Sunday, November 22, 2009

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks

Summary:

This paper introduces a Botnet detection and filtering mechanism called Not-a-Bot (NAB) which potentially can help internet systems distinguish between human generated request versus fake requests. NAB works by trying to authenticate the valid human-generated traffic at the source of the request by a trusted Attester module and further using a verifier module at the server to filter the requests.

The Attester is built on top of the Trusted Platform Module (TPM) (which is a secure cryptoprocessor, supposedly available in many of the computer platforms today). The verification of human generated traffic is done by correlating the mouse or keyboard activities with the generated traffic. This causes the most effective malicious Botnets to only be able to generate traffic at the human activity rate.

The three distinct application of NAB are spam mitigation, DDoS mitigation and Click-fraud mitigation. The authors are able to demonstrate the effectiveness of NAB under the tested scenarios through experiments. For example they show for spam mitigation scenario in which the ISP requires all the outgoing messages to be attested, NAB can cut down the forwarded spam (false negatives) by 92% while 0.08% false misclassification of human generated traffic.

Critique:

It seems like by deploying the NAB system, non-NAB users can potentially suffer from unfair access and prioritization. It can be interesting to discuss and understand to what degree non-attested human generated traffic might be in disadvantage?

In addition, it seems like a sophisticated Botnet can produce traffic at the rate of human activities. What would be the difference of NAB and a simpler verifier that checks at the servers weather the received traffic from the source is too high rate to be a human generated traffic. I believe this is a standard malicious attack detection and I couldn't really figure out the advantage of NAB in this case.

In overall, I vote for dropping this paper from the syllabus mainly for the reason that it is probably better suited for a network security class. Students who don't have much security background (like me) will probably have hard time understanding and further criticizing the paper.

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

Summary:

This is an interesting paper which tries to address the possibility and then needed mechanisms for improving the energy efficiency of networked systems by putting the end-nodes to sleep as much as possible. Many of the current networked computer systems spent much of their time in idle mode almost doing nothing, which results to vast amount of energy waste. The authors of this paper first study and analyze the traffic which an idle system typically sees in a home or office environment and further design a proxy-based architecture based on the learning from the first stage.

The first observation is that because of high frequency of idle-packets both in home and (specially) in office environments, a simple wake-on-lan (WoL) scheme will not result to much savings. After choosing the proxy method as a better candidate authors further analyze and deconstruct the broadcast/multicast/unicast traffic in both home and office settings. Different analyzed protocols can be categorized as don't-wake/don't-ignore/policy-depandant protocols in one dimenstion and ignorable/handled via mechanical responses/specialized processing in another dimension.

For example, don't-wake protocols are the ones with high traffic frequency which if end-hosts were to be woken up for each packet proxy scheme would also be as useless as WoL. Based on these categories, authors propose 4 different proxy approaches with different levels of complexity and performance gain (environment dependent) and further present a general proxy architecture for this purpose.

Comments:

I very much liked the methodology this paper used in approaching the problem. First a detailed study of the underlying interactions and then a well justified approach to proposing a solution. I definitely recommend this paper for the syllabus.

Thursday, November 19, 2009

Cutting the Electric Bill for Internet-Scale Systems

Summary:

This paper investigates the potential savings large distributed internet scale systems might be able to achieve by exploiting the hourly electric price variations geographically.

The authors use real data to investigate the hour by hour variations considering pairwise locations and demonstrate that in many cases the distribution of hourly price difference is almost zero mean with high variance. This is what one needs to be able to dynamically exploit the price variations and tune the routing scheme accordingly. A one-way skewed distribution means the higher price location should probably not be used and there is no need for adaptation.

The paper proposes a bandwidth constraint routing scheme which is also adapting to real time hour by hour price variations. The amount of saving one can potentially achieve by this scheme is highly dependent on the elasticity of the distributed systems and the investigate the potential savings according to different possible elasticity and power usage efficiency scenarios.

Critique:

The paper investigates a very interesting idea. As authors point out savings are heavily influenced by energy elasticity which have been challenging researchers for a while now. Even though things are getting improved a 0% elasticity might be just too optimistic for any new future time (unless something revolutionary happens) and 40% gain under this scenario seems to be too futuristic.

The 2% gain predicted for much more realistic scenario is much more acceptable and can potentially cost few millions of dollars for a Google like system in its current status. The question will be how much this energy aware routing contradicts the current distance optimizations done for many internet systems as we have read in the literature!

Scalable Application Layer Multicast

Summary:

This paper introduces a new scalable application layer multicast, NICE, which is based on a hierarchical cluster-based overlay network. The target application of this architecture is what authors call data stream applications which are applications with large multicast group size and low data rates.

The proposed architecture delicately groups the nodes into clusters at difference hierarchical levels such that cluster leaders are also member of clusters at upper levels while every nodes still exist at level 0. Every cluster leader is chosen carefully to be at the center of corresponding cluster (distance metric is delay) and this is crucial in achieving a good stretch metric (low latency). A pleasant feature of this structure is that as long as nothing is changing, the control overhead can be upper bounded while achieving good stress and stretch metrics as well.

The major operations which NICE further need to support are further detailed out: the join operation, cluster maintenance and host departure/leader selection operations. Since a new host needs to be precisely directed to the closest cluster, this operation can take a long time and temporary service needs to be sdupported from upper level cluster leaders. The host/leader separation was also detailed out as well. The comparisons are done mainly with Narada app-layer multicast system both through simulations and experiments.

Critique:

I consider the main contribution of this paper to be the incredibly low control overhead of NICE compared to older architecture Naradia which had O(N^2) control overhead. This will be a significant factor in larger group sizes. One main concern with the NICE architecture can be the slowness of join operations and the robustness against multi high level cluster-head failures. It could have been interesting if there were some results showing the effect of nodes failing at different high levels together. Nevertheless I found their simulations and experiments a very detailed and appropriate one.

Wednesday, November 18, 2009

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

Summary:

This paper introduces SRM (Scalable Reliable Multicast) which is a scalable frame work supporting reliable delivery of application level frames to multicast groups. SRM only supports reliability in its simplest form which is guaranteeing delivery leaving more complex features such as in order delivery to application and higher layers.

In order to design an scalable reliable delivery mechanism for multicast, authors question some of the fundamental features of typical unicast protocols such as TCP and make a different choice. For example a receiver-based approach to delivery support scales and performs much better than sender-based approaches. Furthermore, naming application level data units rather than state indicators will better fit a dynamic join and separations of the group members.

SRM is mainly a scalable request/repair mechanism within a multicast group. The receivers which believe have missed an ADU request a repair in which is hopefully handled in a recursive fashion from a node close to the congested/broken link rather than the source itself. Much of the analysis is done in tuning the random waits before issuing a request or send out a repair. A main feature of SRM is nodes suppression of each others requests which makes the scheme scalable. Authors analyze and propose dynamic adaptation of the random wait parameters in order to optimize the number of redundant request/repair packets for varying topologies.

Critique:

I think this is an interesting paper to be kept in the syllabus, which definitely helps the reader to get familiar with the complexities scalable multicast support would need in a network. The simulations demonstrated good performance with varying network topologies but the increased variance of the delay sounded like an alarming hint. It can be interesting if we discuss the potential problems an upper level more complex TCP-like protocol might face when it is being run over SRM due to these characteristics. Can a multi-cast ever compete in these features with unicast protocols?

Wednesday, November 11, 2009

X-Trace: A Pervasive Network Tracing Framework


Summary:

X-trace is a tracing framework which goes beyond the single application/protocol/layer/administration limitations of other tracing frameworks such as tracerout. X-trace works by user inserting an X-trace meta data along the request into the system and further logging by every layer/level (which is X-traced enabled) when they see the metadata.

A key component of X-trace is the generation of a task tree which includes all the elements involved in the task also capturing the direction of causality in every relationship. The task tree is constructed from the reports received from the network elements. There are certain concerns in terms of administrative regions being unwilling to share certain information which is solved by aggregating the reports within each region and letting the administrative entities decide on the level of exposure to external users.

The authors give two implementation examples, one a web hosting site and second an overlay network (based on I3 and Chord) which give strong indications of usability and effectiveness of X-trace framework. The paper also has a nice discussion section which discusses the potential draw-back and show stoppers. Security is again a main concern since users of distinct administrative boundary can initiate possibly heavy report generations. Filtering malicious users might potentially alleviate this problem but it is hard to predict the effects of potential attacks until the system is widely deployed. Some other areas which would require more investigation as indicated by authors are non-tree request structure, partial deployments, report loss and managing report traffic.

Critique:

To me X-trace sounds like a neat idea since the trace traverses exactly along the data. Nevertheless it still seems to require a lot of modifications at many different levels and at a very large scale if it aiming to enable a complete visibility. I would think that at more limited scales X-trace can be adapted much faster.

The overall paper was written well. As evaluating this system at larger scale seems to need much more work the authors were only able to demonstrate certain case studies which seems to be a reasonable approach for this problem. In overall an interesting paper, but I personally think there are quite a few more interesting papers (of different subject such as wireless) that we could have read instead.

NetFPGA: A Tool for Network Research and Education


Summary:

NetFPGA is a platform which was originally developed as an academic tool for giving hands-on layer2-3 networking development exposure to students. NetFPGA-v1 have been already (2006) used for this purpose several semesters and both v1 and v2 have been further used for multiple academic research projects. The Shunt project which is "an FGPA-accelerator for network intrusion prevention" was published in 2007 and it seems like there should be many other projects developed based on this platform as well.

I will not go through the spec as it is detailed out in the paper. The second version is obviously more powerful with several key upgrades. NetFPGA-v2 is equipped with more powerful FPGA (Virtex VP230) with two on-chip power PCs. The Power PCs or a micro-blaze on FPGA fabric can be (and is planned to be) used to enable a better integration of software/hardware modules. In addition a PCI form factor makes the system rack mountable and more portable.

It is hard to say from the paper how mature the environment is for academic purposes compared to for example Xilinx development platforms as being used for CS150 here in UC Berkeley. The board is definitely more optimized for router and switch development and of great value to many research projects.

Critique:

Even though this platform is probably an important one for grad students to be aware of, I vote for removing this paper from the syllabus and I would recommend it as a side or third paper for a class session. I think it will be more worth while to read one the interesting use cases which also has an overview of the NetFPGA platform.

Tuesday, November 10, 2009

A Policy-aware Switching Layer for Data Centers

Summary:

This paper introduced PLayer, a policy aware switching layer for data centers which enables an efficient and flexible deployment of middle-boxes in data centers. Currently data centers need to install different kind of middle boxes such as firewalls and SSL offloaders implicitly on the path which flows are traversing through in an ad-hoc fashion. This approach is not flexible, mostly can not guarantee the usage of middle box and require heavy manual configurations.

PLayer addresses this problem by separating policy from reachability and further using off-path middle boxes. This architecture is based on an interconnected layer of policy aware switches (pswitches). Middle boxes are connected to pswithces (off-path) and every pswitch forwards frames according to specified policies by administrators. Administrators can specify high level policies specifying the sequence of services a traffic need to traverse through. PLayer translates this policy onto rules which are implemented by pswitches. Rule tables at the pswitches are managed from a centeralized structure which is also responsible for monitoring middle boxes and updating pswitches of their status.

The paper goes on to a detailed description of pswitch routers, the policy specification and guarantees under churn. It is possible to gradually upgrade the current data centers with the PLayer structure and there is no need for one time major restructuring which is a big advantage.

Critique:

The PLayer definitely enables much more flexible and sophisticated usage of middle-boxes. The authors argue that usage of off-path middle boxes will not be much of a problem since the DC environment is a very low latency and high bandwidth one. I wonder if the latency increase and overhead is really as insignificant as the authors claim. I would vote for keeping a shorter version of this paper in the syllabus.

Internet Indirection Infrastructure

Summary:

This paper introduces Internet Indirection Infrastructure (i3), which is an overlay network supporting a rendezvous-based communication abstraction between senders and receivers. in i3 overlay network, sender transmits the packet to an identifier rather than receiver within the i3 server cluster which is forwarded to interested receivers according to already registered triggers (some kind of callback). i3 is built on top of Chord overlay network which supports scalable and balanced lookups of the identifiers. I3 can efficiently support multicast, anycast and mobility simultaneously and without the involvement of the transmitter and this is the major contribution of this paper.

Using a stack of identifiers instead of simple scalars give an special flexibility to i3 network. Transmitters send a stack of identifiers along with the packet which can be used for example to backup triggers improving the robustness of the system. Using a stack of identifiers also enables heterogeneous multicast since receivers can specify the set of services/servers the packet need to visit before arriving at the receiver and (therefore enable different kind of processing on the packet).

An important technique in i3 design is the usage of public and private triggers. The public trigger is known by everyone and can be used by everyone to for example access a web-server. Private triggers on the other hand are usually short lived triggers which are used for individual data transmission among the nodes. This feature is used to improve routing efficiency by locating the i3 server closest to the receiver in order to alleviate the triangle problem. The length and also private-ness of the private triggers also helps with security issues such as eavesdropping.

Critique:

The authors use simulations and limited experiments to show the feasibility and performance of i3 system. An interesting result is that a receiver can improve the end-to-end latency by only obtaining 16-32 samples from the i3 servers (in order to identify the closer ones). I wonder what happens when the multicast group is geographically widespread? A comparison of this case with IP multicast would have been interesting.

This paper introduces a very nice communication abstraction layer and I vote for keeping it in the syllabus.