Arash@CS268: 2009

Sunday, November 22, 2009

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks

Summary:

This paper introduces a Botnet detection and filtering mechanism called Not-a-Bot (NAB) which potentially can help internet systems distinguish between human generated request versus fake requests. NAB works by trying to authenticate the valid human-generated traffic at the source of the request by a trusted Attester module and further using a verifier module at the server to filter the requests.

The Attester is built on top of the Trusted Platform Module (TPM) (which is a secure cryptoprocessor, supposedly available in many of the computer platforms today). The verification of human generated traffic is done by correlating the mouse or keyboard activities with the generated traffic. This causes the most effective malicious Botnets to only be able to generate traffic at the human activity rate.

The three distinct application of NAB are spam mitigation, DDoS mitigation and Click-fraud mitigation. The authors are able to demonstrate the effectiveness of NAB under the tested scenarios through experiments. For example they show for spam mitigation scenario in which the ISP requires all the outgoing messages to be attested, NAB can cut down the forwarded spam (false negatives) by 92% while 0.08% false misclassification of human generated traffic.

Critique:

It seems like by deploying the NAB system, non-NAB users can potentially suffer from unfair access and prioritization. It can be interesting to discuss and understand to what degree non-attested human generated traffic might be in disadvantage?

In addition, it seems like a sophisticated Botnet can produce traffic at the rate of human activities. What would be the difference of NAB and a simpler verifier that checks at the servers weather the received traffic from the source is too high rate to be a human generated traffic. I believe this is a standard malicious attack detection and I couldn't really figure out the advantage of NAB in this case.

In overall, I vote for dropping this paper from the syllabus mainly for the reason that it is probably better suited for a network security class. Students who don't have much security background (like me) will probably have hard time understanding and further criticizing the paper.

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

Summary:

This is an interesting paper which tries to address the possibility and then needed mechanisms for improving the energy efficiency of networked systems by putting the end-nodes to sleep as much as possible. Many of the current networked computer systems spent much of their time in idle mode almost doing nothing, which results to vast amount of energy waste. The authors of this paper first study and analyze the traffic which an idle system typically sees in a home or office environment and further design a proxy-based architecture based on the learning from the first stage.

The first observation is that because of high frequency of idle-packets both in home and (specially) in office environments, a simple wake-on-lan (WoL) scheme will not result to much savings. After choosing the proxy method as a better candidate authors further analyze and deconstruct the broadcast/multicast/unicast traffic in both home and office settings. Different analyzed protocols can be categorized as don't-wake/don't-ignore/policy-depandant protocols in one dimenstion and ignorable/handled via mechanical responses/specialized processing in another dimension.

For example, don't-wake protocols are the ones with high traffic frequency which if end-hosts were to be woken up for each packet proxy scheme would also be as useless as WoL. Based on these categories, authors propose 4 different proxy approaches with different levels of complexity and performance gain (environment dependent) and further present a general proxy architecture for this purpose.

Comments:

I very much liked the methodology this paper used in approaching the problem. First a detailed study of the underlying interactions and then a well justified approach to proposing a solution. I definitely recommend this paper for the syllabus.

Thursday, November 19, 2009

Cutting the Electric Bill for Internet-Scale Systems

Summary:

This paper investigates the potential savings large distributed internet scale systems might be able to achieve by exploiting the hourly electric price variations geographically.

The authors use real data to investigate the hour by hour variations considering pairwise locations and demonstrate that in many cases the distribution of hourly price difference is almost zero mean with high variance. This is what one needs to be able to dynamically exploit the price variations and tune the routing scheme accordingly. A one-way skewed distribution means the higher price location should probably not be used and there is no need for adaptation.

The paper proposes a bandwidth constraint routing scheme which is also adapting to real time hour by hour price variations. The amount of saving one can potentially achieve by this scheme is highly dependent on the elasticity of the distributed systems and the investigate the potential savings according to different possible elasticity and power usage efficiency scenarios.

Critique:

The paper investigates a very interesting idea. As authors point out savings are heavily influenced by energy elasticity which have been challenging researchers for a while now. Even though things are getting improved a 0% elasticity might be just too optimistic for any new future time (unless something revolutionary happens) and 40% gain under this scenario seems to be too futuristic.

The 2% gain predicted for much more realistic scenario is much more acceptable and can potentially cost few millions of dollars for a Google like system in its current status. The question will be how much this energy aware routing contradicts the current distance optimizations done for many internet systems as we have read in the literature!

Scalable Application Layer Multicast

Summary:

This paper introduces a new scalable application layer multicast, NICE, which is based on a hierarchical cluster-based overlay network. The target application of this architecture is what authors call data stream applications which are applications with large multicast group size and low data rates.

The proposed architecture delicately groups the nodes into clusters at difference hierarchical levels such that cluster leaders are also member of clusters at upper levels while every nodes still exist at level 0. Every cluster leader is chosen carefully to be at the center of corresponding cluster (distance metric is delay) and this is crucial in achieving a good stretch metric (low latency). A pleasant feature of this structure is that as long as nothing is changing, the control overhead can be upper bounded while achieving good stress and stretch metrics as well.

The major operations which NICE further need to support are further detailed out: the join operation, cluster maintenance and host departure/leader selection operations. Since a new host needs to be precisely directed to the closest cluster, this operation can take a long time and temporary service needs to be sdupported from upper level cluster leaders. The host/leader separation was also detailed out as well. The comparisons are done mainly with Narada app-layer multicast system both through simulations and experiments.

Critique:

I consider the main contribution of this paper to be the incredibly low control overhead of NICE compared to older architecture Naradia which had O(N^2) control overhead. This will be a significant factor in larger group sizes. One main concern with the NICE architecture can be the slowness of join operations and the robustness against multi high level cluster-head failures. It could have been interesting if there were some results showing the effect of nodes failing at different high levels together. Nevertheless I found their simulations and experiments a very detailed and appropriate one.

Wednesday, November 18, 2009

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

Summary:

This paper introduces SRM (Scalable Reliable Multicast) which is a scalable frame work supporting reliable delivery of application level frames to multicast groups. SRM only supports reliability in its simplest form which is guaranteeing delivery leaving more complex features such as in order delivery to application and higher layers.

In order to design an scalable reliable delivery mechanism for multicast, authors question some of the fundamental features of typical unicast protocols such as TCP and make a different choice. For example a receiver-based approach to delivery support scales and performs much better than sender-based approaches. Furthermore, naming application level data units rather than state indicators will better fit a dynamic join and separations of the group members.

SRM is mainly a scalable request/repair mechanism within a multicast group. The receivers which believe have missed an ADU request a repair in which is hopefully handled in a recursive fashion from a node close to the congested/broken link rather than the source itself. Much of the analysis is done in tuning the random waits before issuing a request or send out a repair. A main feature of SRM is nodes suppression of each others requests which makes the scheme scalable. Authors analyze and propose dynamic adaptation of the random wait parameters in order to optimize the number of redundant request/repair packets for varying topologies.

Critique:

I think this is an interesting paper to be kept in the syllabus, which definitely helps the reader to get familiar with the complexities scalable multicast support would need in a network. The simulations demonstrated good performance with varying network topologies but the increased variance of the delay sounded like an alarming hint. It can be interesting if we discuss the potential problems an upper level more complex TCP-like protocol might face when it is being run over SRM due to these characteristics. Can a multi-cast ever compete in these features with unicast protocols?

Wednesday, November 11, 2009

X-Trace: A Pervasive Network Tracing Framework

Summary:

X-trace is a tracing framework which goes beyond the single application/protocol/layer/administration limitations of other tracing frameworks such as tracerout. X-trace works by user inserting an X-trace meta data along the request into the system and further logging by every layer/level (which is X-traced enabled) when they see the metadata.

A key component of X-trace is the generation of a task tree which includes all the elements involved in the task also capturing the direction of causality in every relationship. The task tree is constructed from the reports received from the network elements. There are certain concerns in terms of administrative regions being unwilling to share certain information which is solved by aggregating the reports within each region and letting the administrative entities decide on the level of exposure to external users.

The authors give two implementation examples, one a web hosting site and second an overlay network (based on I3 and Chord) which give strong indications of usability and effectiveness of X-trace framework. The paper also has a nice discussion section which discusses the potential draw-back and show stoppers. Security is again a main concern since users of distinct administrative boundary can initiate possibly heavy report generations. Filtering malicious users might potentially alleviate this problem but it is hard to predict the effects of potential attacks until the system is widely deployed. Some other areas which would require more investigation as indicated by authors are non-tree request structure, partial deployments, report loss and managing report traffic.

Critique:

To me X-trace sounds like a neat idea since the trace traverses exactly along the data. Nevertheless it still seems to require a lot of modifications at many different levels and at a very large scale if it aiming to enable a complete visibility. I would think that at more limited scales X-trace can be adapted much faster.

The overall paper was written well. As evaluating this system at larger scale seems to need much more work the authors were only able to demonstrate certain case studies which seems to be a reasonable approach for this problem. In overall an interesting paper, but I personally think there are quite a few more interesting papers (of different subject such as wireless) that we could have read instead.

NetFPGA: A Tool for Network Research and Education

Summary:

NetFPGA is a platform which was originally developed as an academic tool for giving hands-on layer2-3 networking development exposure to students. NetFPGA-v1 have been already (2006) used for this purpose several semesters and both v1 and v2 have been further used for multiple academic research projects. The Shunt project which is "an FGPA-accelerator for network intrusion prevention" was published in 2007 and it seems like there should be many other projects developed based on this platform as well.

I will not go through the spec as it is detailed out in the paper. The second version is obviously more powerful with several key upgrades. NetFPGA-v2 is equipped with more powerful FPGA (Virtex VP230) with two on-chip power PCs. The Power PCs or a micro-blaze on FPGA fabric can be (and is planned to be) used to enable a better integration of software/hardware modules. In addition a PCI form factor makes the system rack mountable and more portable.

It is hard to say from the paper how mature the environment is for academic purposes compared to for example Xilinx development platforms as being used for CS150 here in UC Berkeley. The board is definitely more optimized for router and switch development and of great value to many research projects.

Critique:

Even though this platform is probably an important one for grad students to be aware of, I vote for removing this paper from the syllabus and I would recommend it as a side or third paper for a class session. I think it will be more worth while to read one the interesting use cases which also has an overview of the NetFPGA platform.

Tuesday, November 10, 2009

A Policy-aware Switching Layer for Data Centers

Summary:

This paper introduced PLayer, a policy aware switching layer for data centers which enables an efficient and flexible deployment of middle-boxes in data centers. Currently data centers need to install different kind of middle boxes such as firewalls and SSL offloaders implicitly on the path which flows are traversing through in an ad-hoc fashion. This approach is not flexible, mostly can not guarantee the usage of middle box and require heavy manual configurations.

PLayer addresses this problem by separating policy from reachability and further using off-path middle boxes. This architecture is based on an interconnected layer of policy aware switches (pswitches). Middle boxes are connected to pswithces (off-path) and every pswitch forwards frames according to specified policies by administrators. Administrators can specify high level policies specifying the sequence of services a traffic need to traverse through. PLayer translates this policy onto rules which are implemented by pswitches. Rule tables at the pswitches are managed from a centeralized structure which is also responsible for monitoring middle boxes and updating pswitches of their status.

The paper goes on to a detailed description of pswitch routers, the policy specification and guarantees under churn. It is possible to gradually upgrade the current data centers with the PLayer structure and there is no need for one time major restructuring which is a big advantage.

Critique:

The PLayer definitely enables much more flexible and sophisticated usage of middle-boxes. The authors argue that usage of off-path middle boxes will not be much of a problem since the DC environment is a very low latency and high bandwidth one. I wonder if the latency increase and overhead is really as insignificant as the authors claim. I would vote for keeping a shorter version of this paper in the syllabus.

Internet Indirection Infrastructure

Summary:

This paper introduces Internet Indirection Infrastructure (i3), which is an overlay network supporting a rendezvous-based communication abstraction between senders and receivers. in i3 overlay network, sender transmits the packet to an identifier rather than receiver within the i3 server cluster which is forwarded to interested receivers according to already registered triggers (some kind of callback). i3 is built on top of Chord overlay network which supports scalable and balanced lookups of the identifiers. I3 can efficiently support multicast, anycast and mobility simultaneously and without the involvement of the transmitter and this is the major contribution of this paper.

Using a stack of identifiers instead of simple scalars give an special flexibility to i3 network. Transmitters send a stack of identifiers along with the packet which can be used for example to backup triggers improving the robustness of the system. Using a stack of identifiers also enables heterogeneous multicast since receivers can specify the set of services/servers the packet need to visit before arriving at the receiver and (therefore enable different kind of processing on the packet).

An important technique in i3 design is the usage of public and private triggers. The public trigger is known by everyone and can be used by everyone to for example access a web-server. Private triggers on the other hand are usually short lived triggers which are used for individual data transmission among the nodes. This feature is used to improve routing efficiency by locating the i3 server closest to the receiver in order to alleviate the triangle problem. The length and also private-ness of the private triggers also helps with security issues such as eavesdropping.

Critique:

The authors use simulations and limited experiments to show the feasibility and performance of i3 system. An interesting result is that a receiver can improve the end-to-end latency by only obtaining 16-32 samples from the i3 servers (in order to identify the closer ones). I wonder what happens when the multicast group is geographically widespread? A comparison of this case with IP multicast would have been interesting.

This paper introduces a very nice communication abstraction layer and I vote for keeping it in the syllabus.

Saturday, October 31, 2009

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications

Summary:

This paper introduces Chord, a distributed lookup protocol which is designed based on DHT technology. Even though the function this system supports sounds very simple (given a key maps it onto a node) quite a bit of intelligence is required to design a system which can accomplish this task in an efficient, scalable and further resilient fashion.

As mentioned in the survey paper Chord lookup is symmetric and structured meaning nodes are of equal importance while data items can be deterministically mapped to nodes. To achieve this task, Chord uses a variant of consistent hashing which would not require global awareness at every node; Instead proposes usage of a distributed routing table requiring O(logN) state size and O(logN) messages to resolve a query and therefore it is much more scalable.

In simple terms, each node has a finger table which has pointers to other nodes with roughly different power 2 distances in node identifier circle. Every key is stored at the immediate following node ID and therefore with each message we halve the distance to the node responsible for target key. The important operations which need to be further supported are join and leave operations which authors show with high probability will take O(log^2(N)) message to update all the states.

An important requirement for the correctness of this protocol is the the validity of the pointer which each node has to the next node on the identifier circle (successor). These pointers will guarantee that at each step a positive move will be made towards the target and further ensures correctness. For this reason every node periodically runs an stabilization algorithm ensuring the validity of the pointer. The Chord will further a successor list instead of a single pointer to achieve a better performance against simultaneous pointers.

Critique:

Authors test Chord against different pressure scenarios to demonstrate its true characteristics. A special point which was made in the simulation part was that random uniform id assignment will cause too much load variance across nodes and virtual nodes should be used at each node to alleviate this problem and this would increase the finger table size but not too much. I wonder how important is this variance in reality and if it really needs to be accounted for?

Another point of discussion can be weather Chord can tuned for exploiting asymmetric underlying structure meaning not giving as much load to nodes which are less reliable, etc. Chord is definitely a great paper for this class.

Looking Up Data in P2P Systems

Summary:

This paper gives a survey of recent DHT-based solutions to look-up problem proposed for P2p systems. Scalable and robust look-up mechanisms is a major challenge/requirement for large scale P2P systems which needs to be addressed.

Purely structured (each node has a well defined information about other nodes) lookup or symmetric (nodes are of equal importance from the point of view of lookup) lookup mechanisms are either too sensitive to failures or can't give guarantees and are inefficient. The more recent proposals (CAN, Chord, Kademlia, Pastry and Tapestry) are all structured and symmetric enabling them to be resilient and offer guarantees at the same time.

Interestingly, DHT is the key technology that has enabled all these new systems. When an entity wants to access a file (for example), it needs to calculate the hash key and call the lookup(key) function. Every file is stored at the node with the "closest" ID (usually chosen randomly) to the key (the definition of closeness depends on the scheme).

The difference among the proposed DHT based lookup services is mainly weather they route in a single dimension or more and the used data structures. For example Chord routes in a single dimension and uses a skiplist like data structure to store the potential next hops at every node achieving O(logN) expected similar to other 1-dimensional routing schemes.

Critique:

This paper gives a very educational and concrete overview of the proposed DHT-based lookup algorithms and further gives some insight into the trade-offs and choices of design parameters (specially in the discussion section). It would have been interesting if there were additional simulation or experimentation results comparing these mechanisms in an actual setting beyond the big O notation comparisons; even though this level of comparison is probably not expected from a survey paper. Great paper to be kept in the syllabus indeed.

Thursday, October 29, 2009

Active Network Vision and Reality: Lessons from a Capsule-Based System

Summary:

This paper describes some of the understandings and achievements by experimenting with "active networks" based on ANTS platform. Active network allows applications to receive customized network services by dynamic code import and execution on "active nodes".

The information regarding the customized forwarding routines is injected by the end-users via "capsules" which is a regular IP packet with extra header including the instruction on the requested forwarding routines. Capsules direct themselves through the network by running custom forwarding routines executed by the active nodes. Legacy IP routers within the network continue to route packets as a regular IP packet.

When the required code for a specific forwarding routine is not cached in an active node but is needed it is asked from the previous visited active node. If code download process fails the capsule is dropped in which the end-user needs to recover from it. This model of code distribution and execution is called extensible packet forwarding by the authors.

Wednesday, October 28, 2009

Resilient Overlay Networks

Summary:

Resilient Overlay Network (RON) is an application-layer overlay on top of internet routing allowing an improved detection and recovery of path outages (or serious performance degradation) for distributed internet applications.

The main goal of RON is to protect applications running on a group of cooperating nodes against the failures of underlying internet paths. BGP-4 has been optimized for scalability and has a poor performance with respect to this criteria. The reason that RON is resilient against fault much more than BGP-4 scheme lies in the fact that RON does not have to consider scalability (2-50 number of nodes).

RON achieves this goal by aggressively probing and monitoring the paths among RON nodes. It detects when the default internet path is the best path and uses it, otherwise data is routed through an intermediate node. RON nodes continuously monitor, measure and share the path properties. An interesting aspect of this architecture is a distributed performance data base in which supports summarization and efficient data exchange.

Interestingly, RON control packets are sent along the RON data path in order to ensure resilience against the mentioned possible internet path failures.Therefor RON routing protocol is itself a client of RON.

Latency, packet loss and throughput are the three default maintained information about each virtual link within RON while clients can overwrite a different metric and influence the routing with their parameters of interest. This is an important characteristic of RON architecture which enable an improved level of integration with applications.

In order to demonstrate these concepts, two distinct experiments are done over a 12 node and 16 node RON structures, demonstrating fault avoidance rate of 60-100% with 18 seconds average recovery time.

Critique:

RON is an interesting approach in grabbing the internet resources to build a geographically distributed small sized resilient network (up to 50). (As indicated by authors) I wonder what could happen if the density of RON networks is increased to a non-negligible portion of the internet? Will the overhead or special path optimizations done by RONs start to introduce new behaviors in the internet not seen before?

Another interesting point of discussion can be the scalability issue. There seems to be a range of applications in which the number of involved nodes is beyond 50 (maybe order of 100s) but not high enough to justify a dedicated network. Is there any hope for such applications?

I enjoyed this paper but will vote for dropping it just because I felt it was too lengthy.

Tuesday, October 27, 2009

On Inferring Autonomous System Relationships in the Internet

Summary:

This paper introduces a heuristic algorithm which is able to infer the relationships among Internet Autonomous Systems (AS). The inter-AS agreements in today's Internet takes one of the forms of costumer-provider, peering, mutual transit and mutual backup. The importance of this information lies on the fact that inter-domain routing (BGP) is a policy based routing scheme and therefore connectivity and reachability are not equivalent. Furthermore ISPs do not expose their contractual agreements for most of the time and therefore such algorithms can be quite useful in better understanding the state of the internet and improving it.

The algorithm works by deriving an annotated AS graph with the edges indicating provider-customer (~customer-provider), peer-to-peer and sibling-to-sibling (mutual transit/backup) relationship which is compatible with the BGP route tables. Each route in a route table has been a result of recursive export policies enforced by different ASs. The route entries can be divided onto (possibly) UpHill Path which only traverses through customer-provider and sibling-to-sibling edges (possibly) followed by a peer-to-peer edge (possibly) followed by a downhill path which only contains provider-customer or sibling-to-sibling edges.

Combining this information with size/degree proportionality of the ASs and the possible relationships the proposed algorithm is able to infer the AT&T ISP relationships with its neighboring ASs with 90.5% accuracy and more than 50% siblings are confirmed through the WHOIS lookup service (which I'm not sure if this last number indicates a good performance in sibling-to-sibling prediction). This information can be used by ISPs to possibly debug the erronous entries in their BGP route tables and plan for future contractual agreements.

Critique:

I found this paper interesting but have hoped for better quantitative evaluation of importance/relevance and impact of such algorithms (even though I agree with the general arguments). In general a nice paper but would agree for dropping it for a more exciting paper.

Sunday, October 25, 2009

On Power-Law Relationships of the Internet Topology

Summary:

This paper introduces an interesting approach in characterizing the internet topologies by power-laws. The authors argue and demonstrate that simple averaging of parameters of interest does not correctly capture the underlying skewed distributions.

To this end they experimentally identify 3 power-laws (y prop x^a) and show that they hold over the three instances of Internet inter-domain topologies taken during 1998 and a single instance of intera-domain topology from year 1995. The lines predicted by the identified power-laws have above 96% coefficient correlation with actual data points. The identified power-laws are as following:

Power-Law 1 (rank exponent): The outdegree of a node is proportional to the rank of the node to a power of a constant R. The rank of a node will be determined by ordering the nodes in their decreasing outdegree order and finding the index. The constant R is called the "rank exponent" of the network topology and its value can be measure from the slop of the log-log plot of corresponding parameters. The rank exponent of three inter-domain instances are quite close (~-0.80) and different from the intra-domain instance (~-048) suggesting that it is a good metric in distinguishing the differences among network topologies.
Power-Law 2 (outdegree exponent): The frequency of an outdgree is proportional to the outdegree to the power of a constant O. This constant is called the outdegree exponent of the network topology. The value of O is quite close for all of the 4 instances (still slightly different for the intera-domain instance) suggesting that PL2 is a fundamental property of any such network. good for ckeckign the realisticness of a network
Power-Law 3 (eigen exponent): The eigenvalues of a graph are proportional to the order i, to the power of a constant E. i is the order of eigenvalues in a decreasing sequence fashion. E is called the eigen exponent of the network toplogy which again significantly differs in between the inter-domain instances (~-0.48) and intera-domain (-0.177) suggesting it is a good metric for distinguishing the differences among networks.

In addition to above three power-laws the authors identify another power-law but reserve to call it a law because of lack of sample points:

Approximation 1 (hop-plot exponent): The total number of pairs of nodes, P(h), within h hops is proportional to the number of hops to the power of a constant H. The hop-plot exponent H can again be measured from the slop of the log-log plots when h is much less than the diameter of the network.

The hop-plot exponent is especially interesting since it gives better tools in estimating the number of average neighborhood size versus the traditional approach based on average node degree. The advantage comes from the fact that averaging assumes uniform outdegree while the hop-plot exponent based estimation better captures the underlying skewed distribution.

The identified power-laws can be used to verify the validity of a simulation model, improve the protocol analysis and interpolate/estimate the topological parameters of interest.

Critique:

The paper gives some insight into the underlying effect which causes the validity of identified power-laws which are quite interesting. It seems like after the first significant step the authors had taken in identifying these power-laws, a more detailed study of the underlying effects will be the next natural candidate.

This improved understanding of power-laws will also require to have access to many more sample points (even though it is a tedious process) and network instances. I'm very curious to know what have been the result of studying these power-laws over a richer set of network instances. I believe this is a great paper to be kept in the syllabus.

Friday, October 16, 2009

White Space Networking with Wi-Fi like Connecitivy

This paper introduces the WhiteFi, a Wi-Fi like network architecture designed to operate in UHF white space bands. WhiteFi is the first network prototype demonstrating the feasibility of WiFi like networking in TV bands.

The incumbents in the UHF bands are: 1) TV stations with strong variable spatial usage map and 2) wireless microphones which have more temporal varying characteristics. Therefore any white space network needs to address the spatial and temporal variations in the channel availability in addition to exploiting the existence of contiguous chunks of spectrum.

The WhiteFi introduces the following three key contributions in order to address the distinct characteristics of white space networking:

Channel assignment algorithm: The channel assignment in UHF bands is much harder since the network should consider all the clients observations . The WhiteFi gathers the observations from all clients including their available UHF channels and airtime usage in these channels by other white space devices and defines a new metric "multichannel airtime metric" demonstrating by simulations and experimental results to be a good indicator of what channel and width should be used for the communication.
Access point discovery: The paper introduces a novel time domain analysis technique called Signal Inspection before Fourier Transform (SIFT) to enable faster detection of channel width used by the AP and also measure the medium utilization needed by the channel assignment algorithm. SIFT uses the packet duration and time between packet and ACK transmissions to estimate the waveform width as they are inversely proportional (for constant packet sizes). The authors further introduce two access point discovery algorithms based on SIFT: Linear-SIFT and Jump-SIFT and prove their superiority over traditional search models.
Handling disconnection: To avoid any interference with incumbents, the clients immediately change their channel when detect their presence and start chirping a special signal in an already known backup channel which contains the information regarding their channel map. The access point periodically checks this backup channel and tunes its radio to decode the lost client packets in case needed. Then the network will be able to re-establish itself in a good operational channel.

The authors further use extensive simulations using QualNet package and some implementations to prove the accuracy of SIFT algorithm and the proposed channel metric along the overall performance of the system.

The paper introduces many clever solutions to the problems one will see in the white space networking and do a great job of combining them. I felt the disconnection algorithm to be a bit too sensitive to corner cases and sill a bit immature. The need for the AP to switch its channel to decode the packet when the rest of clients are still running can cause significant degradation in the quality of service the users experience specially when it occurs often or there are possible false alarms.

Very interesting paper and indeed worth to be in the syllabus.

Thursday, October 8, 2009

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

The goal of this paper is to provide a fair performance comparison between 4 major ad-hoc routing protocols (by 1998) under different mobility conditions. The routing protocols under study are: DSDV, TORA, DSR and AODV.

The comparisons are done through simulations and one of the main contributions of this paper is the extension of ns simulator to better model node mobility, physical layer interactions, radio interfaces and 802.11 MAC protocol. The 4 different protocols are tested over 210 different scenario files (10,20 or 30 source nodes over 70 different movement patterns) for 1m/s and 20m/s maximum speed which are fed to the simulator. The pause time is the constant time between two consecutive moves of a node and its variation partially characterizes the mobility of the environment.

The metrics used for comparing the protocols are packet delivery ratio, routing overhead and path optimality. Besides DSDV which is a loop free distance-vector routing protocol, the other 3 are on-demand protocols and have varying overhead as pause time is changed from min=0sec to max=900sec. In general the Dynamic Source Routing protocol shows to perform superior to others in most of the mobility conditions even though the gap is narrowed at lower mobile environments. AODV which combines DSDV and DSR in order to remove the need for encapsulating the route information in the packet performs pretty close to DSR in general but in the most mobile scenarios in which it performs worse than DSR.

This paper gives a good overview of the main ad-hoc routing protocols and gives a fair comparison among them and therefore I recommend it for the syllabus. It would have been interesting to discuss their propagation model and weather testing for an indoor or outdoor environment could have a slightly different result.

Wednesday, October 7, 2009

A High-Throughput Path Metric for Multi-Hop Wireless Routing

This paper for the first time introduces Expected Transmission (ETX) count metric to be used instead of the typical hop-count metric and proves its superiority by experimental tests.

Using the hop-count as the routing metric over a multi-hop network can typically result to choosing a route which is not close to optimum. Minimizing the hop count typically result to using longer links which are inclined to be more lossy. Trivial solutions such as masking errors at link layer or discarding lossy links fail to work as the first one reduces the link throughput and the later disconnects nodes reachable only through lossy links.

With the insight authors had gained from real experiments they designed ETX to address the following issues:

The wide range of link loss ratio: many links with intermediate quality exist
The existence of links with asymmetric loss ratios: 22 of 29 nodes had at list on link with more than 25% difference quality of forward and reverse direction
The interference between successive hops of multi-hop paths: two hop route route with perfect delivery per link achieves 50% of a single link

ETX of a link is defined as the inverse of the multiplication of the forward and reverse delivery ratios which captures the typical asymmetric nature of links and the need for two way high link quality. Over a route the ETX is the addition of the ETX of every member link of the route. This metric estimates the number of transmissions in order for one packet to get to its destination. Using route with minimum ETX reduces the number of transmissions which in addition to throughput gains reduces the network inter link interference.

The delivery ratios are estimated by counting periodic broadcast packets emitted from each node. The authors prove the usefulness of ETX by implementing it over DSDV and DSR routing protocols and show the gain of upto 2x or larger when the number of hop are 2 or more.

I found this paper the most interesting paper we have read so far. The authors not only introduced a new concept and thoroughly evaluated it in real setting but also were fair on emphasizing on the cases when ETX does not outperform other schemes as much. They carefully studied the effects of variable packet size and proved that in a more well connected network, decreasing the average number of hop-count will close the performance gap.

I wonder if the issue of packet size can be alleviated by estimating the actual delivery rate from the observed one with a different packet size knowing that the slopes of the curves in figure 14 is quite constant and further using different estimations for forward and reverse directions as ACK packet is typically smaller. I definitely vote for keeping this paper in syllabus.

Sunday, October 4, 2009

Architecture and Evaluation of an Unplanned 802.11b Mesh Network

This project evaluates the effectiveness of Mesh networks with unplanned placement, omnidirectional antennas and multi-hop routing in bringing last mile connectivity to communities. The lack of planning might increase the rate of deployment but might result to seriously faulty topologies and low performance. The authors make their case by experimenting with Roofnet which is a network of 37 802.11b wireless nodes with roof mounted antennas spread over 4 square kilometers.

The nodes are installed by volunteers who get to use the network for their own personal usage. The network contains 4 gateways to the global internet. For routing a combination of link-state and source-routing techniques are used and bit-rates of wireless cards are set with an optimized algorithm called Sample Rate. Broadcast probes are used in order to gather the required metrics the routing and bit-rate selection algorithms. The routing algorithm tend to choose links with throughput higher than 2Mbps which are usually not too long and discard the low quality links. The roofnet network achieves 627kbps average throughput and 39ms average delay.

The authors show through simulation results that to achieve good connectivity and performance, dense deployment is a must. The simulation results show that network is not too sensitive to deletion of individual links but can be very sensitive to deletion of individual nodes. It takes the two most important nodes to fail in order for the network performance to drop by 43%!

Even though the authors tried, I believe it is very hard to compare this scheme with the case of single hop access point scheme. In the case of single hop access point architectures even though you might need more gateways to cover the whole area, it is more likely that individuals experience a better throughput and robustness. Another advantage of single hop access points is that they can be set to orthogonal channels in order to minimize the interference and thus will not suffer as much as the Roofnet mesh network.

I also recommend another paper published for the same project which studies the physical layer effects in more depth.

Modeling Wireless Links for Transport Protocols

Gurtov and Floyd in this paper, discuss and study the complex interactions between wireless link characteristics and transport protocols. The TCP/IP stack has become the de facto technology in both wired and wireless networks and transport protocols and wireless technologies should be designed with a better understanding of the interplay between these mechanisms.

Researchers who are working on new transport protocol designs are in need of simulation models which capture the important/relevant characteristics. The authors claim that many of the current models have one or more of the following 4 type of problems: 1) being unrealistic, 2) having limited parameter space, 3) being overly realistic and 4) being under specified.

For example it was believed that abrupt increase in delay caused by link level errors will cause retransmission timeouts while later it was observed that because of relative high variations in delay, congestion windows are already inflated and rather than producing spurious timeouts they will cause slow recovery from an actually packet loss.

The authors indicate the following to be the main wireless link characteristics which need to be properly addressed in any such model:

Error losses and corruption
Delay variation
Packet reordering
On-demand resource allocation
Bandwidth variation
Asymmetry in Bandwidth and Latency

In addition of the above link characteristics, Queue management policies and mobility effects also need to be captured and considered in link models as they can have significant effects on transport performance. The authors discuss the effect of above characteristics on transport protocols, their presence in current and future wireless links and hints on how they should be modeled.

The paper is a good overview of the complex relationship between link characteristics and transport protocols. Even though the layered structure has been relatively successful in separating the functions, these layers are not in any way independent of each other and a deep understanding of the interactions is required for any good design.

As authors have pointed out, cross layer techniques are one of the most effective methods for improving the efficiency of interplay between layers and it could be interesting to discuss the advantages/disadvantages of increase in deployments of cross-layer techniques in today's wired/wireless internet. I vote for keeping this paper in the syllabus.

Wednesday, September 30, 2009

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

The authors of this paper perform an in depth experimental study of different proposed solutions for improving TCP performance over lossy wireless links. TCP confuses packet losses which are due to bad link quality with congestion signals; therefore performing unnecessary congestion control procedures reducing the throughput significantly. The authors divide the proposed solutions for this problem to 3 different categories:

End-to-end protocols
Link-layer protocols
Split connection protocols

The tests are done over a single hop wireless link connected to either single hop wired (LAN topology) or 16 hop wired (WAN topology) connection. Most of the experiments are done with fixed 1/64KByte error rate which is artificially injected at the receiver. Later on the better performing representatives of each category is compared together while varying the error rate from 1/16KByte to 1/256KByte.

Out of all the different schemes analyzed in this paper, LL-SMART-TCP-AWARE performs the best closely followed by LL-TCP-AWARE. The LL stands for Link Layer reliability/retransmissions which by itself can improve the TCP performance almost by 2X over raw TCP Reno, but since it doesn't guarantee reordering it performs better when accompanied by an scheme which suppresses the unnecessary dupacks (and therefore called TCP aware). The addition of some kind of selective Acks (SMART) will further improve the performance of this scheme but the gain is not as much. Some of the schemes in other categories such as End-to-End with Explicit Loss Notification (ELN), which solves the TCP confusion, also have very impressive gains over TCP Reno but are not in general as effective as reliable link methods.

In general it is very hard to criticize an ACM best paper award winning paper (especially when one of the authors is your professor)! The paper gives a good overview of the different possible solutions and make fair solid experimental comparisons among them along drawing conclusions and I vote for keeping this paper in the syllabus. As a possible short discussion topic I was hoping to understand why the errors where injected manually over an almost reliable link rather than actually making the link less reliable possibly by increase in noise/interference?

Tuesday, September 29, 2009

MACAW: A Media Access Protocol for Wireless LAN's

MACAW is an extension/modification to MACA, Media Access Collision Avoidance protocol earlier proposed by other researchers. The basis of these protocols is that the contention is at the receiver and not the transmitter pointing to the familiar "hidden" and "exposed" terminal problems and therefore extend on explicit contention management. Interestingly after 15 years from the publication date of this paper most of the WLAN technologies use carrier-sense and do not use such RTS/CTS mechanisms even though it is part of the standards. So it was interesting for me to understand what where their assumptions and arguments back then and why this hasn't been really accepted and deployed widely up to now.

MACAW was designed for a single channel WLAN operating at 5GHz with 256Kbps link radios. The structure of the target WLAN was a nano-cellular structure with small cell sizes (6m diameter) and single base station per cell. The radios are quite short range 3~4 meters. The authors claim many assumptions about the indoor environment under investigation which sound quite strange knowing what we know about indoor environments today:

Attenuation was considered a strong function of distance.
The communication links were considered symmetric.
The communication range was modeled as a binary in-range/out-of-range model and the other effects was ignored.

I am sure they had strong observations supporting these assumptions but it would have been interesting if authors presented the supporting experiments/analysis as they are key to the rest of the results of the paper (these assumptions are generally considered to not hold in today WLANs but that could be a result of longer range wider bandwidth links!?). Based on these assumption authors extend the RTS-CTS-DATA scheme of MACA protocol to RTS-CTS-DS-DATA-ACK scheme of MACAW.

The addition of ACK ensures the unreliable link doesn't wastefully reduce the TCP performance and this can be considered as one the most significant contributions of this paper. Another interesting result of this paper is that in order to have fair bandwidth allocation, contention estimations should be done collectively and individual approach will fail to reach this purpose. Furthermore a Multiplicative Increase Additive Decrease (MILD) scheme is proposed instead of the Binary Exponential Backoff (BEP) for adjusting the backoff parameters and shown to achieve better and fairer throughput when used along the collective contention estimation mechanisms.

The authors mainly compare their algorithm with MACA showing that the extra overhead is negligible compared to the gains they achieve and they mention that they achieve 78% of the link capacity. It is important to note that these calculation were done for 512 byte packets and if one considers smaller packets such as 802.15.4 128byte packets you will end up with almost 50% channel capacity! Never the less if the target system was designed only for large packets sizes this is a valid claim.

In addition the authors do not compare their scheme fairly with a CSMA/CA scheme. By discarding the carrier-sense method they fail to investigate the cases in which CSMA can achieve much better performance than explicit contention management. The overhead corresponding to MACAW like schemes is not negligible in general. Along this paper it looked like the authors need to introduce more and more control packets which adds to the overhead and complexity of this scheme. I believe the complexity in addition to the large overhead for small packets has been the main road blocks for large deployments of such schemes in WLANs today.

Monday, September 28, 2009

Understanding TCP Incast Throughput Collapse in Datacenter Networks

Y. Chen, R. Griffith, J. Liu, A. D. Joseph, and R. H. Katz. Understanding TCP incast throughput collapse in datacenter networks. In Proc. Workshop: Research on Enterprise Networking, Barcelona, Spain, Aug. 2009.

This work is in continuation and complementary to the previous work by CMU researchers. As I read the CMU paper it looked like the Incast problem is almost solved or at least it is obvious what should be direction taken by research community in order to solve it. The main contribution of the Berkeley Incast paper to me was showing that the problem is much deeper than what was portrayed by CMU paper and requires deeper understanding of the underlying phenomenons causing TCP collapse.

In this project the authors perform experimental tests in order to reproduce the earlier published results and further try to come up with analytical models which would justify the underlying factors contributing to Incast behavior. The experimental results demonstrate three distinct regions of (1)initial goodput collapse (2)goodput increase (3) second goodput collapse. It is interesting to know wethere there is any other distinct region if one experiments with much higher number of servers. The experiments in this paper are limited to at most 50 servers and simulations are not considered reliable from the point of view of the authors.

The results and explanations authors in this paper come up with are quite contradictory to CMU paper. For example the best performing strategy they observe is low resolution timer of 1ms RTO with delayed Acks! This brings in the question of how much of the results of these two papers is environment/network dependent and how much of it can be applied as a general solution. The analytical models proposed by authors sounded very interesting but seems to have place for improvement and further investigations. In general I learned a lot from this paper but I felt some of the ideas could be better studied in isolation from each other. The methodology and line of thinking was presented well and was educational.

Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. Ganger, G. A. Gibson, and B. Mueller. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication. In SIGCOMM ’09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, 2009.

The regular values of TCP retransmission timeouts (RTOmin>~200ms) causes serious performance degradation for high-bandwidth, low latency (10-100s of microseconds for RTT) network environments such as Data Centers. The so called TCP incase collapse happens when clients are involved in barrier-synchronized requests with small data request size. The authors propose a solution based on Linux high resolution timers (htimers) which enables microsecond resolution in estimating RTT and RTO and significantly improves the performance under incast environments.

The authors first clearly show through simulations, implementation and traces gathered from a larger scale storage node that microsecond granularity is needed for future faster Data Centers for the incast problem to be solved. They emphasize that the RTO estimations should be done at the same timescale as the RTTs and minimum bounds on this value will result to reduced performance. I found it very interesting that with minor modifications to TCP stack and using currently available Generic Time of Day (GTOD) high resolution timer authors demonstrate the feasibility of achieving these precisions.

The main hazards of removing the RTOmin is spurious retransmissions and faulty interactions with clients using delayed acknowledgments of for example 40ms causing non-necessary timeouts. In two experimental setups the authors try to evaluate these possible drawbacks showing that in their specific experimental setup removing RTOmin has no performance effect when used in Wide Area Networks and furthermore limited but noticeable effect when delayed acks are not disabled.

It would be interesting to discuss weather these experiments are enough to understand the full range of negative effects the aggressiveness of fine-grained timing calculations can bring in? For example if a noticeable percentage of Wide Area networks servers change to these kind fine-grained strategies what happens in terms of fairness and resource usage within the routers?

I vote for keeping this paper in the syllabus since it provides a clear overview of TCP incast problem and the proposed solution.

Friday, September 18, 2009

Portland: A scalable fault-tolerant layer 2 data center network fabric

R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, and V. Subram, "Portland: A scalable fault-tolerant layer 2 data center network fabric - ccr online," pp. 39-50, August 2009.

PortLand is a new proposed architecture for a scalable fault-tolerant data center fabric with plug-and-play like deployment of the switches. PortLand main distinction compared to other recent proposals is the exploitation of current dominant data center topology being the multi-rooted tree structure. Some of the requirements that authors envision for a good data center fabric are:

Seamless VM migration
No need for manual configuration by administrators
Efficient communication to any other end-host within DC along any available physical path
No forwarding loops
Rapid and efficient failure detection

Current L2 architectures faile to scale while adding L3 mechanisms introduces significant management overhead lacking plug-and-play and seamless migration characteristics. PortLand builds on top of internal location encoded MAC addresses (Pseduo MAC, PMAC) along with needed translations at the edge routers. These internal addresses enable efficient loop free routing/forwarding within the data center. The following are the main elements of the PortLand architecture:

Logically centralized topology aware Fabric Manager (FM) responsible for assisting with:
- ARP resolution
- Fault tolerance
- Multicast
Positional Pseudo MACs (PMACS)
- Location encoded
- Basis for efficient routing/forwarding
- Basis for seamless VM migration
- Separates host identity from its location transparent to end-host
Proxy-based ARP
- Utilizing Fabric Manager for ARP resolution
- Falling back on efficient broadcast methods upon failure
Distributed location discovery by Location Discovery Protocol (LDP)
- Periodic Location Discovery Message (LDM) on all switch ports
- Distributed location discovery without manual configuration through exploiting the knowledge of the baseline topology
Loop free forwarding

Multicast support through a single core switch along with Fabric Manager involvement
Loop free: once a packet travels down it never travels up again!

PortLand architecture achieves better scalability and performance metrics over the constrained multi-rooted architecture compared to TRILL and SEATTLE. The designers of SEATTLE architecture had assumed that every end-host only communicates with few other end-hosts which PortLand designers fairly reject this assumption (e.g. consider search or MapReduce). Compared to TRILL and SEATTLE which use flat addresses PortLand location encoded PMAC strategy has significant advantages.

The authors have clearly explained the elements of their architecture and proved its first order promises by simulation and implementation on a small scale network. I did not understand the advantage of using a distributed location discovery versus using the Fabric Manager to explicit ly send locations to switches followed by further verifications by each switch. Even though the authors claim and show FM scalability is not a big issue I wonder if semi-centralized approaches can achieve better scalability characteristics and function as well. In general very educational and interesting work and I vote for keeping it in the syllabus.

Thursday, September 17, 2009

Detailed Diagnosis in Enterprise Networks

S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, and P. Bahl, "Detailed diagnosis in enterprise networks," in SIGCOMM '09, New York, NY, USA: ACM, 2009, pp. 243-254.

This is a very recent work in the area of fault diagnosis in Enterprise networks. The current diagnosis tools are either concerned with failures at the machine level and not application/process level or require extensive knowledge of the applications which makes the problem intractable. NetMedic enables detailed diagnosis at the process level with little application specific knowledge by framing the problem as an inference problem and further estimating the statistical failure relationship of entities from the past observations.

In order to have a better inference performance, a rich set of variables is used for the process state instead of

Publish Post

a single health state variable. The goal is to obtain a detailed diagnosis while system is application agnostic. The process NetMedic takes is to detecting abnormality, computing the edge weight which indicating the historical correlation of failures and further ranking the likely causes of failure. The experiment authors performed showed the correct source to be always one of the top 5 ranked sources and 80% of the time to be the top ranked one.

NetMedic is an interesting application of statistical inference to real networks which can be quite useful if scalability is further resolved.

Wednesday, September 16, 2009

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

C. Kim, M. Caesar, and J. Rexford. Floodless in SEATTLE: a scalable ethernet architecture for large enterprises. In SIGCOMM, 2008.

This paper introduces SEATTLE which is an advanced scalable layer-2 network architecture with the plug-and-play and manageability of Ethernet and the scalability of IP. The Ethernet is not an scalable structure; it uses flooding for locating end-hosts, uses spanning-tree basesd routing and relies on broadcast for bootstarping protocols (ARP and DHCP). The usage of IP on top of Ethernet in order to emulate the extension of layer2 hasn't been really working well since IP addresses are hierarchical and topology dependent. A time-varying topology requires continuous monitoring and modification of routers and subnetting configurations. Non-mobile IP does not support seamless mobility!

The SEATTLE architecture utilizes the difference in the timescales of dynamics of the system. Switches are quite static and much fewer in number rather than end-hosts. Therefore SEATTLE uses a link-state protocol to enable shortest path routing between switches. For address resolution and location resolution a one hope DHT is used to provide the directory service. Furthermore observing that end-hosts usually connect to few others only, authors claim that caching is a good strategy in order to keep the routing tables limited in size. The results are verified through simulation and limited deployment.

I found this paper to be quite an interesting one with full description of the methods they have used to eliminate the causes of non-scalability of Ethernet. The idea of using DHT is quite interesting even though it is not obvious what can happen if one of the switches are hacked! it seems like there is too much trust given to possibly more than hundreds of routers involved in this architecture. It will also be interesting to discuss the similarities and differences between the mobile IP approach to seamless transition versus and SEATTLE approach.

Friday, September 11, 2009

Congestion Control for High Bandwidth-Delay Product Networks

D. Katabi, M. Handley, and C. Rohrs, “Congestion control for high bandwidth-delay product networks,” presented at the ACM SIGCOMM 2002, Pittsburgh, PA.

eXplicit Control Protocol (XCP) tries to address low performance of TCP over networks with high bandwidth-delay products. XCP builds on top of several earlier proposals such as Explicit Congestion Notification (ECN) and CSFQ to improve the link utilization and fairness in such networks. One idea used in this scheme is that implicit signaling is a poor signaling mechanism and with adding more bits to the feedback signal, XCP should be able to outperform TCP. Explicit signaling also has other advantages such as potentially easier policing mechanisms and therefore better security.

XCP architecture is constructed from XCP sender, receiver and router. In order to keep the per flow processing minimal at the routers, XCP sender attaches the window size and estimated rtt per flow to the packet which amazingly simplifies the per packet calculations needed at the router in order to achieve fairness per flow and high utilization.

One of the most interesting aspects of this architecture, is the separation of Efficiency Controller (EC) and Fairness Controller (FC). Linear fair queuing algorithms need to do AIMD in order to converge to both efficiency and fairness. The increase need to be linear or otherwise the algorithm wouldn't converge to fairness. By separating the EC and FC, EC can do MIMD and therefore converge much faster to efficient link usage while FC is still doing AIMD separately. This also enables the FC to implement other notions of fairness such as shadow pricing independent of EC.

The paper is fun to read and interesting. There were several places where I wasn't convinced enough with the arguments made for choosing parameters. For example the control window was chosen to be equal to average RTT time by following argument:

Estimating parameters over intervals longer than the average RTT leads to sluggish response, while estimating parameters over shorter intervals leads to erroneous estimates.

It is not obvious why 1 average RTT is a better choice than 0.9 average RTT or 2 average RTT or if this makes any difference. As a matter of fact around 50% of effects of one control decision hasn't been observed yet after one average RTT.

Even though simulations showed performance improvement over TCP/AQM schemes with certain simplistic topologies, without extensive implementation it is not easy to truely compare the schemes and understand the drawbacks of XCP approach. It is interesting to see if XCP has been further employed or tested over the last 7 years and if it has been widely deployed!

Random Early Detection Gateways for Congestion Avoidance

S. Floyd and V. Jacobson, “Random early detection gateways for congestion avoidance,” IEEE/ACM Trans. Networking, vol. 1, pp. 397–413, Aug. 1993.

By 1993, the main gateway queueing methods were Drop Tail (FIFO w. limited buffer) and Random Drop algorithms. As shown by several researchers by that time, Drop Tail gateways are biased against bursty flows and will drop their packets disproportionately to their share of band-width. Furthermore they will cause global synchronization among source congestion control algorithms in which will cause degraded utilization. Even though Random Drop algorithms with sufficiently large buffers should alleviate the bias against bursty traffic to some degree (even though synchronization problem still remains) large buffer size and therefore average queue size, translates to large average delays in the queues which is unpleasant in high speed networks. In order to avoid large buffer sizes and global synchronization, there had been several studies of Early Random Drop gateways in which above a certain threshold the algorithm would drop the packets with fixed probabilities.

The main contribution of this paper is an improvement over Early Random Drop gateways called Random Early Drop (RED) gateway algorithm in which the drop probability is dynamically changed in between two threshold in order to better remove the bias against bursty traffics and furthermore remove global synchronization. This algorithm does not aim fair allocation of resources and does not include any per flow state/processing.

Even though I found the RED algorithm an interesting improvement, I did not found the comparisons between different schemes well designed. For example, in previous work section, the authors mention a version of Early Random Drop gateway algorithms given by Zhang[36] in which a misbehaving user was able to achieve 73% higher throughput than regular TCP users. I did not find any evidence that RED algorithm will have any better performance against misbehaving users when there is no extra identification mechanisms in which could be applied to the Zhang algorithm as well.

Furthermore, there had been too much emphasis in comparing RED gateways with Drop Tail gateways! I would assume that Random Drop gateways had already made a first order improvement over Drop Tail gateways and the true competitor of RED is either Random Drop gateways, Early Random Drop schemes or DECbit. For example, the authors could have shown that for similar traffic burstiness and throughput share, Random Drop gateway will require a larger average queue size and therefore larger delay. In general I didn't found the simulation results precise enough; for example figure 13, 14 and 15 are given side-by-side while x-axis is different in figure 15 and can be quite misleading. Furthermore, it is not obvious for larger average queue sizes how the figure 15 would look like!

Wednesday, September 9, 2009

Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Alocations in High Speed Networks

Stoica, I., Shenker, S. and Zhang, H., “Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks”, Proceedings of ACM SIGCOMM’98.

By 1998, even though the need for fair resource allocation within the core routers of the networks was very much accepted, the proposed algorithms such as Deficit Round Robin (DRR) were still to complex and costly to be used in high speed core routers (it is interesting to know 10 years later how wide spread is their usage!?).

Core-Stateless Fair Queueing (CFSQ) is a proposal for the architecture for the islands of routers (possibly at the heart of ISPs) which should be able to allocate resources almost fairly. The attempt of the CFSQ architecture is to keep the high speed highly loaded core routers as simple and stateless as possible while the edge routers (working at much lower line rates) carry most of the burden of per flow management and control. Edge routers estimate the arrival rates of the flows and label them with corresponding rates. At core routers, if the fair share of a flow is below its estimated rate, the packet is dropped probabilistically (the core router is basically thinning the random process). The interesting fact is that core routers only need a single label attached to each packet and there is no per flow state at any router (there is much more detail to this which you can find them in the paper).

The simulation results show that CSFQ outperformed regular non-fair schemes (e.g. FIFO and RED) drastically while working better or close to FRED and below DRR. Just by reading the description of architecture I had assumed that CSFQ should perform much closer to DRR (the precise fair queueing algorithm with much implementation complexity) but interestingly that is not the case and CSFQ perform below DRR and not too close. The authors also mention that they are still not aware of the reason behind this difference. Can the inaccuracy in the estimations cause such a difference? Disregard of this observation, the result is brilliant since it enables almost fair allocation with much less complexity and therefore cost.

It is also interesting to think about the scope of this architecture and what can be considered as an island of routers? Will you have multiple islands within an ISP, one island within an ISP or expand it to multiple ISPs and possibly the whole internet? Trust will be crucial in drawing the boundaries of these islands in addition to obvious policy and economical reasons.

Very interesting work and paper!

Analysis and Simulation of a Fair Queueing Algorithm

A. Demers, S. Keshave, and S. Shenker, “Analysis and simulations of a fair queueing algorithm,” in Proc. ACM SIGCOMM‘89, Austin, TX, 1989, pp. 1–12.

As it is not possible to ensure an ethical and fair usage of network resources by only trusting the end nodes, the network gateways need to support fair resource allocation when it is desired. In other words, a fair and proper congestion avoidance algorithm needs the involvement of network and end users simultaneously and otherwise it is not possible.

The authors of this paper, propose a modification to the first fair queuing algorithm proposed by Nagle (1987) which further satisfies the design requirements they had envisioned for any such fair scheme (fair bandwidth, fair buffer space and continuity of service). Nagle proposed servicing distinct queues (one queue per flow) by a round-robin scheduler per packet basis which fails to allocate BW fairly when packet sizes can be different. The modified FQ algorithm tries to emulate a bit-by-bit round-robin scheduler (BR) by computing the end service time of each packet as it arrives similar to BR but servicing them per packet basis instead of per bit basis (servicing the packet with the earliest service time first).

It is interesting to know that in many of today's high speed routers, packets are divided into equal size cells at the input, processed internally and then reassembled at the output. This might mean that enforcing fairness and priorities over variable packet sizes is still hard to implement after 20 years. Even though the papers result is nice, the single size cell approach is still the dominant method in high speed routers. It would be interesting to better understand why after 20 years (from the publication of this paper) still it is preferred to work with single size packets?

Another very interesting point in this paper is that the proposed FQ algorithm discourages the ill-behaved sources by dropping their packets while counting them towards their BW usage! I find this very interesting and effective from game theoretic point of view. In overall I found this paper to be a great mixture of intuition, analysis, simulation and effective results and therefore I would suggest it to be kept in the syllabus.

Saturday, September 5, 2009

CONGESTION AVOIDANCE AND CONTROL

V. Jacobson, "Congestion avoidance and control," in Proc. SIGCOMM symp. Communications Architecture and Protocols, 1988, pp. 314-329.

This paper is mainly about a group of proposed solutions to avoid congestion collapse as first seen in 1986 in LBL/UCB networks. The authors mention a group of 7 new algorithms which together they claim improves the robustness of networks against congestion collapse considerably and describe 5 of them somewhat in detail. The key goal of these algorithms is to make sure the network reaches its equilibrium state and the "conversation of packets" principle is followed when it is in equilibrium. This principle states that a new packet should not be inserted to the network unless one is first removed under stable conditions.

The slow-start algorithm which is used in today's TCP is initially proposed in this paper. As well as algorithms for load dependent estimation of round-trip timing which performs much better than the old load agnostic version. The variability of RTT increases as load is increased and ignoring this fact can cause faulty retransmission of packets. The congestion avoidance algorithm proposed in this paper is again the AIMD as read in our last paper, while the designers had chosen the additive increase part heuristically. In the last paper we saw a more rigorous justification of why AIMD is a better candidate than other linear CA algorithms.

As seen in figures 8 and 9 different end-to-end users with slightly different TCP implementations can experience a very different connection quality from the network. The consequences of this phenomena specially in terms of fairness might be interesting to discuss. Furthermore interestingly the authors of this paper predicted one of the main problems TCP has in today's internet architecture; being the confusion of the binary feedback signal for example in wireless (or other unreliable link level) connections. The TCP thinks packets are failing because of congestion and it reduces its window size while it should absolutely not. This paper is very interesting since it involves the reader with the initial proposed solutions to congestion collapse problem which many of them are still existing in current stack.

ANALYSIS OF THE INCREASE AND DECREASE ALGORITHMS FOR CONGESTION AVOIDANCE IN COMPUTER NETWORKS

D. Chiu and R. Jain, "Analysis of the increase and decrease algorithms for congestion avoidance in computer networks," Comput. Networks ISDN Syst., vol. 17, pp. 1-14, 1989.

This paper is mainly concerned with analytic comparison of linear congestion avoidance control algorithms over a synchronous binary feedback model. In other words, there are n identical end-hosts trying to share a link and the system provides them with a single bit feedback indicating overload/underload status of the link at each time stamp. Even though the assumptions of the model such as synchronicity of end-users may not hold in many scenarios, this result can be considered to be the basis of current TCP congestion control/avoidance algorithm.

To be able to have a solid comparison of algorithms, one needs to define the desired performance metrics. The authors choose efficiency (closeness of l0ad to "knee"), fairness, distributedness and convergence time (responsiveness and smoothness) as the key criteria. The paper (both analytically and pictorially) proves that within the class of linear CA algorithms, the decrease needs to be only multiplicative while increase needs to be additive with possible multiplicative part. They further show that Within this class of algorithms the Additive Increase Multiplicative Decrease (AIMD) has the fastest convergence time and therefore is the most optimal algorithm within this model and set of criteria.

The authors argue against the non-linear control algorithms as they are concerned with sensitivity of the system to parameters. It is interesting that today's TCP congestion control algorithm is a mixture of linear (congestion avoidance) and non-linear (slowstart, fast retransmit, fast recovery, ...) parts and therefore is not linear. If one only considers the result of this paper he/she can not conclude weather further investigations of the non-linear algorithms is meaningful or not!

I found this paper quite interesting since it elegantly shows the advantages of AIMD congestion avoidance algorithms within its model. It would still be interesting to analyze and discuss the model and criteria which was used by the authors. For example the result which was proved for synchronous models, how much it will hold for asynchronous models? Furthermore, distributedness was considered as a required criteria of CA algorithms. Is it correct to discard all non-distributed algorithms at once with the excuse that they will be hard to implement? Can there be semi-coordinated algorithms which have much better performances and can possibly worth the extra coordination efforts?

Wednesday, September 2, 2009

UNDERSTANDING BGP MISCONFIGURATION

The goal of this project has been to identify and quantify how well the current wide-area routing (BGP) is performing and what are its main weaknesses. Although we might only hear about BGP misconfiguration when there is an internet-wide breakdown, the result of this paper shows that BGP misconfiguration is amazingly pervasive. For example most of (at least 75%) of the new route announcements in internet is a result of BGP misconfiguration and 0.2-1.0% of the BGP table is corrupted! The surprising result is that all these misconfigurations have little effect on connectivity and reachability!

To achieve these results the authors target only two kind of unintended misconfigurations being the accidental insertion of routes (origin misconfiguration) and accidental propagation of routes which should have been filtered (export misconfiguration). To monitor and investigate BGP misconfigurations, they flag the short lived route announcements with further use emails to investigate the possible causes of the problem. The authors identify several main reasons of misconfigurations and propose certain improvements at for example user interface design methodologies of BGP system.

INTERDOMAIN INTERNET ROUTING

This lecture by Harry Balakrishnan covers a high level overview of Interdomain Routing in today's internet. Internet is constructed of many Autonomous Systems (ASes) which can be quite different in their internal structure and routing protocol. Each AS is usually owned by a different company and is in competition with other ASes (also referred to as ISPs). What enables the global connectivity over internet is the routing protocol in between ISPs. Current wide-area routing protocol in use is called Border Gateway Protocol (BGP, Version4).

BGP is interestingly different than other routing protocols since its primary goal is global reachability between ASes with special focus on scalability. BGP is centered around following the policies which are primarily affected by the relationship between ASes and ultimately profits. In a usual Interior Gateway Protocol (IGP) the main concern is minimizing some kind of path metrics.

The paper gives a good overview of the process in which different ASes choose to advertise (and therefore claim availability of usage) routes to other ASes and how they absorb the routes advertised by others into their internal structures. The author also discusses hijacking routes by false advertisements, convergence speed and multi-homing as some of the main challenges of today's wide-area routing.

I found the provider-transit vs. peer-peer forms of relationship among ASes quite interesting. Why would peers which are carrying possibly more than 2 times of the traffic possibly worth millions of dollars make such agreement? (well I'm sure there are good reasons for it!) Is the required billing infrastructure too expensive to justify such approach? Furthermore, why ISPs can not have mixed relationships? e.g. ISP A which has two other peers B and C, can act as a provider of B for the traffic targeted to C and act as a peer for the traffic targeted to A and its transits? or maybe this capability already exists and is in use.

At the end I found the lecture a well written overview of BGP protocol.

Monday, August 31, 2009

THE DESIGN PHILOSOPHY OF THE DARPA INTERNET PROTOCOLS

D.D. Clark published this paper in 1988 almost fifteen years after the first TCP/IP proposal which was developed by DARPA. It is interesting to understand why certain directions were taken by the original designers and what was the evolutionary road internet took during those fifteen years.

The DARPA internet project was launched in order to effectively interconnect two distinct networks being ARPANET and ARPA packet radio network and this effective connection was the fundamental goal of the project. Being able to effectively connect networks can be seen as a genetic mutation in the networks evolutionary pattern which enabled much more scalable and complex networks to be built compared to networks which are designed and upgraded uniformly!

Two of the original approaches of the internet, being datagram and packet switching , were already existing in the networks under investigation and were the default candidates. Furthermore they had shown by that time (and continued to show in the future) to be valid choices compared to their competitors such as virtual networks paradigm.

The author also lists secondary and lower tier goals across his paper. Since these networks were being originally designed for military applications, reliability and robustness to failures was put at the top of the secondary list while the accountability was put at the end of secondary list. Noting that this paper was written in 1988, the author claims that if internet was being designed for a commercial application these two choices would needed to be swapped! I find this point of view interesting since it suggest that in 1988 the importance of reliability was not as evident in commercial applications. Today's commercial applications count on the reliabilty of the internet and an unreliable internet can have a destructive effect on many commercial applications today. I would assume if Clark had written this paper in 2009 he would have only ranked accountability higher and keep the reliability at the top two in the secondary list.

It is also notable the couple of places that Clark implicitly points to the usage of end-to-end argument as described by the first paper we have read. One being the "fate-sharing" story in which by keeping the state information at the end-entities rather than network and the other one being the separation of services from the datagram facility (TCP and IP layers).

I would also like to point out that throughout this paper it does not look like scalability is considered to be an important issue. This is likely due to the year this paper was published and might suggest that the number of networks was still not too big. For example the suggestion at the end of the paper about the "soft state" does not consider the possible scalability problems due to such structure.

I would very much like this paper to be kept in the syllabus since it makes students think about the evolutionary path of internet and helps them obtain a more fundamental understanding of it.

Friday, August 28, 2009

END-TO-END ARGUMENTS IN SYSTEM DESIGN

In this paper, Saltzer et. al. present a design principle called “end-to-end argument” which should help designers with the placement of functionalities over a distributed computer system. Usually when the computer involves some kind of communication, a modular boundary is drawn between the part which is responsible for communication (called the communication subsystem in this paper) and the rest of system. The question is: when it is possible to implement a functionality at different levels what should be the choice of designer? One can choose to implement the target functionality at either one of the following ways:

1) by the communication subsystem

2) by the end clients

3) as a joint venture

4) all of the above (redundantly)

In brief, the “end-to-end argument” states that when the function under investigation can only be correctly and completely implemented with the help and knowledge of the end-user application, the communication subsystem can not provide this functionality to the application as a feature.

This does not mean that communication subsystem features and helper functionalities are not important but rather restates that one should look at them from an engineering tradeoff point of view and in support of (and not in parallel with) application required functionalities. The implementation of a functionality at lower levels can become too costly for two reasons. First, the commonality of lower level subsystems to many applications makes other applications also pay for the functionality. Second, the lower levels have access to less information and therefore might not be able to do the job efficiently.

Furthermore, the authors of the paper go over a series of examples that the “end-to-end argument” applies and gives insight to the designers of such systems including: delivery guarantees, secure transmission of data (end-to-end encryption), duplicate message suppression, guaranteeing FIFO message delivery and transaction management.

The most important point of this paper in my point of view is that do not expect a system to accomplish what is not possible! The description of the “end-to-end argument” as given in this paper seems trivial and still very important. As given by several examples in the paper, it is easy to confuse the scope of the required functionality.

I still have a point of disagreement with the authors. The “end-to-end argument” as stated in the paper only applies when it is not possible for the communication subsystem to provide the exact target functionality. In that case it is obvious that the end-to-end application needs to implement the target functionality. But if it is possible that certain functionality be implemented in end clients or communication subsystem or a mixture of both, then we are entering the world of engineering tradeoffs and no unique answer can be given to all systems. On case by case basis, according to the class of target applications and functionalities and system sub-elements, one will try to find the architecture which works close to optimal in some sense!

For example the authors have an example of automatic recovery which goes as follows: “ In telephone exchanges, a failure that could cause a single call to be lost is considered not worth providing explicit recovery for, since the caller will probably replace the call if it matters” which claim is an example of end-to-end argument. I would like to argue that since it is possible to implement this functionality by communication-subsystem or end-users, the end-to-end argument as given above does not hold. Rather the authors are implicitly solving an optimization problem which minimizes certain cost function. If one was designing this system for very lazy kind of people, he/she would probably design it differently.

I believe this is a good paper to be kept in the syllabus since it points to a very complex and confusing topic being the function placement.