Arash@CS268: September 2009

Wednesday, September 30, 2009

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

The authors of this paper perform an in depth experimental study of different proposed solutions for improving TCP performance over lossy wireless links. TCP confuses packet losses which are due to bad link quality with congestion signals; therefore performing unnecessary congestion control procedures reducing the throughput significantly. The authors divide the proposed solutions for this problem to 3 different categories:

End-to-end protocols
Link-layer protocols
Split connection protocols

The tests are done over a single hop wireless link connected to either single hop wired (LAN topology) or 16 hop wired (WAN topology) connection. Most of the experiments are done with fixed 1/64KByte error rate which is artificially injected at the receiver. Later on the better performing representatives of each category is compared together while varying the error rate from 1/16KByte to 1/256KByte.

Out of all the different schemes analyzed in this paper, LL-SMART-TCP-AWARE performs the best closely followed by LL-TCP-AWARE. The LL stands for Link Layer reliability/retransmissions which by itself can improve the TCP performance almost by 2X over raw TCP Reno, but since it doesn't guarantee reordering it performs better when accompanied by an scheme which suppresses the unnecessary dupacks (and therefore called TCP aware). The addition of some kind of selective Acks (SMART) will further improve the performance of this scheme but the gain is not as much. Some of the schemes in other categories such as End-to-End with Explicit Loss Notification (ELN), which solves the TCP confusion, also have very impressive gains over TCP Reno but are not in general as effective as reliable link methods.

In general it is very hard to criticize an ACM best paper award winning paper (especially when one of the authors is your professor)! The paper gives a good overview of the different possible solutions and make fair solid experimental comparisons among them along drawing conclusions and I vote for keeping this paper in the syllabus. As a possible short discussion topic I was hoping to understand why the errors where injected manually over an almost reliable link rather than actually making the link less reliable possibly by increase in noise/interference?

Tuesday, September 29, 2009

MACAW: A Media Access Protocol for Wireless LAN's

MACAW is an extension/modification to MACA, Media Access Collision Avoidance protocol earlier proposed by other researchers. The basis of these protocols is that the contention is at the receiver and not the transmitter pointing to the familiar "hidden" and "exposed" terminal problems and therefore extend on explicit contention management. Interestingly after 15 years from the publication date of this paper most of the WLAN technologies use carrier-sense and do not use such RTS/CTS mechanisms even though it is part of the standards. So it was interesting for me to understand what where their assumptions and arguments back then and why this hasn't been really accepted and deployed widely up to now.

MACAW was designed for a single channel WLAN operating at 5GHz with 256Kbps link radios. The structure of the target WLAN was a nano-cellular structure with small cell sizes (6m diameter) and single base station per cell. The radios are quite short range 3~4 meters. The authors claim many assumptions about the indoor environment under investigation which sound quite strange knowing what we know about indoor environments today:

Attenuation was considered a strong function of distance.
The communication links were considered symmetric.
The communication range was modeled as a binary in-range/out-of-range model and the other effects was ignored.

I am sure they had strong observations supporting these assumptions but it would have been interesting if authors presented the supporting experiments/analysis as they are key to the rest of the results of the paper (these assumptions are generally considered to not hold in today WLANs but that could be a result of longer range wider bandwidth links!?). Based on these assumption authors extend the RTS-CTS-DATA scheme of MACA protocol to RTS-CTS-DS-DATA-ACK scheme of MACAW.

The addition of ACK ensures the unreliable link doesn't wastefully reduce the TCP performance and this can be considered as one the most significant contributions of this paper. Another interesting result of this paper is that in order to have fair bandwidth allocation, contention estimations should be done collectively and individual approach will fail to reach this purpose. Furthermore a Multiplicative Increase Additive Decrease (MILD) scheme is proposed instead of the Binary Exponential Backoff (BEP) for adjusting the backoff parameters and shown to achieve better and fairer throughput when used along the collective contention estimation mechanisms.

The authors mainly compare their algorithm with MACA showing that the extra overhead is negligible compared to the gains they achieve and they mention that they achieve 78% of the link capacity. It is important to note that these calculation were done for 512 byte packets and if one considers smaller packets such as 802.15.4 128byte packets you will end up with almost 50% channel capacity! Never the less if the target system was designed only for large packets sizes this is a valid claim.

In addition the authors do not compare their scheme fairly with a CSMA/CA scheme. By discarding the carrier-sense method they fail to investigate the cases in which CSMA can achieve much better performance than explicit contention management. The overhead corresponding to MACAW like schemes is not negligible in general. Along this paper it looked like the authors need to introduce more and more control packets which adds to the overhead and complexity of this scheme. I believe the complexity in addition to the large overhead for small packets has been the main road blocks for large deployments of such schemes in WLANs today.

Monday, September 28, 2009

Understanding TCP Incast Throughput Collapse in Datacenter Networks

Y. Chen, R. Griffith, J. Liu, A. D. Joseph, and R. H. Katz. Understanding TCP incast throughput collapse in datacenter networks. In Proc. Workshop: Research on Enterprise Networking, Barcelona, Spain, Aug. 2009.

This work is in continuation and complementary to the previous work by CMU researchers. As I read the CMU paper it looked like the Incast problem is almost solved or at least it is obvious what should be direction taken by research community in order to solve it. The main contribution of the Berkeley Incast paper to me was showing that the problem is much deeper than what was portrayed by CMU paper and requires deeper understanding of the underlying phenomenons causing TCP collapse.

In this project the authors perform experimental tests in order to reproduce the earlier published results and further try to come up with analytical models which would justify the underlying factors contributing to Incast behavior. The experimental results demonstrate three distinct regions of (1)initial goodput collapse (2)goodput increase (3) second goodput collapse. It is interesting to know wethere there is any other distinct region if one experiments with much higher number of servers. The experiments in this paper are limited to at most 50 servers and simulations are not considered reliable from the point of view of the authors.

The results and explanations authors in this paper come up with are quite contradictory to CMU paper. For example the best performing strategy they observe is low resolution timer of 1ms RTO with delayed Acks! This brings in the question of how much of the results of these two papers is environment/network dependent and how much of it can be applied as a general solution. The analytical models proposed by authors sounded very interesting but seems to have place for improvement and further investigations. In general I learned a lot from this paper but I felt some of the ideas could be better studied in isolation from each other. The methodology and line of thinking was presented well and was educational.

Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. Ganger, G. A. Gibson, and B. Mueller. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication. In SIGCOMM ’09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, 2009.

The regular values of TCP retransmission timeouts (RTOmin>~200ms) causes serious performance degradation for high-bandwidth, low latency (10-100s of microseconds for RTT) network environments such as Data Centers. The so called TCP incase collapse happens when clients are involved in barrier-synchronized requests with small data request size. The authors propose a solution based on Linux high resolution timers (htimers) which enables microsecond resolution in estimating RTT and RTO and significantly improves the performance under incast environments.

The authors first clearly show through simulations, implementation and traces gathered from a larger scale storage node that microsecond granularity is needed for future faster Data Centers for the incast problem to be solved. They emphasize that the RTO estimations should be done at the same timescale as the RTTs and minimum bounds on this value will result to reduced performance. I found it very interesting that with minor modifications to TCP stack and using currently available Generic Time of Day (GTOD) high resolution timer authors demonstrate the feasibility of achieving these precisions.

The main hazards of removing the RTOmin is spurious retransmissions and faulty interactions with clients using delayed acknowledgments of for example 40ms causing non-necessary timeouts. In two experimental setups the authors try to evaluate these possible drawbacks showing that in their specific experimental setup removing RTOmin has no performance effect when used in Wide Area Networks and furthermore limited but noticeable effect when delayed acks are not disabled.

It would be interesting to discuss weather these experiments are enough to understand the full range of negative effects the aggressiveness of fine-grained timing calculations can bring in? For example if a noticeable percentage of Wide Area networks servers change to these kind fine-grained strategies what happens in terms of fairness and resource usage within the routers?

I vote for keeping this paper in the syllabus since it provides a clear overview of TCP incast problem and the proposed solution.

Friday, September 18, 2009

Portland: A scalable fault-tolerant layer 2 data center network fabric

R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, and V. Subram, "Portland: A scalable fault-tolerant layer 2 data center network fabric - ccr online," pp. 39-50, August 2009.

PortLand is a new proposed architecture for a scalable fault-tolerant data center fabric with plug-and-play like deployment of the switches. PortLand main distinction compared to other recent proposals is the exploitation of current dominant data center topology being the multi-rooted tree structure. Some of the requirements that authors envision for a good data center fabric are:

Seamless VM migration
No need for manual configuration by administrators
Efficient communication to any other end-host within DC along any available physical path
No forwarding loops
Rapid and efficient failure detection

Current L2 architectures faile to scale while adding L3 mechanisms introduces significant management overhead lacking plug-and-play and seamless migration characteristics. PortLand builds on top of internal location encoded MAC addresses (Pseduo MAC, PMAC) along with needed translations at the edge routers. These internal addresses enable efficient loop free routing/forwarding within the data center. The following are the main elements of the PortLand architecture:

Logically centralized topology aware Fabric Manager (FM) responsible for assisting with:
- ARP resolution
- Fault tolerance
- Multicast
Positional Pseudo MACs (PMACS)
- Location encoded
- Basis for efficient routing/forwarding
- Basis for seamless VM migration
- Separates host identity from its location transparent to end-host
Proxy-based ARP
- Utilizing Fabric Manager for ARP resolution
- Falling back on efficient broadcast methods upon failure
Distributed location discovery by Location Discovery Protocol (LDP)
- Periodic Location Discovery Message (LDM) on all switch ports
- Distributed location discovery without manual configuration through exploiting the knowledge of the baseline topology
Loop free forwarding

Multicast support through a single core switch along with Fabric Manager involvement
Loop free: once a packet travels down it never travels up again!

PortLand architecture achieves better scalability and performance metrics over the constrained multi-rooted architecture compared to TRILL and SEATTLE. The designers of SEATTLE architecture had assumed that every end-host only communicates with few other end-hosts which PortLand designers fairly reject this assumption (e.g. consider search or MapReduce). Compared to TRILL and SEATTLE which use flat addresses PortLand location encoded PMAC strategy has significant advantages.

The authors have clearly explained the elements of their architecture and proved its first order promises by simulation and implementation on a small scale network. I did not understand the advantage of using a distributed location discovery versus using the Fabric Manager to explicit ly send locations to switches followed by further verifications by each switch. Even though the authors claim and show FM scalability is not a big issue I wonder if semi-centralized approaches can achieve better scalability characteristics and function as well. In general very educational and interesting work and I vote for keeping it in the syllabus.

Thursday, September 17, 2009

Detailed Diagnosis in Enterprise Networks

S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, and P. Bahl, "Detailed diagnosis in enterprise networks," in SIGCOMM '09, New York, NY, USA: ACM, 2009, pp. 243-254.

This is a very recent work in the area of fault diagnosis in Enterprise networks. The current diagnosis tools are either concerned with failures at the machine level and not application/process level or require extensive knowledge of the applications which makes the problem intractable. NetMedic enables detailed diagnosis at the process level with little application specific knowledge by framing the problem as an inference problem and further estimating the statistical failure relationship of entities from the past observations.

In order to have a better inference performance, a rich set of variables is used for the process state instead of

Publish Post

a single health state variable. The goal is to obtain a detailed diagnosis while system is application agnostic. The process NetMedic takes is to detecting abnormality, computing the edge weight which indicating the historical correlation of failures and further ranking the likely causes of failure. The experiment authors performed showed the correct source to be always one of the top 5 ranked sources and 80% of the time to be the top ranked one.

NetMedic is an interesting application of statistical inference to real networks which can be quite useful if scalability is further resolved.

Wednesday, September 16, 2009

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

C. Kim, M. Caesar, and J. Rexford. Floodless in SEATTLE: a scalable ethernet architecture for large enterprises. In SIGCOMM, 2008.

This paper introduces SEATTLE which is an advanced scalable layer-2 network architecture with the plug-and-play and manageability of Ethernet and the scalability of IP. The Ethernet is not an scalable structure; it uses flooding for locating end-hosts, uses spanning-tree basesd routing and relies on broadcast for bootstarping protocols (ARP and DHCP). The usage of IP on top of Ethernet in order to emulate the extension of layer2 hasn't been really working well since IP addresses are hierarchical and topology dependent. A time-varying topology requires continuous monitoring and modification of routers and subnetting configurations. Non-mobile IP does not support seamless mobility!

The SEATTLE architecture utilizes the difference in the timescales of dynamics of the system. Switches are quite static and much fewer in number rather than end-hosts. Therefore SEATTLE uses a link-state protocol to enable shortest path routing between switches. For address resolution and location resolution a one hope DHT is used to provide the directory service. Furthermore observing that end-hosts usually connect to few others only, authors claim that caching is a good strategy in order to keep the routing tables limited in size. The results are verified through simulation and limited deployment.

I found this paper to be quite an interesting one with full description of the methods they have used to eliminate the causes of non-scalability of Ethernet. The idea of using DHT is quite interesting even though it is not obvious what can happen if one of the switches are hacked! it seems like there is too much trust given to possibly more than hundreds of routers involved in this architecture. It will also be interesting to discuss the similarities and differences between the mobile IP approach to seamless transition versus and SEATTLE approach.

Friday, September 11, 2009

Congestion Control for High Bandwidth-Delay Product Networks

D. Katabi, M. Handley, and C. Rohrs, “Congestion control for high bandwidth-delay product networks,” presented at the ACM SIGCOMM 2002, Pittsburgh, PA.

eXplicit Control Protocol (XCP) tries to address low performance of TCP over networks with high bandwidth-delay products. XCP builds on top of several earlier proposals such as Explicit Congestion Notification (ECN) and CSFQ to improve the link utilization and fairness in such networks. One idea used in this scheme is that implicit signaling is a poor signaling mechanism and with adding more bits to the feedback signal, XCP should be able to outperform TCP. Explicit signaling also has other advantages such as potentially easier policing mechanisms and therefore better security.

XCP architecture is constructed from XCP sender, receiver and router. In order to keep the per flow processing minimal at the routers, XCP sender attaches the window size and estimated rtt per flow to the packet which amazingly simplifies the per packet calculations needed at the router in order to achieve fairness per flow and high utilization.

One of the most interesting aspects of this architecture, is the separation of Efficiency Controller (EC) and Fairness Controller (FC). Linear fair queuing algorithms need to do AIMD in order to converge to both efficiency and fairness. The increase need to be linear or otherwise the algorithm wouldn't converge to fairness. By separating the EC and FC, EC can do MIMD and therefore converge much faster to efficient link usage while FC is still doing AIMD separately. This also enables the FC to implement other notions of fairness such as shadow pricing independent of EC.

The paper is fun to read and interesting. There were several places where I wasn't convinced enough with the arguments made for choosing parameters. For example the control window was chosen to be equal to average RTT time by following argument:

Estimating parameters over intervals longer than the average RTT leads to sluggish response, while estimating parameters over shorter intervals leads to erroneous estimates.

It is not obvious why 1 average RTT is a better choice than 0.9 average RTT or 2 average RTT or if this makes any difference. As a matter of fact around 50% of effects of one control decision hasn't been observed yet after one average RTT.

Even though simulations showed performance improvement over TCP/AQM schemes with certain simplistic topologies, without extensive implementation it is not easy to truely compare the schemes and understand the drawbacks of XCP approach. It is interesting to see if XCP has been further employed or tested over the last 7 years and if it has been widely deployed!

Random Early Detection Gateways for Congestion Avoidance

S. Floyd and V. Jacobson, “Random early detection gateways for congestion avoidance,” IEEE/ACM Trans. Networking, vol. 1, pp. 397–413, Aug. 1993.

By 1993, the main gateway queueing methods were Drop Tail (FIFO w. limited buffer) and Random Drop algorithms. As shown by several researchers by that time, Drop Tail gateways are biased against bursty flows and will drop their packets disproportionately to their share of band-width. Furthermore they will cause global synchronization among source congestion control algorithms in which will cause degraded utilization. Even though Random Drop algorithms with sufficiently large buffers should alleviate the bias against bursty traffic to some degree (even though synchronization problem still remains) large buffer size and therefore average queue size, translates to large average delays in the queues which is unpleasant in high speed networks. In order to avoid large buffer sizes and global synchronization, there had been several studies of Early Random Drop gateways in which above a certain threshold the algorithm would drop the packets with fixed probabilities.

The main contribution of this paper is an improvement over Early Random Drop gateways called Random Early Drop (RED) gateway algorithm in which the drop probability is dynamically changed in between two threshold in order to better remove the bias against bursty traffics and furthermore remove global synchronization. This algorithm does not aim fair allocation of resources and does not include any per flow state/processing.

Even though I found the RED algorithm an interesting improvement, I did not found the comparisons between different schemes well designed. For example, in previous work section, the authors mention a version of Early Random Drop gateway algorithms given by Zhang[36] in which a misbehaving user was able to achieve 73% higher throughput than regular TCP users. I did not find any evidence that RED algorithm will have any better performance against misbehaving users when there is no extra identification mechanisms in which could be applied to the Zhang algorithm as well.

Furthermore, there had been too much emphasis in comparing RED gateways with Drop Tail gateways! I would assume that Random Drop gateways had already made a first order improvement over Drop Tail gateways and the true competitor of RED is either Random Drop gateways, Early Random Drop schemes or DECbit. For example, the authors could have shown that for similar traffic burstiness and throughput share, Random Drop gateway will require a larger average queue size and therefore larger delay. In general I didn't found the simulation results precise enough; for example figure 13, 14 and 15 are given side-by-side while x-axis is different in figure 15 and can be quite misleading. Furthermore, it is not obvious for larger average queue sizes how the figure 15 would look like!

Wednesday, September 9, 2009

Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Alocations in High Speed Networks

Stoica, I., Shenker, S. and Zhang, H., “Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks”, Proceedings of ACM SIGCOMM’98.

By 1998, even though the need for fair resource allocation within the core routers of the networks was very much accepted, the proposed algorithms such as Deficit Round Robin (DRR) were still to complex and costly to be used in high speed core routers (it is interesting to know 10 years later how wide spread is their usage!?).

Core-Stateless Fair Queueing (CFSQ) is a proposal for the architecture for the islands of routers (possibly at the heart of ISPs) which should be able to allocate resources almost fairly. The attempt of the CFSQ architecture is to keep the high speed highly loaded core routers as simple and stateless as possible while the edge routers (working at much lower line rates) carry most of the burden of per flow management and control. Edge routers estimate the arrival rates of the flows and label them with corresponding rates. At core routers, if the fair share of a flow is below its estimated rate, the packet is dropped probabilistically (the core router is basically thinning the random process). The interesting fact is that core routers only need a single label attached to each packet and there is no per flow state at any router (there is much more detail to this which you can find them in the paper).

The simulation results show that CSFQ outperformed regular non-fair schemes (e.g. FIFO and RED) drastically while working better or close to FRED and below DRR. Just by reading the description of architecture I had assumed that CSFQ should perform much closer to DRR (the precise fair queueing algorithm with much implementation complexity) but interestingly that is not the case and CSFQ perform below DRR and not too close. The authors also mention that they are still not aware of the reason behind this difference. Can the inaccuracy in the estimations cause such a difference? Disregard of this observation, the result is brilliant since it enables almost fair allocation with much less complexity and therefore cost.

It is also interesting to think about the scope of this architecture and what can be considered as an island of routers? Will you have multiple islands within an ISP, one island within an ISP or expand it to multiple ISPs and possibly the whole internet? Trust will be crucial in drawing the boundaries of these islands in addition to obvious policy and economical reasons.

Very interesting work and paper!

Analysis and Simulation of a Fair Queueing Algorithm

A. Demers, S. Keshave, and S. Shenker, “Analysis and simulations of a fair queueing algorithm,” in Proc. ACM SIGCOMM‘89, Austin, TX, 1989, pp. 1–12.

As it is not possible to ensure an ethical and fair usage of network resources by only trusting the end nodes, the network gateways need to support fair resource allocation when it is desired. In other words, a fair and proper congestion avoidance algorithm needs the involvement of network and end users simultaneously and otherwise it is not possible.

The authors of this paper, propose a modification to the first fair queuing algorithm proposed by Nagle (1987) which further satisfies the design requirements they had envisioned for any such fair scheme (fair bandwidth, fair buffer space and continuity of service). Nagle proposed servicing distinct queues (one queue per flow) by a round-robin scheduler per packet basis which fails to allocate BW fairly when packet sizes can be different. The modified FQ algorithm tries to emulate a bit-by-bit round-robin scheduler (BR) by computing the end service time of each packet as it arrives similar to BR but servicing them per packet basis instead of per bit basis (servicing the packet with the earliest service time first).

It is interesting to know that in many of today's high speed routers, packets are divided into equal size cells at the input, processed internally and then reassembled at the output. This might mean that enforcing fairness and priorities over variable packet sizes is still hard to implement after 20 years. Even though the papers result is nice, the single size cell approach is still the dominant method in high speed routers. It would be interesting to better understand why after 20 years (from the publication of this paper) still it is preferred to work with single size packets?

Another very interesting point in this paper is that the proposed FQ algorithm discourages the ill-behaved sources by dropping their packets while counting them towards their BW usage! I find this very interesting and effective from game theoretic point of view. In overall I found this paper to be a great mixture of intuition, analysis, simulation and effective results and therefore I would suggest it to be kept in the syllabus.

Saturday, September 5, 2009

CONGESTION AVOIDANCE AND CONTROL

V. Jacobson, "Congestion avoidance and control," in Proc. SIGCOMM symp. Communications Architecture and Protocols, 1988, pp. 314-329.

This paper is mainly about a group of proposed solutions to avoid congestion collapse as first seen in 1986 in LBL/UCB networks. The authors mention a group of 7 new algorithms which together they claim improves the robustness of networks against congestion collapse considerably and describe 5 of them somewhat in detail. The key goal of these algorithms is to make sure the network reaches its equilibrium state and the "conversation of packets" principle is followed when it is in equilibrium. This principle states that a new packet should not be inserted to the network unless one is first removed under stable conditions.

The slow-start algorithm which is used in today's TCP is initially proposed in this paper. As well as algorithms for load dependent estimation of round-trip timing which performs much better than the old load agnostic version. The variability of RTT increases as load is increased and ignoring this fact can cause faulty retransmission of packets. The congestion avoidance algorithm proposed in this paper is again the AIMD as read in our last paper, while the designers had chosen the additive increase part heuristically. In the last paper we saw a more rigorous justification of why AIMD is a better candidate than other linear CA algorithms.

As seen in figures 8 and 9 different end-to-end users with slightly different TCP implementations can experience a very different connection quality from the network. The consequences of this phenomena specially in terms of fairness might be interesting to discuss. Furthermore interestingly the authors of this paper predicted one of the main problems TCP has in today's internet architecture; being the confusion of the binary feedback signal for example in wireless (or other unreliable link level) connections. The TCP thinks packets are failing because of congestion and it reduces its window size while it should absolutely not. This paper is very interesting since it involves the reader with the initial proposed solutions to congestion collapse problem which many of them are still existing in current stack.

ANALYSIS OF THE INCREASE AND DECREASE ALGORITHMS FOR CONGESTION AVOIDANCE IN COMPUTER NETWORKS

D. Chiu and R. Jain, "Analysis of the increase and decrease algorithms for congestion avoidance in computer networks," Comput. Networks ISDN Syst., vol. 17, pp. 1-14, 1989.

This paper is mainly concerned with analytic comparison of linear congestion avoidance control algorithms over a synchronous binary feedback model. In other words, there are n identical end-hosts trying to share a link and the system provides them with a single bit feedback indicating overload/underload status of the link at each time stamp. Even though the assumptions of the model such as synchronicity of end-users may not hold in many scenarios, this result can be considered to be the basis of current TCP congestion control/avoidance algorithm.

To be able to have a solid comparison of algorithms, one needs to define the desired performance metrics. The authors choose efficiency (closeness of l0ad to "knee"), fairness, distributedness and convergence time (responsiveness and smoothness) as the key criteria. The paper (both analytically and pictorially) proves that within the class of linear CA algorithms, the decrease needs to be only multiplicative while increase needs to be additive with possible multiplicative part. They further show that Within this class of algorithms the Additive Increase Multiplicative Decrease (AIMD) has the fastest convergence time and therefore is the most optimal algorithm within this model and set of criteria.

The authors argue against the non-linear control algorithms as they are concerned with sensitivity of the system to parameters. It is interesting that today's TCP congestion control algorithm is a mixture of linear (congestion avoidance) and non-linear (slowstart, fast retransmit, fast recovery, ...) parts and therefore is not linear. If one only considers the result of this paper he/she can not conclude weather further investigations of the non-linear algorithms is meaningful or not!

I found this paper quite interesting since it elegantly shows the advantages of AIMD congestion avoidance algorithms within its model. It would still be interesting to analyze and discuss the model and criteria which was used by the authors. For example the result which was proved for synchronous models, how much it will hold for asynchronous models? Furthermore, distributedness was considered as a required criteria of CA algorithms. Is it correct to discard all non-distributed algorithms at once with the excuse that they will be hard to implement? Can there be semi-coordinated algorithms which have much better performances and can possibly worth the extra coordination efforts?

Wednesday, September 2, 2009

UNDERSTANDING BGP MISCONFIGURATION

The goal of this project has been to identify and quantify how well the current wide-area routing (BGP) is performing and what are its main weaknesses. Although we might only hear about BGP misconfiguration when there is an internet-wide breakdown, the result of this paper shows that BGP misconfiguration is amazingly pervasive. For example most of (at least 75%) of the new route announcements in internet is a result of BGP misconfiguration and 0.2-1.0% of the BGP table is corrupted! The surprising result is that all these misconfigurations have little effect on connectivity and reachability!

To achieve these results the authors target only two kind of unintended misconfigurations being the accidental insertion of routes (origin misconfiguration) and accidental propagation of routes which should have been filtered (export misconfiguration). To monitor and investigate BGP misconfigurations, they flag the short lived route announcements with further use emails to investigate the possible causes of the problem. The authors identify several main reasons of misconfigurations and propose certain improvements at for example user interface design methodologies of BGP system.

INTERDOMAIN INTERNET ROUTING

This lecture by Harry Balakrishnan covers a high level overview of Interdomain Routing in today's internet. Internet is constructed of many Autonomous Systems (ASes) which can be quite different in their internal structure and routing protocol. Each AS is usually owned by a different company and is in competition with other ASes (also referred to as ISPs). What enables the global connectivity over internet is the routing protocol in between ISPs. Current wide-area routing protocol in use is called Border Gateway Protocol (BGP, Version4).

BGP is interestingly different than other routing protocols since its primary goal is global reachability between ASes with special focus on scalability. BGP is centered around following the policies which are primarily affected by the relationship between ASes and ultimately profits. In a usual Interior Gateway Protocol (IGP) the main concern is minimizing some kind of path metrics.

The paper gives a good overview of the process in which different ASes choose to advertise (and therefore claim availability of usage) routes to other ASes and how they absorb the routes advertised by others into their internal structures. The author also discusses hijacking routes by false advertisements, convergence speed and multi-homing as some of the main challenges of today's wide-area routing.

I found the provider-transit vs. peer-peer forms of relationship among ASes quite interesting. Why would peers which are carrying possibly more than 2 times of the traffic possibly worth millions of dollars make such agreement? (well I'm sure there are good reasons for it!) Is the required billing infrastructure too expensive to justify such approach? Furthermore, why ISPs can not have mixed relationships? e.g. ISP A which has two other peers B and C, can act as a provider of B for the traffic targeted to C and act as a peer for the traffic targeted to A and its transits? or maybe this capability already exists and is in use.

At the end I found the lecture a well written overview of BGP protocol.

Arash@CS268