Arash@CS268: October 2009

Saturday, October 31, 2009

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications

Summary:

This paper introduces Chord, a distributed lookup protocol which is designed based on DHT technology. Even though the function this system supports sounds very simple (given a key maps it onto a node) quite a bit of intelligence is required to design a system which can accomplish this task in an efficient, scalable and further resilient fashion.

As mentioned in the survey paper Chord lookup is symmetric and structured meaning nodes are of equal importance while data items can be deterministically mapped to nodes. To achieve this task, Chord uses a variant of consistent hashing which would not require global awareness at every node; Instead proposes usage of a distributed routing table requiring O(logN) state size and O(logN) messages to resolve a query and therefore it is much more scalable.

In simple terms, each node has a finger table which has pointers to other nodes with roughly different power 2 distances in node identifier circle. Every key is stored at the immediate following node ID and therefore with each message we halve the distance to the node responsible for target key. The important operations which need to be further supported are join and leave operations which authors show with high probability will take O(log^2(N)) message to update all the states.

An important requirement for the correctness of this protocol is the the validity of the pointer which each node has to the next node on the identifier circle (successor). These pointers will guarantee that at each step a positive move will be made towards the target and further ensures correctness. For this reason every node periodically runs an stabilization algorithm ensuring the validity of the pointer. The Chord will further a successor list instead of a single pointer to achieve a better performance against simultaneous pointers.

Critique:

Authors test Chord against different pressure scenarios to demonstrate its true characteristics. A special point which was made in the simulation part was that random uniform id assignment will cause too much load variance across nodes and virtual nodes should be used at each node to alleviate this problem and this would increase the finger table size but not too much. I wonder how important is this variance in reality and if it really needs to be accounted for?

Another point of discussion can be weather Chord can tuned for exploiting asymmetric underlying structure meaning not giving as much load to nodes which are less reliable, etc. Chord is definitely a great paper for this class.

Looking Up Data in P2P Systems

Summary:

This paper gives a survey of recent DHT-based solutions to look-up problem proposed for P2p systems. Scalable and robust look-up mechanisms is a major challenge/requirement for large scale P2P systems which needs to be addressed.

Purely structured (each node has a well defined information about other nodes) lookup or symmetric (nodes are of equal importance from the point of view of lookup) lookup mechanisms are either too sensitive to failures or can't give guarantees and are inefficient. The more recent proposals (CAN, Chord, Kademlia, Pastry and Tapestry) are all structured and symmetric enabling them to be resilient and offer guarantees at the same time.

Interestingly, DHT is the key technology that has enabled all these new systems. When an entity wants to access a file (for example), it needs to calculate the hash key and call the lookup(key) function. Every file is stored at the node with the "closest" ID (usually chosen randomly) to the key (the definition of closeness depends on the scheme).

The difference among the proposed DHT based lookup services is mainly weather they route in a single dimension or more and the used data structures. For example Chord routes in a single dimension and uses a skiplist like data structure to store the potential next hops at every node achieving O(logN) expected similar to other 1-dimensional routing schemes.

Critique:

This paper gives a very educational and concrete overview of the proposed DHT-based lookup algorithms and further gives some insight into the trade-offs and choices of design parameters (specially in the discussion section). It would have been interesting if there were additional simulation or experimentation results comparing these mechanisms in an actual setting beyond the big O notation comparisons; even though this level of comparison is probably not expected from a survey paper. Great paper to be kept in the syllabus indeed.

Thursday, October 29, 2009

Active Network Vision and Reality: Lessons from a Capsule-Based System

Summary:

This paper describes some of the understandings and achievements by experimenting with "active networks" based on ANTS platform. Active network allows applications to receive customized network services by dynamic code import and execution on "active nodes".

The information regarding the customized forwarding routines is injected by the end-users via "capsules" which is a regular IP packet with extra header including the instruction on the requested forwarding routines. Capsules direct themselves through the network by running custom forwarding routines executed by the active nodes. Legacy IP routers within the network continue to route packets as a regular IP packet.

When the required code for a specific forwarding routine is not cached in an active node but is needed it is asked from the previous visited active node. If code download process fails the capsule is dropped in which the end-user needs to recover from it. This model of code distribution and execution is called extensible packet forwarding by the authors.

Wednesday, October 28, 2009

Resilient Overlay Networks

Summary:

Resilient Overlay Network (RON) is an application-layer overlay on top of internet routing allowing an improved detection and recovery of path outages (or serious performance degradation) for distributed internet applications.

The main goal of RON is to protect applications running on a group of cooperating nodes against the failures of underlying internet paths. BGP-4 has been optimized for scalability and has a poor performance with respect to this criteria. The reason that RON is resilient against fault much more than BGP-4 scheme lies in the fact that RON does not have to consider scalability (2-50 number of nodes).

RON achieves this goal by aggressively probing and monitoring the paths among RON nodes. It detects when the default internet path is the best path and uses it, otherwise data is routed through an intermediate node. RON nodes continuously monitor, measure and share the path properties. An interesting aspect of this architecture is a distributed performance data base in which supports summarization and efficient data exchange.

Interestingly, RON control packets are sent along the RON data path in order to ensure resilience against the mentioned possible internet path failures.Therefor RON routing protocol is itself a client of RON.

Latency, packet loss and throughput are the three default maintained information about each virtual link within RON while clients can overwrite a different metric and influence the routing with their parameters of interest. This is an important characteristic of RON architecture which enable an improved level of integration with applications.

In order to demonstrate these concepts, two distinct experiments are done over a 12 node and 16 node RON structures, demonstrating fault avoidance rate of 60-100% with 18 seconds average recovery time.

Critique:

RON is an interesting approach in grabbing the internet resources to build a geographically distributed small sized resilient network (up to 50). (As indicated by authors) I wonder what could happen if the density of RON networks is increased to a non-negligible portion of the internet? Will the overhead or special path optimizations done by RONs start to introduce new behaviors in the internet not seen before?

Another interesting point of discussion can be the scalability issue. There seems to be a range of applications in which the number of involved nodes is beyond 50 (maybe order of 100s) but not high enough to justify a dedicated network. Is there any hope for such applications?

I enjoyed this paper but will vote for dropping it just because I felt it was too lengthy.

Tuesday, October 27, 2009

On Inferring Autonomous System Relationships in the Internet

Summary:

This paper introduces a heuristic algorithm which is able to infer the relationships among Internet Autonomous Systems (AS). The inter-AS agreements in today's Internet takes one of the forms of costumer-provider, peering, mutual transit and mutual backup. The importance of this information lies on the fact that inter-domain routing (BGP) is a policy based routing scheme and therefore connectivity and reachability are not equivalent. Furthermore ISPs do not expose their contractual agreements for most of the time and therefore such algorithms can be quite useful in better understanding the state of the internet and improving it.

The algorithm works by deriving an annotated AS graph with the edges indicating provider-customer (~customer-provider), peer-to-peer and sibling-to-sibling (mutual transit/backup) relationship which is compatible with the BGP route tables. Each route in a route table has been a result of recursive export policies enforced by different ASs. The route entries can be divided onto (possibly) UpHill Path which only traverses through customer-provider and sibling-to-sibling edges (possibly) followed by a peer-to-peer edge (possibly) followed by a downhill path which only contains provider-customer or sibling-to-sibling edges.

Combining this information with size/degree proportionality of the ASs and the possible relationships the proposed algorithm is able to infer the AT&T ISP relationships with its neighboring ASs with 90.5% accuracy and more than 50% siblings are confirmed through the WHOIS lookup service (which I'm not sure if this last number indicates a good performance in sibling-to-sibling prediction). This information can be used by ISPs to possibly debug the erronous entries in their BGP route tables and plan for future contractual agreements.

Critique:

I found this paper interesting but have hoped for better quantitative evaluation of importance/relevance and impact of such algorithms (even though I agree with the general arguments). In general a nice paper but would agree for dropping it for a more exciting paper.

Sunday, October 25, 2009

On Power-Law Relationships of the Internet Topology

Summary:

This paper introduces an interesting approach in characterizing the internet topologies by power-laws. The authors argue and demonstrate that simple averaging of parameters of interest does not correctly capture the underlying skewed distributions.

To this end they experimentally identify 3 power-laws (y prop x^a) and show that they hold over the three instances of Internet inter-domain topologies taken during 1998 and a single instance of intera-domain topology from year 1995. The lines predicted by the identified power-laws have above 96% coefficient correlation with actual data points. The identified power-laws are as following:

Power-Law 1 (rank exponent): The outdegree of a node is proportional to the rank of the node to a power of a constant R. The rank of a node will be determined by ordering the nodes in their decreasing outdegree order and finding the index. The constant R is called the "rank exponent" of the network topology and its value can be measure from the slop of the log-log plot of corresponding parameters. The rank exponent of three inter-domain instances are quite close (~-0.80) and different from the intra-domain instance (~-048) suggesting that it is a good metric in distinguishing the differences among network topologies.
Power-Law 2 (outdegree exponent): The frequency of an outdgree is proportional to the outdegree to the power of a constant O. This constant is called the outdegree exponent of the network topology. The value of O is quite close for all of the 4 instances (still slightly different for the intera-domain instance) suggesting that PL2 is a fundamental property of any such network. good for ckeckign the realisticness of a network
Power-Law 3 (eigen exponent): The eigenvalues of a graph are proportional to the order i, to the power of a constant E. i is the order of eigenvalues in a decreasing sequence fashion. E is called the eigen exponent of the network toplogy which again significantly differs in between the inter-domain instances (~-0.48) and intera-domain (-0.177) suggesting it is a good metric for distinguishing the differences among networks.

In addition to above three power-laws the authors identify another power-law but reserve to call it a law because of lack of sample points:

Approximation 1 (hop-plot exponent): The total number of pairs of nodes, P(h), within h hops is proportional to the number of hops to the power of a constant H. The hop-plot exponent H can again be measured from the slop of the log-log plots when h is much less than the diameter of the network.

The hop-plot exponent is especially interesting since it gives better tools in estimating the number of average neighborhood size versus the traditional approach based on average node degree. The advantage comes from the fact that averaging assumes uniform outdegree while the hop-plot exponent based estimation better captures the underlying skewed distribution.

The identified power-laws can be used to verify the validity of a simulation model, improve the protocol analysis and interpolate/estimate the topological parameters of interest.

Critique:

The paper gives some insight into the underlying effect which causes the validity of identified power-laws which are quite interesting. It seems like after the first significant step the authors had taken in identifying these power-laws, a more detailed study of the underlying effects will be the next natural candidate.

This improved understanding of power-laws will also require to have access to many more sample points (even though it is a tedious process) and network instances. I'm very curious to know what have been the result of studying these power-laws over a richer set of network instances. I believe this is a great paper to be kept in the syllabus.

Friday, October 16, 2009

White Space Networking with Wi-Fi like Connecitivy

This paper introduces the WhiteFi, a Wi-Fi like network architecture designed to operate in UHF white space bands. WhiteFi is the first network prototype demonstrating the feasibility of WiFi like networking in TV bands.

The incumbents in the UHF bands are: 1) TV stations with strong variable spatial usage map and 2) wireless microphones which have more temporal varying characteristics. Therefore any white space network needs to address the spatial and temporal variations in the channel availability in addition to exploiting the existence of contiguous chunks of spectrum.

The WhiteFi introduces the following three key contributions in order to address the distinct characteristics of white space networking:

Channel assignment algorithm: The channel assignment in UHF bands is much harder since the network should consider all the clients observations . The WhiteFi gathers the observations from all clients including their available UHF channels and airtime usage in these channels by other white space devices and defines a new metric "multichannel airtime metric" demonstrating by simulations and experimental results to be a good indicator of what channel and width should be used for the communication.
Access point discovery: The paper introduces a novel time domain analysis technique called Signal Inspection before Fourier Transform (SIFT) to enable faster detection of channel width used by the AP and also measure the medium utilization needed by the channel assignment algorithm. SIFT uses the packet duration and time between packet and ACK transmissions to estimate the waveform width as they are inversely proportional (for constant packet sizes). The authors further introduce two access point discovery algorithms based on SIFT: Linear-SIFT and Jump-SIFT and prove their superiority over traditional search models.
Handling disconnection: To avoid any interference with incumbents, the clients immediately change their channel when detect their presence and start chirping a special signal in an already known backup channel which contains the information regarding their channel map. The access point periodically checks this backup channel and tunes its radio to decode the lost client packets in case needed. Then the network will be able to re-establish itself in a good operational channel.

The authors further use extensive simulations using QualNet package and some implementations to prove the accuracy of SIFT algorithm and the proposed channel metric along the overall performance of the system.

The paper introduces many clever solutions to the problems one will see in the white space networking and do a great job of combining them. I felt the disconnection algorithm to be a bit too sensitive to corner cases and sill a bit immature. The need for the AP to switch its channel to decode the packet when the rest of clients are still running can cause significant degradation in the quality of service the users experience specially when it occurs often or there are possible false alarms.

Very interesting paper and indeed worth to be in the syllabus.

Thursday, October 8, 2009

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

The goal of this paper is to provide a fair performance comparison between 4 major ad-hoc routing protocols (by 1998) under different mobility conditions. The routing protocols under study are: DSDV, TORA, DSR and AODV.

The comparisons are done through simulations and one of the main contributions of this paper is the extension of ns simulator to better model node mobility, physical layer interactions, radio interfaces and 802.11 MAC protocol. The 4 different protocols are tested over 210 different scenario files (10,20 or 30 source nodes over 70 different movement patterns) for 1m/s and 20m/s maximum speed which are fed to the simulator. The pause time is the constant time between two consecutive moves of a node and its variation partially characterizes the mobility of the environment.

The metrics used for comparing the protocols are packet delivery ratio, routing overhead and path optimality. Besides DSDV which is a loop free distance-vector routing protocol, the other 3 are on-demand protocols and have varying overhead as pause time is changed from min=0sec to max=900sec. In general the Dynamic Source Routing protocol shows to perform superior to others in most of the mobility conditions even though the gap is narrowed at lower mobile environments. AODV which combines DSDV and DSR in order to remove the need for encapsulating the route information in the packet performs pretty close to DSR in general but in the most mobile scenarios in which it performs worse than DSR.

This paper gives a good overview of the main ad-hoc routing protocols and gives a fair comparison among them and therefore I recommend it for the syllabus. It would have been interesting to discuss their propagation model and weather testing for an indoor or outdoor environment could have a slightly different result.

Wednesday, October 7, 2009

A High-Throughput Path Metric for Multi-Hop Wireless Routing

This paper for the first time introduces Expected Transmission (ETX) count metric to be used instead of the typical hop-count metric and proves its superiority by experimental tests.

Using the hop-count as the routing metric over a multi-hop network can typically result to choosing a route which is not close to optimum. Minimizing the hop count typically result to using longer links which are inclined to be more lossy. Trivial solutions such as masking errors at link layer or discarding lossy links fail to work as the first one reduces the link throughput and the later disconnects nodes reachable only through lossy links.

With the insight authors had gained from real experiments they designed ETX to address the following issues:

The wide range of link loss ratio: many links with intermediate quality exist
The existence of links with asymmetric loss ratios: 22 of 29 nodes had at list on link with more than 25% difference quality of forward and reverse direction
The interference between successive hops of multi-hop paths: two hop route route with perfect delivery per link achieves 50% of a single link

ETX of a link is defined as the inverse of the multiplication of the forward and reverse delivery ratios which captures the typical asymmetric nature of links and the need for two way high link quality. Over a route the ETX is the addition of the ETX of every member link of the route. This metric estimates the number of transmissions in order for one packet to get to its destination. Using route with minimum ETX reduces the number of transmissions which in addition to throughput gains reduces the network inter link interference.

The delivery ratios are estimated by counting periodic broadcast packets emitted from each node. The authors prove the usefulness of ETX by implementing it over DSDV and DSR routing protocols and show the gain of upto 2x or larger when the number of hop are 2 or more.

I found this paper the most interesting paper we have read so far. The authors not only introduced a new concept and thoroughly evaluated it in real setting but also were fair on emphasizing on the cases when ETX does not outperform other schemes as much. They carefully studied the effects of variable packet size and proved that in a more well connected network, decreasing the average number of hop-count will close the performance gap.

I wonder if the issue of packet size can be alleviated by estimating the actual delivery rate from the observed one with a different packet size knowing that the slopes of the curves in figure 14 is quite constant and further using different estimations for forward and reverse directions as ACK packet is typically smaller. I definitely vote for keeping this paper in syllabus.

Sunday, October 4, 2009

Architecture and Evaluation of an Unplanned 802.11b Mesh Network

This project evaluates the effectiveness of Mesh networks with unplanned placement, omnidirectional antennas and multi-hop routing in bringing last mile connectivity to communities. The lack of planning might increase the rate of deployment but might result to seriously faulty topologies and low performance. The authors make their case by experimenting with Roofnet which is a network of 37 802.11b wireless nodes with roof mounted antennas spread over 4 square kilometers.

The nodes are installed by volunteers who get to use the network for their own personal usage. The network contains 4 gateways to the global internet. For routing a combination of link-state and source-routing techniques are used and bit-rates of wireless cards are set with an optimized algorithm called Sample Rate. Broadcast probes are used in order to gather the required metrics the routing and bit-rate selection algorithms. The routing algorithm tend to choose links with throughput higher than 2Mbps which are usually not too long and discard the low quality links. The roofnet network achieves 627kbps average throughput and 39ms average delay.

The authors show through simulation results that to achieve good connectivity and performance, dense deployment is a must. The simulation results show that network is not too sensitive to deletion of individual links but can be very sensitive to deletion of individual nodes. It takes the two most important nodes to fail in order for the network performance to drop by 43%!

Even though the authors tried, I believe it is very hard to compare this scheme with the case of single hop access point scheme. In the case of single hop access point architectures even though you might need more gateways to cover the whole area, it is more likely that individuals experience a better throughput and robustness. Another advantage of single hop access points is that they can be set to orthogonal channels in order to minimize the interference and thus will not suffer as much as the Roofnet mesh network.

I also recommend another paper published for the same project which studies the physical layer effects in more depth.

Modeling Wireless Links for Transport Protocols

Gurtov and Floyd in this paper, discuss and study the complex interactions between wireless link characteristics and transport protocols. The TCP/IP stack has become the de facto technology in both wired and wireless networks and transport protocols and wireless technologies should be designed with a better understanding of the interplay between these mechanisms.

Researchers who are working on new transport protocol designs are in need of simulation models which capture the important/relevant characteristics. The authors claim that many of the current models have one or more of the following 4 type of problems: 1) being unrealistic, 2) having limited parameter space, 3) being overly realistic and 4) being under specified.

For example it was believed that abrupt increase in delay caused by link level errors will cause retransmission timeouts while later it was observed that because of relative high variations in delay, congestion windows are already inflated and rather than producing spurious timeouts they will cause slow recovery from an actually packet loss.

The authors indicate the following to be the main wireless link characteristics which need to be properly addressed in any such model:

Error losses and corruption
Delay variation
Packet reordering
On-demand resource allocation
Bandwidth variation
Asymmetry in Bandwidth and Latency

In addition of the above link characteristics, Queue management policies and mobility effects also need to be captured and considered in link models as they can have significant effects on transport performance. The authors discuss the effect of above characteristics on transport protocols, their presence in current and future wireless links and hints on how they should be modeled.

The paper is a good overview of the complex relationship between link characteristics and transport protocols. Even though the layered structure has been relatively successful in separating the functions, these layers are not in any way independent of each other and a deep understanding of the interactions is required for any good design.

As authors have pointed out, cross layer techniques are one of the most effective methods for improving the efficiency of interplay between layers and it could be interesting to discuss the advantages/disadvantages of increase in deployments of cross-layer techniques in today's wired/wireless internet. I vote for keeping this paper in the syllabus.

Arash@CS268