Friday, September 18, 2009

Portland: A scalable fault-tolerant layer 2 data center network fabric

R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, and V. Subram, "Portland: A scalable fault-tolerant layer 2 data center network fabric - ccr online," pp. 39-50, August 2009.

PortLand is a new proposed architecture for a scalable fault-tolerant data center fabric with plug-and-play like deployment of the switches. PortLand main distinction compared to other recent proposals is the exploitation of current dominant data center topology being the multi-rooted tree structure. Some of the requirements that authors envision for a good data center fabric are:
  1. Seamless VM migration
  2. No need for manual configuration by administrators
  3. Efficient communication to any other end-host within DC along any available physical path
  4. No forwarding loops
  5. Rapid and efficient failure detection
Current L2 architectures faile to scale while adding L3 mechanisms introduces significant management overhead lacking plug-and-play and seamless migration characteristics. PortLand builds on top of internal location encoded MAC addresses (Pseduo MAC, PMAC) along with needed translations at the edge routers. These internal addresses enable efficient loop free routing/forwarding within the data center. The following are the main elements of the PortLand architecture:
  1. Logically centralized topology aware Fabric Manager (FM) responsible for assisting with:
    • ARP resolution
    • Fault tolerance
    • Multicast
  2. Positional Pseudo MACs (PMACS)
    • Location encoded
    • Basis for efficient routing/forwarding
    • Basis for seamless VM migration
    • Separates host identity from its location transparent to end-host
  3. Proxy-based ARP
    • Utilizing Fabric Manager for ARP resolution
    • Falling back on efficient broadcast methods upon failure
  4. Distributed location discovery by Location Discovery Protocol (LDP)
    • Periodic Location Discovery Message (LDM) on all switch ports
    • Distributed location discovery without manual configuration through exploiting the knowledge of the baseline topology
  5. Loop free forwarding
    • Multicast support through a single core switch along with Fabric Manager involvement
    • Loop free: once a packet travels down it never travels up again!
PortLand architecture achieves better scalability and performance metrics over the constrained multi-rooted architecture compared to TRILL and SEATTLE. The designers of SEATTLE architecture had assumed that every end-host only communicates with few other end-hosts which PortLand designers fairly reject this assumption (e.g. consider search or MapReduce). Compared to TRILL and SEATTLE which use flat addresses PortLand location encoded PMAC strategy has significant advantages.

The authors have clearly explained the elements of their architecture and proved its first order promises by simulation and implementation on a small scale network. I did not understand the advantage of using a distributed location discovery versus using the Fabric Manager to explicit ly send locations to switches followed by further verifications by each switch. Even though the authors claim and show FM scalability is not a big issue I wonder if semi-centralized approaches can achieve better scalability characteristics and function as well. In general very educational and interesting work and I vote for keeping it in the syllabus.

No comments:

Post a Comment