ABSTRACT
Data center networks encode locality and topology information into their server and switch addresses for performance and routing purposes. For this reason, the traditional address configuration protocols such as DHCP require huge amount of manual input, leaving them error-prone.
In this paper, we present DAC, a generic and automatic Data center Address Configuration system. With an automatically generated blueprint which defines the connections of servers and switches labeled by logical IDs, e.g., IP addresses, DAC first learns the physical topology labeled by device IDs, e.g., MAC addresses. Then at the core of DAC is its device-to-logical ID mapping and malfunction detection. DAC makes an innovation in abstracting the device-to-logical ID mapping to the graph isomorphism problem, and solves it with low time-complexity by leveraging the attributes of data center network topologies. Its malfunction detection scheme detects errors such as device and link failures and miswirings, including the most difficult case where miswirings do not cause any node degree change.
We have evaluated DAC via simulation, implementation and experiments. Our simulation results show that DAC can accurately find all the hardest-to-detect malfunctions and can autoconfigure a large data center with 3.8 million devices in 46 seconds. In our implementation, we successfully autoconfigure a small 64-server BCube network within 300 milliseconds and show that DAC is a viable solution for data center autoconfiguration.
- R. H. Katz, "Tech Titans Building Boom," IEEE SPECTRUM, Feb 2009. Google ScholarDigital Library
- L. Barroso, J. Dean, and U. Holzle, "Web Search for a Planet: The Google Cluster Architecture," IEEE Micro, March 2003. Google ScholarDigital Library
- R. Droms, "Dynamic Host Configuration Protocol," RFC 2131, March 1997.Google Scholar
- S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," in SOSP, 2003. Google ScholarDigital Library
- J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," in OSDI, 2004. Google ScholarDigital Library
- C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu, "DCell: A Scalable and Fault Tolerant Network Structure for Data Centers," in SIGCOMM, 2008. Google ScholarDigital Library
- C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, "BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers," in SIGCOMM, 2009. Google ScholarDigital Library
- R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat, "PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric," in SIGCOMM, 2009. Google ScholarDigital Library
- A. Greenberg, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, and S. Sengupta, "VL2: A Scalable and Flexible Data Center Network," in SIGCOMM, 2009. Google ScholarDigital Library
- {Online}. Available: http://royal.pingdom.com/2007/10/30/human-errors-most-common-reason-for-data-center-outages/Google Scholar
- Z. Kerravala, "As the value of enterprise networks escalates, so does the need for configuration management," The Yankee Group, Jan 2004.Google Scholar
- Juniper, "What is behind network downtime?" 2008.Google Scholar
- {Online}. Available: http://searchdatacenter.techtarget.com/news/column/0,294698,sid80_gci1148903,00.htmlGoogle Scholar
- Graph isomorphism problem," http://en.wikipedia.org/wiki/Graph_isomorphism_problem.Google Scholar
- B. D. McKay, "Practical graph isomorphism," in Congressus Numerantium, 1981.Google Scholar
- P. T. Darga, K. A. Sakallah, and I. L. Markov, "Faster Symmetry Discovery using Sparsity of Symmetries," in 45st Design Automation Conference, 2008. Google ScholarDigital Library
- D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, and S. Lu, "FiConn: Using Backup Port for Server Interconnection in Data Centers," in Infocom, 2009.Google Scholar
- E. M. Luks, "Isomorphism of graphs of bounded valence can be tested in polynomial time," in Journal of Computer and System Sciences, 1982.Google Scholar
- Graph automorphism," http://en.wikipedia.org/wiki/Graph_automorphism.Google Scholar
- P. T. Darga, M. H. Liffiton, K. A. Sakallah, and I. L. Markov, "Exploiting Structure in Symmetry Generation for CNF," in 41st Design Automation Conference, 2004. Google ScholarDigital Library
- Data Center Network Overview," Extreme Networks, 2009.Google Scholar
- Maximum common subgraph problem," http://en.wikipedia.org/wiki/Maximum_common_subgraph_isomorphism_problem.Google Scholar
- V. Kann, "On the approximability of the maximum common subgraph problem," Annual Symposium on Theoretical Aspects of Computer Science, 1992. Google ScholarDigital Library
- Personal communications with opeartor of a large enterprise data center," 2009.Google Scholar
- T. Rodeheffer, C. Thekkath, and D. Anderson, "SmartBridge: A scalable bridge architecture," in SIGCOMM, 2000. Google ScholarDigital Library
- A. Myers, E. Ng, and H. Zhang, "Rethinking the service model: scaling Ethernet to a million nodes," in HotNets, 2004.Google Scholar
- R. Perlman, "Rbridges: Transparent routing," in Infocom, 2004.Google Scholar
- C. Kim, M. Caesar, and J. Rexford, "Floodless in SEATTLE: a scalable ethernet architecture for large enterprises," in SIGCOMM, 2008. Google ScholarDigital Library
- S. Thomson and T. Narten, "IPv6 Stateless Address Autoconfiguration," Expired Internet Draft, December 1998. Google ScholarDigital Library
- S. Cheshire, B. Aboba, and E. Guttman, "Dynamic configuration of IPv4 link-local addresses," IETF Draft, 2003.Google Scholar
Index Terms
- Generic and automatic address configuration for data center networks
Recommendations
Generic and automatic address configuration for data center networks
SIGCOMM '10Data center networks encode locality and topology information into their server and switch addresses for performance and routing purposes. For this reason, the traditional address configuration protocols such as DHCP require huge amount of manual input, ...
Analysis for TCP in data center networks
The unfairness caused by bandwidth sharing via TCP in data center networks is called TCP Outcast problem. Some researchers show that the throughput of a flow with small Round Trip Time (RTT) is less than that with large RTT which is completely contrary ...
Randomizing TCP payload size for TCP fairness in data center networks
As many-to-one traffic patterns prevail in data center networks, TCP flows often suffer from severe unfairness in sharing bottleneck bandwidth, which is known as the TCP outcast problem. The cause of the TCP outcast problem is the bursty packet losses ...
Comments