skip to main content
research-article

A scalable, commodity data center network architecture

Published:17 August 2008Publication History
Skip Abstract Section

Abstract

Today's data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance.

In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

References

  1. Cisco Data Center Infrastructure 2.5 Design Guide. http://www.cisco.com/univercd/cc/td/doc/solution/dcidg21.pdf.Google ScholarGoogle Scholar
  2. InfiniBand Architecture Specification Volume 1, Release 1.0. http://www.infinibandta.org/specs.Google ScholarGoogle Scholar
  3. Juniper J-Flow. http://www.juniper.net/techpubs/software/erx/junose61/swconfig-routing-vol1/html/ip-jflow-stats-config2.html.Google ScholarGoogle Scholar
  4. Sun Datacenter Switch 3456 Architecture White Paper. http://www.sun.com/products/networking/datacenter/ds3456/ds3456_wp.pdf.Google ScholarGoogle Scholar
  5. M. Blumrich, D. Chen, P. Coteus, A. Gara, M. Giampapa, P. Heidelberger, S. Singh, B. Steinmacher-Burow, T. Takken, and P. Vranas. Design and Analysis of the BlueGene/L Torus Interconnection Network. IBM Research Report RC23025 (W0312--022), 3, 2003.Google ScholarGoogle Scholar
  6. N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, and J. Seizovic. Myrinet: A Gigabit-per-second Local Area Network. Micro, IEEE, 15(1), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30(1--7), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cheveresan, M. Ramsay, C. Feucht, and I. Sharapov. Characteristics of Workloads used in High Performance and Technical Computing. In International Conference on Supercomputing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Chisvin and R. J. Duckworth. Content-Addressable and Associative Memory: Alternatives to the Ubiquitous RAM. Computer, 22(7):51--64, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Claise. Cisco Systems NetFlow Services Export Version 9. RFC 3954, Internet Engineering Task Force, 2004.Google ScholarGoogle Scholar
  11. C. Clos. A Study of Non-blocking Switching Networks. Bell System Technical Journal, 32(2), 1953.Google ScholarGoogle Scholar
  12. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. USENIX Symposium on Operating Systems Design and Implementation, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. ACM Symposium on Operating Systems Principles, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. B. Downey. Evidence for Long-tailed Distributions in the Internet. ACM SIGCOMM Workshop on Internet Measurement, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Eatherton, G. Varghese, and Z. Dittia. Tree Bitmap : Hardware/Software IP Lookups with Incremental Updates. SIGCOMM Computer Communications Review, 34(2):97--122, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. B. Fred, T. Bonald, A. Proutiere, G. Régnié, and J. W. Roberts. Statistical Bandwidth Sharing: A Study of Congestion at Flow Level. SIGCOMM Computer Communication Review, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. ACM SIGOPS Operating Systems Review, 37(5), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Hopps. Analysis of an Equal-Cost Multi-Path Algorithm. RFC 2992, Internet Engineering Task Force, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Katz, D. Ward. BFD for IPv4 and IPv6 (Single Hop) (Draft). Technical report, Internet Engineering Task Force, 2008.Google ScholarGoogle Scholar
  21. E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems, 18(3), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Leiserson, Z. Abuhamdeh, D. Douglas, C. Feynman, M. Ganmukhi, J. Hill, D. Hillis, B. Kuszmaul, M. Pierre, D. Wells, et al. The Network Architecture of the Connection Machine CM-5 (Extended Abstract). ACM Symposium on Parallel Algorithms and Architectures, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. E. Leiserson. Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing. IEEE Transactions on Computers, 34(10):892--901, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Lockwood, N. McKeown, G. Watson, G. Gibb, P. Hartke, J. Naous, R. Raghuraman, and J. Luo. NetFPGA-An Open Platform for Gigabit-rate Network Switching and Routing. In IEEE International Conference on Microelectronic Systems Education, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Moy. OSPF Version 2. RFC 2328, Internet Engineering Task Force, 1998.Google ScholarGoogle Scholar
  26. F. Schmuck and R. Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. In USENIX Conference on File and Storage Technologies, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. R. Scott, T. Clark, and B. Bagheri. Scientific Parallel Computing. Princeton University Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. SGI Developer Central Open Source Linux XFS. XFS: A High-performance Journaling Filesystem. http://oss.sgi.com/projects/xfs/.Google ScholarGoogle Scholar
  29. V. Srinivasan and G. Varghese. Faster IP Lookups using Controlled Prefix Expansion. ACM Transactions on Computer Systems (TOCS), 17(1):1--40, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Thaler and C. Hopps. Multipath Issues in Unicast and Multicast Next-Hop Selection. RFC 2991, Internet Engineering Task Force, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Tucker and G. Robertson. Architecture and Applications of the Connection Machine. Computer, 21(8), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Vetter, S. Alam, J. Dunigan, T.H., M. Fahey, P. Roth, and P. Worley. Early Evaluation of the Cray XT3. In IEEE International Parallel and Distributed Processing Symposium, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Woodacre, D. Robb, D. Roe, and K. Feind. The SGI Altix 3000 Global Shared-Memory Architecture. SGI White Paper, 2003.Google ScholarGoogle Scholar

Index Terms

  1. A scalable, commodity data center network architecture

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGCOMM Computer Communication Review
        ACM SIGCOMM Computer Communication Review  Volume 38, Issue 4
        October 2008
        436 pages
        ISSN:0146-4833
        DOI:10.1145/1402946
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication
          August 2008
          452 pages
          ISBN:9781605581750
          DOI:10.1145/1402958

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 August 2008

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader