ABSTRACT
The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design cost-effective, reliable, and maintainable networks, and typically cannot explore much of the design space. We present Condor, our approach to enabling a rapid, efficient design cycle. Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures. Condor then uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria. We show that TDL supports concise descriptions of topologies such as fat-trees, BCube, and DCell; that we can generate known and novel variants of fat-trees with simple changes to a TDL file; and that we can synthesize large topologies in tens of seconds. We also show that Condor supports the daunting task of designing multi-phase network expansions that can be carried out on live networks.
Supplemental Material
- J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. HyperX: Topology, Routing, and Packaging of Efficient Large-scale Networks. In SC, page 41, 2009. Google ScholarDigital Library
- A. Akella, T. Benson, B. Chandrasekaran, C. Huang, B. Maggs, and D. Maltz. A Universal Approach to Data Center Network Design. In ICDCN, 2015. Google ScholarDigital Library
- M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM, pages 63--74, 2008. Google ScholarDigital Library
- A. Andreyev. Introducing data center fabric, the next-generation Facebook data center network. http://bit.ly/1zq5nsF, 2014.Google Scholar
- J. Arjona Aroca and A. Fernandez Anta. Bisection (Band)Width of Product Networks with Application to Data Centers. IEEE TPDS, 25(3):570--580, March 2014. Google ScholarDigital Library
- C. Bessiére, A. Chmeiss, and L. Saïs. Neighborhood-Based Variable Ordering Heuristics for the Constraint Satisfaction Problem. In T. Walsh, editor, Principles and Practice of Constraint Programming -- CP 2001, volume 2239 of Lecture Notes in Computer Science, pages 565--569. Springer Berlin Heidelberg, 2001. Google ScholarDigital Library
- Cisco Systems. Cisco's Massively Scalable Data Center. http://bit.ly/1relWo8.Google Scholar
- A. R. Curtis, T. Carpenter, M. Elsheikh, A. López-Ortiz, and S. Keshav. REWIRE: An Optimization-based Framework for Unstructured Data Center Network Design. In INFOCOM, pages 1116--1124. IEEE, 2012.Google ScholarCross Ref
- A. R. Curtis, S. Keshav, and A. Lopez-Ortiz. LEGUP: Using Heterogeneity to Reduce the Cost of Data Center Network Upgrades. In CoNEXT, pages 14:1--14:12, 2010. Google ScholarDigital Library
- B. Dieter and H. Dietz. A Web-Based Tool for Optimized Cluster Design. http://bit.ly/1fyovAl, 2007.Google Scholar
- W. R. Dieter and H. G. Dietz. Automatic Exploration and Characterization of the Cluster Design Space. Tech. Rep. TR-ECE-2005-04--25-01, ECE Dept, U. Kentucky, 2005.Google Scholar
- D. Frost and R. Dechter. Look-ahead value ordering for constraint satisfaction problems. In IJCAI, pages 572--578, 1995. Google ScholarDigital Library
- P. Gill, N. Jain, and N. Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In SIGCOMM, pages 350--361, 2011. Google ScholarDigital Library
- Google, Inc. or-tools: the Google Operations Research Suite. https://code.google.com/p/or-tools/.Google Scholar
- C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. In SIGCOMM, pages 63--74, 2009. Google ScholarDigital Library
- C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: A Scalable and Fault-tolerant Network Structure for Data Centers. In SIGCOMM, pages 75--86, 2008. Google ScholarDigital Library
- A. A. Hagberg, D. A. Schult, and P. J. Swart. Exploring Network Structure, Dynamics, and Function using NetworkX. In SciPy, pages 11--15, Aug. 2008.Google Scholar
- H. Hanani. The existence and construction of balanced incomplete block designs. The Annals of Mathematical Statistics, pages 361--386, 1961.Google ScholarCross Ref
- H. Hanani. Balanced incomplete block designs and related designs. Discrete Mathematics, 11(3), 1975. Google ScholarDigital Library
- S. A. Jyothi, A. Singla, B. Godfrey, and A. Kolla. Measuring and Understanding Throughput of Network Topologies. arXiv preprint 1402.2531, 2014.Google Scholar
- S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The Nature of Data Center Traffic: Measurements & Analysis. In IMC, pages 202--208, 2009. Google ScholarDigital Library
- J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-efficient Topology for High-radix Networks. In ISCA, pages 126--137, 2007. Google ScholarDigital Library
- M. Li, D. Subhraveti, A. R. Butt, A. Khasymski, and P. Sarkar. CAM: A Topology Aware Minimum Cost Flow Based Resource Manager for MapReduce Applications in the Cloud. In HPDC, pages 211--222. ACM, 2012. Google ScholarDigital Library
- H. H. Liu, X. Wu, M. Zhang, L. Yuan, R. Wattenhofer, and D. Maltz. zUpdate: Updating Data Center Networks with Zero Loss. In SIGCOMM, pages 411--422, 2013. Google ScholarDigital Library
- V. Liu, D. Halperin, A. Krishnamurthy, and T. E. Anderson. F10: A Fault-Tolerant Engineered Network. In NSDI, pages 399--412, 2013. Google ScholarDigital Library
- Y. J. Liu, P. X. Gao, B. Wong, and S. Keshav. Quartz: A New Design Element for Low-Latency DCNs. In SIGCOMM, pages 283--294, 2014. Google ScholarDigital Library
- B. Mandal. Linear integer programming approach to construction of balanced incomplete block designs. Communications in Statistics-Simulation and Computation, 44(6):1405--1411, 2015.Google ScholarCross Ref
- J. Mudigonda, P. Yalagandula, and J. C. Mogul. Taming the Flying Cable Monster: A Topology Design and Optimization Framework for Data-Center Networks. In USENIX Annual Technical Conference, 2011. Google ScholarDigital Library
- S. R. Öhring, M. Ibel, S. K. Das, and M. J. Kumar. On Generalized Fat Trees. In Parallel Processing Symposium, pages 37--44. IEEE, 1995. Google ScholarDigital Library
- B. Palanisamy, A. Singh, L. Liu, and B. Jain. Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud. In SC, pages 58:1--58:11. ACM, 2011. Google ScholarDigital Library
- L. Popa, S. Ratnasamy, G. Iannaccone, A. Krishnamurthy, and I. Stoica. A Cost Comparison of Datacenter Network Architectures. In CoNEXT, pages 16:1--16:12, 2010. Google ScholarDigital Library
- A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hoelzle, S. Stuart, and A. Vahdat. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In SIGCOMM, 2015. Google ScholarDigital Library
- A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking Data Centers Randomly. In NSDI, pages 225--238, 2012. Google ScholarDigital Library
- K. S. Solnushkin. Automated Design of Two-Layer Fat-Tree Networks. arXiv preprint 1301.6179, 2013.Google Scholar
- Z. Taylor and S. Ranganathan. Designing High Availability Systems: DFSS and Classical Reliability Techniques with Practical Real Life Examples. John Wiley & Sons, 2013.Google ScholarCross Ref
- E. Tsang. Foundations of Constraint Satisfaction, volume 289. Academic Press, London, 1993.Google Scholar
- A. Varga et al. The OMNeTGoogle Scholar
- discrete event simulation system. In ESM2001, 2001.Google Scholar
- A. Varga and G. Pongor. Flexible topology description language for simulation programs. In ESS97, 1997.Google Scholar
- M. Walraed-Sullivan, J. Padhye, and D. A. Maltz. Theia: Simple and Cheap Networking for Ultra-Dense Data Centers. In HotNets, page 26. ACM, 2014. Google ScholarDigital Library
- M. Walraed-Sullivan, A. Vahdat, and K. Marzullo. Aspen Trees: Balancing Data Center Fault Tolerance, Scalability and Cost. In CoNEXT, pages 85--96, 2013. Google ScholarDigital Library
- X. Wen, K. Chen, Y. Chen, Y. Liu, Y. Xia, and C. Hu. VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter. In ICDCS, pages 12--21. IEEE, June 2012. Google ScholarDigital Library
- F. Yates. Incomplete randomized blocks. Annals of Eugenics, 7(2):121--140, 1936.Google ScholarCross Ref
- E. Zahavi, I. Keslassy, and A. Kolodny. Quasi Fat Trees for HPC Clouds and Their Fault-Resilient Closed-Form Routing. In Hot Interconnects, pages 41--48. IEEE, 2014. Google ScholarDigital Library
- J. Zhou, M. Tewari, M. Zhu, A. Kabbani, L. Poutievski, A. Singh, and A. Vahdat. WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers. In EuroSys, page 5, 2014. Google ScholarDigital Library
Index Terms
- Condor: Better Topologies Through Declarative Design
Recommendations
Condor: Better Topologies Through Declarative Design
SIGCOMM'15The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design cost-effective, reliable, and maintainable networks, and typically cannot ...
Topology Design of Network-Coding-Based Multicast Networks
It is anticipated that a large amount of multicast traffic need to be supported in the future communication networks. Network coding technique proposed recently is promising for establishing multicast connections with significantly lower bandwidth ...
Robust Topology Design in Time-Evolving and Predictable Spacecraft Network with Node Efficiency
ICNCC '17: Proceedings of the 2017 VI International Conference on Network, Communication and ComputingIn spacecraft network, each nodes move in a predictable trajectory and the communication opportunities between them are often very short, therefore, network topology changes frequent and predictable. However, with the expansion of spacecraft network, ...
Comments