research-article

Unifying on-chip and inter-node switching within the Anton 2 network

Authors:
Brian Towles

D. E. Shaw Research, New York, NY

D. E. Shaw Research, New York, NY
View Profile

,
J. P. Grossman

D. E. Shaw Research, New York, NY

D. E. Shaw Research, New York, NY
View Profile

,
Brian Greskamp

D. E. Shaw Research, New York, NY

D. E. Shaw Research, New York, NY
View Profile

,
David E. Shaw

D. E. Shaw Research, New York, NY and Columbia University, New York, NY

D. E. Shaw Research, New York, NY and Columbia University, New York, NY
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 42 Issue 3June 2014pp 1–12https://doi.org/10.1145/2678373.2665677

Published:14 June 2014Publication History

ACM SIGARCH Computer Architecture News

Abstract

The design of network architectures has become increasingly complex as the chips connected by inter-node networkshave emerged as distributed systems in their own right, complete with their own on-chip networks. In Anton 2, a massively parallel special-purpose supercomputer for molecular dynamics simulations, we managed this complexity by reusing the on-chip network as a switch for inter-node traffic. This unified network approach introduces several design challenges. Maintaining fairness within the inter-node network is difficult, as each hop becomes a sequence of many on-chip routing decisions. We addressed this problem with an inverse-weighted arbiter that ensures fairness with low implementation costs. Balancing the load of inter-node traffic across the on-chip network is also critical, and we adopted an optimization approach to design an appropriate routing algorithm. Finally, the on-chip routers carry inter-node traffic, so they must implement inter-node virtual channels to avoid deadlock. In order to keep the routers small and fast, we developed a deadlock-free routing algorithm that reduces the number of virtual channels by one-third relative to previous approaches. The resulting Anton 2 network implementation efficiently utilizes its inter-node channels and provides low messaging latency, while occupying a modest amount of silicon area

References

D. Abts and D. Weisser, "Age-based packet arbitration in large-radix k-ary n-cubes," in Proceedings of the 2007 ACM/IEEE conference on Supercomputing (SC), Nov. 2007, pp. 1--11. Google ScholarDigital Library
A. Agarwal, "Limits on interconnection network performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398--412, 1991. Google ScholarDigital Library
B. Alverson, "Cray high speed networking," in Proceedings of the 20th Annual Symposium on High-Performance Interconnects (HOTI), Aug. 2012.Google Scholar
J. Balfour and W. J. Dally, "Design tradeoffs for tiled CMP on-chip networks," in Proceedings of the 20th annual International Conference on Supercomputing (ICS), Jun. 2006, pp. 187--198. Google ScholarDigital Library
S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook, "TILE64 processor: A 64-core SoC with mesh interconnect," in Proceedings of the International Solid-State Circuits Conference (ISSCC), Feb. 2008, pp. 88--89, 598.Google Scholar
L. Benini and G. De Micheli, "Networks on chips: a new SoC paradigm," Computer, vol. 35, no. 1, pp. 70--78, Jan. 2002. Google ScholarDigital Library
T. Bjerregaard and S. Mahadevan, "A survey of research and practices of network-on-chip," ACM Computing Surveys, vol. 38, no. 1, 2006. Google ScholarDigital Library
G. Chrysos, "IntelR Xeon PhiTM coprocessor (codename Knights Corner)," in Proceedings of the 24th Annual IEEE Hot Chips Symposium, 2012.Google Scholar
W. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Francisco: Morgan Kaufmann Publishers Inc., 2003. Google ScholarDigital Library
W. Dally, "Performance analysis of k-ary n-cube interconnection networks," IEEE Transactions on Computers, vol. 39, no. 6, pp. 775--785, 1990. Google ScholarDigital Library
W. Dally and C. Seitz, "Deadlock-free message routing in multiprocessor interconnection networks," IEEE Transactions on Computers, vol. 36, no. 5, pp. 547--553, 1987. Google ScholarDigital Library
E. DeLano, "Tukwila -- a quad-core IntelR ItaniumR processor," in Proceedings of the 20th Annual IEEE Hot Chips Symposium, 2012.Google Scholar
R. O. Dror, J. P. Grossman, K. M. Mackenzie, B. Towles, E. Chow, J. K. Salmon, C. Young, J. A. Bank, B. Batson, M. M. Deneroff, J. S. Kuskin, R. H. Larson, M. A. Moraes, and D. E. Shaw, "Exploiting 162- nanosecond end-to-end communication latency on Anton," in Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2010, pp. 1--12. Google ScholarDigital Library
N. Eisley and L.-S. Peh, "High-level power analysis for on-chip networks," in Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Sep. 2004, pp. 104--115. Google ScholarDigital Library
J. P. Grossman, J. S. Kuskin, J. A. Bank, M. Theobald, R. O. Dror, D. J. Ierardi, R. H. Larson, U. B. Schafer, B. Towles, C. Young, and D. E.Shaw, "Hardware support for fine-grained event-driven computation in Anton 2," in Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Apr. 2013, pp. 549--560. Google ScholarDigital Library
B. Grot, S. Keckler, and O. Mutlu, "Preemptive virtual clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2009, pp. 268--279. Google ScholarDigital Library
P. Gupta and N. McKeown, "Designing and implementing a fast crossbar scheduler," IEEE Micro, vol. 19, no. 1, pp. 20--28, 1999. Google ScholarDigital Library
J. Lee, M. C. Ng, and K. Asanović, "Globally-synchronized frames for guaranteed quality-of-service in on-chip networks," in Proceedings of the 35th annual International Symposium on Computer Architecture (ISCA), Jun. 2008, pp. 89--100. Google ScholarDigital Library
M. M. Lee, J. Kim, D. Abts, M. Marty, and J. W. Lee, "Probabilistic distance-based arbitration: providing equality of service for many-core CMPs," in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2010, pp. 509--519. Google ScholarDigital Library
T. Nesson and S. L. Johnsson, "ROMM routing on mesh and torus networks," in Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Jul. 1995, pp. 275--287. Google ScholarDigital Library
T. D. Nguyen and L. Snyder, "Performance analysis of a minimal adaptive router," in Parallel Computer Routing and Communication: 1st International Workshop (PCRCW), May 1994, pp. 31--44. Google ScholarDigital Library
J. Owens, W. Dally, R. Ho, D. N. Jayasimha, S. Keckler, and L.-S. Peh, "Research challenges for on-chip interconnection networks," IEEE Micro, vol. 27, no. 5, pp. 96--108, 2007. Google ScholarDigital Library
S. Scott and G. Thorson, "The Cray T3E network: Adaptive routing in a high performance 3D torus," in Proceedings of the 4th Annual Symposium on High-Performance Interconnects (HOTI), Aug. 1996.Google Scholar
D. E. Shaw, R. O. Dror, J. K. Salmon, J. P. Grossman, K. M. Mackenzie, J. A. Bank, C. Young, M. M. Deneroff, B. Batson, K. J. Bowers, E. Chow, M. P. Eastwood, D. J. Ierardi, J. L. Klepeis, J. S. Kuskin, R. H. Larson, K. Lindorff-Larsen, P. Maragakis, M. A. Moraes, S. Piana, Y. Shan, and B. Towles, "Millisecond-scale molecular dynamics simulations on Anton," in Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2009, pp. 1--11. Google ScholarDigital Library
A. Singh,W. J. Dally, B. Towles, and A. K. Gupta, "Locality-preserving randomized oblivious routing on torus networks," in Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Aug. 2002, pp. 9--13. Google ScholarDigital Library
Taiwan Semiconductor Manufacturing Company (TSMC), "TSMC first to deliver 40nm process technology," Mar. 2008. Available: http://www.tsmc.com/tsmcdotcom/PRListingNewsAction.do? action=detail&newsid=2561Google Scholar
B. Towles and W. J. Dally, "Worst-case traffic for oblivious routing functions," in Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Aug. 2002, pp. 1--8. Google ScholarDigital Library
L. G. Valiant and G. J. Brebner, "Universal schemes for parallel communication," in Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing (STOC), May 1981, pp. 263--277. Google ScholarDigital Library
H. Wang, L.-S. Peh, and S. Malik, "A technology-aware and energyoriented topology exploration for on-chip networks," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Mar. 2005, pp. 1238--1243. Google ScholarDigital Library

Recommendations

Unifying on-chip and inter-node switching within the Anton 2 network
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

The design of network architectures has become increasingly complex as the chips connected by inter-node networkshave emerged as distributed systems in their own right, complete with their own on-chip networks. In Anton 2, a massively parallel special-...
Read More
A Torus-Based Hierarchical Optical-Electronic Network-on-Chip for Multiprocessor System-on-Chip

Networks-on-chip (NoCs) are emerging as a key on-chip communication architecture for multiprocessor systems-on-chip (MPSoCs). Optical communication technologies are introduced to NoCs in order to empower ultra-high bandwidth with low power consumption. ...
Read More
An improved transmission scheme for error-prone inter-chip network-on-chip communication links implemented on FPGAs
FPGAworld '13: Proceedings of the 10th FPGAworld Conference

Network-on-Chip (NoC) is an alternative to traditional busses for faster interconnect mechanism. The aim is to have infinite scalability, and this implies the possibility to extend the on-chip NoC communication protocol off-chip. To gain wholesome ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 42, Issue 3
ISCA '14
June 2014
552 pages
ISSN:0163-5964
DOI:10.1145/2678373
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
June 2014
566 pages
ISBN:9781479943944
General Chairs:
Pen-Chung Yew
University of Minnesota
,
Antonia Zhai
University of Minnesota
,
Program Chair:
Steve Keckler
NVIDIA/University of Texas at Austin
Copyright © 2014 IEEE
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2014
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 479
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unifying on-chip and inter-node switching within the Anton 2 network

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

Unifying on-chip and inter-node switching within the Anton 2 network

A Torus-Based Hierarchical Optical-Electronic Network-on-Chip for Multiprocessor System-on-Chip

An improved transmission scheme for error-prone inter-chip network-on-chip communication links implemented on FPGAs