skip to main content
research-article

Unifying on-chip and inter-node switching within the Anton 2 network

Published:14 June 2014Publication History
Skip Abstract Section

Abstract

The design of network architectures has become increasingly complex as the chips connected by inter-node networkshave emerged as distributed systems in their own right, complete with their own on-chip networks. In Anton 2, a massively parallel special-purpose supercomputer for molecular dynamics simulations, we managed this complexity by reusing the on-chip network as a switch for inter-node traffic. This unified network approach introduces several design challenges. Maintaining fairness within the inter-node network is difficult, as each hop becomes a sequence of many on-chip routing decisions. We addressed this problem with an inverse-weighted arbiter that ensures fairness with low implementation costs. Balancing the load of inter-node traffic across the on-chip network is also critical, and we adopted an optimization approach to design an appropriate routing algorithm. Finally, the on-chip routers carry inter-node traffic, so they must implement inter-node virtual channels to avoid deadlock. In order to keep the routers small and fast, we developed a deadlock-free routing algorithm that reduces the number of virtual channels by one-third relative to previous approaches. The resulting Anton 2 network implementation efficiently utilizes its inter-node channels and provides low messaging latency, while occupying a modest amount of silicon area

References

  1. D. Abts and D. Weisser, "Age-based packet arbitration in large-radix k-ary n-cubes," in Proceedings of the 2007 ACM/IEEE conference on Supercomputing (SC), Nov. 2007, pp. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Agarwal, "Limits on interconnection network performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398--412, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Alverson, "Cray high speed networking," in Proceedings of the 20th Annual Symposium on High-Performance Interconnects (HOTI), Aug. 2012.Google ScholarGoogle Scholar
  4. J. Balfour and W. J. Dally, "Design tradeoffs for tiled CMP on-chip networks," in Proceedings of the 20th annual International Conference on Supercomputing (ICS), Jun. 2006, pp. 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook, "TILE64 processor: A 64-core SoC with mesh interconnect," in Proceedings of the International Solid-State Circuits Conference (ISSCC), Feb. 2008, pp. 88--89, 598.Google ScholarGoogle Scholar
  6. L. Benini and G. De Micheli, "Networks on chips: a new SoC paradigm," Computer, vol. 35, no. 1, pp. 70--78, Jan. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Bjerregaard and S. Mahadevan, "A survey of research and practices of network-on-chip," ACM Computing Surveys, vol. 38, no. 1, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Chrysos, "IntelR Xeon PhiTM coprocessor (codename Knights Corner)," in Proceedings of the 24th Annual IEEE Hot Chips Symposium, 2012.Google ScholarGoogle Scholar
  9. W. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Francisco: Morgan Kaufmann Publishers Inc., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Dally, "Performance analysis of k-ary n-cube interconnection networks," IEEE Transactions on Computers, vol. 39, no. 6, pp. 775--785, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Dally and C. Seitz, "Deadlock-free message routing in multiprocessor interconnection networks," IEEE Transactions on Computers, vol. 36, no. 5, pp. 547--553, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. DeLano, "Tukwila -- a quad-core IntelR ItaniumR processor," in Proceedings of the 20th Annual IEEE Hot Chips Symposium, 2012.Google ScholarGoogle Scholar
  13. R. O. Dror, J. P. Grossman, K. M. Mackenzie, B. Towles, E. Chow, J. K. Salmon, C. Young, J. A. Bank, B. Batson, M. M. Deneroff, J. S. Kuskin, R. H. Larson, M. A. Moraes, and D. E. Shaw, "Exploiting 162- nanosecond end-to-end communication latency on Anton," in Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2010, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Eisley and L.-S. Peh, "High-level power analysis for on-chip networks," in Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Sep. 2004, pp. 104--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. P. Grossman, J. S. Kuskin, J. A. Bank, M. Theobald, R. O. Dror, D. J. Ierardi, R. H. Larson, U. B. Schafer, B. Towles, C. Young, and D. E.Shaw, "Hardware support for fine-grained event-driven computation in Anton 2," in Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Apr. 2013, pp. 549--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Grot, S. Keckler, and O. Mutlu, "Preemptive virtual clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2009, pp. 268--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Gupta and N. McKeown, "Designing and implementing a fast crossbar scheduler," IEEE Micro, vol. 19, no. 1, pp. 20--28, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lee, M. C. Ng, and K. Asanović, "Globally-synchronized frames for guaranteed quality-of-service in on-chip networks," in Proceedings of the 35th annual International Symposium on Computer Architecture (ISCA), Jun. 2008, pp. 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. M. Lee, J. Kim, D. Abts, M. Marty, and J. W. Lee, "Probabilistic distance-based arbitration: providing equality of service for many-core CMPs," in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2010, pp. 509--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Nesson and S. L. Johnsson, "ROMM routing on mesh and torus networks," in Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Jul. 1995, pp. 275--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. D. Nguyen and L. Snyder, "Performance analysis of a minimal adaptive router," in Parallel Computer Routing and Communication: 1st International Workshop (PCRCW), May 1994, pp. 31--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Owens, W. Dally, R. Ho, D. N. Jayasimha, S. Keckler, and L.-S. Peh, "Research challenges for on-chip interconnection networks," IEEE Micro, vol. 27, no. 5, pp. 96--108, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Scott and G. Thorson, "The Cray T3E network: Adaptive routing in a high performance 3D torus," in Proceedings of the 4th Annual Symposium on High-Performance Interconnects (HOTI), Aug. 1996.Google ScholarGoogle Scholar
  24. D. E. Shaw, R. O. Dror, J. K. Salmon, J. P. Grossman, K. M. Mackenzie, J. A. Bank, C. Young, M. M. Deneroff, B. Batson, K. J. Bowers, E. Chow, M. P. Eastwood, D. J. Ierardi, J. L. Klepeis, J. S. Kuskin, R. H. Larson, K. Lindorff-Larsen, P. Maragakis, M. A. Moraes, S. Piana, Y. Shan, and B. Towles, "Millisecond-scale molecular dynamics simulations on Anton," in Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2009, pp. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Singh,W. J. Dally, B. Towles, and A. K. Gupta, "Locality-preserving randomized oblivious routing on torus networks," in Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Aug. 2002, pp. 9--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Taiwan Semiconductor Manufacturing Company (TSMC), "TSMC first to deliver 40nm process technology," Mar. 2008. Available: http://www.tsmc.com/tsmcdotcom/PRListingNewsAction.do? action=detail&newsid=2561Google ScholarGoogle Scholar
  27. B. Towles and W. J. Dally, "Worst-case traffic for oblivious routing functions," in Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Aug. 2002, pp. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. G. Valiant and G. J. Brebner, "Universal schemes for parallel communication," in Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing (STOC), May 1981, pp. 263--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Wang, L.-S. Peh, and S. Malik, "A technology-aware and energyoriented topology exploration for on-chip networks," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Mar. 2005, pp. 1238--1243. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 42, Issue 3
    ISCA '14
    June 2014
    552 pages
    ISSN:0163-5964
    DOI:10.1145/2678373
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
      June 2014
      566 pages
      ISBN:9781479943944

    Copyright © 2014 IEEE

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 June 2014

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader