skip to main content
10.1145/3624062.3624203acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 List

Published:12 November 2023Publication History

ABSTRACT

As a rule, Top 500 class supercomputers are extensively benchmarked as part of their acceptance testing process. However, barring publicly posted LINPACK / HPCG results, most benchmark results are often inaccessible outside the hosting institution. Moreover, these higher level benchmarks do not provide easy answers to common questions such as “What is the realizable memory bandwidth?” or “What is the launch latency on the accelerator?” To partially address these issues, we executed selected single-node micro-benchmarks — focused on latencies and memory bandwidth — on every US Department of Energy system above rank 150 of the June 2023 Top 500 list, excepting NERSC’s Cori and ORNL’s Frontier TDS (now decommissioned or repurposed). We hope to provide an easy “first stop” reference for users of current Top 500 systems and inspire users and administrators of other Top 500 systems to similarly compile and make available benchmark results for their systems.

References

  1. 2017. NVIDIA Tesla V100 GPU Architecture. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdfGoogle ScholarGoogle Scholar
  2. 2018. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. https://www.netlib.org/benchmark/hplGoogle ScholarGoogle Scholar
  3. 2020. NVIDIA A100 Tensor Core GPU Architecture. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdfGoogle ScholarGoogle Scholar
  4. 2021. Introducing AMD CDNA 2 Architecture. https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdfGoogle ScholarGoogle Scholar
  5. 2021. mpi-benchmarks. https://github.com/intel/mpi-benchmarksGoogle ScholarGoogle Scholar
  6. 2022. BenchPress. https://github.com/bienz2/BenchPressGoogle ScholarGoogle Scholar
  7. 2022. HPCG Benchmark. https://www.hpcg-benchmark.org/Google ScholarGoogle Scholar
  8. 2023. alcf-mpi-benchmarks. https://github.com/argonne-lcf/alcf-mpi-benchmarksGoogle ScholarGoogle Scholar
  9. 2023. AMD Instinct MI250X Accelerator. https://www.amd.com/en/products/server-accelerators/instinct-mi250xGoogle ScholarGoogle Scholar
  10. 2023. benchmark. http://github.com/google/benchmarkGoogle ScholarGoogle Scholar
  11. 2023. Frontier User Guide. https://docs.olcf.ornl.gov/systems/frontier_user_guide.htmlGoogle ScholarGoogle Scholar
  12. 2023. Intel Xeon Gold 6154 Processor. https://ark.intel.com/content/www/us/en/ark/products/120495/intel-xeon-gold-6154-processor-24-75m-cache-3-00-ghz.htmlGoogle ScholarGoogle Scholar
  13. 2023. Intel Xeon Platinum 8268 Processor. https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.htmlGoogle ScholarGoogle Scholar
  14. 2023. OSU Micro-benchmarks. http://mvapich.cse.ohio- state.edu/benchmarks/Google ScholarGoogle Scholar
  15. 2023. Perlmutter Architecture. https://docs.nersc.gov/systems/perlmutter/architecture/Google ScholarGoogle Scholar
  16. 2023. Summit User Guide. https://docs.olcf.ornl.gov/systems/summit_user_guide.htmlGoogle ScholarGoogle Scholar
  17. 2023. TOP500 June 2023. https://www.top500.org/lists/top500/2023/06/Google ScholarGoogle Scholar
  18. David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, 1991. The NAS Parallel Benchmarks—Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. 158–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Michael Welcome, and Katherine Yelick. 2003. An Evaluation of Current High-Performance Networks. In Proceedings International Parallel and Distributed Processing Symposium. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  20. Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs. 2013. There goes the Neighborhood: Performance Degradation due to Nearby Jobs. In SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–12. https://doi.org/10.1145/2503210.2503247Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Devendar Bureddy, Hao Wang, Akshay Venkatesh, Sreeram Potluri, and Dhabaleswar K Panda. 2012. OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In Recent Advances in the Message Passing Interface: 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings 19. Springer, 110–120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paul Stewart Crozier, Heidi K Thornquist, Robert W Numrich, Alan B Williams, Harold Carter Edwards, Eric Richard Keiter, Mahesh Rajan, James M Willenbring, Douglas W Doerfler, and Michael Allen Heroux. 2009. Improving Performance via Mini-Applications.Technical Report SAND2009-5574. Sandia National Laboratories. https://www.osti.gov/biblo/993908.Google ScholarGoogle Scholar
  23. Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating Attainable Memory Bandwidth of Parallel Programming Models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262. https://doi.org/10.1504/IJCSE.2018.095847Google ScholarGoogle ScholarCross RefCross Ref
  24. Brice Goglin, Emmanuel Jeannot, Farouk Mansouri, and Guillaume Mercier. 2018. Hardware Topology Management in MPI Applications through Hierarchical Communicators. Parallel Comput. 76 (2018), 70–90. https://doi.org/10.1016/j.parco.2018.05.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Feng Ji, Ashwin M Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, and Xiaosong Ma. 2012. Efficient Intranode Communication in GPU-accelerated Systems. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops. IEEE, 1838–1847.Google ScholarGoogle Scholar
  26. Kawthar Shafie Khorassani, Ching-Hsiang Chu, Hari Subramoni, and Dhabaleswar K Panda. 2019. Performance Evaluation of MPI Libraries on GPU-enabled OpenPOWER Architectures: Early Experiences. In High Performance Computing: ISC High Performance 2019 International Workshops, Frankfurt, Germany, June 16-20, 2019, Revised Selected Papers 34. Springer, 361–378.Google ScholarGoogle Scholar
  27. Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R Tallent, and Kevin J Barker. 2019. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2019), 94–110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan Tallent, and Kevin Barker. 2018. Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 191–202.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jiuxing Liu, Balasubramanian Chandrasekaran, Weikuan Yu, Jiesheng Wu, Darius Buntinas, Sushmitha Kini, Dhabaleswar K Panda, and Pete Wyckoff. 2004. Microbenchmark Performance Comparison of High-Speed Cluster Interconnects. IEEE Micro 24, 1 (2004), 42–51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. John D McCalpin. 1995. Sustainable Memory Bandwidth in Current High Performance Computers. (1995). https://www.cs.virginia.edu/ mccalpin/papers/bandwidth/bandwidth.htmlGoogle ScholarGoogle Scholar
  31. John D McCalpin 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter 2, 19-25 (1995).Google ScholarGoogle Scholar
  32. Carl Pearson, Abdul Dakkak, Sarah Hashash, Cheng Li, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu. 2019. Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering. 209–218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mahesh Rajan, Doug Doerfler, and Simon Hammond. 2015. Trinity Benchmarks on Intel Xeon Phi (Knights Corner). Technical Report SAND2015-0454C. Sandia National Laboratories. https://www.osti.gov/biblo/1504115.Google ScholarGoogle Scholar
  34. Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation intel xeon phi product. Ieee micro 36, 2 (2016), 34–46.Google ScholarGoogle Scholar
  35. N Wichmann, C Nuss, P Carrier, R Olson, S Anderson, M Davis, R Baker, E Draeger, S Domino, A Agelastos, and M Rajan. 2015. Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks. Technical Report SAND2016-3635C. Sandia National Laboratories. https://www.osti.gov/biblio/1365199.Google ScholarGoogle Scholar

Index Terms

  1. Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 List

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
        November 2023
        2180 pages
        ISBN:9798400707858
        DOI:10.1145/3624062

        Copyright © 2023 ACM

        Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 November 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)102
        • Downloads (Last 6 weeks)12

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format