research-article

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 List

Authors:
Christopher M Siefert

Sandia National Laboratories, USA

Sandia National Laboratories, USA

0009-0003-2116-125X
View Profile

,
Carl Pearson

Sandia National Laboratories, USA

Sandia National Laboratories, USA

0000-0001-6481-970X
View Profile

,
Stephen L. Olivier

Sandia National Laboratories, United States of America

Sandia National Laboratories, United States of America

0000-0001-6247-8980
View Profile

,
Andrey Prokopenko

Oak Ridge National Laboratory, United States of America

Oak Ridge National Laboratory, United States of America

0000-0003-3616-5504
View Profile

,
Jonathan Hu

Sandia National Laboratories, USA

Sandia National Laboratories, USA

0000-0003-4636-3619
View Profile

,
Timothy J. Fuller

Sandia National Laboratories, USA

Sandia National Laboratories, USA

0009-0004-5781-626X
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 1298–1305https://doi.org/10.1145/3624062.3624203

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1298–1305

ABSTRACT

As a rule, Top 500 class supercomputers are extensively benchmarked as part of their acceptance testing process. However, barring publicly posted LINPACK / HPCG results, most benchmark results are often inaccessible outside the hosting institution. Moreover, these higher level benchmarks do not provide easy answers to common questions such as “What is the realizable memory bandwidth?” or “What is the launch latency on the accelerator?” To partially address these issues, we executed selected single-node micro-benchmarks — focused on latencies and memory bandwidth — on every US Department of Energy system above rank 150 of the June 2023 Top 500 list, excepting NERSC’s Cori and ORNL’s Frontier TDS (now decommissioned or repurposed). We hope to provide an easy “first stop” reference for users of current Top 500 systems and inspire users and administrators of other Top 500 systems to similarly compile and make available benchmark results for their systems.

References

2017. NVIDIA Tesla V100 GPU Architecture. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdfGoogle Scholar
2018. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. https://www.netlib.org/benchmark/hplGoogle Scholar
2020. NVIDIA A100 Tensor Core GPU Architecture. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdfGoogle Scholar
2021. Introducing AMD CDNA 2 Architecture. https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdfGoogle Scholar
2021. mpi-benchmarks. https://github.com/intel/mpi-benchmarksGoogle Scholar
2022. BenchPress. https://github.com/bienz2/BenchPressGoogle Scholar
2022. HPCG Benchmark. https://www.hpcg-benchmark.org/Google Scholar
2023. alcf-mpi-benchmarks. https://github.com/argonne-lcf/alcf-mpi-benchmarksGoogle Scholar
2023. AMD Instinct MI250X Accelerator. https://www.amd.com/en/products/server-accelerators/instinct-mi250xGoogle Scholar
2023. benchmark. http://github.com/google/benchmarkGoogle Scholar
2023. Frontier User Guide. https://docs.olcf.ornl.gov/systems/frontier_user_guide.htmlGoogle Scholar
2023. Intel Xeon Gold 6154 Processor. https://ark.intel.com/content/www/us/en/ark/products/120495/intel-xeon-gold-6154-processor-24-75m-cache-3-00-ghz.htmlGoogle Scholar
2023. Intel Xeon Platinum 8268 Processor. https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.htmlGoogle Scholar
2023. OSU Micro-benchmarks. http://mvapich.cse.ohio- state.edu/benchmarks/Google Scholar
2023. Perlmutter Architecture. https://docs.nersc.gov/systems/perlmutter/architecture/Google Scholar
2023. Summit User Guide. https://docs.olcf.ornl.gov/systems/summit_user_guide.htmlGoogle Scholar
2023. TOP500 June 2023. https://www.top500.org/lists/top500/2023/06/Google Scholar
David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, 1991. The NAS Parallel Benchmarks—Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. 158–165.Google ScholarDigital Library
Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Michael Welcome, and Katherine Yelick. 2003. An Evaluation of Current High-Performance Networks. In Proceedings International Parallel and Distributed Processing Symposium. IEEE.Google ScholarCross Ref
Abhinav Bhatele, Kathryn Mohror, Steven H. Langer, and Katherine E. Isaacs. 2013. There goes the Neighborhood: Performance Degradation due to Nearby Jobs. In SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–12. https://doi.org/10.1145/2503210.2503247Google ScholarDigital Library
Devendar Bureddy, Hao Wang, Akshay Venkatesh, Sreeram Potluri, and Dhabaleswar K Panda. 2012. OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In Recent Advances in the Message Passing Interface: 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings 19. Springer, 110–120.Google ScholarDigital Library
Paul Stewart Crozier, Heidi K Thornquist, Robert W Numrich, Alan B Williams, Harold Carter Edwards, Eric Richard Keiter, Mahesh Rajan, James M Willenbring, Douglas W Doerfler, and Michael Allen Heroux. 2009. Improving Performance via Mini-Applications.Technical Report SAND2009-5574. Sandia National Laboratories. https://www.osti.gov/biblo/993908.Google Scholar
Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating Attainable Memory Bandwidth of Parallel Programming Models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262. https://doi.org/10.1504/IJCSE.2018.095847Google ScholarCross Ref
Brice Goglin, Emmanuel Jeannot, Farouk Mansouri, and Guillaume Mercier. 2018. Hardware Topology Management in MPI Applications through Hierarchical Communicators. Parallel Comput. 76 (2018), 70–90. https://doi.org/10.1016/j.parco.2018.05.006Google ScholarDigital Library
Feng Ji, Ashwin M Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, and Xiaosong Ma. 2012. Efficient Intranode Communication in GPU-accelerated Systems. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops. IEEE, 1838–1847.Google Scholar
Kawthar Shafie Khorassani, Ching-Hsiang Chu, Hari Subramoni, and Dhabaleswar K Panda. 2019. Performance Evaluation of MPI Libraries on GPU-enabled OpenPOWER Architectures: Early Experiences. In High Performance Computing: ISC High Performance 2019 International Workshops, Frankfurt, Germany, June 16-20, 2019, Revised Selected Papers 34. Springer, 361–378.Google Scholar
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R Tallent, and Kevin J Barker. 2019. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2019), 94–110.Google ScholarDigital Library
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan Tallent, and Kevin Barker. 2018. Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 191–202.Google ScholarCross Ref
Jiuxing Liu, Balasubramanian Chandrasekaran, Weikuan Yu, Jiesheng Wu, Darius Buntinas, Sushmitha Kini, Dhabaleswar K Panda, and Pete Wyckoff. 2004. Microbenchmark Performance Comparison of High-Speed Cluster Interconnects. IEEE Micro 24, 1 (2004), 42–51.Google ScholarDigital Library
John D McCalpin. 1995. Sustainable Memory Bandwidth in Current High Performance Computers. (1995). https://www.cs.virginia.edu/ mccalpin/papers/bandwidth/bandwidth.htmlGoogle Scholar
John D McCalpin 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter 2, 19-25 (1995).Google Scholar
Carl Pearson, Abdul Dakkak, Sarah Hashash, Cheng Li, I-Hsin Chung, Jinjun Xiong, and Wen-Mei Hwu. 2019. Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering. 209–218.Google ScholarDigital Library
Mahesh Rajan, Doug Doerfler, and Simon Hammond. 2015. Trinity Benchmarks on Intel Xeon Phi (Knights Corner). Technical Report SAND2015-0454C. Sandia National Laboratories. https://www.osti.gov/biblo/1504115.Google Scholar
Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation intel xeon phi product. Ieee micro 36, 2 (2016), 34–46.Google Scholar
N Wichmann, C Nuss, P Carrier, R Olson, S Anderson, M Davis, R Baker, E Draeger, S Domino, A Agelastos, and M Rajan. 2015. Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks. Technical Report SAND2016-3635C. Sandia National Laboratories. https://www.osti.gov/biblio/1365199.Google Scholar

Index Terms

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 List
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Hardware
  1. Hardware test
    1. Testing with distributed and parallel systems

Recommendations

Top500 versus sustained performance: the top problems with the top500 list - and what to do about them
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

A popular U.S. talk show host uses "top 10" lists to critique events and culture every night. Our HPC industry is captivated by another list, the TOP500 list, as a way to track HPC systems' performance based on FLOPS/S assessed by a single, long-lived ...
Read More
DEISA--Distributed European Infrastructure for Supercomputing Applications

The paper presents an overview of the current research and achievements of the DEISA project, with a focus on the general concept of the infrastructure, the operational model, application projects and science communities, the DEISA Extreme Computing ...
Read More
Behavior of MDynaMix on Intel Xeon Phi Coprocessor
AIMS '13: Proceedings of the 2013 1st International Conference on Artificial Intelligence, Modelling and Simulation

Over the years, computational science has witnessed exceptional growth, but still lagging in efficient programming to effectively undertake research activities. Today, developments in almost all areas of Science & Technology heavily rely on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
high performance computing
micro-benchmarking
supercomputing
top500
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 102
  Total Downloads
- Downloads (Last 12 months)102
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 List

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Top500 versus sustained performance: the top problems with the top500 list - and what to do about them

DEISA--Distributed European Infrastructure for Supercomputing Applications

Behavior of MDynaMix on Intel Xeon Phi Coprocessor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Latency and Bandwidth Microbenchmarks of US Department of Energy Systems in the June 2023 Top 500 List

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Top500 versus sustained performance: the top problems with the top500 list - and what to do about them

DEISA--Distributed European Infrastructure for Supercomputing Applications

Behavior of MDynaMix on Intel Xeon Phi Coprocessor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media