ABSTRACT
The advent of Exascale computing invites an assessment of existing best practices for developing application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programmability, tuning, and portability considerations that are key to moving applications from existing systems to future installations. A set of representative workloads provides case studies for general system and software testing. We evaluate the use of early access systems for development across several generations of hardware. Finally, we discuss how best practices were identified and disseminated to the community through a wide range of activities including user-guides and trainings. We conclude with recommendations for ensuring application readiness on future leadership computing systems.
- Ahmad Abdelfattah, Stanimire Tomov, and Jack Dongarra. 2019. Progressive optimization of batched LU factorization on GPUs. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--6.Google ScholarCross Ref
- Inc. Advanced Micro Devices. 2016. Hip: c++ heterogeneous-compute interface for portability. https://www.olcf.ornl.gov/calendar/intro-to-amd-gpu-programming-with-hip/.Google Scholar
- Anton Afanasyev, Mauro Bianco, Lukas Mosimann, Carlos Osuna, Felix Thaler, Hannes Vogt, Oliver Fuhrer, Joost VandeVondele, and Thomas C. Schulthess. 2021. Gridtools: a framework for portable weather and climate applications. SoftwareX, 15, 100707. Google ScholarCross Ref
- H.M. Aktulga, J.C. Fogarty, S.A. Pandit, and A.Y. Grama. 2012. Parallel reactive molecular dynamics: numerical methods and algorithmic techniques. Parallel Computing, 38, 4, 245--259. Google ScholarDigital Library
- A. S. Almgren, J. B. Bell, M. J. Lijewski, Z. Lukić, and E. Van Andel. 2013. Nyx: A Massively Parallel AMR Code for Computational Cosmology. The Astrophysical Journal, 765, (Mar. 2013), 39, 39. arXiv: 1301.4498. Google ScholarCross Ref
- [n. d.] Amr-wind. https://github.com/exawind/amr-wind. Accessed: 2023-04-02. ().Google Scholar
- Cody J Balos, David J Gardner, Carol S Woodward, and Daniel R Reynolds. 2021. Enabling GPU accelerated computing in the SUNDIALS time integration library. Parallel Computing, 108, 102836.Google ScholarDigital Library
- Giuseppe M. J. Barca, Calum Snowdon, Jorge L. Galvez Vallejo, Fazeleh Kazemian, Alistair P. Rendell, and Mark S. Gordon. 2022. Scaling correlated fragment molecular orbital calculations on Summit. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 1--14. Google ScholarCross Ref
- Giuseppe M. J. Barca et al. 2020. Recent developments in the general atomic and molecular electronic structure system. en. The Journal of Chemical Physics, 152, 15, (Apr. 2020), 154102. Google ScholarCross Ref
- A. Beckingsale et al. 2019. Raja: portable performance for large-scale scientific applications. In P3HPC: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC. IEEE.Google Scholar
- Peter M Caldwell et al. 2019. The DOE E3SM coupled model version 1: description and results at high resolution. Journal of Advances in Modeling Earth Systems, 11, 12, 4095--4146.Google ScholarCross Ref
- Kimberly Chenoweth, Adri C. T. van Duin, and William A. Goddard. 2008. ReaxFF reactive force field for molecular dynamics simulations of hydrocarbon oxidation. The Journal of Physical Chemistry A, 112, 5, 1040--1053. PMID: 18197648. doi: 10.1021/jp709896w. Google ScholarCross Ref
- DWARF Debugging Information Format Committee. 2023. Dwarf debugging information format. https://dwarfstd.org/.Google Scholar
- L. Dagum and R. Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 5, 1, 46--55. Google ScholarDigital Library
- H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos: enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 74, 12, 3202--3216. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing. Google ScholarDigital Library
- Markus Eisenbach, Jeff Larkin, Justin Lutjens, Steven Rennich, and James H. Rogers. 2017. GPU acceleration of the locally selfconsistent multiple scattering code for first principles calculation of the ground state and statistical physics of materials. Computer Physics Communications, 211, 2--7.Google ScholarCross Ref
- Tjerk P. Straatsma, Katerina B. Antypas, and Timothy J. Williams, (Eds.) 2017. Exascale Scientific Applications: Scalability and Performance Portability. CRC Press, New York. Chap. Real-Space Multiple-Scattering Theory and Its Applications at Exascale, 449--460.Google Scholar
- Oak Ridge Leadership Computing Facility. 2019. Intro to AMD GPU programming with HIP. https://www.olcf.ornl.gov/calendar/intro-to-amd-gpu-programming-with-hip/.Google Scholar
- Luca Fedeli et al. 2022. Pushing the frontier in the design of laser-based electron accelerators with groundbreaking mesh-refined particle-in-cell simulations on exascale-class supercomputers. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 1--12. Google ScholarCross Ref
- Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Joseph Adamo, Salman Habib, Katrin Heitmann, and Claude-André Faucher-Giguère. 2023. Simulating Hydrodynamics in Cosmology with CRK-HACC. The Astrophysical Journal Supplement Series, 264, 2, (Jan. 2023), 34. Google ScholarCross Ref
- Nicholas Frontiere et al. 2022. Farpoint: A High-resolution Cosmology Simulation at the Gigaparsec Scale. The Astrophysical Journal Supplement Series, 259, 1, (Feb. 2022), 15. Google ScholarCross Ref
- Mark S. Gordon, Giuseppe Barca, Sarom S. Leang, David Poole, Alistair P. Rendell, Jorge L. Galvez Vallejo, and Bryce Westheimer. 2020. Novel computer architectures and quantum chemistry. The Journal of Physical Chemistry A, 124, 23, 4557--4582. PMID: 32379450. doi: 10.1021/acs.jpca.0c02249. Google ScholarCross Ref
- Salman Habib et al. 2016. HACC: Extreme Scaling and Performance Across Diverse Architectures. Commun. ACM (Research Highlight), 60, 1, (Dec. 2016), 97--104. Originally published in: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 6. ACM, 2013. Google ScholarDigital Library
- Katrin Heitmann et al. 2021. The Last Journey. I. An Extreme-scale Simulation on the Mira Supercomputer. The Astrophysical Journal Supplement Series, 252, 2, 19.Google Scholar
- Marc T Henry de Frahan et al. 2022. PeleC: An adaptive mesh refinement solver for compressible reacting flows. The International Journal of High Performance Computing Applications, 0, 0, 10943420221121151. Google ScholarDigital Library
- Wayne Joubert, Deborah Weighill, David Kainer, Sharlee Climer, Amy Justice, Kjiersten Fagnan, and Daniel Jacobson. 2018. Attacking the opioid epidemic: determining the epistatic and pleiotropic genetic architectures for chronic pain and opioid addiction. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, 717--730. Google ScholarDigital Library
- Ramakrishnan Kannan, Piyush Sao, Hao Lu, Drahomira Herrmannova, Vijay Thakkar, Robert Patton, Richard Vuduc, and Thomas Potok. 2020. Scalable knowledge graph analytics at 136 petaflop/s. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13.Google ScholarDigital Library
- Ramakrishnan Kannan et al. 2022. Exaflops biomedical knowledge graph analytics. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--11.Google Scholar
- John Lagergren et al. 2022. Climatic clustering and longitudinal analysis with impacts on food, bioenergy, and pandemics. Phytobiomes Journal, 0, ja, null. Google ScholarCross Ref
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04). Palo Alto, California, (Mar. 2004).Google ScholarDigital Library
- Myoungkyu Lee, Nicholas Malaya, and Robert D Moser. 2013. Petascale direct numerical simulation of turbulent channel flow on up to 786k cores. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 1--11.Google ScholarDigital Library
- Myoungkyu Lee, Rhys Ulerich, Nicholas Malaya, and Robert D. Moser. 2014. Experiences from leadership computing in simulations of turbulent fluid flows. Computing in Science & Engineering, 16, 5, 24--31. Google ScholarCross Ref
- Lianchi Liu, Yi Liu, Sergey V. Zybin, Huai Sun, and William A. III Goddard. 2011. ReaxFF-lg: correction of the ReaxFF reactive force field for london dispersion, with applications to the equations of state for energetic materials. The Journal of Physical Chemistry A, 115, 40, 11016--11022. PMID: 21888351. doi: 10.1021/jp201599t. Google ScholarCross Ref
- M. Graham Lopez, Jeffrey Young, Jeremy S. Meredith, Philip C. Roth, Mitchel Horton, and Jeffrey S. Vetter. 2015. Examining recent many-core architectures and programming models using SHOC. In Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS '15) Article 3. Association for Computing Machinery, Austin, Texas, 12 pages. isbn: 9781450340090. Google ScholarDigital Library
- L. Luo et al. 2020. Pre-exascale accelerated application development: the ORNL Summit experience. IBM Journal of Research and Development, 64, 3/4, 11:1--11:21. Google ScholarCross Ref
- George S. Markomanolis et al. 2022. Evaluating GPU programming models for the LUMI supercomputer. In Supercomputing Frontiers: 7th Asian Conference, SCFA 2022, Singapore, March 1--3, 2022, Proceedings. Springer-Verlag, Singapore, Singapore, 79--101. isbn: 978-3-031-10418-3. Google ScholarDigital Library
- Roger Ghanem, David Higdon, and Houman Owhadi, (Eds.) 2017. The parallel C++ statistical library for Bayesian inference: QUESO. Handbook of Uncertainty Quantification. Springer International Publishing, Cham, 1829--1865. isbn: 978-3-319-12385-1. Google ScholarCross Ref
- Stan Moore. 2022. Kokkos and SNAP work in support of EXAALT and LAMMPS. In CoPA All Hands Meeting. Sandia National Laboratories.Google Scholar
- H. Nam, G. Rockefeller, M. Glass, S. Dawson, J. Levesque, and V. Lee. 2017. The Trinity center of excellence co-design best practices. Computing in Science & Engineering, 19, 05, (Sept. 2017), 19--26. Google ScholarDigital Library
- J. Robert Neely and Bronis R. de Supinski. 2017. Application modernization at LLNL and the Sierra center of excellence. Computing in Science & Engineering, 19, 5, 9--18. Google ScholarDigital Library
- Charlotte A. Nelson, Ana Uriarte Acuna, Amber M. Paul, Ryan T. Scott, Atul J. Butte, Egle Cekanaviciute, Sergio E. Baranzini, and Sylvain V. Costes. 2021. Knowledge network embedding of transcriptomic data from spaceflown mice uncovers signs and symptoms associated with terrestrial diseases. Life, 11, 1. Google ScholarCross Ref
- Matthew Norman and Jeffrey Larkin. 2020. A holistic algorithmic approach to improving accuracy, robustness, and computational efficiency for atmospheric dynamics. SIAM Journal on Scientific Computing, 42, 5, B1302--B1327.Google ScholarDigital Library
- Matthew Norman, Isaac Lyngaas, Abhishek Bagusetty, and Mark Berrill. 2022. Portable C++ code that can look and feel like Fortran code with Yet Another Kernel Launcher (YAKL). International Journal of Parallel Programming, 1--22.Google Scholar
- Matthew R Norman et al. 2022. Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF's Summit supercomputer. The International Journal of High Performance Computing Applications, 36, 1, 93--105.Google ScholarDigital Library
- Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, and Aparna Chandramowliswharan. 2020. CFDNet: a deep learning-based accelerator for fluid simulations. In Proceedings of the 34th ACM International Conference on Super-computing (ICS '20) Article 3. Association for Computing Machinery, Barcelona, Spain, 12 pages. isbn: 9781450379830. Google ScholarDigital Library
- Octavi Obiols-Sales, Abhinav Vishnu, Nicholas P. Malaya, and Aparna Chandramowlishwaran. 2021. Surfnet: super-resolution of turbulent flows with transfer learning using small datasets. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), 331--344. Google ScholarCross Ref
- Kiran Ravikumar, David Appelhans, and P.K Yeung. 2019. GPU acceleration of extreme scale pseudo-spectral simulations of turbulence using asynchronism. In SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1--22. Google ScholarDigital Library
- Evan Schneider. 2021. Astrophysics at exascale: preparing Cholla for Frontier. https://www.olcf.ornl.gov/wp-content/uploads/2021/01/2021UM-Day-3-Schneider-Preparing-Cholla-for-Frontier.pdf.Google Scholar
- John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12, 3, 66--73. Google ScholarDigital Library
- A. P. Thompson et al. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm., 271, 108171. Google ScholarCross Ref
- Christian R. Trott et al. 2022. Kokkos 3: programming model extensions for the exascale era. IEEE Transactions on Parallel and Distributed Systems, 33, 4, 805--817. Google ScholarCross Ref
- Sudharshan S Vazhkudai et al. 2018. The design, deployment, and evaluation of the CORAL pre-exascale systems. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 661--672.Google Scholar
- Yang Wang, G. Malcolm Stocks, W. A. Shelton, D. M. C. Nicholson, W. M. Temmerman, and Z. Szotek. 1995. Order-n multiple scattering approach to electronic structure calculations. Phys. Rev. Lett., 75, 2867.Google ScholarCross Ref
- Weiqun Zhang et al. 2019. Amrex: a framework for block-structured adaptive mesh refinement. Journal of Open Source Software, 4, 37, (May 12, 2019), 1370. Google ScholarCross Ref
Recommendations
Frontier: Exploring Exascale
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisAs the US Department of Energy (DOE) computing facilities began deploying petascale systems in 2008, DOE was already setting its sights on exascale. In that year, DARPA published a report on the feasibility of reaching exascale. The report authors ...
Preparing HPC Applications for Exascale: Challenges and Recommendations
NBIS '15: Proceedings of the 2015 18th International Conference on Network-Based Information SystemsWhile the HPC community is working towards the development of the first Exaflop computer (expected around 2020), after reaching the Petaflop milestone in 2008 still only few HPC applications are able to fully exploit the capabilities of Petaflop ...
Parallel computing and the Grid-experiences and applications
In recent years, Grid computing evolved from first implementations as prototype Grid environments to large-scale production Grid infrastructures utilised during everyday work by scientists around the world. This demonstrates that the concept of the Grid ...
Comments