skip to main content
10.1145/3624062.3624188acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Authors Info & Claims
Published:12 November 2023Publication History

ABSTRACT

Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with a range of backends, using a combination of automated tools and manual tuning. We use a proxy application along with a custom performance model to inform the results and identify additional optimization strategies. At scale performance of the programming model implementations are evaluated on pre-production GPU node architectures for Frontier and Aurora, as well as on current NVIDIA device-based systems Summit and Polaris. Real-world workloads representing 3D blood flow calculations in complex vasculature are assessed. Our analysis highlights critical trade-offs between code performance, portability, and development time.

References

  1. Germán Castaño, Youssef Faqir-Rhazoui, Carlos García, and Manuel Prieto-Matías. 2022. Evaluation of Intel’s DPC++ Compatibility Tool in heterogeneous computing. J. Parallel and Distrib. Comput. 165 (2022), 120–129.Google ScholarGoogle ScholarCross RefCross Ref
  2. Cheng Chang, Chih-Hao Liu, and Chao-An Lin. 2009. Boundary conditions for lattice Boltzmann simulations with complex geometry flows. Computers & Mathematics with Applications 58, 5 (2009), 940–949. https://doi.org/10.1016/j.camwa.2009.02.016 Mesoscopic Methods in Engineering and Science.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Steffen Christgau and Thomas Steinke. 2020. Porting a legacy cuda stencil code to oneapi. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 359–367.Google ScholarGoogle ScholarCross RefCross Ref
  4. Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262.Google ScholarGoogle ScholarCross RefCross Ref
  5. Amanda S Dufek, Rahulkumar Gayatri, Neil Mehta, Douglas Doerfler, Brandon Cook, Yasaman Ghadar, and Carleton DeTar. 2021. Case Study of Using Kokkos and SYCL as Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 57–67.Google ScholarGoogle ScholarCross RefCross Ref
  6. Argonne Leadership Computing Facility. 2021. Polaris. https://www.alcf.anl.gov/polaris.Google ScholarGoogle Scholar
  7. Argonne Leadership Computing Facility. 2022. Aurora/Sunspot Interconnect. https://www.alcf.anl.gov/support-center/aurora/interconnect.Google ScholarGoogle Scholar
  8. Argonne Leadership Computing Facility. 2022. Aurora/Sunspot Node Level Overview. https://www.alcf.anl.gov/support-center/aurora/node-level-overview.Google ScholarGoogle Scholar
  9. Oak Ridge Leadership Computing Facility. 2023. Crusher Quick-Start Guide. https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html.Google ScholarGoogle Scholar
  10. William F Godoy, Pedro Valero-Lara, T Elise Dettling, Christian Trefftz, Ian Jorquera, Thomas Sheehy, Ross G Miller, Marc Gonzalez-Tallada, Jeffrey S Vetter, and Valentin Churavy. 2023. Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes. arXiv preprint arXiv:2303.06195 (2023).Google ScholarGoogle Scholar
  11. Muhammad Haseeb, Nan Ding, Jack Deslippe, and Muaaz Awan. 2021. Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 68–78.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gregory Herschlag, Seyong Lee, Jeffrey S Vetter, and Amanda Randles. 2018. GPU data access on complex geometries for D3Q19 lattice Boltzmann method. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 825–834.Google ScholarGoogle ScholarCross RefCross Ref
  13. Intel. 2017. Intel MPI Benchmarks Github. https://github.com/intel/mpi-benchmarks.Google ScholarGoogle Scholar
  14. Balint Joo, Thorsten Kurth, Michael A Clark, Jeongnim Kim, Christian Robert Trott, Dan Ibanez, Daniel Sunderland, and Jack Deslippe. 2019. Performance portability of a Wilson Dslash stencil operator mini-app using Kokkos and SYCL. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 14–25.Google ScholarGoogle ScholarCross RefCross Ref
  15. JaeHyuk Kwack, John Tramm, Colleen Bertoni, Yasaman Ghadar, Brian Homerding, Esteban Rangel, Christopher Knight, and Scott Parker. 2021. Evaluation of Performance Portability of Applications and Mini-Apps across AMD, Intel and NVIDIA GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 45–56.Google ScholarGoogle Scholar
  16. William Ladd, Christopher Jensen, Madhurima Vardhan, Jeff Ames, Jeff Hammond, Erik Draeger, and Amanda Randles. 2023. Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation. In IEEE 37th International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/IPDPS54959.2023.00063Google ScholarGoogle ScholarCross RefCross Ref
  17. Geng Liu and John Gounley. 2022. MINIAPP. https://github.com/lucaso19891019/MINIAPP.Google ScholarGoogle Scholar
  18. Ji Qiang and Robert D Ryne. 2001. Parallel 3D Poisson solver for a charged beam in a conducting pipe. Computer physics communications 138, 1 (2001), 18–28.Google ScholarGoogle Scholar
  19. Amanda Peters Randles, Vivek Kale, Jeff Hammond, William Gropp, and Efthimios Kaxiras. 2013. Performance Analysis of the Lattice Boltzmann Model Beyond Navier-Stokes. In IEEE 27th International Symposium on Parallel and Distributed Processing. 1063–1074. https://doi.org/10.1109/IPDPS.2013.109Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sauro Succi. 2001. The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press.Google ScholarGoogle Scholar
  21. Nhat Phuong Tran, Myungho Lee, and Dong Hoon Choi. 2016. Memory-Efficient Parallelization of 3D Lattice Boltzmann Flow Solver on a GPU. In Proceedings - 22nd IEEE International Conference on High Performance Computing, HiPC 2015. IEEE, 315–324. https://doi.org/10.1109/HiPC.2015.49Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Wellein, T. Zeiser, G. Hager, and S. Donath. 2006. On the single processor performance of simple lattice Boltzmann kernels. Computers and Fluids 35, 8-9 (2006), 910–919. https://doi.org/10.1016/j.compfluid.2005.02.008Google ScholarGoogle ScholarCross RefCross Ref
  23. Jisheng Zhao, Colleen Bertoni, Jeffrey Young, Kevin Harms, Vivek Sarkar, and Brice Videau. 2022. HIPLZ: Enabling Performance Portability for Exascale Systems. In European Conference on Parallel Processing. Springer, 197–210.Google ScholarGoogle Scholar

Index Terms

  1. Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
          November 2023
          2180 pages
          ISBN:9798400707858
          DOI:10.1145/3624062

          Copyright © 2023 ACM

          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format