research-article

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Authors:
Aristotle Martin

Biomedical Engineering, Duke University, United States of America

Biomedical Engineering, Duke University, United States of America

0000-0002-8704-764X
View Profile

,
Geng Liu

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-8636-5996
View Profile

,
William Ladd

Biomedical Engineering, Duke University, United States of America

Biomedical Engineering, Duke University, United States of America

0009-0002-1989-8757
View Profile

,
Seyong Lee

Oak Ridge National Laboratory, United States of America

Oak Ridge National Laboratory, United States of America

0000-0001-8872-4932
View Profile

,
John Gounley

Oak Ridge National Laboratory, United States of America

Oak Ridge National Laboratory, United States of America

0000-0001-8424-4982
View Profile

,
Jeffrey Vetter

Oak Ridge National Laboratory, United States of America

Oak Ridge National Laboratory, United States of America

0000-0002-2449-6720
View Profile

,
Saumil Patel

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0003-0803-5761
View Profile

,
Silvio Rizzi

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-3804-2471
View Profile

,
Victor Mateevitsi

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-6677-7520
View Profile

,
Joseph Insley

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-6955-869X
View Profile

,
Amanda Randles

Biomedical Engineering, Duke University, United States of America

Biomedical Engineering, Duke University, United States of America

0000-0001-6318-3885
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 1126–1137https://doi.org/10.1145/3624062.3624188

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1126–1137

ABSTRACT

Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with a range of backends, using a combination of automated tools and manual tuning. We use a proxy application along with a custom performance model to inform the results and identify additional optimization strategies. At scale performance of the programming model implementations are evaluated on pre-production GPU node architectures for Frontier and Aurora, as well as on current NVIDIA device-based systems Summit and Polaris. Real-world workloads representing 3D blood flow calculations in complex vasculature are assessed. Our analysis highlights critical trade-offs between code performance, portability, and development time.

References

Germán Castaño, Youssef Faqir-Rhazoui, Carlos García, and Manuel Prieto-Matías. 2022. Evaluation of Intel’s DPC++ Compatibility Tool in heterogeneous computing. J. Parallel and Distrib. Comput. 165 (2022), 120–129.Google ScholarCross Ref
Cheng Chang, Chih-Hao Liu, and Chao-An Lin. 2009. Boundary conditions for lattice Boltzmann simulations with complex geometry flows. Computers & Mathematics with Applications 58, 5 (2009), 940–949. https://doi.org/10.1016/j.camwa.2009.02.016 Mesoscopic Methods in Engineering and Science.Google ScholarDigital Library
Steffen Christgau and Thomas Steinke. 2020. Porting a legacy cuda stencil code to oneapi. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 359–367.Google ScholarCross Ref
Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2018. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering 17, 3 (2018), 247–262.Google ScholarCross Ref
Amanda S Dufek, Rahulkumar Gayatri, Neil Mehta, Douglas Doerfler, Brandon Cook, Yasaman Ghadar, and Carleton DeTar. 2021. Case Study of Using Kokkos and SYCL as Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 57–67.Google ScholarCross Ref
Argonne Leadership Computing Facility. 2021. Polaris. https://www.alcf.anl.gov/polaris.Google Scholar
Argonne Leadership Computing Facility. 2022. Aurora/Sunspot Interconnect. https://www.alcf.anl.gov/support-center/aurora/interconnect.Google Scholar
Argonne Leadership Computing Facility. 2022. Aurora/Sunspot Node Level Overview. https://www.alcf.anl.gov/support-center/aurora/node-level-overview.Google Scholar
Oak Ridge Leadership Computing Facility. 2023. Crusher Quick-Start Guide. https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html.Google Scholar
William F Godoy, Pedro Valero-Lara, T Elise Dettling, Christian Trefftz, Ian Jorquera, Thomas Sheehy, Ross G Miller, Marc Gonzalez-Tallada, Jeffrey S Vetter, and Valentin Churavy. 2023. Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes. arXiv preprint arXiv:2303.06195 (2023).Google Scholar
Muhammad Haseeb, Nan Ding, Jack Deslippe, and Muaaz Awan. 2021. Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 68–78.Google ScholarCross Ref
Gregory Herschlag, Seyong Lee, Jeffrey S Vetter, and Amanda Randles. 2018. GPU data access on complex geometries for D3Q19 lattice Boltzmann method. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 825–834.Google ScholarCross Ref
Intel. 2017. Intel MPI Benchmarks Github. https://github.com/intel/mpi-benchmarks.Google Scholar
Balint Joo, Thorsten Kurth, Michael A Clark, Jeongnim Kim, Christian Robert Trott, Dan Ibanez, Daniel Sunderland, and Jack Deslippe. 2019. Performance portability of a Wilson Dslash stencil operator mini-app using Kokkos and SYCL. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 14–25.Google ScholarCross Ref
JaeHyuk Kwack, John Tramm, Colleen Bertoni, Yasaman Ghadar, Brian Homerding, Esteban Rangel, Christopher Knight, and Scott Parker. 2021. Evaluation of Performance Portability of Applications and Mini-Apps across AMD, Intel and NVIDIA GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 45–56.Google Scholar
William Ladd, Christopher Jensen, Madhurima Vardhan, Jeff Ames, Jeff Hammond, Erik Draeger, and Amanda Randles. 2023. Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation. In IEEE 37th International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/IPDPS54959.2023.00063Google ScholarCross Ref
Geng Liu and John Gounley. 2022. MINIAPP. https://github.com/lucaso19891019/MINIAPP.Google Scholar
Ji Qiang and Robert D Ryne. 2001. Parallel 3D Poisson solver for a charged beam in a conducting pipe. Computer physics communications 138, 1 (2001), 18–28.Google Scholar
Amanda Peters Randles, Vivek Kale, Jeff Hammond, William Gropp, and Efthimios Kaxiras. 2013. Performance Analysis of the Lattice Boltzmann Model Beyond Navier-Stokes. In IEEE 27th International Symposium on Parallel and Distributed Processing. 1063–1074. https://doi.org/10.1109/IPDPS.2013.109Google ScholarDigital Library
Sauro Succi. 2001. The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press.Google Scholar
Nhat Phuong Tran, Myungho Lee, and Dong Hoon Choi. 2016. Memory-Efficient Parallelization of 3D Lattice Boltzmann Flow Solver on a GPU. In Proceedings - 22nd IEEE International Conference on High Performance Computing, HiPC 2015. IEEE, 315–324. https://doi.org/10.1109/HiPC.2015.49Google ScholarDigital Library
G. Wellein, T. Zeiser, G. Hager, and S. Donath. 2006. On the single processor performance of simple lattice Boltzmann kernels. Computers and Fluids 35, 8-9 (2006), 910–919. https://doi.org/10.1016/j.compfluid.2005.02.008Google ScholarCross Ref
Jisheng Zhao, Colleen Bertoni, Jeffrey Young, Kevin Harms, Vivek Sarkar, and Brice Videau. 2022. HIPLZ: Enabling Performance Portability for Exascale Systems. In European Conference on Parallel Processing. Springer, 197–210.Google Scholar

Index Terms

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation types and techniques
      1. Massively parallel and high-performance simulations
  2. Parallel computing methodologies
    1. Parallel programming languages
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures

Recommendations

Evaluation of a performance portable lattice Boltzmann code using OpenCL
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important ...
Read More
Evaluation of directive-based performance portable programming models

We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators. To do this, we use examples of algorithms with varying ...
Read More
Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures
Abstract
Patient-specific hemodynamic simulations have the potential to greatly improve both the diagnosis and treatment of a variety of vascular diseases. Portability will enable wider adoption of computational fluid dynamics (CFD) ...
Highlights
- Port HARVEY to heterogeneous systems using the hybrid MPI＋X programming model.
- ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computational fluid dynamics
Performance portability
Proxy applications
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 77
  Total Downloads
- Downloads (Last 12 months)77
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of a performance portable lattice Boltzmann code using OpenCL

Evaluation of directive-based performance portable programming models

Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of a performance portable lattice Boltzmann code using OpenCL

Evaluation of directive-based performance portable programming models

Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media