research-article

Open Access

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

Authors:
Xinyi Li

The University of Utah, Salt Lake City, UT, USA

The University of Utah, Salt Lake City, UT, USA

0009-0005-7276-7715
View Profile

,
Ignacio Laguna

Lawrence Livermore National Laboratory, Livermore, CA, USA

Lawrence Livermore National Laboratory, Livermore, CA, USA

0000-0002-9374-4433
View Profile

,
Bo Fang

Pacific Northwest National Laboratory, Richland, WA, USA

Pacific Northwest National Laboratory, Richland, WA, USA

0000-0001-9721-3982
View Profile

,
Katarzyna Swirydowicz

Pacific Northwest National Laboratory, Richland, WA, USA

Pacific Northwest National Laboratory, Richland, WA, USA

0000-0001-5758-5394
View Profile

,
Ang Li

Pacific Northwest National Laboratory, Richland, WA, USA

Pacific Northwest National Laboratory, Richland, WA, USA

0000-0003-3734-9137
View Profile

,
Ganesh Gopalakrishnan

The University of Utah, Salt Lake City, UT, USA

The University of Utah, Salt Lake City, UT, USA

0000-0002-3705-0031
View Profile

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed ComputingAugust 2023Pages 59–71https://doi.org/10.1145/3588195.3592991

Published:07 August 2023Publication History

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Pages 59–71

ABSTRACT

Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed Unfortunately, on NVIDIA GPUs-today's most widely used types and which do not have hardware exception traps-this task must be carried out in software. Given the prevalence of closed-source kernels, efficient binary-level exception tracking is essential. It is also important to know how exceptions flow through the code, whether they alter the code behavior and additionally whether these exceptions can be detected at the program outputs or are killed inside program flow-paths.

In this paper, we introduce GPU-FPX, a tool that has low overhead, allows for deep understanding of the origin and flow of exceptions, and also how exceptions are modified by code optimizations. We measure GPU-FPX's performance over 151 widely used GPU programs coming from HPC and ML, detecting 26 serious exceptions that were previously not reported. Our results show that GPU-FPX is 16× faster with respect to the geometric-mean runtime in relation to the only comparable prior tool, while also helping debug a larger class of codes more effectively.

References

2022. CUDA C Programming Guide, v12. https://docs.nvidia.com/cuda/floating-point/index.html. Online; accessed March, 30, 2022.Google Scholar
2022. NVIDIA Deep Learning Performance. https://docs.nvidia.com/deeplearning/performance/. Online; accessed March, 30, 2022.Google Scholar
Syed Ahmed, Christian Sarofeen, Mike Ruberry, Eddie Yan, Natalia Gimelshein, Michael Carilli, Szymon Migacz, Piotr Bialecki, Paulius Micikevicius, Dusan Stosic, Dong Yang, and Naoya Maruyama. 2022. https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/.Google Scholar
AMD. 2015. FLOATING-POINT ARITHMETIC IN AMD PROCESSORS. https://community.amd.com/t5/opencl/amd-gpus-ieee-754-compliance/td-p/98382. Accessed: 2023-04--10.Google Scholar
NVIDIA Corporation. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdfGoogle Scholar
Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, and Pavel Panchekha. 2020. Scalable yet Rigorous Floating-Point Error Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 51, 14 pages.Google ScholarDigital Library
Marc Daumas and Guillaume Melquiond. 2010. Certification of Bounds on Expressions Involving Rounded Operators. ACM Trans. Math. Software 37, 1, Article 2 (2010), 20 pages.Google Scholar
David Delmas, Eric Goubault, Sylvie Putot, Jean Souyris, Karim Tekkal, and Franck Védrine. 2009. Towards an Industrial Use of FLUCTUAT on Safety-Critical Avionics Software. In Formal Methods for Industrial Critical Systems, FMICS 2009. Lecture Notes in Computer Science, Vol. 5825. Springer Berlin Heidelberg, 53--69. https://doi.org/10.1007/978--3--642-04570--7_6Google Scholar
James Demmel, Jack Dongarra, Mark Gates, Greg Henry, Julien Langou, Xiaoye Li, Piotr Luszczek, Weslley Pereira, Jason Riedy, and Cindy Rubio-González. 2022. Proposed Consistent Exception Handling for the BLAS and LAPACK. arXiv preprint arXiv:2207.09281 (2022).Google Scholar
Peter Dinda, Alex Bernat, and Conor Hetland. 2020. Spying on the floating point behavior of existing, unmodified scientific applications. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. 5--16.Google ScholarDigital Library
Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-González, and Zachary Tatlock. 2021. Guarding Numerics Amidst Rising Heterogeneity. In Correctness 2021: Fifth International Workshop on Software Correctness for HPC Applications. https://correctness-workshop.github.io/2021/.Google Scholar
IEEE 754 Working Group et al . 2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std (2019), 754--2008.Google Scholar
Ari B. Hayes, Fei Hua, Jin Huang, Yanhao Chen, and Eddy Z. Zhang. 2019. Decoding CUDA Binary. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 229--241. https://doi.org/10.1109/CGO.2019.8661186Google Scholar
David G. Hough. 2019. The IEEE Standard 754: One for the History Books. Computer 52, 12 (2019), 109--112. https://doi.org/10.1109/MC.2019.2926614Google Scholar
2008. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754--2008 (2008), 1--70. https://doi.org/10.1109/IEEESTD.2008.4610935Google Scholar
Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. https://doi.org/10.48550/ARXIV.1903.07486Google Scholar
Ignacio Laguna. 2019. FPChecker: Detecting Floating-Point Exceptions in GPU Applications. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (San Diego, California) (ASE '19). IEEE Press, 1126--1129. https://doi.org/10.1109/ASE.2019.00118Google ScholarDigital Library
Ignacio Laguna and Ganesh Gopalakrishnan. 2022. Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian Optimization. In Supercomputing.Google Scholar
Ignacio Laguna, Xinyi Li, and Ganesh Gopalakrishnan. 2022. BinFPE: Accurate Floating-Point Exception Detection for GPU Applications. In Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Pro- gram Analysis (San Diego, CA, USA) (SOAP 2022). Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3520313.3534655Google ScholarDigital Library
Ignacio Laguna, Tanmay Tirpankar, Xinyi Li, and Ganesh Gopalakrishnan. 2022. FPChecker: Floating-Point Exception Detection Tool and Benchmark for Parallel and Distributed HPC. In 2022 IEEE International Symposium on Workload Characterization (IISWC). 39--50. https://doi.org/10.1109/IISWC55918.2022.00014Google Scholar
Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, and Yoav Artzi. 2018. Simple Recurrent Units for Highly Parallelizable Recurrence. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). Association for Computing Machinery, New York, NY, USA, Article 15, 14 pages. https://doi.org/10.1145/2925426.2926255Google ScholarDigital Library
NVIDIA. 2022. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html. Online; accessed March, 30, 2022.Google Scholar
Alexey Solovyev. 2017. TOPLAS FPTaylor Results Table. Retrieved October 10, 2017 from http://tinyurl.com/TOPLAS-FPTaylor-Results-TableGoogle Scholar
Laura Titolo, Marco A. Feliú, Mariano Moscato, and César A. Muñoz. 2017. An Abstract Interpretation Framework for the Round-Off Error Analysis of Floating- Point Programs. In Lecture Notes in Computer Science. Springer International Publishing, 516--537. https://doi.org/10.1007/978--3--319--73721--8_24Google Scholar
Oreste Villa, Mark Stephenson, David Nellans, and Stephen W Keckler. 2019. Nvbit: A dynamic binary instrumentation framework for nvidia gpus. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 372--383.Google ScholarDigital Library
Nathan Whitehead and Alex Fit-florea. 2022. Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs. https://docs.nvidia.com/cuda/floating-point/index.htmlGoogle Scholar

Index Terms

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools
  2. Software organization and properties
    1. Extra-functional properties
      1. Software safety

Recommendations

BinFPE: accurate floating-point exception detection for GPU applications
SOAP 2022: Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis

When modern heterogeneous HPC systems perform numerical computations, floating-point exceptional quantities such as NaN and infinity in the GPU context, remain insufficiently handled. This is because commonly used GPUs and the CUDA language have no ...
Read More
Automatic detection of floating-point exceptions
POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

It is well-known that floating-point exceptions can be disastrous and writing exception-free numerical programs is very difficult. Thus, it is important to automatically detect such errors. In this paper, we present Ariadne, a practical symbolic ...
Read More
Finding inputs that trigger floating-point exceptions in GPUs via bayesian optimization
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Testing code for floating-point exceptions is crucial as exceptions can quickly propagate and produce unreliable numerical answers. The state-of-the-art to test for floating-point exceptions in GPUs is quite limited and solutions require the application'...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
August 2023
350 pages
ISBN:9798400701559
DOI:10.1145/3588195
General Chair:
Ali R. Butt
Virginia Tech, USA
,
Program Chairs:
Ningfang Mi
Northeastern University, USA
,
Kyle Chard
University of Chicago & Argonne National Laboratory, USA
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2023
Check for updates
Author Tags
GPUs
binary instrumentation
floating-point exceptions
high-performance computing
machine learning
numerical programs
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate166of966submissions,17%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)383
- Downloads (Last 6 weeks)62
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

BinFPE: accurate floating-point exception detection for GPU applications

Automatic detection of floating-point exceptions

Finding inputs that trigger floating-point exceptions in GPUs via bayesian optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

BinFPE: accurate floating-point exception detection for GPU applications

Automatic detection of floating-point exceptions

Finding inputs that trigger floating-point exceptions in GPUs via bayesian optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media