skip to main content
10.1145/3588195.3592991acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open Access

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

Published:07 August 2023Publication History

ABSTRACT

Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed Unfortunately, on NVIDIA GPUs-today's most widely used types and which do not have hardware exception traps-this task must be carried out in software. Given the prevalence of closed-source kernels, efficient binary-level exception tracking is essential. It is also important to know how exceptions flow through the code, whether they alter the code behavior and additionally whether these exceptions can be detected at the program outputs or are killed inside program flow-paths.

In this paper, we introduce GPU-FPX, a tool that has low overhead, allows for deep understanding of the origin and flow of exceptions, and also how exceptions are modified by code optimizations. We measure GPU-FPX's performance over 151 widely used GPU programs coming from HPC and ML, detecting 26 serious exceptions that were previously not reported. Our results show that GPU-FPX is 16× faster with respect to the geometric-mean runtime in relation to the only comparable prior tool, while also helping debug a larger class of codes more effectively.

References

  1. 2022. CUDA C Programming Guide, v12. https://docs.nvidia.com/cuda/floating-point/index.html. Online; accessed March, 30, 2022.Google ScholarGoogle Scholar
  2. 2022. NVIDIA Deep Learning Performance. https://docs.nvidia.com/deeplearning/performance/. Online; accessed March, 30, 2022.Google ScholarGoogle Scholar
  3. Syed Ahmed, Christian Sarofeen, Mike Ruberry, Eddie Yan, Natalia Gimelshein, Michael Carilli, Szymon Migacz, Piotr Bialecki, Paulius Micikevicius, Dusan Stosic, Dong Yang, and Naoya Maruyama. 2022. https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/.Google ScholarGoogle Scholar
  4. AMD. 2015. FLOATING-POINT ARITHMETIC IN AMD PROCESSORS. https://community.amd.com/t5/opencl/amd-gpus-ieee-754-compliance/td-p/98382. Accessed: 2023-04--10.Google ScholarGoogle Scholar
  5. NVIDIA Corporation. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdfGoogle ScholarGoogle Scholar
  6. Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, and Pavel Panchekha. 2020. Scalable yet Rigorous Floating-Point Error Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 51, 14 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marc Daumas and Guillaume Melquiond. 2010. Certification of Bounds on Expressions Involving Rounded Operators. ACM Trans. Math. Software 37, 1, Article 2 (2010), 20 pages.Google ScholarGoogle Scholar
  8. David Delmas, Eric Goubault, Sylvie Putot, Jean Souyris, Karim Tekkal, and Franck Védrine. 2009. Towards an Industrial Use of FLUCTUAT on Safety-Critical Avionics Software. In Formal Methods for Industrial Critical Systems, FMICS 2009. Lecture Notes in Computer Science, Vol. 5825. Springer Berlin Heidelberg, 53--69. https://doi.org/10.1007/978--3--642-04570--7_6Google ScholarGoogle Scholar
  9. James Demmel, Jack Dongarra, Mark Gates, Greg Henry, Julien Langou, Xiaoye Li, Piotr Luszczek, Weslley Pereira, Jason Riedy, and Cindy Rubio-González. 2022. Proposed Consistent Exception Handling for the BLAS and LAPACK. arXiv preprint arXiv:2207.09281 (2022).Google ScholarGoogle Scholar
  10. Peter Dinda, Alex Bernat, and Conor Hetland. 2020. Spying on the floating point behavior of existing, unmodified scientific applications. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. 5--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-González, and Zachary Tatlock. 2021. Guarding Numerics Amidst Rising Heterogeneity. In Correctness 2021: Fifth International Workshop on Software Correctness for HPC Applications. https://correctness-workshop.github.io/2021/.Google ScholarGoogle Scholar
  12. IEEE 754 Working Group et al . 2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std (2019), 754--2008.Google ScholarGoogle Scholar
  13. Ari B. Hayes, Fei Hua, Jin Huang, Yanhao Chen, and Eddy Z. Zhang. 2019. Decoding CUDA Binary. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 229--241. https://doi.org/10.1109/CGO.2019.8661186Google ScholarGoogle Scholar
  14. David G. Hough. 2019. The IEEE Standard 754: One for the History Books. Computer 52, 12 (2019), 109--112. https://doi.org/10.1109/MC.2019.2926614Google ScholarGoogle Scholar
  15. 2008. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754--2008 (2008), 1--70. https://doi.org/10.1109/IEEESTD.2008.4610935Google ScholarGoogle Scholar
  16. Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. https://doi.org/10.48550/ARXIV.1903.07486Google ScholarGoogle Scholar
  17. Ignacio Laguna. 2019. FPChecker: Detecting Floating-Point Exceptions in GPU Applications. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (San Diego, California) (ASE '19). IEEE Press, 1126--1129. https://doi.org/10.1109/ASE.2019.00118Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ignacio Laguna and Ganesh Gopalakrishnan. 2022. Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian Optimization. In Supercomputing.Google ScholarGoogle Scholar
  19. Ignacio Laguna, Xinyi Li, and Ganesh Gopalakrishnan. 2022. BinFPE: Accurate Floating-Point Exception Detection for GPU Applications. In Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Pro- gram Analysis (San Diego, CA, USA) (SOAP 2022). Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3520313.3534655Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ignacio Laguna, Tanmay Tirpankar, Xinyi Li, and Ganesh Gopalakrishnan. 2022. FPChecker: Floating-Point Exception Detection Tool and Benchmark for Parallel and Distributed HPC. In 2022 IEEE International Symposium on Workload Characterization (IISWC). 39--50. https://doi.org/10.1109/IISWC55918.2022.00014Google ScholarGoogle Scholar
  21. Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, and Yoav Artzi. 2018. Simple Recurrent Units for Highly Parallelizable Recurrence. In Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle Scholar
  22. Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). Association for Computing Machinery, New York, NY, USA, Article 15, 14 pages. https://doi.org/10.1145/2925426.2926255Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NVIDIA. 2022. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html. Online; accessed March, 30, 2022.Google ScholarGoogle Scholar
  24. Alexey Solovyev. 2017. TOPLAS FPTaylor Results Table. Retrieved October 10, 2017 from http://tinyurl.com/TOPLAS-FPTaylor-Results-TableGoogle ScholarGoogle Scholar
  25. Laura Titolo, Marco A. Feliú, Mariano Moscato, and César A. Muñoz. 2017. An Abstract Interpretation Framework for the Round-Off Error Analysis of Floating- Point Programs. In Lecture Notes in Computer Science. Springer International Publishing, 516--537. https://doi.org/10.1007/978--3--319--73721--8_24Google ScholarGoogle Scholar
  26. Oreste Villa, Mark Stephenson, David Nellans, and Stephen W Keckler. 2019. Nvbit: A dynamic binary instrumentation framework for nvidia gpus. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 372--383.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nathan Whitehead and Alex Fit-florea. 2022. Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs. https://docs.nvidia.com/cuda/floating-point/index.htmlGoogle ScholarGoogle Scholar

Index Terms

  1. Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader