ABSTRACT
Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed Unfortunately, on NVIDIA GPUs-today's most widely used types and which do not have hardware exception traps-this task must be carried out in software. Given the prevalence of closed-source kernels, efficient binary-level exception tracking is essential. It is also important to know how exceptions flow through the code, whether they alter the code behavior and additionally whether these exceptions can be detected at the program outputs or are killed inside program flow-paths.
In this paper, we introduce GPU-FPX, a tool that has low overhead, allows for deep understanding of the origin and flow of exceptions, and also how exceptions are modified by code optimizations. We measure GPU-FPX's performance over 151 widely used GPU programs coming from HPC and ML, detecting 26 serious exceptions that were previously not reported. Our results show that GPU-FPX is 16× faster with respect to the geometric-mean runtime in relation to the only comparable prior tool, while also helping debug a larger class of codes more effectively.
- 2022. CUDA C Programming Guide, v12. https://docs.nvidia.com/cuda/floating-point/index.html. Online; accessed March, 30, 2022.Google Scholar
- 2022. NVIDIA Deep Learning Performance. https://docs.nvidia.com/deeplearning/performance/. Online; accessed March, 30, 2022.Google Scholar
- Syed Ahmed, Christian Sarofeen, Mike Ruberry, Eddie Yan, Natalia Gimelshein, Michael Carilli, Szymon Migacz, Piotr Bialecki, Paulius Micikevicius, Dusan Stosic, Dong Yang, and Naoya Maruyama. 2022. https://pytorch.org/blog/what-every-user-should-know-about-mixed-precision-training-in-pytorch/.Google Scholar
- AMD. 2015. FLOATING-POINT ARITHMETIC IN AMD PROCESSORS. https://community.amd.com/t5/opencl/amd-gpus-ieee-754-compliance/td-p/98382. Accessed: 2023-04--10.Google Scholar
- NVIDIA Corporation. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdfGoogle Scholar
- Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Sriram Krishnamoorthy, and Pavel Panchekha. 2020. Scalable yet Rigorous Floating-Point Error Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC '20). IEEE Press, Article 51, 14 pages.Google ScholarDigital Library
- Marc Daumas and Guillaume Melquiond. 2010. Certification of Bounds on Expressions Involving Rounded Operators. ACM Trans. Math. Software 37, 1, Article 2 (2010), 20 pages.Google Scholar
- David Delmas, Eric Goubault, Sylvie Putot, Jean Souyris, Karim Tekkal, and Franck Védrine. 2009. Towards an Industrial Use of FLUCTUAT on Safety-Critical Avionics Software. In Formal Methods for Industrial Critical Systems, FMICS 2009. Lecture Notes in Computer Science, Vol. 5825. Springer Berlin Heidelberg, 53--69. https://doi.org/10.1007/978--3--642-04570--7_6Google Scholar
- James Demmel, Jack Dongarra, Mark Gates, Greg Henry, Julien Langou, Xiaoye Li, Piotr Luszczek, Weslley Pereira, Jason Riedy, and Cindy Rubio-González. 2022. Proposed Consistent Exception Handling for the BLAS and LAPACK. arXiv preprint arXiv:2207.09281 (2022).Google Scholar
- Peter Dinda, Alex Bernat, and Conor Hetland. 2020. Spying on the floating point behavior of existing, unmodified scientific applications. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. 5--16.Google ScholarDigital Library
- Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-González, and Zachary Tatlock. 2021. Guarding Numerics Amidst Rising Heterogeneity. In Correctness 2021: Fifth International Workshop on Software Correctness for HPC Applications. https://correctness-workshop.github.io/2021/.Google Scholar
- IEEE 754 Working Group et al . 2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std (2019), 754--2008.Google Scholar
- Ari B. Hayes, Fei Hua, Jin Huang, Yanhao Chen, and Eddy Z. Zhang. 2019. Decoding CUDA Binary. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 229--241. https://doi.org/10.1109/CGO.2019.8661186Google Scholar
- David G. Hough. 2019. The IEEE Standard 754: One for the History Books. Computer 52, 12 (2019), 109--112. https://doi.org/10.1109/MC.2019.2926614Google Scholar
- 2008. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754--2008 (2008), 1--70. https://doi.org/10.1109/IEEESTD.2008.4610935Google Scholar
- Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. https://doi.org/10.48550/ARXIV.1903.07486Google Scholar
- Ignacio Laguna. 2019. FPChecker: Detecting Floating-Point Exceptions in GPU Applications. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (San Diego, California) (ASE '19). IEEE Press, 1126--1129. https://doi.org/10.1109/ASE.2019.00118Google ScholarDigital Library
- Ignacio Laguna and Ganesh Gopalakrishnan. 2022. Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian Optimization. In Supercomputing.Google Scholar
- Ignacio Laguna, Xinyi Li, and Ganesh Gopalakrishnan. 2022. BinFPE: Accurate Floating-Point Exception Detection for GPU Applications. In Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Pro- gram Analysis (San Diego, CA, USA) (SOAP 2022). Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3520313.3534655Google ScholarDigital Library
- Ignacio Laguna, Tanmay Tirpankar, Xinyi Li, and Ganesh Gopalakrishnan. 2022. FPChecker: Floating-Point Exception Detection Tool and Benchmark for Parallel and Distributed HPC. In 2022 IEEE International Symposium on Workload Characterization (IISWC). 39--50. https://doi.org/10.1109/IISWC55918.2022.00014Google Scholar
- Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, and Yoav Artzi. 2018. Simple Recurrent Units for Highly Parallelizable Recurrence. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). Association for Computing Machinery, New York, NY, USA, Article 15, 14 pages. https://doi.org/10.1145/2925426.2926255Google ScholarDigital Library
- NVIDIA. 2022. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html. Online; accessed March, 30, 2022.Google Scholar
- Alexey Solovyev. 2017. TOPLAS FPTaylor Results Table. Retrieved October 10, 2017 from http://tinyurl.com/TOPLAS-FPTaylor-Results-TableGoogle Scholar
- Laura Titolo, Marco A. Feliú, Mariano Moscato, and César A. Muñoz. 2017. An Abstract Interpretation Framework for the Round-Off Error Analysis of Floating- Point Programs. In Lecture Notes in Computer Science. Springer International Publishing, 516--537. https://doi.org/10.1007/978--3--319--73721--8_24Google Scholar
- Oreste Villa, Mark Stephenson, David Nellans, and Stephen W Keckler. 2019. Nvbit: A dynamic binary instrumentation framework for nvidia gpus. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 372--383.Google ScholarDigital Library
- Nathan Whitehead and Alex Fit-florea. 2022. Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs. https://docs.nvidia.com/cuda/floating-point/index.htmlGoogle Scholar
Index Terms
- Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs
Recommendations
BinFPE: accurate floating-point exception detection for GPU applications
SOAP 2022: Proceedings of the 11th ACM SIGPLAN International Workshop on the State Of the Art in Program AnalysisWhen modern heterogeneous HPC systems perform numerical computations, floating-point exceptional quantities such as NaN and infinity in the GPU context, remain insufficiently handled. This is because commonly used GPUs and the CUDA language have no ...
Automatic detection of floating-point exceptions
POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesIt is well-known that floating-point exceptions can be disastrous and writing exception-free numerical programs is very difficult. Thus, it is important to automatically detect such errors. In this paper, we present Ariadne, a practical symbolic ...
Finding inputs that trigger floating-point exceptions in GPUs via bayesian optimization
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisTesting code for floating-point exceptions is crucial as exceptions can quickly propagate and produce unreliable numerical answers. The state-of-the-art to test for floating-point exceptions in GPUs is quite limited and solutions require the application'...
Comments