ABSTRACT
As we approach the limits of Moore's law, researchers are exploring new paradigms for future high-performance computing (HPC) systems. Approximate computing has gained traction by promising to deliver substantial computing power. However, due to the stringent accuracy requirements of HPC scientific applications, the broad adoption of approximate computing methods in HPC requires an in-depth understanding of the application's amenability to approximations.
We develop HPAC, a framework with compiler and runtime support for code annotation and transformation, and accuracy vs. performance trade-off analysis of OpenMP HPC applications. We use HPAC to perform an in-depth analysis of the effectiveness of approximate computing techniques when applied to HPC applications. The results reveal possible performance gains of approximation and its interplay with parallel execution. For instance, in the LULESH proxy application approximation provides substantial performance gains due to the reduction of memory accesses. However, in the leukocyte benchmark approximation induces load imbalance in the parallel execution and thus limiting the performance gains.
Supplemental Material
- Sameh Abdulah, Qinglei Cao, Yu Pei, George Bosilca, Jack Dongarra, Marc G Genton, David E Keyes, Hatem Ltaief, and Ying Sun. 2021. Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach with PaRSEC. Technical Report.Google Scholar
- Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. Petabricks: A language and compiler for algorithmic choice. ACM Sigplan Notices 44, 6 (2009), 38--49.Google ScholarDigital Library
- Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In International Symposium on Code Generation and Optimization (CGO 2011). IEEE, 85--96.Google ScholarCross Ref
- Woongki Baek and Trishul M Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation. 198--209.Google ScholarDigital Library
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. 72--81.Google ScholarDigital Library
- Qinglei Cao, Yu Pei, Kadir Akbudak, George Bosilca, Hatem Ltaief, David E Keyes, and Jack Dongarra. 2020. Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems. (2020).Google Scholar
- Qinglei Cao, Yu Pei, Kadir Akbudak, Aleksandr Mikhalev, George Bosilca, Hatem Ltaief, David Keyes, and Jack Dongarra. 2020. Extreme-scale task-based cholesky factorization toward climate and weather prediction applications. In Proceedings of the Platform for Advanced Scientific Computing Conference. 1--11.Google ScholarDigital Library
- Michael Carbin, Sasa Misailovic, and Martin C Rinard. 2013. Verifying quantitative reliability for programs that execute on unreliable hardware. ACM SIGPLAN Notices 48, 10 (2013), 33--52.Google ScholarDigital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54. Google ScholarDigital Library
- Ron S Dembo, Stanley C Eisenstat, and Trond Steihaug. 1982. Inexact newton methods. SIAM Journal on Numerical analysis 19, 2 (1982), 400--408.Google Scholar
- Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. ACM SIGPLAN Notices 50, 6 (2015), 379--390.Google ScholarDigital Library
- Jack Dongarra, G Bosilca, A Bouteiller, A Danalis, M Faverge, and T Herault. 2013. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability. IEEE Computing in Science and Engineering 15 (2013), 36.Google ScholarDigital Library
- Rudolf Eigenmann. 2017. HiPA: history-based piecewise approximation for functions. In Proceedings of the International Conference on Supercomputing. 1--10.Google Scholar
- Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems. 301--312.Google ScholarDigital Library
- Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 449--460.Google ScholarDigital Library
- Saman Froehlich, Daniel Große, and Rolf Drechsler. 2018. Towards reversed approximate hardware design. In 2018 21st Euromicro Conference on Digital System Design (DSD). IEEE, 665--671.Google ScholarCross Ref
- Daniele Funaro. 2008. Polynomial approximation of differential equations. Vol. 8. Springer Science & Business Media.Google Scholar
- Beayna Grigorian, Nazanin Farahpour, and Glenn Reinman. 2015. BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 615--626.Google ScholarCross Ref
- Michael A. Heroux. 2017. High Performance Computing Conjugate Gradients: The original Mantevo miniapp. https://github.com/Mantevo/HPCCGGoogle Scholar
- Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agarwal, and Martin Rinard. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. (2009).Google Scholar
- Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. Technical Report LLNL-TR-641973. 1--9 pages.Google Scholar
- Ignacio Laguna, Paul C Wood, Ranvijay Singh, and Saurabh Bagchi. 2019. Gpumixer: Performance-driven floating-point tuning for gpu scientific applications. In International Conference on High Performance Computing. Springer, 227--246.Google ScholarCross Ref
- Michael O Lam, Tristan Vanderbruggen, Harshitha Menon, and Markus Schordan. 2019. Tool integration for source-level mixed precision. In 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness). IEEE, 27--35.Google ScholarCross Ref
- Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing 14, 9 (2011), 1--9.Google Scholar
- Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-effort parallel execution framework for recognition and mining applications. In 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1--12.Google ScholarDigital Library
- Jiayuan Mengte, Anand Raghunathan, Srimat Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1--12.Google ScholarCross Ref
- Harshitha Menon, Michael O Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, and Jeffrey Hittinger. 2018. ADAPT: algorithmic differentiation applied to floating-point precision tuning. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 614--626.Google ScholarDigital Library
- Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C Rinard. 2014. Chisel: Reliability-and accuracy-aware optimization of approximate computational kernels. ACM Sigplan Notices 49, 10 (2014), 309--328.Google ScholarDigital Library
- Sasa Misailovic, Daniel M Roy, and Martin C Rinard. 2011. Probabilistically accurate program transformations. In International Static Analysis Symposium. Springer, 316--333.Google ScholarDigital Library
- Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin Rinard. 2010. Quality of service profiling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 25--34.Google ScholarDigital Library
- Asit K Mishra, Rajkishore Barik, and Somnath Paul. 2014. iACT: A software-hardware framework for understanding the scope of approximate computing. In Workshop on Approximate Computing Across the System Stack (WACAS). 52.Google Scholar
- Konstantinos Parasyris, Vassilis Vassiliadis, Christos D Antonopoulos, Spyros Lalis, and Nikolaos Bellas. 2017. Significance-aware program execution on unreliable hardware. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2 (2017), 1--25.Google ScholarDigital Library
- A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, and L. Benini. 2013. Variationtolerant OpenMP tasking on tightly-coupled processor clusters. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE). 541--546. Google ScholarCross Ref
- Abbas Rahimi, Andrea Marongiu, Rajesh K Gupta, and Luca Benini. 2013. A variability-aware openmp environment for efficient execution of accuracy-configurable computation on shared-fpu processor clusters. In 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS). IEEE, 1--10.Google ScholarCross Ref
- Semeen Rehman, Walaa El-Harouni, Muhammad Shafique, Akash Kumar, Jorg Henkel, and Jörg Henkel. 2016. Architectural-space exploration of approximate multipliers. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--8.Google ScholarDigital Library
- Martin Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 20th annual international conference on Supercomputing. 324--334.Google ScholarDigital Library
- Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In SC'13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.Google ScholarDigital Library
- Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based approximation for data parallel applications. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems. 35--50.Google ScholarDigital Library
- Mehrzad Samadi, Janghaeng Lee, D Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. Sage: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 13--24.Google ScholarDigital Library
- Adrian Sampson, André Baixo, Benjamin Ransford, Thierry Moreau, Joshua Yip, Luis Ceze, and Mark Oskin. 2015. Accept: A programmer-guided compiler framework for practical approximate computing. University of Washington Technical Report UW-CSE-15-01 1, 2 (2015).Google Scholar
- Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. ACM SIGPLAN Notices 46, 6 (2011), 164--174.Google ScholarDigital Library
- Hashim Sharif, Prakalp Srivastava, Muhammad Huzaifa, Maria Kotsifakou, Keyur Joshi, Yasmin Sarita, Nathan Zhao, Vikram S Adve, Sasa Misailovic, and Sarita V Adve. 2019. ApproxHPVM: a portable compiler IR for accuracy-aware optimizations. Proc. ACM Program. Lang. 3, OOPSLA (2019), 186--1.Google ScholarDigital Library
- Hashim Sharif, Yifan Zhao, Maria Kotsifakou, Akash Kothari, Ben Schreiber, Elizabeth Wang, Yasmin Sarita, Nathan Zhao, Keyur Joshi, Vikram S Adve, et al. 2021. ApproxTuner: a compiler and runtime system for adaptive approximations. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 262--277.Google ScholarDigital Library
- Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 124--134.Google ScholarDigital Library
- Georgios Tziantzioulis, Nikos Hardavellas, and Simone Campanoni. 2018. Temporal approximate function memoization. IEEE Micro 38, 4 (2018), 60--70.Google ScholarDigital Library
- Vassilis Vassiliadis, Charalampos Chalios, Konstantinos Parasyris, Christos D Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S Nikolopoulos. 2015. A significance-driven programming framework for energy-constrained approximate computing. In Proceedings of the 12th ACM International Conference on Computing Frontiers. 1--8.Google ScholarDigital Library
- Vassilis Vassiliadis, Charalampos Chalios, Konstantinos Parasyris, Christos D Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S Nikolopoulos. 2016. Exploiting significance of computations for energy-constrained approximate computing. International Journal of Parallel Programming 44, 5 (2016), 1078--1098.Google ScholarCross Ref
- Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. ACM SIGPLAN Notices 50, 8 (2015), 275--276.Google ScholarDigital Library
- Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A Kelner, and Martin Rinard. 2012. Randomized accuracy-aware program transformations for efficient approximate computations. ACM SIGPLAN Notices 47, 1 (2012), 441--454.Google ScholarDigital Library
Index Terms
- HPAC: evaluating approximate computing techniques on HPC OpenMP applications
Recommendations
HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisThe end of Dennard scaling and the slowdown of Moore's law led to a shift in technology trends towards parallel architectures, particularly in HPC systems. To continue providing performance benefits, HPC should embrace Approximate Computing (AC), which ...
OpenMP Offloading in the Jetson Nano Platform
ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel ProcessingThe nvidia Jetson Nano is a very popular system-on-module and developer kit which brings high-performance specs in a small and power-efficient embedded platform. Integrating a 128-core gpu and a quad-core cpu, it provides enough capabilities to support ...
Exploiting fine-grain thread parallelism on multicore architectures
Software Development for Multi-core Computing SystemsIn this work we present a runtime threading system which provides an efficient substrate for fine-grain parallelism, suitable for deployment in multicore platforms. Its architecture encompasses a number of optimizations that make it particularly ...
Comments