research-article

Open Access

HPAC: evaluating approximate computing techniques on HPC OpenMP applications

Authors:
Konstantinos Parasyris

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Giorgis Georgakoudis

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Harshitha Menon

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
James Diffenderfer

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Ignacio Laguna

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Daniel Osei-Kuffuor

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Markus Schordan

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2021Article No.: 86Pages 1–14https://doi.org/10.1145/3458817.3476216

Published:13 November 2021Publication History

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–14

ABSTRACT

As we approach the limits of Moore's law, researchers are exploring new paradigms for future high-performance computing (HPC) systems. Approximate computing has gained traction by promising to deliver substantial computing power. However, due to the stringent accuracy requirements of HPC scientific applications, the broad adoption of approximate computing methods in HPC requires an in-depth understanding of the application's amenability to approximations.

We develop HPAC, a framework with compiler and runtime support for code annotation and transformation, and accuracy vs. performance trade-off analysis of OpenMP HPC applications. We use HPAC to perform an in-depth analysis of the effectiveness of approximate computing techniques when applied to HPC applications. The results reveal possible performance gains of approximation and its interplay with parallel execution. For instance, in the LULESH proxy application approximation provides substantial performance gains due to the reduction of memory accesses. However, in the leukocyte benchmark approximation induces load imbalance in the parallel execution and thus limiting the performance gains.

Supplemental Material

HPAC Evaluating Approximate Computing Techniques on HPC OpenMP Applications.mp4

mp4

135.5 MB

Download

References

Sameh Abdulah, Qinglei Cao, Yu Pei, George Bosilca, Jack Dongarra, Marc G Genton, David E Keyes, Hatem Ltaief, and Ying Sun. 2021. Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach with PaRSEC. Technical Report.Google Scholar
Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. Petabricks: A language and compiler for algorithmic choice. ACM Sigplan Notices 44, 6 (2009), 38--49.Google ScholarDigital Library
Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In International Symposium on Code Generation and Optimization (CGO 2011). IEEE, 85--96.Google ScholarCross Ref
Woongki Baek and Trishul M Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation. 198--209.Google ScholarDigital Library
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. 72--81.Google ScholarDigital Library
Qinglei Cao, Yu Pei, Kadir Akbudak, George Bosilca, Hatem Ltaief, David E Keyes, and Jack Dongarra. 2020. Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems. (2020).Google Scholar
Qinglei Cao, Yu Pei, Kadir Akbudak, Aleksandr Mikhalev, George Bosilca, Hatem Ltaief, David Keyes, and Jack Dongarra. 2020. Extreme-scale task-based cholesky factorization toward climate and weather prediction applications. In Proceedings of the Platform for Advanced Scientific Computing Conference. 1--11.Google ScholarDigital Library
Michael Carbin, Sasa Misailovic, and Martin C Rinard. 2013. Verifying quantitative reliability for programs that execute on unreliable hardware. ACM SIGPLAN Notices 48, 10 (2013), 33--52.Google ScholarDigital Library
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54. Google ScholarDigital Library
Ron S Dembo, Stanley C Eisenstat, and Trond Steihaug. 1982. Inexact newton methods. SIAM Journal on Numerical analysis 19, 2 (1982), 400--408.Google Scholar
Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. ACM SIGPLAN Notices 50, 6 (2015), 379--390.Google ScholarDigital Library
Jack Dongarra, G Bosilca, A Bouteiller, A Danalis, M Faverge, and T Herault. 2013. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability. IEEE Computing in Science and Engineering 15 (2013), 36.Google ScholarDigital Library
Rudolf Eigenmann. 2017. HiPA: history-based piecewise approximation for functions. In Proceedings of the International Conference on Supercomputing. 1--10.Google Scholar
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems. 301--312.Google ScholarDigital Library
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 449--460.Google ScholarDigital Library
Saman Froehlich, Daniel Große, and Rolf Drechsler. 2018. Towards reversed approximate hardware design. In 2018 21st Euromicro Conference on Digital System Design (DSD). IEEE, 665--671.Google ScholarCross Ref
Daniele Funaro. 2008. Polynomial approximation of differential equations. Vol. 8. Springer Science & Business Media.Google Scholar
Beayna Grigorian, Nazanin Farahpour, and Glenn Reinman. 2015. BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 615--626.Google ScholarCross Ref
Michael A. Heroux. 2017. High Performance Computing Conjugate Gradients: The original Mantevo miniapp. https://github.com/Mantevo/HPCCGGoogle Scholar
Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agarwal, and Martin Rinard. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. (2009).Google Scholar
Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. Technical Report LLNL-TR-641973. 1--9 pages.Google Scholar
Ignacio Laguna, Paul C Wood, Ranvijay Singh, and Saurabh Bagchi. 2019. Gpumixer: Performance-driven floating-point tuning for gpu scientific applications. In International Conference on High Performance Computing. Springer, 227--246.Google ScholarCross Ref
Michael O Lam, Tristan Vanderbruggen, Harshitha Menon, and Markus Schordan. 2019. Tool integration for source-level mixed precision. In 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness). IEEE, 27--35.Google ScholarCross Ref
Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing 14, 9 (2011), 1--9.Google Scholar
Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-effort parallel execution framework for recognition and mining applications. In 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1--12.Google ScholarDigital Library
Jiayuan Mengte, Anand Raghunathan, Srimat Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1--12.Google ScholarCross Ref
Harshitha Menon, Michael O Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, and Jeffrey Hittinger. 2018. ADAPT: algorithmic differentiation applied to floating-point precision tuning. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 614--626.Google ScholarDigital Library
Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C Rinard. 2014. Chisel: Reliability-and accuracy-aware optimization of approximate computational kernels. ACM Sigplan Notices 49, 10 (2014), 309--328.Google ScholarDigital Library
Sasa Misailovic, Daniel M Roy, and Martin C Rinard. 2011. Probabilistically accurate program transformations. In International Static Analysis Symposium. Springer, 316--333.Google ScholarDigital Library
Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin Rinard. 2010. Quality of service profiling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 25--34.Google ScholarDigital Library
Asit K Mishra, Rajkishore Barik, and Somnath Paul. 2014. iACT: A software-hardware framework for understanding the scope of approximate computing. In Workshop on Approximate Computing Across the System Stack (WACAS). 52.Google Scholar
Konstantinos Parasyris, Vassilis Vassiliadis, Christos D Antonopoulos, Spyros Lalis, and Nikolaos Bellas. 2017. Significance-aware program execution on unreliable hardware. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2 (2017), 1--25.Google ScholarDigital Library
A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, and L. Benini. 2013. Variationtolerant OpenMP tasking on tightly-coupled processor clusters. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE). 541--546. Google ScholarCross Ref
Abbas Rahimi, Andrea Marongiu, Rajesh K Gupta, and Luca Benini. 2013. A variability-aware openmp environment for efficient execution of accuracy-configurable computation on shared-fpu processor clusters. In 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS). IEEE, 1--10.Google ScholarCross Ref
Semeen Rehman, Walaa El-Harouni, Muhammad Shafique, Akash Kumar, Jorg Henkel, and Jörg Henkel. 2016. Architectural-space exploration of approximate multipliers. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--8.Google ScholarDigital Library
Martin Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 20th annual international conference on Supercomputing. 324--334.Google ScholarDigital Library
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In SC'13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.Google ScholarDigital Library
Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based approximation for data parallel applications. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems. 35--50.Google ScholarDigital Library
Mehrzad Samadi, Janghaeng Lee, D Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. Sage: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 13--24.Google ScholarDigital Library
Adrian Sampson, André Baixo, Benjamin Ransford, Thierry Moreau, Joshua Yip, Luis Ceze, and Mark Oskin. 2015. Accept: A programmer-guided compiler framework for practical approximate computing. University of Washington Technical Report UW-CSE-15-01 1, 2 (2015).Google Scholar
Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. ACM SIGPLAN Notices 46, 6 (2011), 164--174.Google ScholarDigital Library
Hashim Sharif, Prakalp Srivastava, Muhammad Huzaifa, Maria Kotsifakou, Keyur Joshi, Yasmin Sarita, Nathan Zhao, Vikram S Adve, Sasa Misailovic, and Sarita V Adve. 2019. ApproxHPVM: a portable compiler IR for accuracy-aware optimizations. Proc. ACM Program. Lang. 3, OOPSLA (2019), 186--1.Google ScholarDigital Library
Hashim Sharif, Yifan Zhao, Maria Kotsifakou, Akash Kothari, Ben Schreiber, Elizabeth Wang, Yasmin Sarita, Nathan Zhao, Keyur Joshi, Vikram S Adve, et al. 2021. ApproxTuner: a compiler and runtime system for adaptive approximations. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 262--277.Google ScholarDigital Library
Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 124--134.Google ScholarDigital Library
Georgios Tziantzioulis, Nikos Hardavellas, and Simone Campanoni. 2018. Temporal approximate function memoization. IEEE Micro 38, 4 (2018), 60--70.Google ScholarDigital Library
Vassilis Vassiliadis, Charalampos Chalios, Konstantinos Parasyris, Christos D Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S Nikolopoulos. 2015. A significance-driven programming framework for energy-constrained approximate computing. In Proceedings of the 12th ACM International Conference on Computing Frontiers. 1--8.Google ScholarDigital Library
Vassilis Vassiliadis, Charalampos Chalios, Konstantinos Parasyris, Christos D Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S Nikolopoulos. 2016. Exploiting significance of computations for energy-constrained approximate computing. International Journal of Parallel Programming 44, 5 (2016), 1078--1098.Google ScholarCross Ref
Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. ACM SIGPLAN Notices 50, 8 (2015), 275--276.Google ScholarDigital Library
Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A Kelner, and Martin Rinard. 2012. Randomized accuracy-aware program transformations for efficient approximate computations. ACM SIGPLAN Notices 47, 1 (2012), 441--454.Google ScholarDigital Library

Index Terms

HPAC: evaluating approximate computing techniques on HPC OpenMP applications
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments

Recommendations

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

The end of Dennard scaling and the slowdown of Moore's law led to a shift in technology trends towards parallel architectures, particularly in HPC systems. To continue providing performance benefits, HPC should embrace Approximate Computing (AC), which ...
Read More
OpenMP Offloading in the Jetson Nano Platform
ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

The nvidia Jetson Nano is a very popular system-on-module and developer kit which brings high-performance specs in a small and power-efficient embedded platform. Integrating a 128-core gpu and a quad-core cpu, it provides enough capabilities to support ...
Read More
Exploiting fine-grain thread parallelism on multicore architectures
Software Development for Multi-core Computing Systems

In this work we present a runtime threading system which provides an efficient substrate for fine-grain parallelism, suitable for deployment in multicore platforms. Its architecture encompasses a number of optimizations that make it particularly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
OpenMP
approximate computing
programming models
runtime systems
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 942
  Total Downloads
- Downloads (Last 12 months)235
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HPAC: evaluating approximate computing techniques on HPC OpenMP applications

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU

OpenMP Offloading in the Jetson Nano Platform

Exploiting fine-grain thread parallelism on multicore architectures