research-article

Experiences in extending parallware to support OpenACC

Authors:
Jacobo Lobeiras

Appentra Solutions, A Coruña, Spain

Appentra Solutions, A Coruña, Spain
View Profile

,
Manuel Arenaz

Appentra Solutions & University of A Coruña, A Coruña, Spain

Appentra Solutions & University of A Coruña, A Coruña, Spain
View Profile

,
Oscar Hernández

Oak Ridge National Laboratory, Tennessee

Oak Ridge National Laboratory, Tennessee
View Profile

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using DirectivesNovember 2015Article No.: 4Pages 1–12https://doi.org/10.1145/2832105.2832112

Published:15 November 2015Publication History

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives

Pages 1–12

ABSTRACT

Porting scientific codes to accelerator-based computers using OpenACC and OpenMP is an important topic for the HPC community. Programmability, performance portability and developer productivity are key issues for the widespread use of these systems. In the scope of general-purpose parallel computing, Parallware is a new commercial OpenMP-enabling source-to-source compiler that automatically adds OpenMP capabilities in scientific programs. Thus, extending Parallware with OpenACC or OpenMP 4.x support would contribute to improve programmability and developer productivity. In contrast, the performance portability of such approach needs to be demonstrated in practice. This paper presents a preliminary study to extend Parallware with OpenACC support for GPU devices. A simple benchmark suite has been designed to mimic important features and computational patterns of real scientific applications. Handcoded OpenACC versions are compared to OpenMP versions automatically generated by Parallware. Performance is evaluated with the PGI OpenACC compiler on systems accelerated with NVIDIA GPUs.

References

OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. http://www.khronos.org/opencl, 2015.Google Scholar
J. Andión, M. Arenaz, G. Rodríguez, and J. Touriño. A Novel Compiler Support for Automatic Parallelization on Multicore Systems. Parallel Computing, 39(9):442--460, 2013.Google ScholarCross Ref
Appentra. Parallware: The OpenMP-enabling Source-to-Source Compiler. http://www.appentra.com/products/parallware, May 2015.Google Scholar
M. Arenaz, J. Domínguez, and A. Crespo. Democratization of HPC in the Oil & Gas Industry through Automatic Parallelization with Parallware. 2015 Rice Oil and Gas HPC Workshop, Mar. 2015.Google Scholar
M. Arenaz, J. Touriño, and R. Doallo. A GSA-based Compiler Infrastructure to Extract Parallelism from Complex Loops. In Proceedings of the 17th International Conference on Supercomputing, ICS '03, pages 193--204, New York, USA, 2003. ACM Press. Google ScholarDigital Library
M. Arenaz, J. Touriño, and R. Doallo. XARK: An Extensible Framework for Automatic Recognition of Computational Kernels. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(6):32:1--32:56, 2008. Google ScholarDigital Library
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS Parallel Benchmarks - Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, pages 158--165. ACM, 1991. Google ScholarDigital Library
Barcelona Supercomputing Center (BSC). The OmpSs Programming Model. http://pm.bsc.es/ompss, 2015.Google Scholar
W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. Parallel Programming with Polaris. Computer, 29(12):78--82, 1996. Google ScholarDigital Library
B. Chapman, G. Jost, and R. Pas. Using OpenMP: Portable Shared Memory Parallel Programming. Scientific and Engineering Computation. The MIT Press, 2007. Google ScholarDigital Library
J.-H. Chow, L. E. Lyon, and V. Sarkar. Automatic Parallelization for Symmetric Shared-memory Multiprocessors. In Proceedings of the 1996 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON '96, pages 1--14. IBM Press, 1996. Google ScholarDigital Library
C. Dave, H. Bae, S.-J. Min, S. Lee, R. Eigenmann, and S. Midkiff. Cetus: A Source-to-Source Compiler Infrastructure for Multicores. IEEE Micro, 42(12):36--42, 2009. Google ScholarDigital Library
J. D. Davis and E. S. Chung. SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place. Technical Report MSR-TR-2012-95, Sept. 2012.Google Scholar
M. Gerndt, A. Hollmann, M. Meyer, M. Schreiber, and J. Weidendorfer. Invasive Computing with iOMP. In Proceeding of the 2012 Forum on Specification and Design Languages, Vienna, Austria, September 18-20, 2012, pages 225--231, 2012.Google Scholar
H. Gómez-Sousa, M. Arenaz, O. Rubiños-López, and J. Martínez-Lorenzo. Novel Source-to-Source Compiler Approach for the Automatic Parallelization of Codes based on the Method of Moments. In Proceedings of the 9th European Conference on Antenas and Propagation, EuCap 2015, Apr. 2015.Google Scholar
K. Goto and R. Geijn. Anatomy of High-Performance Matrix Multiplication. ACM Transactions on Mathematical Software, 34(3):12:1--12:25, 2008. Google ScholarDigital Library
S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. Auto-Tuning a High-Level Language Targeted to GPU Codes. In Proceedings of Innovative Parallel Computing, InPar '12, 2012.Google ScholarCross Ref
Intel. Intel Math Kernel Library, Reference Manual, 2014. v11.2.Google Scholar
Intel. Intel Architecture Instruction Set Extensions Programming Reference, Aug. 2015.Google Scholar
M. Ishihara, H. Honda, and M. Sato. Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP. IEICE Transactions on Information and Systems, 89-D(2):399--407, 2006. Google ScholarDigital Library
Q. Jiang, Y. C. Lee, A. Zomaya, M. Arenaz, and L. Leslie. Optimizing Scientific Workflows in the Cloud: A Montage Example. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC), pages 517--522. IEEE, Dec. 2014. Google ScholarDigital Library
H. Jin, H. Jin, M. Frumkin, M. Frumkin, J. Yan, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance. Technical report, NASA Ames Research Center (NAS System Division), 1999.Google Scholar
S. Johnson, E. Evans, H. Jin, and C. Ierotheou. The Parawise Expert Assistant - Widening Accessibility to Efficient and Scalable Tool Generated OpenMP Code. In Proceedings of the 5th International Conference on OpenMP Applications and Tools: Shared Memory Parallel Programming with OpenMP, WOMPAT'04, pages 67--82. Springer-Verlag, 2004. Google ScholarDigital Library
G. Juckeland, W. Brantley, S. Chandrasekaran, B. Chapman, S. Che, M. Colgrove, H. Feng, A. Grund, R. Henschel, W.-M. W.-Hwu, H. Li, M. S. Muller, W. E. Nagel, M. Perminov, P. Shelepugin, K. Skadron, J. Stratton, A. Titov, K. Wang, M. van Waveren, B. Whitney, S. Wienke, R. Xu, and K. Kumaran. SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. In S. A. Jarvis, S. A. Wright, and S. D. Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, volume 8966 of Lecture Notes in Computer Science (LNCS), pages 46--67. Springer International Publishing, 2015.Google Scholar
M. Kandemir, A. Choudhary, J. Ramanujam, and M. A. Kandaswamy. A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations. IEEE Transactions on Parallel Distributed Systems (TPDS), 11(7):648--668, 2000. Google ScholarDigital Library
J. Larking. Advanced OpenACC Programming. http://on-demand.gputechconf.com/gtc/2015/presentation/S5195-Jeff-Larkin.pdf, GPU Technology Conference (GTC), Mar. 2015.Google Scholar
J. Larking. Comparing OpenACC and OpenMP Performance and Programmability. http://on-demand.gputechconf.com/gtc/2015/presentation/S5196-Jeff-Larkin.pdf, GPU Technology Conference (GTC), Mar. 2015.Google Scholar
J. Larking. Comparing OpenACC and OpenMP Performance and Programmability. http://on-demand.gputechconf.com/gtc/2015/presentation/S5196-Jeff-Larkin.pdf, GPU Technology Conference (GTC), Mar. 2015.Google Scholar
S.-W. Liao, A. Diwan, R. P. Bosch, Jr., A. Ghuloum, and M. S. Lam. SUIF Explorer: An Interactive and Interprocedural Parallelizer. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '99, pages 37--48, New York, NY, USA, 1999. ACM Press. Google ScholarDigital Library
J. Lobeiras and M. Arenaz. A Success Case using Parallware: The NAS Parallel Benchmark EP. In Proceedings of the OpenMPCon Developers Conference, 2015.Google Scholar
M. Martín, D. Singh, J. Touriño, and F. Rivera. Exploiting locality in the run-time parallelization of irregular loops. In Proceedings of the 2002 International Conference on Parallel Processing, ICPP '02, pages 27--34, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
K. S. McKinley. A Compiler Optimization Algorithm for Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems (TPDS), 9(8):769--787, 1998. Google ScholarDigital Library
N. Megiddo and V. Sarkar. Optimal Weighted Loop Fusion for Parallel Programs. In Proceedings of the 9th ACM Symposium on Parallel Algorithms and Architectures, SPAA '97, pages 282--291, New York, NY, USA, 1997. ACM Press. Google ScholarDigital Library
NVIDIA. CUDA C Best Practices Guide (SDK Documentation), 2015. v7.0.Google Scholar
NVIDIA. CUDA C Programming Guide (SDK Documentation), 2015. v7.0.Google Scholar
NVIDIA. NVIDIA OpenACC Toolkit. http://developer.nvidia.com/openacc, July 2015.Google Scholar
OpenACC Architecture Review Board. The OpenACC API Specification for Parallel Programming. http://www.openacc.org, May 2015.Google Scholar
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.0. http://www.openmp.org, July 2013.Google Scholar
S. Rus and L. Rauchwerger. Compiler technology for migrating sequential code to multi-threaded Architectures. Technical report, Texas A&M University, 2006.Google Scholar
S. Squires, M. V. D. Vanter, and L. Votta. Software Productivity Research in High Performance Computing. CTWatch Quarterly, 2(4A), 2006.Google Scholar
Standard Performance Evaluation Corporation (SPEC). The SPEC ACCEL V1.0 benchmark. https://www.spec.org/accel, 2015.Google Scholar
M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining Loop Transformations Considering Caches and Scheduling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 29, pages 274--286, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarDigital Library
M. Wolfe. Compilers and More: MPI+X. HPC Wire, July 2014.Google Scholar
M. Wolfe. OpenACC and CUDA Unified Memory. http://www.pgroup.com/lit/articles/insider/v6n2a4.htm, Mar. 2015.Google Scholar
H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization Techniques. In Proceedings of the 14th International Conference on Supercomputing, ICS '00, pages 66--77, New York, NY, USA, 2000. ACM Press. Google ScholarDigital Library
H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, New York, NY, USA, 1991. Google Scholar

Index Terms

Experiences in extending parallware to support OpenACC

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
Read More
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Read More
Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives
November 2015
68 pages
ISBN:9781450340144
DOI:10.1145/2832105
Program Chairs:
Sunita Chandrasekaran
University of Houston
,
Fernanda Foertter
ORNL
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
OpenACC
OpenMP
compiler-assisted parallelization
experiences in implementing compilers for accelerator directives on new architectures
parallware
Qualifiers
- research-article
Conference

Acceptance Rates
WACCPD '15 Paper Acceptance Rate7of14submissions,50%Overall Acceptance Rate7of14submissions,50%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 128
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Experiences in extending parallware to support OpenACC

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond

Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Experiences in extending parallware to support OpenACC

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond

Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media