Multiple Target Task Sharing Support for the OpenMP Accelerator Model

Ozen, Guray; Mateo, Sergi; Ayguadé, Eduard; Labarta, Jesús; Beyer, James

doi:10.1007/978-3-319-45550-1_19

Guray Ozen^16,17,
Sergi Mateo^16,17,
Eduard Ayguadé^16,17,
Jesús Labarta^16,17 &
…
James Beyer¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1146 Accesses
1 Citations

Abstract

The use of GPU accelerators is becoming common in HPC platforms due to the their effective performance and energy efficiency. In addition, new generations of multicore processors are being designed with wider vector units and/or larger hardware thread counts, also contributing to the peak performance of the whole system. Although current directive–based paradigms, such as OpenMP or OpenACC, support both accelerators and multicore-based hosts, they do not provide an effective and efficient way to concurrently use them, usually resulting in accelerated programs in which the potential computational performance of the host is not exploited. In this paper we propose an extension to the OpenMP 4.5 directive-based programming model to support the specification and execution of multiple instances of task regions on different devices (i.e. accelerators in conjunction with the vector and heavily multithreaded capabilities in multicore processors). The compiler is responsible for the generation of device-specific code for each device kind, delegating to the runtime system the dynamic schedule of the tasks to the available devices. The new proposed clause conveys useful insight to guide the scheduler while keeping a clean, abstract and machine independent programmer interface. The potential of the proposal is analyzed in a prototype implementation in the OmpSs compiler and runtime infrastructure. Performance evaluation is done using three kernels (N-Body, tiled matrix multiply and Stream) on different GPU-capable systems based on ARM, Intel x86 and IBM Power8. From the evaluation we observe speed–ups in the 8–20% range compared to versions in which only the GPU is used, reaching 96 % of the additional peak performance thanks to the reduction of data transfers and the benefits introduced by the OmpSs NUMA-aware scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Currently under discussion in the accelerators subcommittee of the OpenMP Language Committee.
2.
MACC is an abbreviation for “Mercurium ACCelerator Compiler”.

References

Adinetz, A.V., Baumeister, P.F., Böttiger, H., Hater, T., Maurer, T., Pleiter, D., Schenck, W., Schifano, S.F.: Performance evaluation of scientific applications on POWER8. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 24–45. Springer, Heidelberg (2015)
Google Scholar
OpenMP ARB. OpenMP application program interface, v. 4.5 (2015)
Google Scholar
Bertolli, C., Antao, S.F., Eichenberger, A.E., O’Brien, K., Sura, Z., Jacob, A.C., Chen, T., Sallenave, O.: Coordinating GPU threads for OpenMP 4.0 in LLVM. In: Proceedings of the LLVM Compiler Infrastructure in HPC, LLVM-HPC 2014, Piscataway, NJ, USA, pp. 12–21. IEEE Press (2014)
Google Scholar
Khronos OpenCL Working Group. The OpenCL specification, version 2.0 (2014)
Google Scholar
The Portland Group. PGI accelerator compilers
Google Scholar
Lee, S., Vetter, J.S.: OpenARC: Open Accelerator Research Compiler for directive-based, efficient heterogeneous computing. In: The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2014, Vancouver, BC, Canada, 23–27 June 2014, pp. 115–120 (2014)
Google Scholar
McCalpin, J.D.: Stream: sustainable memory bandwidth in high performance computers. Technical report, University of Virginia (2007)
Google Scholar
NVIDIA. CUDA C programming guide version 7.0. NVIDIA Corporation (2013)
Google Scholar
OpenACC-Standard.org. OpenACC application programming interface, v. 2.5 (2015)
Google Scholar
Ozen, G., Ayguadé, E., Labarta, J.: On the roles of the programmer, the compiler and the runtime system when programming accelerators in OpenMP. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 215–229. Springer, Heidelberg (2014)
Google Scholar
Ozen, G., Ayguadé, E., Labarta, J.: Exploring dynamic parallelismin OpenMP. In: Proceedings of the Second Workshop on Accelerator Programming using Directives, WACCPD 2015, Austin, Texas, USA, 15 November 2015, pp. 5:1–5:8 (2015)
Google Scholar

Download references

Acknowledgments

This work is partially supported by the IBM/BSC Deep Learning Center Initiative, by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contract 2014-SGR-1051).

Author information

Authors and Affiliations

Universitat Politècnica de Catalunya (UPC–BarcelonaTECH), Barcelona, Spain
Guray Ozen, Sergi Mateo, Eduard Ayguadé & Jesús Labarta
Barcelona Supercomputing Center (BSC-CNS), Barcelona, Spain
Guray Ozen, Sergi Mateo, Eduard Ayguadé & Jesús Labarta
Nvidia Corporation, Santa Clara, USA
James Beyer

Authors

Guray Ozen
View author publications
You can also search for this author in PubMed Google Scholar
Sergi Mateo
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Labarta
View author publications
You can also search for this author in PubMed Google Scholar
James Beyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guray Ozen .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ozen, G., Mateo, S., Ayguadé, E., Labarta, J., Beyer, J. (2016). Multiple Target Task Sharing Support for the OpenMP Accelerator Model. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_19
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics