Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

Broquedis, François; Furmento, Nathalie; Goglin, Brice; Namyst, Raymond; Wacrenier, Pierre-André

doi:10.1007/978-3-642-02303-3_7

François Broquedis¹⁹,
Nathalie Furmento²¹,
Brice Goglin²⁰,
Raymond Namyst¹⁹ &
…
Pierre-André Wacrenier¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5568))

Included in the following conference series:

International Workshop on OpenMP

899 Accesses
33 Citations

Abstract

Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid memory access penalties. Directive-based programming languages such as OpenMPprovide programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system.

Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into “scheduling hints” to solve thread/memory affinity issues. It enables dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. First experiments show that mixed solutions (migrating threads and data) outperform next-touch-based data distribution policies and open possibilities for new optimizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benkner, S., Brandes, T.: Efficient parallel programming on scalable shared memory systems with High Performance Fortran. In: Concurrency: Practice and Experience, vol. 14, pp. 789–803. John Wiley & Sons, Chichester (2002)
Google Scholar
Broquedis, F., Diakhaté, F., Thibault, S., Aumage, O., Namyst, R., Wacrenier, P.-A.: Scheduling Dynamic OpenMP Applications over Multicore Architectures. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 170–180. Springer, Heidelberg (2008)
Chapter Google Scholar
Carlson, W., Draper, J., Culler, D., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, George Mason University (May 1999)
Google Scholar
Chapman, B.M., Bregier, F., Patil, A., Prabhakar, A.: Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems. In: Concurrency: Practice and Experience, vol. 14, pp. 713–739. John Wiley & Sons, Chichester (2002)
Google Scholar
Dolbeau, R., Bihan, S., Bodin, F.: HMPP^TM: A Hybrid Multi-core Parallel Programming Environment. Technical report, CAPS entreprise (2007)
Google Scholar
Duran, A., Perez, J.M., Ayguade, E., Badia, R., Labarta, J.: Extending the OpenMP Tasking Model to Allow Dependant Tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)
Chapter Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada (June 1998)
Google Scholar
Goglin, B., Furmento, N.: Enabling High-Performance Memory-Migration in Linux for Multithreaded Applications. In: MTAAP 2009: Workshop on Multithreaded Architectures and Applications, held in conjunction with IPDPS 2009, Rome, Italy, May 2009. IEEE Computer Society Press, Los Alamitos (2009)
Google Scholar
Intel. Thread Building Blocks, http://www.intel.com/software/products/tbb/
Koelbel, C., Loveman, D., Schreiber, R., Steele, G., Zosel, M.: The High Performance Fortran Handbook (1994)
Google Scholar
Löf, H., Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 387–392 (2005)
Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995, pp. 19–25 (1995)
Google Scholar
Nikolopoulos, D.S., Papatheodorou, T.S., Polychronopoulos, C.D., Labarta, J., Ayguadé, E.: User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors. In: International Conference on Parallel Processing, September 2000, pp. 95–103. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Nikolopoulos, D.S., Polychronopoulos, C.D., Papatheodorou, T.S., Labarta, J., Ayguadé, E.: Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors. Parallel and Distributed Computing 62, 1069–1103 (2002)
Article MATH Google Scholar
Nordén, M., Löf, H., Rantakokko, J., Holmgren, S.: Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 382–393. Springer, Heidelberg (2008)
Chapter Google Scholar
Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and Thread Affinity in OpenMP Programs. In: MAW 2008: Proceedings of the 2008 workshop on Memory access on future processors, pp. 377–384. ACM, New York (2008)
Google Scholar
Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An efficient openMP runtime system for hierarchical architectures. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 161–172. Springer, Heidelberg (2008)
Chapter Google Scholar
Thibault, S., Namyst, R., Wacrenier, P.-A.: Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 42–51. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Bordeaux, France
François Broquedis, Raymond Namyst & Pierre-André Wacrenier
INRIA, France
Brice Goglin
CNRS LaBRI, 351 cours de la Libération, F-33405, Talence, France
Nathalie Furmento

Authors

François Broquedis
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Furmento
View author publications
You can also search for this author in PubMed Google Scholar
Brice Goglin
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Namyst
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-André Wacrenier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Information Services and High Performance Computing (ZIH), Dresden University of Technology, 01062, Dresden, Germany
Matthias S. Müller
Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, CA 94551-0808, Livermore, USA
Bronis R. de Supinski
Dept. of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, 4800 Calhoun Rd, 77204-3475, Houston, TX, USA
Barbara M. Chapman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, PA. (2009). Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds) Evolving OpenMP in an Age of Extreme Parallelism. IWOMP 2009. Lecture Notes in Computer Science, vol 5568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02303-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-02303-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02284-5
Online ISBN: 978-3-642-02303-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics