research-article

HPX: A Task Based Programming Model in a Global Address Space

Authors:
Hartmut Kaiser

Center for Computation and Technology, Louisiana State University, Louisiana, U.S.A.

Center for Computation and Technology, Louisiana State University, Louisiana, U.S.A.
View Profile

,
Thomas Heller

Computer Science 3, Computer Architectures, Friedrich-Alexander-University, Erlangen, Germany

Computer Science 3, Computer Architectures, Friedrich-Alexander-University, Erlangen, Germany
View Profile

,
Bryce Adelstein-Lelbach

Center for Computation and Technology, Louisiana State University, Louisiana, U.S.A.

Center for Computation and Technology, Louisiana State University, Louisiana, U.S.A.
View Profile

,
Adrian Serio

Center for Computation and Technology, Louisiana State University, Louisiana, U.S.A.

Center for Computation and Technology, Louisiana State University, Louisiana, U.S.A.
View Profile

,
Dietmar Fey

Computer Science 3, Computer Architectures, Friedrich-Alexander-University, Erlangen, Germany

Computer Science 3, Computer Architectures, Friedrich-Alexander-University, Erlangen, Germany
View Profile

PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming ModelsOctober 2014Article No.: 6Pages 1–11https://doi.org/10.1145/2676870.2676883

Published:06 October 2014Publication History

PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models

Pages 1–11

ABSTRACT

The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and resilient programming techniques that are portable across multiple future generations of machines. We believe that guaranteeing adequate scalability, programmability, performance portability, resilience, and energy efficiency requires a fundamentally new approach, combined with a transition path for existing scientific applications, to fully explore the rewards of todays and tomorrows systems. We present HPX -- a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource management. This provides a widely accepted API enabling programmability, composability and performance portability of user applications. By employing a global address space, we seamlessly augment the standard to apply to a distributed case. We present HPX's architecture, design decisions, and results selected from a diverse set of application runs showing superior performance, scalability, and efficiency over conventional practice.

References

"X-Stack: Programming Challenges, Runtime Systems, and Tools, DoE-FOA-0000619," 2012, http://science.energy.gov//media/grants/pdf/foas/2012/SC_FOA_0000619.pdf.Google Scholar
"The Qthread Library," 2014, http://www.cs.sandia.gov/qthreads/.Google Scholar
K. Huck, S. Shende, A. Malony, H. Kaiser, A. Porterfield, R. Fowler, and R. Brightwell, "An early prototype of an autonomic performance environment for exascale," in Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ser. ROSS '13. New York, NY, USA: ACM, 2013, pp. 8:1--8:8. {Online}. Available: http://doi.acm.org/10.1145/2491661.2481434 Google ScholarDigital Library
M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach, and T. L. Sterling, "Neutron star evolutions using tabulated equations of state with a new execution model," CoRR, vol. abs/1205.5055, 2012. {Online}. Available: http://dblp.uni-trier.de/db/journals/corr/corr1205.html#abs-1205-5055Google Scholar
C. Dekate, H. Kaiser, M. Anderson, B. Adelstein-Lelbach, and T. Sterling, "N-Body SVN repository," 2011, available under a BSD-style open source license. Contact [email protected] for repository access. {Online}. Available: https: //svn.cct.lsu.edu/repos/projects/parallex/trunk/history/nbodyGoogle Scholar
Intel, "Intel Thread Building Blocks 3.0," 2010, http://www.threadingbuildingblocks.org.Google Scholar
Microsoft, "Microsoft Parallel Pattern Library," 2010, http://msdn.microsoft.com/en-us/library/dd492418.aspx.Google Scholar
"StarPU - A Unified Runtime System for Heterogeneous Multicore Architectures," 2013, http://runtime.bordeaux.inria.fr/StarPU/.Google Scholar
"Intel(R) Cilk(tm) Plus," 2014, http://software.intel.com/en-us/intel-cilk-plus.Google Scholar
"OpenMP Specifications," 2013, http://openmp.org/wp/openmp-specifications/.Google Scholar
B. L. Chamberlain, D. Callahan, and H. P. Zima, "Parallel programmability and the Chapel language," International Journal of High Performance Computing Applications, vol. 21, pp. 291--312, 2007. Google ScholarDigital Library
"Intel SPMD Program Compiler," 2011-2012, http://ispc.github.io/.Google Scholar
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: An object-oriented approach to non- uniform cluster computing," SIGPLAN Not., vol. 40, pp. 519--538, October 2005. {Online}. Available: http://doi.acm.org/10.1145/1103845.1094852 Google ScholarDigital Library
The C++ Standards Committee, "ISO/IEC 14882:2011, Standard for Programming Language C++,", Tech. Rep., 2011, http://www.open-std.org/jtc1/sc22/wg21.Google Scholar
The C++ Standards Committee, "N3797: Working Draft, Standard for Programming Language C++," Tech. Rep., 2013, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf.Google Scholar
Niklas Gustafsson and Artur Laksberg and Herb Sutter and Sana Mithani, "N3857: Improvements to std::future<T> and Related APIs," The C++ Standards Committee, Tech. Rep., 2014, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf.Google Scholar
"The OmpSs Programming Model," 2013, https://pm.bsc.es/ompss.Google Scholar
"OpenACC - Directives for Accelerators," 2013, http://www.openacc-standard.org/.Google Scholar
"C++ AMP (C++ Accelerated Massive Parallelism)," 2013, http://msdn.microsoft.com/en-us/library/hh265137.aspx.Google Scholar
"CUDA," 2013, http://www.nvidia.com/object/cuda_home_new.html.Google Scholar
"OpenCL - The open standard for parallel programming of heterogeneous systems," 2013, https://www.khronos.org/opencl/.Google Scholar
UPC Consortium, "UPC Language Specifications, v1.2," Lawrence Berkeley National Lab, Tech Report LBNL-59208, 2005. {Online}. Available: http://www.gwu.edu/\~{}upc/publications/LBNL-59208.pdfGoogle Scholar
Oracle, "Project Frotress," 2011, https://projectfortress.java.net/.Google Scholar
PGAS, "PGAS - Partitioned Global Address Space," 2011, http://www.pgas.org.Google Scholar
S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cavé, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan, "Integrating asynchronous task parallelism with mpi." in IPDPS. IEEE Computer Society, 2013, pp. 712--725. {Online}. Available: http://dblp.uni-trier.de/db/conf/ipps/ipdps2013.html#ChatterjeeTBCCGSY13 Google ScholarDigital Library
Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 2.2. Stuttgart, Germany: High Performance Computing Center Stuttgart (HLRS), September 2009.Google Scholar
H. Kaiser, M. Brodowicz, and T. Sterling, "ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications," in Parallel Processing Workshops. Los Alamitos, CA, USA: IEEE Computer Society, 2009, pp. 394--401. Google ScholarDigital Library
T. Heller, H. Kaiser, A. Schäfer, and D. Fey, "Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers," in Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ser. ScalA '13. New York, NY, USA: ACM, 2013, pp. 1:1--1:8. {Online}. Available: http://doi.acm.org/10.1145/2530268.2530269 Google ScholarDigital Library
C. Dekate, M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach, and T. L. Sterling, "Improving the scalability of parallel N-body applications with an event driven constraint based execution model," The International Journal of High Performance Computing Applications, vol. abs/1109.5190, 2012, http://arxiv.org/abs/1109.5190. Google ScholarDigital Library
A. Tabbal, M. Anderson, M. Brodowicz, H. Kaiser, and T. Sterling, "Preliminary design examination of the ParalleX system from a software and hardware perspective," SIGMETRICS Performance Evaluation Review, vol. 38, p. 4, Mar 2011. Google ScholarDigital Library
M. Anderson, M. Brodowicz, H. Kaiser, and T. L. Sterling, "An application driven analysis of the ParalleX execution model," CoRR, vol. abs/1109.5201, 2011, http://arxiv.org/abs/1109.5201.Google Scholar
"InifiniBand Trade Association," 2014, http://www.infinibandta.org/.Google Scholar
A. Kopser and D. Vollrath, "Overview of the Next Generation Cray XMT," in Cray User Group Proceedings, 2011, pp. 1--10.Google Scholar
C. E. Leiserson, "The Cilk++ concurrency platform," in DAC '09: Proceedings of the 46th Annual Design Automation Conference. New York, NY, USA: ACM, 2009, pp. 522--527. {Online}. Available: http://dx.doi.org/10.1145/1629911.1630048 Google ScholarDigital Library
L. Dagum and R. Menon, "OpenMP: An Industry- Standard API for Shared-Memory Programming," IEEE Computational Science and Engineering, vol. 5, no. 1, pp. 46--55, 1998. Google ScholarDigital Library
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon, Parallel programming in OpenMP. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001. Google ScholarDigital Library
G. Papadopoulos and D. Culler, "Monsoon: An Explicit Token-Store Architecture," in 17th International Symposium on Computer Architecture, ser. ACM SIGARCH Computer Architecture News, no. 18(2). Seattle, Washington, May 28--31: ACM Digital Library, June 1990, pp. 82--91. Google ScholarDigital Library
J. B. Dennis, "First version of a data flow procedure language," in Symposium on Programming, 1974, pp. 362--376. Google ScholarDigital Library
PPL, "PPL - Parallel Programming Laboratory," 2011, http://charm.cs.uiuc.edu/.Google Scholar
"CppLINDA: C++ LINDA implementation," 2013, http://sourceforge.net/projects/cpplinda/.Google Scholar
D. W. Wall, "Messages as active agents," in Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ser. POPL '82. New York, NY, USA: ACM, 1982, pp. 34--39. {Online}. Available: http://doi.acm.org/10.1145/582153.582157 Google ScholarDigital Library
K. Yelick, V. Sarkar, J. Demmel, M. Erez, and D. Quinlan, "DEGAS: Dynamic Exascale Global Address Space," 2013, http://crd.lbl.gov/assets/Uploads/FTG/Projects/DEGAS/RetreatSummer13/DEGAS-Overview-Yelick-Retreat13.pdf.Google Scholar
H. C. Baker and C. Hewitt, "The incremental garbage collection of processes," in SIGART Bull. New York, NY, USA: ACM, August 1977, pp. 55--59. {Online}. Available: http://doi.acm.org/10.1145/872736.806932 Google ScholarDigital Library
D. P. Friedman and D. S. Wise, "CONS Should Not Evaluate its Arguments," in ICALP, 1976, pp. 257--284.Google Scholar
R. H. Halstead, Jr., "MULTILISP: A language for concurrent symbolic computation," ACM Trans. Program. Lang. Syst., vol. 7, pp. 501--538, October 1985. {Online}. Available: http://doi.acm.org/10.1145/4472.4478 Google ScholarDigital Library
J. B. Dennis and D. Misunas, "A Preliminary Architecture for a Basic Data-Flow Processor," in 25 Years ISCA: Retrospectives and Reprints, 1998, pp. 125--131. Google ScholarDigital Library
Arvind and R. Nikhil, "Executing a Program on the MIT Tagged-Token Dataflow Architecture"," in PARLE '87, Parallel Architectures and Languages Europe, Volume 2: Parallel Languages, J. W. de Bakker, A. J. Nijman, and P. C. Treleaven, Eds. Berlin, DE: Springer-Verlag, 1987, lecture Notes in Computer Science 259. Google ScholarDigital Library
P. J. Courtois, F. Heymans, and D. L. Parnas, "Concurrent control with "readers" and "writers"," Commun. ACM, vol. 14, no. 10, pp. 667--668, 1971. Google ScholarDigital Library
Vicente J. Botet Escriba, "N3865: More Improvements to std::future<T>," The C++ Standards Committee, Tech. Rep., 2014, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3865.pdf.Google Scholar
Chris Mysen and Niklas Gustafsson and Matt Austern and Jeffrey Yasskin, "N3785: Executors and schedulers, revision 3,", Tech. Rep., 2013, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3785.pdf.Google Scholar
A. Schïl¡fer and D. Fey, "LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes," in Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Berlin, Heidelberg: Springer, 2008, pp. 285--294. Google ScholarDigital Library
MetaScale, "NT2 -- High-performance MATLAB-inspired C++ framework," 2014, http://www.metascale.org/products/nt2.Google Scholar
Odeint, "Boost.Odeint -- a C++ Library for Solving ODEs," 2014, http://www.odeint.com.Google Scholar
R. F. Barrett, C. T. Vaughan, and M. A. Heroux, "Minighost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing," Sandia National Laboratories, Tech. Rep. SAND, vol. 5294832, 2011.Google Scholar
M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich, "Improving performance via mini-applications," Sandia National Laboratories, Tech. Rep. SAND2009-5574, 2009.Google Scholar
Texas Advanced Computing Center - Stampede. Http://www.tacc.utexas.edu/resources/hpc/stampede. {Online}. Available: http://www.tacc.utexas.edu/resources/hpc/stampedeGoogle Scholar
T. Heller, H. Kaiser, and K. Iglberger, "Application of the ParalleX Execution Model to Stencil-based Problems," in Proceedings of the International Supercomputing Conference ISC'12, Hamburg, Germany, 2012. {Online}. Available: http://stellar.cct.lsu.edu/pubs/isc2012.pdf Google ScholarDigital Library

Recommendations

A massively parallel distributed n-body application implemented with HPX
ScalA '16: Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

One of the major challenges in parallelization is the difficulty of improving application scalability with conventional techniques. HPX provides efficient scalable parallelism by significantly reducing node starvation and effective latencies while ...
Read More
Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers
ScalA '13: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

With the general availability of PetaFLOP clusters and the advent of heterogeneous machines equipped with special accelerator cards such as the Xeon Phi[2], computer scientist face the difficult task of improving application scalability beyond what is ...
Read More
Application of the ParalleX execution model to stencil-based problems

In the prospect of the upcoming exa-scale era with millions of execution units, the question of how to deal with this level of parallelism efficiently is of time-critical relevance. State-of-the-Art parallelization techniques such as OpenMP and MPI are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models
October 2014
199 pages
ISBN:9781450332477
DOI:10.1145/2676870
Conference Chair:
Allen D. Malony,
Program Chair:
Jeff Hammond
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 October 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Exascale
Global Address Space
High Performance Computing
Parallel Runtime Systems
Programming Models
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 180
  Total Citations
  View Citations
- 716
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HPX: A Task Based Programming Model in a Global Address Space

PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models

ABSTRACT

References

Cited By

Recommendations

A massively parallel distributed n-body application implemented with HPX

Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers

Application of the ParalleX execution model to stencil-based problems