skip to main content
10.1145/2676870.2676883acmotherconferencesArticle/Chapter ViewAbstractPublication PagespgasConference Proceedingsconference-collections
research-article

HPX: A Task Based Programming Model in a Global Address Space

Authors Info & Claims
Published:06 October 2014Publication History

ABSTRACT

The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and resilient programming techniques that are portable across multiple future generations of machines. We believe that guaranteeing adequate scalability, programmability, performance portability, resilience, and energy efficiency requires a fundamentally new approach, combined with a transition path for existing scientific applications, to fully explore the rewards of todays and tomorrows systems. We present HPX -- a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource management. This provides a widely accepted API enabling programmability, composability and performance portability of user applications. By employing a global address space, we seamlessly augment the standard to apply to a distributed case. We present HPX's architecture, design decisions, and results selected from a diverse set of application runs showing superior performance, scalability, and efficiency over conventional practice.

References

  1. "X-Stack: Programming Challenges, Runtime Systems, and Tools, DoE-FOA-0000619," 2012, http://science.energy.gov//media/grants/pdf/foas/2012/SC_FOA_0000619.pdf.Google ScholarGoogle Scholar
  2. "The Qthread Library," 2014, http://www.cs.sandia.gov/qthreads/.Google ScholarGoogle Scholar
  3. K. Huck, S. Shende, A. Malony, H. Kaiser, A. Porterfield, R. Fowler, and R. Brightwell, "An early prototype of an autonomic performance environment for exascale," in Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ser. ROSS '13. New York, NY, USA: ACM, 2013, pp. 8:1--8:8. {Online}. Available: http://doi.acm.org/10.1145/2491661.2481434 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach, and T. L. Sterling, "Neutron star evolutions using tabulated equations of state with a new execution model," CoRR, vol. abs/1205.5055, 2012. {Online}. Available: http://dblp.uni-trier.de/db/journals/corr/corr1205.html#abs-1205-5055Google ScholarGoogle Scholar
  5. C. Dekate, H. Kaiser, M. Anderson, B. Adelstein-Lelbach, and T. Sterling, "N-Body SVN repository," 2011, available under a BSD-style open source license. Contact [email protected] for repository access. {Online}. Available: https: //svn.cct.lsu.edu/repos/projects/parallex/trunk/history/nbodyGoogle ScholarGoogle Scholar
  6. Intel, "Intel Thread Building Blocks 3.0," 2010, http://www.threadingbuildingblocks.org.Google ScholarGoogle Scholar
  7. Microsoft, "Microsoft Parallel Pattern Library," 2010, http://msdn.microsoft.com/en-us/library/dd492418.aspx.Google ScholarGoogle Scholar
  8. "StarPU - A Unified Runtime System for Heterogeneous Multicore Architectures," 2013, http://runtime.bordeaux.inria.fr/StarPU/.Google ScholarGoogle Scholar
  9. "Intel(R) Cilk(tm) Plus," 2014, http://software.intel.com/en-us/intel-cilk-plus.Google ScholarGoogle Scholar
  10. "OpenMP Specifications," 2013, http://openmp.org/wp/openmp-specifications/.Google ScholarGoogle Scholar
  11. B. L. Chamberlain, D. Callahan, and H. P. Zima, "Parallel programmability and the Chapel language," International Journal of High Performance Computing Applications, vol. 21, pp. 291--312, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. "Intel SPMD Program Compiler," 2011-2012, http://ispc.github.io/.Google ScholarGoogle Scholar
  13. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: An object-oriented approach to non- uniform cluster computing," SIGPLAN Not., vol. 40, pp. 519--538, October 2005. {Online}. Available: http://doi.acm.org/10.1145/1103845.1094852 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. The C++ Standards Committee, "ISO/IEC 14882:2011, Standard for Programming Language C++,", Tech. Rep., 2011, http://www.open-std.org/jtc1/sc22/wg21.Google ScholarGoogle Scholar
  15. The C++ Standards Committee, "N3797: Working Draft, Standard for Programming Language C++," Tech. Rep., 2013, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf.Google ScholarGoogle Scholar
  16. Niklas Gustafsson and Artur Laksberg and Herb Sutter and Sana Mithani, "N3857: Improvements to std::future<T> and Related APIs," The C++ Standards Committee, Tech. Rep., 2014, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf.Google ScholarGoogle Scholar
  17. "The OmpSs Programming Model," 2013, https://pm.bsc.es/ompss.Google ScholarGoogle Scholar
  18. "OpenACC - Directives for Accelerators," 2013, http://www.openacc-standard.org/.Google ScholarGoogle Scholar
  19. "C++ AMP (C++ Accelerated Massive Parallelism)," 2013, http://msdn.microsoft.com/en-us/library/hh265137.aspx.Google ScholarGoogle Scholar
  20. "CUDA," 2013, http://www.nvidia.com/object/cuda_home_new.html.Google ScholarGoogle Scholar
  21. "OpenCL - The open standard for parallel programming of heterogeneous systems," 2013, https://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  22. UPC Consortium, "UPC Language Specifications, v1.2," Lawrence Berkeley National Lab, Tech Report LBNL-59208, 2005. {Online}. Available: http://www.gwu.edu/\~{}upc/publications/LBNL-59208.pdfGoogle ScholarGoogle Scholar
  23. Oracle, "Project Frotress," 2011, https://projectfortress.java.net/.Google ScholarGoogle Scholar
  24. PGAS, "PGAS - Partitioned Global Address Space," 2011, http://www.pgas.org.Google ScholarGoogle Scholar
  25. S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cavé, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan, "Integrating asynchronous task parallelism with mpi." in IPDPS. IEEE Computer Society, 2013, pp. 712--725. {Online}. Available: http://dblp.uni-trier.de/db/conf/ipps/ipdps2013.html#ChatterjeeTBCCGSY13 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 2.2. Stuttgart, Germany: High Performance Computing Center Stuttgart (HLRS), September 2009.Google ScholarGoogle Scholar
  27. H. Kaiser, M. Brodowicz, and T. Sterling, "ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications," in Parallel Processing Workshops. Los Alamitos, CA, USA: IEEE Computer Society, 2009, pp. 394--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Heller, H. Kaiser, A. Schäfer, and D. Fey, "Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers," in Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ser. ScalA '13. New York, NY, USA: ACM, 2013, pp. 1:1--1:8. {Online}. Available: http://doi.acm.org/10.1145/2530268.2530269 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Dekate, M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach, and T. L. Sterling, "Improving the scalability of parallel N-body applications with an event driven constraint based execution model," The International Journal of High Performance Computing Applications, vol. abs/1109.5190, 2012, http://arxiv.org/abs/1109.5190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Tabbal, M. Anderson, M. Brodowicz, H. Kaiser, and T. Sterling, "Preliminary design examination of the ParalleX system from a software and hardware perspective," SIGMETRICS Performance Evaluation Review, vol. 38, p. 4, Mar 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Anderson, M. Brodowicz, H. Kaiser, and T. L. Sterling, "An application driven analysis of the ParalleX execution model," CoRR, vol. abs/1109.5201, 2011, http://arxiv.org/abs/1109.5201.Google ScholarGoogle Scholar
  32. "InifiniBand Trade Association," 2014, http://www.infinibandta.org/.Google ScholarGoogle Scholar
  33. A. Kopser and D. Vollrath, "Overview of the Next Generation Cray XMT," in Cray User Group Proceedings, 2011, pp. 1--10.Google ScholarGoogle Scholar
  34. C. E. Leiserson, "The Cilk++ concurrency platform," in DAC '09: Proceedings of the 46th Annual Design Automation Conference. New York, NY, USA: ACM, 2009, pp. 522--527. {Online}. Available: http://dx.doi.org/10.1145/1629911.1630048 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Dagum and R. Menon, "OpenMP: An Industry- Standard API for Shared-Memory Programming," IEEE Computational Science and Engineering, vol. 5, no. 1, pp. 46--55, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon, Parallel programming in OpenMP. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Papadopoulos and D. Culler, "Monsoon: An Explicit Token-Store Architecture," in 17th International Symposium on Computer Architecture, ser. ACM SIGARCH Computer Architecture News, no. 18(2). Seattle, Washington, May 28--31: ACM Digital Library, June 1990, pp. 82--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. B. Dennis, "First version of a data flow procedure language," in Symposium on Programming, 1974, pp. 362--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. PPL, "PPL - Parallel Programming Laboratory," 2011, http://charm.cs.uiuc.edu/.Google ScholarGoogle Scholar
  40. "CppLINDA: C++ LINDA implementation," 2013, http://sourceforge.net/projects/cpplinda/.Google ScholarGoogle Scholar
  41. D. W. Wall, "Messages as active agents," in Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ser. POPL '82. New York, NY, USA: ACM, 1982, pp. 34--39. {Online}. Available: http://doi.acm.org/10.1145/582153.582157 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Yelick, V. Sarkar, J. Demmel, M. Erez, and D. Quinlan, "DEGAS: Dynamic Exascale Global Address Space," 2013, http://crd.lbl.gov/assets/Uploads/FTG/Projects/DEGAS/RetreatSummer13/DEGAS-Overview-Yelick-Retreat13.pdf.Google ScholarGoogle Scholar
  43. H. C. Baker and C. Hewitt, "The incremental garbage collection of processes," in SIGART Bull. New York, NY, USA: ACM, August 1977, pp. 55--59. {Online}. Available: http://doi.acm.org/10.1145/872736.806932 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. P. Friedman and D. S. Wise, "CONS Should Not Evaluate its Arguments," in ICALP, 1976, pp. 257--284.Google ScholarGoogle Scholar
  45. R. H. Halstead, Jr., "MULTILISP: A language for concurrent symbolic computation," ACM Trans. Program. Lang. Syst., vol. 7, pp. 501--538, October 1985. {Online}. Available: http://doi.acm.org/10.1145/4472.4478 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. B. Dennis and D. Misunas, "A Preliminary Architecture for a Basic Data-Flow Processor," in 25 Years ISCA: Retrospectives and Reprints, 1998, pp. 125--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Arvind and R. Nikhil, "Executing a Program on the MIT Tagged-Token Dataflow Architecture"," in PARLE '87, Parallel Architectures and Languages Europe, Volume 2: Parallel Languages, J. W. de Bakker, A. J. Nijman, and P. C. Treleaven, Eds. Berlin, DE: Springer-Verlag, 1987, lecture Notes in Computer Science 259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. P. J. Courtois, F. Heymans, and D. L. Parnas, "Concurrent control with "readers" and "writers"," Commun. ACM, vol. 14, no. 10, pp. 667--668, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vicente J. Botet Escriba, "N3865: More Improvements to std::future<T>," The C++ Standards Committee, Tech. Rep., 2014, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3865.pdf.Google ScholarGoogle Scholar
  50. Chris Mysen and Niklas Gustafsson and Matt Austern and Jeffrey Yasskin, "N3785: Executors and schedulers, revision 3,", Tech. Rep., 2013, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3785.pdf.Google ScholarGoogle Scholar
  51. A. Schïl¡fer and D. Fey, "LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes," in Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Berlin, Heidelberg: Springer, 2008, pp. 285--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. MetaScale, "NT2 -- High-performance MATLAB-inspired C++ framework," 2014, http://www.metascale.org/products/nt2.Google ScholarGoogle Scholar
  53. Odeint, "Boost.Odeint -- a C++ Library for Solving ODEs," 2014, http://www.odeint.com.Google ScholarGoogle Scholar
  54. R. F. Barrett, C. T. Vaughan, and M. A. Heroux, "Minighost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing," Sandia National Laboratories, Tech. Rep. SAND, vol. 5294832, 2011.Google ScholarGoogle Scholar
  55. M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich, "Improving performance via mini-applications," Sandia National Laboratories, Tech. Rep. SAND2009-5574, 2009.Google ScholarGoogle Scholar
  56. Texas Advanced Computing Center - Stampede. Http://www.tacc.utexas.edu/resources/hpc/stampede. {Online}. Available: http://www.tacc.utexas.edu/resources/hpc/stampedeGoogle ScholarGoogle Scholar
  57. T. Heller, H. Kaiser, and K. Iglberger, "Application of the ParalleX Execution Model to Stencil-based Problems," in Proceedings of the International Supercomputing Conference ISC'12, Hamburg, Germany, 2012. {Online}. Available: http://stellar.cct.lsu.edu/pubs/isc2012.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models
    October 2014
    199 pages
    ISBN:9781450332477
    DOI:10.1145/2676870

    Copyright © 2014 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 October 2014

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader