skip to main content
10.1145/2491661.2481434acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

An early prototype of an autonomic performance environment for exascale

Published:10 June 2013Publication History

ABSTRACT

Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.

References

  1. Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., and Tallent, N. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701. http://hpctoolkit.org/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson, M., Brodowicz, M., Kaiser, H., and Sterling, T. L. An Application Driven Analysis of the ParalleX Execution Model. CoRR abs/1109.5201 (2011). http://arxiv.org/abs/1109.5201.Google ScholarGoogle Scholar
  3. Baker, C., Davidson, G., Evans, T. M., Hamilton, S., Jarrell, J., and Joubert, W. High performance radiation transport simulations: preparing for titan. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Los Alamitos, CA, USA, 2012), SC '12, IEEE Computer Society Press, pp. 47:1--47:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Boost: a collection of free peer-reviewed portable C++ source libraries, 2011. http://www.boost.org/.Google ScholarGoogle Scholar
  5. Brightwell, R., and Pedretti, K. An intra-node implementation of OpenSHMEM using virtual address space mapping. In Proceedings of the Fifth Partitioned Global Address Space Conference (October 2011).Google ScholarGoogle Scholar
  6. Dongarra, J., London, K., Moore, S., Mucci, P., and Terpstra, D. Using PAPI for hardware performance monitoring on linux systems. In International Conference on Linux Clusters: The HPC Revolution (June 2001).Google ScholarGoogle Scholar
  7. Ethier, S., Tang, W. M., and Lin, Z. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. J. Phys: Conf. Ser.16 (2005).Google ScholarGoogle ScholarCross RefCross Ref
  8. Fowler, R., Cox, A., Elnikety, S., and Zwaenepoel, W. Using Performance Reflection in Systems Software. In Proceedings of USENIX Workshop on Hot Topics in Operating Systems (HOTOS IX) (Lihue, HI, Mar. 2003). Extended abstract. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gamblin, T., de Supinski, B., Schulz, M., Fowler, R., and Reed, D. Efficiently clustering performance data at massive scales. In Proceedings of the International Conference on Supercomputing 2010 (ICS2010) (Tsukuba, Japan, June 2010), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gamblin, T., de Supinski, B. R., Schultz, M., Fowler, R., and Reed, D. A. Scalable load-balance measurement for SPMD codes. In Proceedings of Supercomputing 2008 (Austin, TX, Nov. 2008), ACM/IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Heroux, M., Bartlett, R., Hoekstra, V. H. R., Hu, J., Kolda, T., Lehoucq, R., Long, K., Pawlowski, R., Phipps, E., Salinger, A., Thornquist, H., Tuminaro, R., Willenbring, J., and Williams, A. An Overview of Trilinos. Tech. Rep. SAND2003-2927, Sandia National Laboratories, 2003.Google ScholarGoogle Scholar
  12. Intel. Intel® ITT API open source version. http://software.intel.com/en-us/articles/intel-itt-api-open-source, 2013.Google ScholarGoogle Scholar
  13. Intel Corporation. Intel(R) Xeon(R) Processor 7500 Series Uncore Programming Guide, March 2010.Google ScholarGoogle Scholar
  14. Intel Corporation. Intel MIC. http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html, 2013.Google ScholarGoogle Scholar
  15. John Levon et al. OProfile. http://oprofile.sourceforge.net/. 14 April 2006.Google ScholarGoogle Scholar
  16. Kaiser, H., Adelstein-Lelbach, B., et al. HPX SVN repository, 2011. Available under a BSD-style open source license. Contact [email protected] for repository access.Google ScholarGoogle Scholar
  17. Kaiser, H., Brodowicz, M., and Sterling, T. ParalleX: An advanced parallel execution model for scaling-impaired applications. In Parallel Processing Workshops (Los Alamitos, CA, USA, 2009), IEEE Computer Society, pp. 394--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. Computer Architecture, International Symposium on 0 (2004), 64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lin, Z., Ethier, S., and Lewandowski, J. GTC: 3D Gyrokinetic Toroidal Code, 2012.Google ScholarGoogle Scholar
  20. Lin, Z., Hahm, T. S., Lee, W. W., Tang, W. M., and White, R. B. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science 281, 5384 (1998), 1835--1837.Google ScholarGoogle ScholarCross RefCross Ref
  21. Nvidia Corporation. The benefits of quad core CPUs in mobile devices. http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911a.pdf.Google ScholarGoogle Scholar
  22. Olivier, S., Porterfield, A., Wheeler, K., and Prins, J. Scheduling task parallelism on multi-socket multicore systems. In International Workshop on Runtime and Operating Systems for Supercomputers (Tuson, AZ, USA, June 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Open|SpeedShop. http://www.openspeedshop.org/wp/.Google ScholarGoogle Scholar
  24. Porterfield, A., Fowler, R., and Lim, M. Y. RCRTool design document; version 0.1. Tech. Rep. RENCI Technical Report TR-10-01, RENCI, 2010.Google ScholarGoogle Scholar
  25. Sandia National Laboratories. The Kitten Lightweight Kernel. https://software.sandia.gov/trac/kitten.Google ScholarGoogle Scholar
  26. Shende, S., and Malony, A. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2, Summer (2006), 287--311. ACTS Collection Special Issue. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shende, S., and Malony, A. D. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (Summer 2006), 287--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. STE||AR Group. Systems Technologies, Emerging Parallelism, and Algorithms Reseach, 2011. http://stellar.cct.lsu.edu.Google ScholarGoogle Scholar
  29. The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Tech. rep., ISO/IEC, 2011. http://www.open-std.org/jtc1/sc22/wg21.Google ScholarGoogle Scholar
  30. University Corporation for Atmospheric Research. Community Earth System Model (CESM). http://www.cesm.ucar.edu, 2013.Google ScholarGoogle Scholar
  31. University of Oregon. ACISS. http://aciss.uoregon.edu, 2013.Google ScholarGoogle Scholar

Index Terms

  1. An early prototype of an autonomic performance environment for exascale

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
          June 2013
          75 pages
          ISBN:9781450321464
          DOI:10.1145/2491661

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 June 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ROSS '13 Paper Acceptance Rate9of18submissions,50%Overall Acceptance Rate58of169submissions,34%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader