research-article

An early prototype of an autonomic performance environment for exascale

Authors:
Kevin Huck

University Of Oregon, Eugene, Oregon

University Of Oregon, Eugene, Oregon
View Profile

,
Sameer Shende

University Of Oregon, Eugene, Oregon

University Of Oregon, Eugene, Oregon
View Profile

,
Allen Malony

University Of Oregon, Eugene, Oregon

University Of Oregon, Eugene, Oregon
View Profile

,
Hartmut Kaiser

Louisiana State University, Baton Rouge, LA

Louisiana State University, Baton Rouge, LA
View Profile

,
Allan Porterfield

RENCI, Chapel Hill, NC

RENCI, Chapel Hill, NC
View Profile

,
Rob Fowler

RENCI, Chapel Hill, NC

RENCI, Chapel Hill, NC
View Profile

,
Ron Brightwell

Sandia National Labs, Albuquerque, NM

Sandia National Labs, Albuquerque, NM
View Profile

ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for SupercomputersJune 2013Article No.: 8Pages 1–8https://doi.org/10.1145/2491661.2481434

Published:10 June 2013Publication History

ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers

Pages 1–8

ABSTRACT

Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.

References

Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., and Tallent, N. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701. http://hpctoolkit.org/. Google ScholarDigital Library
Anderson, M., Brodowicz, M., Kaiser, H., and Sterling, T. L. An Application Driven Analysis of the ParalleX Execution Model. CoRR abs/1109.5201 (2011). http://arxiv.org/abs/1109.5201.Google Scholar
Baker, C., Davidson, G., Evans, T. M., Hamilton, S., Jarrell, J., and Joubert, W. High performance radiation transport simulations: preparing for titan. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Los Alamitos, CA, USA, 2012), SC '12, IEEE Computer Society Press, pp. 47:1--47:10. Google ScholarDigital Library
Boost: a collection of free peer-reviewed portable C++ source libraries, 2011. http://www.boost.org/.Google Scholar
Brightwell, R., and Pedretti, K. An intra-node implementation of OpenSHMEM using virtual address space mapping. In Proceedings of the Fifth Partitioned Global Address Space Conference (October 2011).Google Scholar
Dongarra, J., London, K., Moore, S., Mucci, P., and Terpstra, D. Using PAPI for hardware performance monitoring on linux systems. In International Conference on Linux Clusters: The HPC Revolution (June 2001).Google Scholar
Ethier, S., Tang, W. M., and Lin, Z. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. J. Phys: Conf. Ser.16 (2005).Google ScholarCross Ref
Fowler, R., Cox, A., Elnikety, S., and Zwaenepoel, W. Using Performance Reflection in Systems Software. In Proceedings of USENIX Workshop on Hot Topics in Operating Systems (HOTOS IX) (Lihue, HI, Mar. 2003). Extended abstract. Google ScholarDigital Library
Gamblin, T., de Supinski, B., Schulz, M., Fowler, R., and Reed, D. Efficiently clustering performance data at massive scales. In Proceedings of the International Conference on Supercomputing 2010 (ICS2010) (Tsukuba, Japan, June 2010), ACM. Google ScholarDigital Library
Gamblin, T., de Supinski, B. R., Schultz, M., Fowler, R., and Reed, D. A. Scalable load-balance measurement for SPMD codes. In Proceedings of Supercomputing 2008 (Austin, TX, Nov. 2008), ACM/IEEE. Google ScholarDigital Library
Heroux, M., Bartlett, R., Hoekstra, V. H. R., Hu, J., Kolda, T., Lehoucq, R., Long, K., Pawlowski, R., Phipps, E., Salinger, A., Thornquist, H., Tuminaro, R., Willenbring, J., and Williams, A. An Overview of Trilinos. Tech. Rep. SAND2003-2927, Sandia National Laboratories, 2003.Google Scholar
Intel. Intel® ITT API open source version. http://software.intel.com/en-us/articles/intel-itt-api-open-source, 2013.Google Scholar
Intel Corporation. Intel(R) Xeon(R) Processor 7500 Series Uncore Programming Guide, March 2010.Google Scholar
Intel Corporation. Intel MIC. http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html, 2013.Google Scholar
John Levon et al. OProfile. http://oprofile.sourceforge.net/. 14 April 2006.Google Scholar
Kaiser, H., Adelstein-Lelbach, B., et al. HPX SVN repository, 2011. Available under a BSD-style open source license. Contact [email protected] for repository access.Google Scholar
Kaiser, H., Brodowicz, M., and Sterling, T. ParalleX: An advanced parallel execution model for scaling-impaired applications. In Parallel Processing Workshops (Los Alamitos, CA, USA, 2009), IEEE Computer Society, pp. 394--401. Google ScholarDigital Library
Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. Computer Architecture, International Symposium on 0 (2004), 64. Google ScholarDigital Library
Lin, Z., Ethier, S., and Lewandowski, J. GTC: 3D Gyrokinetic Toroidal Code, 2012.Google Scholar
Lin, Z., Hahm, T. S., Lee, W. W., Tang, W. M., and White, R. B. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science 281, 5384 (1998), 1835--1837.Google ScholarCross Ref
Nvidia Corporation. The benefits of quad core CPUs in mobile devices. http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911a.pdf.Google Scholar
Olivier, S., Porterfield, A., Wheeler, K., and Prins, J. Scheduling task parallelism on multi-socket multicore systems. In International Workshop on Runtime and Operating Systems for Supercomputers (Tuson, AZ, USA, June 2011). Google ScholarDigital Library
Open|SpeedShop. http://www.openspeedshop.org/wp/.Google Scholar
Porterfield, A., Fowler, R., and Lim, M. Y. RCRTool design document; version 0.1. Tech. Rep. RENCI Technical Report TR-10-01, RENCI, 2010.Google Scholar
Sandia National Laboratories. The Kitten Lightweight Kernel. https://software.sandia.gov/trac/kitten.Google Scholar
Shende, S., and Malony, A. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2, Summer (2006), 287--311. ACTS Collection Special Issue. Google ScholarDigital Library
Shende, S., and Malony, A. D. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (Summer 2006), 287--331. Google ScholarDigital Library
STE||AR Group. Systems Technologies, Emerging Parallelism, and Algorithms Reseach, 2011. http://stellar.cct.lsu.edu.Google Scholar
The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Tech. rep., ISO/IEC, 2011. http://www.open-std.org/jtc1/sc22/wg21.Google Scholar
University Corporation for Atmospheric Research. Community Earth System Model (CESM). http://www.cesm.ucar.edu, 2013.Google Scholar
University of Oregon. ACISS. http://aciss.uoregon.edu, 2013.Google Scholar

Index Terms

An early prototype of an autonomic performance environment for exascale

Recommendations

An Autonomic Performance Environment for Exascale

Exascale systems will require new approaches to performance observation, analysis, and runtime decision-making to optimize for performance and efficiency. The standard "first-person" model, in which multiple operating system processes and threads ...
Read More
Performance at Exascale

Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components. Deep software hierarchies of large, complex software components will be required to make use of such systems. While the software layers ...
Read More
Enabling Autonomic Meta-Scheduling in Grid Environments
ICAC '08: Proceedings of the 2008 International Conference on Autonomic Computing

Grid computing supports workload execution on computing resources that are shared across a set of collaborative organizations. At the core of workload management for Grid computing is a software component, called meta-scheduler or Grid resource broker, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
June 2013
75 pages
ISBN:9781450321464
DOI:10.1145/2491661
Conference Chairs:
Torsten Hoefler
ETH Zurich, Switzerland
,
Kamil Iskra
Argonne National Laboratory
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
ROSS '13 Paper Acceptance Rate9of18submissions,50%Overall Acceptance Rate58of169submissions,34%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 168
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An early prototype of an autonomic performance environment for exascale

ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Autonomic Performance Environment for Exascale

Performance at Exascale

Enabling Autonomic Meta-Scheduling in Grid Environments