ABSTRACT
Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.
- Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., and Tallent, N. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701. http://hpctoolkit.org/. Google ScholarDigital Library
- Anderson, M., Brodowicz, M., Kaiser, H., and Sterling, T. L. An Application Driven Analysis of the ParalleX Execution Model. CoRR abs/1109.5201 (2011). http://arxiv.org/abs/1109.5201.Google Scholar
- Baker, C., Davidson, G., Evans, T. M., Hamilton, S., Jarrell, J., and Joubert, W. High performance radiation transport simulations: preparing for titan. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Los Alamitos, CA, USA, 2012), SC '12, IEEE Computer Society Press, pp. 47:1--47:10. Google ScholarDigital Library
- Boost: a collection of free peer-reviewed portable C++ source libraries, 2011. http://www.boost.org/.Google Scholar
- Brightwell, R., and Pedretti, K. An intra-node implementation of OpenSHMEM using virtual address space mapping. In Proceedings of the Fifth Partitioned Global Address Space Conference (October 2011).Google Scholar
- Dongarra, J., London, K., Moore, S., Mucci, P., and Terpstra, D. Using PAPI for hardware performance monitoring on linux systems. In International Conference on Linux Clusters: The HPC Revolution (June 2001).Google Scholar
- Ethier, S., Tang, W. M., and Lin, Z. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. J. Phys: Conf. Ser.16 (2005).Google ScholarCross Ref
- Fowler, R., Cox, A., Elnikety, S., and Zwaenepoel, W. Using Performance Reflection in Systems Software. In Proceedings of USENIX Workshop on Hot Topics in Operating Systems (HOTOS IX) (Lihue, HI, Mar. 2003). Extended abstract. Google ScholarDigital Library
- Gamblin, T., de Supinski, B., Schulz, M., Fowler, R., and Reed, D. Efficiently clustering performance data at massive scales. In Proceedings of the International Conference on Supercomputing 2010 (ICS2010) (Tsukuba, Japan, June 2010), ACM. Google ScholarDigital Library
- Gamblin, T., de Supinski, B. R., Schultz, M., Fowler, R., and Reed, D. A. Scalable load-balance measurement for SPMD codes. In Proceedings of Supercomputing 2008 (Austin, TX, Nov. 2008), ACM/IEEE. Google ScholarDigital Library
- Heroux, M., Bartlett, R., Hoekstra, V. H. R., Hu, J., Kolda, T., Lehoucq, R., Long, K., Pawlowski, R., Phipps, E., Salinger, A., Thornquist, H., Tuminaro, R., Willenbring, J., and Williams, A. An Overview of Trilinos. Tech. Rep. SAND2003-2927, Sandia National Laboratories, 2003.Google Scholar
- Intel. Intel® ITT API open source version. http://software.intel.com/en-us/articles/intel-itt-api-open-source, 2013.Google Scholar
- Intel Corporation. Intel(R) Xeon(R) Processor 7500 Series Uncore Programming Guide, March 2010.Google Scholar
- Intel Corporation. Intel MIC. http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html, 2013.Google Scholar
- John Levon et al. OProfile. http://oprofile.sourceforge.net/. 14 April 2006.Google Scholar
- Kaiser, H., Adelstein-Lelbach, B., et al. HPX SVN repository, 2011. Available under a BSD-style open source license. Contact [email protected] for repository access.Google Scholar
- Kaiser, H., Brodowicz, M., and Sterling, T. ParalleX: An advanced parallel execution model for scaling-impaired applications. In Parallel Processing Workshops (Los Alamitos, CA, USA, 2009), IEEE Computer Society, pp. 394--401. Google ScholarDigital Library
- Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. Computer Architecture, International Symposium on 0 (2004), 64. Google ScholarDigital Library
- Lin, Z., Ethier, S., and Lewandowski, J. GTC: 3D Gyrokinetic Toroidal Code, 2012.Google Scholar
- Lin, Z., Hahm, T. S., Lee, W. W., Tang, W. M., and White, R. B. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science 281, 5384 (1998), 1835--1837.Google ScholarCross Ref
- Nvidia Corporation. The benefits of quad core CPUs in mobile devices. http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911a.pdf.Google Scholar
- Olivier, S., Porterfield, A., Wheeler, K., and Prins, J. Scheduling task parallelism on multi-socket multicore systems. In International Workshop on Runtime and Operating Systems for Supercomputers (Tuson, AZ, USA, June 2011). Google ScholarDigital Library
- Open|SpeedShop. http://www.openspeedshop.org/wp/.Google Scholar
- Porterfield, A., Fowler, R., and Lim, M. Y. RCRTool design document; version 0.1. Tech. Rep. RENCI Technical Report TR-10-01, RENCI, 2010.Google Scholar
- Sandia National Laboratories. The Kitten Lightweight Kernel. https://software.sandia.gov/trac/kitten.Google Scholar
- Shende, S., and Malony, A. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2, Summer (2006), 287--311. ACTS Collection Special Issue. Google ScholarDigital Library
- Shende, S., and Malony, A. D. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (Summer 2006), 287--331. Google ScholarDigital Library
- STE||AR Group. Systems Technologies, Emerging Parallelism, and Algorithms Reseach, 2011. http://stellar.cct.lsu.edu.Google Scholar
- The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Tech. rep., ISO/IEC, 2011. http://www.open-std.org/jtc1/sc22/wg21.Google Scholar
- University Corporation for Atmospheric Research. Community Earth System Model (CESM). http://www.cesm.ucar.edu, 2013.Google Scholar
- University of Oregon. ACISS. http://aciss.uoregon.edu, 2013.Google Scholar
Index Terms
- An early prototype of an autonomic performance environment for exascale
Recommendations
An Autonomic Performance Environment for Exascale
Exascale systems will require new approaches to performance observation, analysis, and runtime decision-making to optimize for performance and efficiency. The standard "first-person" model, in which multiple operating system processes and threads ...
Performance at Exascale
Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components. Deep software hierarchies of large, complex software components will be required to make use of such systems. While the software layers ...
Enabling Autonomic Meta-Scheduling in Grid Environments
ICAC '08: Proceedings of the 2008 International Conference on Autonomic ComputingGrid computing supports workload execution on computing resources that are shared across a set of collaborative organizations. At the core of workload management for Grid computing is a software component, called meta-scheduler or Grid resource broker, ...
Comments