skip to main content
10.1145/2063384.2063415acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems

Authors Info & Claims
Published:12 November 2011Publication History

ABSTRACT

The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions designed to improve multi-node parallel scaling, particle binning for improved load balance, GPU acceleration of key subroutines, and memory-centric optimizations to improve single-node scaling and reduce memory utilization. The new hybrid MPI-OpenMP and MPI-OpenMP-CUDA GTC versions achieve up to a 2x speedup over the production Fortran code on four parallel systems --- clusters based on the AMD Magny-Cours, Intel Nehalem-EP, IBM BlueGene/P, and NVIDIA Fermi architectures. Finally, strong scaling experiments provide insight into parallel scalability, memory utilization, and programmability trade-offs for large-scale gyrokinetic PIC simulations, while attaining a 1.6× speedup on 49,152 XE6 cores.

References

  1. M. Adams, S. Ethier, and N. Wichmann. Performance of particle in cell methods on highly concurrent computational architectures. Journal of Physics: Conference Series, 78:012001 (10pp), 2007.Google ScholarGoogle Scholar
  2. E. Akarsu, K. Dincer, T. Haupt, and G. Fox. Particle-in-cell simulation codes in High Performance Fortran. In Proc. ACM/IEEE Conference on Supercomputing (SC'96), page 38, Nov. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Bertschinger and J. Gelb. Cosmological N-body simulations. Computers in Physics, 5:164--175, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Bowers. Accelerating a particle-in-cell simulation using a hybrid counting sort. Journal of Computational Physics, 173(2):393--411, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bowers, B. Albright, B. Bergen, L. Yin, K. Barker, and D. Kerbyson. 0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner. In Proc. 2008 ACM/IEEE Conf. on Supercomputing, pages 1--11, Austin, TX, Nov. 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Briguglio, B. M. G. Fogaccia, and G. Vlad. Hierarchical MPI+OpenMP implementation of parallel PIC applications on clusters of Symmetric MultiProcessors. In Proc. Recent Advances in Parallel Virtual Machine and Message Passing Interface (Euro PVM/MPI), pages 180--187, Sep--Oct 1996.Google ScholarGoogle Scholar
  7. E. Carmona and L. Chandler. On parallel PIC versatility and the structure of parallel PIC approaches. Concurrency: Practice and Experience, 9(12):1377--1405, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Danalis, G. Marin, C. McCurdy, J. Meredith, P. Roth, K. Spafford, V. Tipparaju, and J. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU '10), pages 63--74. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. K. Decyk. UPIC: A framework for massively parallel particle-in-cell codes. Computer Physics Communications, 177(1--2):95--97, 2007.Google ScholarGoogle Scholar
  10. V. K. Decyk and T. V. Singh. Adaptable particle-in-cell algorithms for graphical processing units. Computer Physics Communications, 182(3):641--648, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Ethier, W. Tang, and Z. Lin. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. Journal of Physics: Conference Series, 16:1--15, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Ethier, W. Tang, R. Walkup, and L. Oliker. Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas. IBM Journal of Research and Development, 52(1--2):105--115, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Fonseca et al. OSIRIS: A three-dimensional, fully relativistic particle in cell code for modeling plasma based accelerators. In Proc. Int'l. Conference on Computational Science (ICCS '02), pages 342--351, Apr. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Hockney and J. Eastwood. Computer simulation using particles. Taylor & Francis, Inc., Bristol, PA, USA, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Huang et al. QUICKPIC: A highly efficient particle-in-cell code for modeling wakefield acceleration in plasmas. Journal of Computational Physics, 217(2):658--679, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. The ITER project. http://www.iter.org/.Google ScholarGoogle Scholar
  17. JET, the Joint European Torus. http://www.jet.efda.org/jet/, last accessed Apr 2011.Google ScholarGoogle Scholar
  18. A. Koniges et al. Application acceleration on current and future Cray platforms. In Proc. Cray User Group Meeting, May 2009.Google ScholarGoogle Scholar
  19. W. Lee. Gyrokinetic particle simulation model. Journal of Computational Physics, 72(1):243--269, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Lin, T. Hahm, W. Lee, W. Tang, and R. White. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science, 281(5384):1835--1837, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  21. K. Madduri, E. J. Im, K. Ibrahim, S. Williams, S. Ethier, and L. Oliker. Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Computing, 2011. in press, http://dx.doi.org/10.1016/j.parco.2011.02.001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Madduri, S. Williams, S. Ethier, L. Oliker, J. Shalf, E. Strohmaier, and K. Yelick. Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors. In Proc. ACM/IEEE Conf. on Supercomputing (SC 2009), pages 48:1--48:12, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Marin, G. Jin, and J. Mellor-Crummey. Managing locality in grand challenge applications: a case study of the gyrokinetic toroidal code. Journal of Physics: Conference Series, 125:012087 (6pp), 2008.Google ScholarGoogle Scholar
  24. J. D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  25. H. Nakashima, Y. Miyake, H. Usui, and Y. Omura. OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations. In Proc. 23rd International Conference on Supercomputing (ICS '09), pages 90--99, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Nieter and J. Cary. VORPAL: a versatile plasma simulation code. Journal of Computational Physics, 196(2):448--473, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Oliker, A. Canning, J. Carter, J. Shalf, and S. Ethier. Scientific computations on modern parallel vector systems. In Proc. 2004 ACM/IEEE Conf. on Supercomputing, page 10, Pittsburgh, PA, Nov. 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Stantchev, W. Dorland, and N. Gumerov. Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. Journal of Parallel and Distributed Computing, 68(10):1339--1349, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Top500 Supercomputer Sites. http://www.top500.org.Google ScholarGoogle Scholar
  1. Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
        November 2011
        866 pages
        ISBN:9781450307710
        DOI:10.1145/2063384

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 November 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SC '11 Paper Acceptance Rate74of352submissions,21%Overall Acceptance Rate1,516of6,373submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader