ABSTRACT
The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions designed to improve multi-node parallel scaling, particle binning for improved load balance, GPU acceleration of key subroutines, and memory-centric optimizations to improve single-node scaling and reduce memory utilization. The new hybrid MPI-OpenMP and MPI-OpenMP-CUDA GTC versions achieve up to a 2x speedup over the production Fortran code on four parallel systems --- clusters based on the AMD Magny-Cours, Intel Nehalem-EP, IBM BlueGene/P, and NVIDIA Fermi architectures. Finally, strong scaling experiments provide insight into parallel scalability, memory utilization, and programmability trade-offs for large-scale gyrokinetic PIC simulations, while attaining a 1.6× speedup on 49,152 XE6 cores.
- M. Adams, S. Ethier, and N. Wichmann. Performance of particle in cell methods on highly concurrent computational architectures. Journal of Physics: Conference Series, 78:012001 (10pp), 2007.Google Scholar
- E. Akarsu, K. Dincer, T. Haupt, and G. Fox. Particle-in-cell simulation codes in High Performance Fortran. In Proc. ACM/IEEE Conference on Supercomputing (SC'96), page 38, Nov. 1996. Google ScholarDigital Library
- E. Bertschinger and J. Gelb. Cosmological N-body simulations. Computers in Physics, 5:164--175, 1991.Google ScholarCross Ref
- K. Bowers. Accelerating a particle-in-cell simulation using a hybrid counting sort. Journal of Computational Physics, 173(2):393--411, 2001. Google ScholarDigital Library
- K. Bowers, B. Albright, B. Bergen, L. Yin, K. Barker, and D. Kerbyson. 0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner. In Proc. 2008 ACM/IEEE Conf. on Supercomputing, pages 1--11, Austin, TX, Nov. 2008. IEEE Press. Google ScholarDigital Library
- S. Briguglio, B. M. G. Fogaccia, and G. Vlad. Hierarchical MPI+OpenMP implementation of parallel PIC applications on clusters of Symmetric MultiProcessors. In Proc. Recent Advances in Parallel Virtual Machine and Message Passing Interface (Euro PVM/MPI), pages 180--187, Sep--Oct 1996.Google Scholar
- E. Carmona and L. Chandler. On parallel PIC versatility and the structure of parallel PIC approaches. Concurrency: Practice and Experience, 9(12):1377--1405, 1998.Google ScholarCross Ref
- A. Danalis, G. Marin, C. McCurdy, J. Meredith, P. Roth, K. Spafford, V. Tipparaju, and J. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU '10), pages 63--74. ACM, 2010. Google ScholarDigital Library
- V. K. Decyk. UPIC: A framework for massively parallel particle-in-cell codes. Computer Physics Communications, 177(1--2):95--97, 2007.Google Scholar
- V. K. Decyk and T. V. Singh. Adaptable particle-in-cell algorithms for graphical processing units. Computer Physics Communications, 182(3):641--648, 2011.Google ScholarCross Ref
- S. Ethier, W. Tang, and Z. Lin. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. Journal of Physics: Conference Series, 16:1--15, 2005.Google ScholarCross Ref
- S. Ethier, W. Tang, R. Walkup, and L. Oliker. Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas. IBM Journal of Research and Development, 52(1--2):105--115, 2008. Google ScholarDigital Library
- R. Fonseca et al. OSIRIS: A three-dimensional, fully relativistic particle in cell code for modeling plasma based accelerators. In Proc. Int'l. Conference on Computational Science (ICCS '02), pages 342--351, Apr. 2002. Google ScholarDigital Library
- R. Hockney and J. Eastwood. Computer simulation using particles. Taylor & Francis, Inc., Bristol, PA, USA, 1988. Google ScholarDigital Library
- C. Huang et al. QUICKPIC: A highly efficient particle-in-cell code for modeling wakefield acceleration in plasmas. Journal of Computational Physics, 217(2):658--679, 2006. Google ScholarDigital Library
- The ITER project. http://www.iter.org/.Google Scholar
- JET, the Joint European Torus. http://www.jet.efda.org/jet/, last accessed Apr 2011.Google Scholar
- A. Koniges et al. Application acceleration on current and future Cray platforms. In Proc. Cray User Group Meeting, May 2009.Google Scholar
- W. Lee. Gyrokinetic particle simulation model. Journal of Computational Physics, 72(1):243--269, 1987. Google ScholarDigital Library
- Z. Lin, T. Hahm, W. Lee, W. Tang, and R. White. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science, 281(5384):1835--1837, 1998.Google ScholarCross Ref
- K. Madduri, E. J. Im, K. Ibrahim, S. Williams, S. Ethier, and L. Oliker. Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Computing, 2011. in press, http://dx.doi.org/10.1016/j.parco.2011.02.001. Google ScholarDigital Library
- K. Madduri, S. Williams, S. Ethier, L. Oliker, J. Shalf, E. Strohmaier, and K. Yelick. Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors. In Proc. ACM/IEEE Conf. on Supercomputing (SC 2009), pages 48:1--48:12, Nov. 2009. Google ScholarDigital Library
- G. Marin, G. Jin, and J. Mellor-Crummey. Managing locality in grand challenge applications: a case study of the gyrokinetic toroidal code. Journal of Physics: Conference Series, 125:012087 (6pp), 2008.Google Scholar
- J. D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.Google Scholar
- H. Nakashima, Y. Miyake, H. Usui, and Y. Omura. OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations. In Proc. 23rd International Conference on Supercomputing (ICS '09), pages 90--99, June 2009. Google ScholarDigital Library
- C. Nieter and J. Cary. VORPAL: a versatile plasma simulation code. Journal of Computational Physics, 196(2):448--473, 2004. Google ScholarDigital Library
- L. Oliker, A. Canning, J. Carter, J. Shalf, and S. Ethier. Scientific computations on modern parallel vector systems. In Proc. 2004 ACM/IEEE Conf. on Supercomputing, page 10, Pittsburgh, PA, Nov. 2004. IEEE Computer Society. Google ScholarDigital Library
- G. Stantchev, W. Dorland, and N. Gumerov. Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. Journal of Parallel and Distributed Computing, 68(10):1339--1349, 2008. Google ScholarDigital Library
- Top500 Supercomputer Sites. http://www.top500.org.Google Scholar
- Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems
Recommendations
Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms
The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This work presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce ...
Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms
The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient ...
GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP
Accelerator Programming Using DirectivesAbstractGPU offloading of a large-scale gyrokinetic particle-in-cell Fortran code is converted from using OpenACC to using OpenMP. Particle pushing and deposition are completely offloaded to GPU. Performance is compared between CPU and GPU, and between ...
Comments