Editorial Notes
The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on March 14, 2022. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.
ABSTRACT
The cloud microphysics scheme, CASIM, and the radiation scheme, SOCRATES, are two computationally intensive parts within the Met Office's Unified Model (UM). This study enables CASIM and SOCRATES to use accelerated multi-core systems for optimal computational performance of the UM. Using profiling to guide our efforts, we refactored the code for optimal threading and kernel arrangement and implemented OpenACC directives manually or through the CLAW source-to-source translator. Initial porting results achieved 10.02x and 9.25x speedup in CASIM and SOCRATES respectively on 1 GPU compared with 1 CPU core. A granular performance analysis of the strategy and bottlenecks are discussed. These improvements will enable UM to run on heterogeneous computers and a path forward for further improvements is provided.
Supplemental Material
Available for Download
Version of Record for "Progress towards accelerating the unified model on hybrid multi-core systems" by Zhang et al., Proceedings of the Platform for Advanced Scientific Computing Conference (PASC '21).
- Michail Alvanos and Theodoros Christoudias. 2019. Accelerating Atmospheric Chemical Kinetics for Climate Simulations. IEEE Transactions on Parallel and Distributed Systems 30, 11 (2019), 2396--2407.Google ScholarDigital Library
- Nick Brown, Alexandr Nigay, Michèle Weiland, Adrian Hill, and Ben Shipway. 2020. Porting the microphysics model CASIM to GPU and KNL Cray machines. arXiv preprint arXiv:2010.14823 (2020).Google Scholar
- Imen Chakroun, Mohand Mezmaz, Nouredine Melab, and Ahcene Bendjoudi. 2013. Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm. Concurrency and Computation: Practice and Experience 25, 8 (2013), 1121--1136.Google ScholarCross Ref
- Valentin Clement, Sylvaine Ferrachat, Oliver Fuhrer, Xavier Lapillonne, Carlos E. Osuna, Robert Pincus, Jon Rood, and William Sawyer. 2018. The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models. In Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC '18. ACM Press, Basel, Switzerland, 1--10. Google ScholarDigital Library
- Valentin Clement, Philippe Marti, Xavier Lapillonne, Oliver Fuhrer, and William Sawyer. 2019. Automatic Port to OpenACC/OpenMP for Physical Parameterization in Climate and Weather Code Using the CLAW Compiler. Supercomputing Frontiers and Innovations 6, 3 (2019), 51--63. https://superfri.org/superfri/article/view/285Google Scholar
- Irina Demeshko, Naoya Maruyama, Hirofumi Tomita, and Satoshi Matsuoka. 2012. Multi-GPU implementation of the NICAM atmospheric model. In European Conference on Parallel Processing. Springer, 175--184.Google Scholar
- J. M. Edwards and A. Slingo. 1996. Studies with a flexible new radiation code. I: Choosing a configuration for a large-scale model. Quarterly Journal of the Royal Meteorological Society 122, 531 (April 1996), 689--719. Google ScholarCross Ref
- Katherine J Evans, Richard K Archibald, David J Gardner, Matthew R Norman, Mark A Taylor, Carol S Woodward, and Patrick H Worley. 2019. Performance analysis of fully explicit and fully implicit solvers within a spectral element shallow-water atmosphere model. The International Journal of High Performance Computing Applications 33, 2 (March 2019), 268--284. Google ScholarDigital Library
- Jinrong Jiang, Pengfei Lin, Joey Wang, Hailong Liu, Xuebin Chi, Huiqun Hao, Yuzhu Wang, Wu Wang, and Linghan Zhang. 2019. Porting LASG/IAP Climate System Ocean Model to Gpus Using OpenAcc. IEEE Access 7 (2019), 154490--154501.Google ScholarCross Ref
- Jae Youp Kim, Ji-Sun Kang, and Minsu Joh. 2020. GPU acceleration of MPAS microphysics WSM6 using OpenACC directives: Performance and verification. Computers & Geosciences (2020), 104627.Google Scholar
- James Manners, John M. Edwards, Peter Hill, and Jean-Claude Thelen. 2015. SOCRATES (Suite Of Community RAdiative Transfer codes based on Edwards and Slingo) technical guide. https://code.metoffice.gov.uk/trac/socratesGoogle Scholar
- Christopher M Maynard and David N Walters. 2019. Mixed-precision arithmetic in the ENDGame dynamical core of the Unified Model, a numerical weather prediction and climate model code. Computer Physics Communications 244 (2019), 69--75.Google ScholarCross Ref
- Matthew Norman, Jeffrey Larkin, Aaron Vose, and Katherine Evans. 2015. A case study of CUDA FORTRAN and OpenACC for an atmospheric climate kernel. Journal of computational science 9 (2015), 1--6.Google ScholarCross Ref
- Sergi Palomas Martinez. 2019. Accelerating Operational Earth System Models using GPUs: portability of NEMO diagnostics to GPU's. (2019).Google Scholar
- Jim Rosinski. 2009. General purpose timing library (gptl): A tool for characterizing performance of parallel and serial applications. In Cray User Group (CUG). Berkley, California.Google Scholar
- BJ Shipway and AA Hill. 2011. The Kinematic Driver model (KiD). Technical Re (2011).Google Scholar
- Karthee Sivalingam, Grenville Lister, and Bryan Lawrence. 2015. Performance analysis and Optimisation of the Met Unified Model on a Cray XC30. arXiv preprint arXiv:1511.03885 (2015).Google Scholar
- Yuzhu Wang, Yuan Zhao, Wei Li, Jinrong Jiang, Xiaohui Ji, and Albert Y Zomaya. 2019. Using a GPU to accelerate a longwave radiative transfer model with efficient CUDA-based methods. Applied Sciences 9, 19 (2019), 4039.Google ScholarCross Ref
- Huadong Xiao, Jing Sun, Xiaofeng Bian, and Zhijun Dai. 2013. GPU acceleration of the WSM6 cloud microphysics scheme in GRAPES model. Computers & Geosciences 59 (2013), 156--162.Google ScholarDigital Library
- Shizhen Xu, Xiaomeng Huang, L-Y Oey, Fanghua Xu, Haohuan Fu, Yan Zhang, and Guangwen Yang. 2015. POM. gpu-v1. 0: a GPU-based Princeton Ocean Model. Geoscientific Model Development 8, 9 (2015), 2815--2827.Google ScholarCross Ref
Index Terms
Progress towards accelerating the unified model on hybrid multi-core systems
Recommendations
Progress towards accelerating HOMME on hybrid multi-core systems
The suitability of a spectral element based dynamical core (HOMME) within the Community Atmospheric Model (CAM) for GPU-based architectures is examined and initial performance results are reported. This work was done within a project to enable CAM to ...
An OpenACC-based unified programming model for multi-accelerator systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThis paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers ...
Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC
WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using DirectivesThe multi-zone scalar pentadiagonal (SP-MZ) benchmark, part of the multi-zone NAS Parallel Benchmark suite, is ported to graphics processing units (GPUs) using OpenACC compiler directives. The sequence of optimizations necessary to transform the SP-MZ ...
Comments