Articles | Volume 15, issue 14
Geosci. Model Dev., 15, 5739–5756, 2022
https://doi.org/10.5194/gmd-15-5739-2022
Geosci. Model Dev., 15, 5739–5756, 2022
https://doi.org/10.5194/gmd-15-5739-2022
Model description paper
25 Jul 2022
Model description paper | 25 Jul 2022

swNEMO_v4.0: an ocean model based on NEMO4 for the new-generation Sunway supercomputer

Yuejin Ye1,2,3,4, Zhenya Song2,3,4, Shengchang Zhou2,4,5, Yao Liu6, Qi Shu2,3,4, Bingzhuo Wang1, Weiguo Liu4,5, Fangli Qiao2,3,4, and Lanning Wang4,7 Yuejin Ye et al.
  • 1National Supercomputing Center, Wuxi 214000, China
  • 2First Institute of Oceanography, and Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources, Qingdao 266061, China
  • 3Shandong Key Laboratory of Marine Science and Numerical Modeling, Qingdao 266061, China
  • 4Laboratory for Regional Oceanography and Numerical Modeling, Pilot National Laboratory for Marine Science and Technology, Qingdao 266237, China
  • 5School of Software, Shandong University, Jinan 250101, China
  • 6School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • 7College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China

Abstract. The current large-scale parallel barrier of ocean general circulation models (OGCMs) makes it difficult to meet the computing demand of high resolution. Fully considering both the computational characteristics of OGCMs and the heterogeneous many-core architecture of the new Sunway supercomputer, swNEMO_v4.0, based on NEMO4 (Nucleus for European Modelling of the Ocean version 4), is developed with ultrahigh scalability. Three innovations and breakthroughs are shown in our work: (1) a highly adaptive, efficient four-level parallelization framework for OGCMs is proposed to release a new level of parallelism along the compute-dependency column dimension. (2) A many-core optimization method using blocking by remote memory access (RMA) and a dynamic cache scheduling strategy is applied, effectively utilizing the temporal and spatial locality of data. The test shows that the actual direct memory access (DMA) bandwidth is greater than 90 % of the ideal band-width after optimization, and the maximum is up to 95 %. (3) A mixed-precision optimization method with half, single and double precision is explored, which can effectively improve the computation performance while maintaining the simulated accuracy of OGCMs. The results demonstrate that swNEMO_v4.0 has ultrahigh scalability, achieving up to 99.29 % parallel efficiency with a resolution of 500 m using 27 988 480 cores, reaching the peak performance with 1.97 PFLOPS.

Download
Short summary
The swNEMO_v4.0 is developed with ultrahigh scalability through the concepts of hardware–software co-design based on the characteristics of the new Sunway supercomputer and NEMO4. Three breakthroughs, including an adaptive four-level parallelization design, many-core optimization and mixed-precision optimization, are designed. The simulations achieve 71.48 %, 83.40 % and 99.29 % parallel efficiency with resolutions of 2 km, 1 km and 500 m using 27 988 480 cores, respectively.