Preprints
https://doi.org/10.5194/gmd-2022-33
https://doi.org/10.5194/gmd-2022-33
Submitted as: model description paper
02 Mar 2022
Submitted as: model description paper | 02 Mar 2022
Status: this preprint is currently under review for the journal GMD.

swNEMO_v4.0: an ocean model NEMO for the next generation Sunway supercomputer

Yuejin Ye1, Zhenya Song2,3,4, Shengchang Zhou2,4,5, Yao Liu6, Qi Shu2,3,4, Bingzhuo Wang1, Weiguo Liu4,5, Fangli Qiao2,3,4, and Lanning Wang4,7 Yuejin Ye et al.
  • 1National Supercomputing Center in Wuxi, 214000, China
  • 2First Institute of Oceanography, and Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources, Qingdao 266061, China
  • 3Shandong Key Laboratory of Marine Science and Numerical Modeling, Qingdao 266061, China
  • 4Laboratory for Regional Oceanography and Numerical Modeling, Pilot National Laboratory for Marine Science and Technology, Qingdao 266237, China
  • 5Shandong University, Jinan 250101, China
  • 6East China Normal University, Shanghai 200062, China
  • 7Beijing Normal University, Beijing 100875, China

Abstract. The current large-scale parallel barrier of ocean general circulation models (OGCMs) makes it difficult to meet the computing demand of high resolution. Fully considering both the computational characteristics of OGCMs and the heterogeneous many-core architecture of the new Sunway supercomputer, swNEMO_v4.0, with ultrahigh scalability is developed. Three innovations and breakthroughs are shown in our work: (1) A highly adaptive, efficient four-level parallelization framework for OGCMs is proposed to release a new level of parallelism along the compute-dependency column dimension. (2) A many-core optimization method using blocking by remote memory access (RMA) and a dynamic cache scheduling strategy, effectively utilizing the temporal and spatial locality of data. The test shows that the actual DMA bandwidth is greater than 90 % of the ideal bandwidth after optimization, and the maximum is up to 95 %. (3) A mixed-precision optimization method with half-, single-, and double-precision is explored, which can effectively improve the computation performance, while maintaining the simulated accuracy of OGCMs. The results demonstrate that swNEMO_v4.0 has ultrahigh scalability, achieving up to 99.29 % parallel efficiency with a resolution of 500 m using 27,988,480 cores, reaching the peak performance with 1.97 PFlops.

Yuejin Ye et al.

Status: open (extended)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2022-33', Anonymous Referee #1, 10 Apr 2022 reply

Yuejin Ye et al.

Yuejin Ye et al.

Viewed

Total article views: 480 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
312 160 8 480 4 3
  • HTML: 312
  • PDF: 160
  • XML: 8
  • Total: 480
  • BibTeX: 4
  • EndNote: 3
Views and downloads (calculated since 02 Mar 2022)
Cumulative views and downloads (calculated since 02 Mar 2022)

Viewed (geographical distribution)

Total article views: 407 (including HTML, PDF, and XML) Thereof 407 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 20 May 2022
Download
Short summary
The swNEMO_v4.0 with ultra-high scalability is developed through the concepts of hardware-software co-design based on the characteristics of new Sunway supercomputer and NEMO 4. Its simulations remarkably achieve up to 71.48 %, 83.40 %, and 99.29 % parallel efficiency with ultra-high resolution of 2 km, 1 km, and 500 m respectively using 27,988,480 cores.