Preprints
https://doi.org/10.5194/gmd-2023-222
https://doi.org/10.5194/gmd-2023-222
Submitted as: development and technical paper
 | 
17 Jan 2024
Submitted as: development and technical paper |  | 17 Jan 2024
Status: a revised version of this preprint is currently under review for the journal GMD.

GPU-HADVPPM4HIP V1.0: higher model accuracy on China's domestically GPU-like accelerator using heterogeneous compute interface for portability (HIP) technology to accelerate the piecewise parabolic method (PPM) in an air quality model (CAMx V6.10)

Kai Cao, Qizhong Wu, Lingling Wang, Hengliang Guo, Nan Wang, Huaqiong Cheng, Xiao Tang, Lina Liu, Dongqing Li, Hao Wu, and Lanning Wang

Abstract. The graphics processing units (GPUs) are becoming a compelling acceleration strategy for geoscience numerical model due to their powerful computing performance. In this study, AMD’s heterogeneous compute interface for portability (HIP) was implemented to port the GPU acceleration version of the Piecewise Parabolic Method (PPM) solver (GPU-HADVPPM) from the NVIDIA GPUs to China’ s domestically GPU-like accelerators as GPU-HADVPPM4HIP, and further introduced the multi-level hybrid parallelism scheme to improve the total computational performance of the HIP version of CAMx (CAMx-HIP) model on the China’ s domestically heterogeneous cluster. The experimental results show that the acceleration effect of GPU-HADVPPM on the different GPU accelerator is more obvious when the computing scale is larger, and the maximum speedup of GPU-HADVPPM on the domestic GPU-like accelerator is 28.9 times. The hybrid parallelism with a message passing interface (MPI) and HIP enables achieve up to 17.2 times speedup when configure 32 CPU cores and GPU-like accelerators on the domestic heterogeneous cluster. And the OpenMP technology is introduced to further reduce the computation time of CAMx-HIP model by 1.9 times. More importantly, by comparing the simulation results of GPU-HADVPPM on NVIDIA GPUs and domestic GPU-like accelerators, it is found that the simulation results of GPU-HADVPPM on domestic GPU-like accelerators have less difference than the NVIDIA GPUs, and the reason for this difference may be related to the fact that the NVIDIA GPU sacrifices part of the accuracy for improved computing performance. All in all, the domestic GPU-like accelerators are more accuracy for scientific computing in the field of geoscience numerical models. Furthermore, we also exhibit that the data transfer efficiency between CPU and GPU has an important impact on heterogeneous computing, and point out that optimizing the data transfer efficiency between CPU and GPU is one of the important directions to improve the computing efficiency of geoscience numerical models in heterogeneous clusters in the future.

Kai Cao, Qizhong Wu, Lingling Wang, Hengliang Guo, Nan Wang, Huaqiong Cheng, Xiao Tang, Lina Liu, Dongqing Li, Hao Wu, and Lanning Wang

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2023-222', Anonymous Referee #1, 30 Jan 2024
    • AC1: 'Reply on RC1', Qizhong Wu, 23 Feb 2024
  • RC2: 'Comment on gmd-2023-222', Anonymous Referee #2, 05 Apr 2024
    • AC2: 'Reply on RC2', Qizhong Wu, 22 Apr 2024
Kai Cao, Qizhong Wu, Lingling Wang, Hengliang Guo, Nan Wang, Huaqiong Cheng, Xiao Tang, Lina Liu, Dongqing Li, Hao Wu, and Lanning Wang
Kai Cao, Qizhong Wu, Lingling Wang, Hengliang Guo, Nan Wang, Huaqiong Cheng, Xiao Tang, Lina Liu, Dongqing Li, Hao Wu, and Lanning Wang

Viewed

Total article views: 483 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
395 61 27 483 22 30 23
  • HTML: 395
  • PDF: 61
  • XML: 27
  • Total: 483
  • Supplement: 22
  • BibTeX: 30
  • EndNote: 23
Views and downloads (calculated since 17 Jan 2024)
Cumulative views and downloads (calculated since 17 Jan 2024)

Viewed (geographical distribution)

Total article views: 468 (including HTML, PDF, and XML) Thereof 468 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Apr 2024
Download
Short summary
AMD’s HIP was implemented to port the PPM solver from the NVIDIA GPUs to China’s GPU-like accelerators. The results show that the larger the model scale, the more acceleration effect on the GPU-like accelerator, up to the 28.9x. The multi-level hybrid parallelism enables achieve up to 32.7x speedup on the heterogeneous cluster. By comparing the simulation results, it is found that the GPU-like accelerators get more accuracy for the geoscience numerical models.