Articles | Volume 17, issue 17
https://doi.org/10.5194/gmd-17-6887-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-17-6887-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GPU-HADVPPM4HIP V1.0: using the heterogeneous-compute interface for portability (HIP) to speed up the piecewise parabolic method in the CAMx (v6.10) air quality model on China's domestic GPU-like accelerator
College of Global Change and Earth System Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
College of Global Change and Earth System Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Joint Center for Earth System Modeling and High Performance Computing, Beijing Normal University, Beijing 100875, China
Lingling Wang
CORRESPONDING AUTHOR
Henan Ecological Environmental Monitoring and Safety Center, Henan Key Laboratory of Environmental Monitoring Technology, Zhengzhou 450008, China
Hengliang Guo
National Supercomputing Center in Zhengzhou, Zhengzhou 450001, China
Nan Wang
Henan Ecological Environmental Monitoring and Safety Center, Henan Key Laboratory of Environmental Monitoring Technology, Zhengzhou 450008, China
Huaqiong Cheng
College of Global Change and Earth System Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Joint Center for Earth System Modeling and High Performance Computing, Beijing Normal University, Beijing 100875, China
Xiao Tang
State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry, Institute of Atmospheric Physics, Chinese Academy of Science, Beijing 100029, China
Dongxing Li
College of Global Change and Earth System Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Joint Center for Earth System Modeling and High Performance Computing, Beijing Normal University, Beijing 100875, China
Lina Liu
National Supercomputing Center in Zhengzhou, Zhengzhou 450001, China
Dongqing Li
College of Global Change and Earth System Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Hao Wu
National Supercomputing Center in Zhengzhou, Zhengzhou 450001, China
Lanning Wang
CORRESPONDING AUTHOR
College of Global Change and Earth System Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Joint Center for Earth System Modeling and High Performance Computing, Beijing Normal University, Beijing 100875, China
Related authors
Zehua Bai, Qizhong Wu, Kai Cao, Yiming Sun, and Huaqiong Cheng
Geosci. Model Dev., 17, 4383–4399, https://doi.org/10.5194/gmd-17-4383-2024, https://doi.org/10.5194/gmd-17-4383-2024, 2024
Short summary
Short summary
There is relatively limited research on the application of scientific computing on RISC CPU platforms. The MIPS architecture CPUs, a type of RISC CPUs, have distinct advantages in energy efficiency and scalability. The air quality modeling system can run stably on the MIPS and LoongArch platforms, and the experiment results verify the stability of scientific computing on the platforms. The work provides a technical foundation for the scientific application based on MIPS and LoongArch.
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang
Geosci. Model Dev., 16, 4367–4383, https://doi.org/10.5194/gmd-16-4367-2023, https://doi.org/10.5194/gmd-16-4367-2023, 2023
Short summary
Short summary
Offline performance experiment results show that the GPU-HADVPPM on a V100 GPU can achieve up to 1113.6 × speedups to its original version on an E5-2682 v4 CPU. A series of optimization measures are taken, and the CAMx-CUDA model improves the computing efficiency by 128.4 × on a single V100 GPU card. A parallel architecture with an MPI plus CUDA hybrid paradigm is presented, and it can achieve up to 4.5 × speedup when launching eight CPU cores and eight GPU cards.
Jiayi Lai, Lanning Wang, Qizhong Wu, Yizhou Yang, and Fang Wang
Geosci. Model Dev., 18, 1089–1102, https://doi.org/10.5194/gmd-18-1089-2025, https://doi.org/10.5194/gmd-18-1089-2025, 2025
Short summary
Short summary
High-performance computing limitations often hinder numerical model development. Traditional models use double precision for accuracy, which is computationally expensive. Lower precision reduces costs but can introduce errors. The quasi-double-precision (QDP) algorithm helps mitigate these errors. This study applies the QDP algorithm to the Model for Prediction Across Scales – Atmosphere, showing reduced errors and computational time, making it an efficient solution for large-scale simulations.
Haoming Bao, Jiandong Shang, Jinzhu Li, Gang Wu, Haitao Wei, Lingling Wang, Nan Wang, Jingye Shi, Wenge Zhou, Feng Chen, Jiahui Guo, Jinyang Wang, Dujuan Zhang, and Hengliang Guo
EGUsphere, https://doi.org/10.5194/egusphere-2024-3495, https://doi.org/10.5194/egusphere-2024-3495, 2025
Short summary
Short summary
An analysis of ozone pollution in Henan Province, China, from 2015 to 2022 was conducted. The spatiotemporal distribution patterns of ozone pollution in Henan Province during this period and its driving factors were examined from the perspectives of pollutant concentrations, meteorological conditions, and socioeconomic factors. Time-series analysis and machine learning techniques were employed to predict both short-term and long-term ozone concentrations in the region.
Dylan Jones, Lucas Prates, Zhen Qu, William Cheng, Kazuyuki Miyazaki, Takashi Sekiya, Antje Inness, Rajesh Kumar, Xiao Tang, Helen Worden, Gerbrand Koren, and Vincent Huijen
EGUsphere, https://doi.org/10.5194/egusphere-2024-3759, https://doi.org/10.5194/egusphere-2024-3759, 2025
Short summary
Short summary
We evaluate five chemical reanalysis products to assess their potential to provide useful information on tropospheric ozone variability. We find that the reanalyses produce consistent information on ozone variations in the free troposphere, but have large discrepancies at the surface. The results suggests that improvements in the reanalyses are needed to better exploit the assimilated observations to enhance the utility of the reanalysis products at the surface.
Zichen Wu, Xueshun Chen, Zifa Wang, Huansheng Chen, Zhe Wang, Qing Mu, Lin Wu, Wending Wang, Xiao Tang, Jie Li, Ying Li, Qizhong Wu, Yang Wang, Zhiyin Zou, and Zijian Jiang
Geosci. Model Dev., 17, 8885–8907, https://doi.org/10.5194/gmd-17-8885-2024, https://doi.org/10.5194/gmd-17-8885-2024, 2024
Short summary
Short summary
We developed a model to simulate polycyclic aromatic hydrocarbons (PAHs) from global to regional scales. The model can reproduce PAH distribution well. The concentration of BaP (indicator species for PAHs) could exceed the target values of 1 ng m-3 over some areas (e.g., in central Europe, India, and eastern China). The change in BaP is lower than that in PM2.5 from 2013 to 2018. China still faces significant potential health risks posed by BaP although the Action Plan has been implemented.
Lei Kong, Xiao Tang, Zifa Wang, Jiang Zhu, Jianjun Li, Huangjian Wu, Qizhong Wu, Huansheng Chen, Lili Zhu, Wei Wang, Bing Liu, Qian Wang, Duohong Chen, Yuepeng Pan, Jie Li, Lin Wu, and Gregory R. Carmichael
Earth Syst. Sci. Data, 16, 4351–4387, https://doi.org/10.5194/essd-16-4351-2024, https://doi.org/10.5194/essd-16-4351-2024, 2024
Short summary
Short summary
A new long-term inversed emission inventory for Chinese air quality (CAQIEI) is developed in this study, which contains constrained monthly emissions of NOx, SO2, CO, PM2.5, PM10, and NMVOCs in China from 2013 to 2020 with a horizontal resolution of 15 km. Emissions of different air pollutants and their changes during 2013–2020 were investigated and compared with previous emission inventories, which sheds new light on the complex variations of air pollutant emissions in China.
Zehua Bai, Qizhong Wu, Kai Cao, Yiming Sun, and Huaqiong Cheng
Geosci. Model Dev., 17, 4383–4399, https://doi.org/10.5194/gmd-17-4383-2024, https://doi.org/10.5194/gmd-17-4383-2024, 2024
Short summary
Short summary
There is relatively limited research on the application of scientific computing on RISC CPU platforms. The MIPS architecture CPUs, a type of RISC CPUs, have distinct advantages in energy efficiency and scalability. The air quality modeling system can run stably on the MIPS and LoongArch platforms, and the experiment results verify the stability of scientific computing on the platforms. The work provides a technical foundation for the scientific application based on MIPS and LoongArch.
Jiaxu Guo, Juepeng Zheng, Yidan Xu, Haohuan Fu, Wei Xue, Lanning Wang, Lin Gan, Ping Gao, Wubing Wan, Xianwei Wu, Zhitao Zhang, Liang Hu, Gaochao Xu, and Xilong Che
Geosci. Model Dev., 17, 3975–3992, https://doi.org/10.5194/gmd-17-3975-2024, https://doi.org/10.5194/gmd-17-3975-2024, 2024
Short summary
Short summary
To enhance the efficiency of experiments using SCAM, we train a learning-based surrogate model to facilitate large-scale sensitivity analysis and tuning of combinations of multiple parameters. Employing a hybrid method, we investigate the joint sensitivity of multi-parameter combinations across typical cases, identifying the most sensitive three-parameter combination out of 11. Subsequently, we conduct a tuning process aimed at reducing output errors in these cases.
Yaqi Wang, Lanning Wang, Juan Feng, Zhenya Song, Qizhong Wu, and Huaqiong Cheng
Geosci. Model Dev., 16, 6857–6873, https://doi.org/10.5194/gmd-16-6857-2023, https://doi.org/10.5194/gmd-16-6857-2023, 2023
Short summary
Short summary
In this study, to noticeably improve precipitation simulation in steep mountains, we propose a sub-grid parameterization scheme for the topographic vertical motion in CAM5-SE to revise the original vertical velocity by adding the topographic vertical motion. The dynamic lifting effect of topography is extended from the lowest layer to multiple layers, thus improving the positive deviations of precipitation simulation in high-altitude regions and negative deviations in low-altitude regions.
Xianwei Wu, Liang Hu, Lanning Wang, Haitian Lu, and Juepeng Zheng
Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2023-164, https://doi.org/10.5194/gmd-2023-164, 2023
Revised manuscript not accepted
Short summary
Short summary
In order to build an effective surrogate model for the community atmospheric model (CAM). We present a surrogate model-based parameter tuning framework for the CAM and apply it to improve the CAM5 precipitation performance and propose a multilevel surrogate model-based optimization method. We design a nonuniform parameter parameterization scheme and integrate the parameters using a parameter smoothing scheme, and the experimental results improve in four regions.
Kai Cao, Qizhong Wu, Lingling Wang, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongqing Li, and Lanning Wang
Geosci. Model Dev., 16, 4367–4383, https://doi.org/10.5194/gmd-16-4367-2023, https://doi.org/10.5194/gmd-16-4367-2023, 2023
Short summary
Short summary
Offline performance experiment results show that the GPU-HADVPPM on a V100 GPU can achieve up to 1113.6 × speedups to its original version on an E5-2682 v4 CPU. A series of optimization measures are taken, and the CAMx-CUDA model improves the computing efficiency by 128.4 × on a single V100 GPU card. A parallel architecture with an MPI plus CUDA hybrid paradigm is presented, and it can achieve up to 4.5 × speedup when launching eight CPU cores and eight GPU cards.
Jinming Feng, Meng Luo, Jun Wang, Yuan Qiu, Qizhong Wu, and Ke Wang
EGUsphere, https://doi.org/10.5194/egusphere-2023-867, https://doi.org/10.5194/egusphere-2023-867, 2023
Preprint withdrawn
Short summary
Short summary
We modified the code of the Weather Research and Forecasting Model (WRF) v3.8.1 to include the forcing components more than the Greenhouse Gases and evaluate the impact of forcing configurations on the climate simulation results in China. It showed that different external forcing configurations in WRF could result in considerable impact on the annual temperature and precipitation trend, which was stronger than parameterization schemes but was weaker than spectral nudging.
Lei Kong, Xiao Tang, Jiang Zhu, Zifa Wang, Yele Sun, Pingqing Fu, Meng Gao, Huangjian Wu, Miaomiao Lu, Qian Wu, Shuyuan Huang, Wenxuan Sui, Jie Li, Xiaole Pan, Lin Wu, Hajime Akimoto, and Gregory R. Carmichael
Atmos. Chem. Phys., 23, 6217–6240, https://doi.org/10.5194/acp-23-6217-2023, https://doi.org/10.5194/acp-23-6217-2023, 2023
Short summary
Short summary
A multi-air-pollutant inversion system has been developed in this study to estimate emission changes in China during COVID-19 lockdown. The results demonstrate that the lockdown is largely a nationwide road traffic control measure with NOx emissions decreasing by ~40 %. Emissions of other species only decreased by ~10 % due to smaller effects of lockdown on other sectors. Assessment results further indicate that the lockdown only had limited effects on the control of PM2.5 and O3 in China.
Yuejin Ye, Zhenya Song, Shengchang Zhou, Yao Liu, Qi Shu, Bingzhuo Wang, Weiguo Liu, Fangli Qiao, and Lanning Wang
Geosci. Model Dev., 15, 5739–5756, https://doi.org/10.5194/gmd-15-5739-2022, https://doi.org/10.5194/gmd-15-5739-2022, 2022
Short summary
Short summary
The swNEMO_v4.0 is developed with ultrahigh scalability through the concepts of hardware–software co-design based on the characteristics of the new Sunway supercomputer and NEMO4. Three breakthroughs, including an adaptive four-level parallelization design, many-core optimization and mixed-precision optimization, are designed. The simulations achieve 71.48 %, 83.40 % and 99.29 % parallel efficiency with resolutions of 2 km, 1 km and 500 m using 27 988 480 cores, respectively.
Zixuan Jia, Ruth M. Doherty, Carlos Ordóñez, Chaofan Li, Oliver Wild, Shipra Jain, and Xiao Tang
Atmos. Chem. Phys., 22, 6471–6487, https://doi.org/10.5194/acp-22-6471-2022, https://doi.org/10.5194/acp-22-6471-2022, 2022
Short summary
Short summary
This study investigates the modulation of daily PM2.5 over three major populated regions in China by regional meteorology and large-scale circulation during winter. These results demonstrate the benefits of considering the large-scale circulation for air quality studies. The novel circulation indices proposed here can explain a considerable fraction of the day-to-day variability of PM2.5 and can be combined with regional meteorology to improve our capability to predict the variability of PM2.5.
Qian Ma, Kaicun Wang, Yanyi He, Liangyuan Su, Qizhong Wu, Han Liu, and Youren Zhang
Earth Syst. Sci. Data, 14, 463–477, https://doi.org/10.5194/essd-14-463-2022, https://doi.org/10.5194/essd-14-463-2022, 2022
Short summary
Short summary
Surface incident solar radiation plays a key role in atmospheric circulation, the water cycle, and ecological equilibrium on Earth. A homogenized century-long surface incident solar radiation dataset was obtained over Japan.
Qian Ye, Jie Li, Xueshun Chen, Huansheng Chen, Wenyi Yang, Huiyun Du, Xiaole Pan, Xiao Tang, Wei Wang, Lili Zhu, Jianjun Li, Zhe Wang, and Zifa Wang
Geosci. Model Dev., 14, 7573–7604, https://doi.org/10.5194/gmd-14-7573-2021, https://doi.org/10.5194/gmd-14-7573-2021, 2021
Short summary
Short summary
We developed a global tropospheric atmospheric chemistry source–receptor model. This model can quantify the contributions of multiple air pollutants from various source regions in one simulation without introducing the nonlinear error of atmospheric chemistry. The S-R relationships of PM2.5 and O3 from a global high-resolution (0.5° × 0.5°) simulation were given and compared with previous studies. This model will be useful for creating a link between the scientific community and policymakers.
Ying Wei, Xueshun Chen, Huansheng Chen, Yele Sun, Wenyi Yang, Huiyun Du, Qizhong Wu, Dan Chen, Xiujuan Zhao, Jie Li, and Zifa Wang
Geosci. Model Dev., 14, 4411–4428, https://doi.org/10.5194/gmd-14-4411-2021, https://doi.org/10.5194/gmd-14-4411-2021, 2021
Short summary
Short summary
The sub-grid particle formation (SGPF) in plumes plays an important role in air pollution and climate. We coupled an SGPF scheme to a chemical transport model with an aerosol microphysics module and applied it to investigate the SGPF impact over China. The scheme clearly improved the model performance in simulating aerosol components and particle number at typical sites influenced by point sources. The results indicate the significant effects of SGPF on aerosol particles in industrial areas.
Xueshun Chen, Fangqun Yu, Wenyi Yang, Yele Sun, Huansheng Chen, Wei Du, Jian Zhao, Ying Wei, Lianfang Wei, Huiyun Du, Zhe Wang, Qizhong Wu, Jie Li, Junling An, and Zifa Wang
Atmos. Chem. Phys., 21, 9343–9366, https://doi.org/10.5194/acp-21-9343-2021, https://doi.org/10.5194/acp-21-9343-2021, 2021
Short summary
Short summary
Atmospheric aerosol particles have significant climate and health effects that depend on aerosol size, composition, and mixing state. A new global-regional nested aerosol model with an advanced particle microphysics module and a volatility basis set organic aerosol module was developed to simulate aerosol microphysical processes. Simulations strongly suggest the important role of anthropogenic organic species in particle formation over the areas influenced by anthropogenic sources.
Hui Wang, Qizhong Wu, Alex B. Guenther, Xiaochun Yang, Lanning Wang, Tang Xiao, Jie Li, Jinming Feng, Qi Xu, and Huaqiong Cheng
Atmos. Chem. Phys., 21, 4825–4848, https://doi.org/10.5194/acp-21-4825-2021, https://doi.org/10.5194/acp-21-4825-2021, 2021
Short summary
Short summary
We assessed the influence of the greening trend on BVOC emission in China. The comparison among different scenarios showed that vegetation changes resulting from land cover management are the main driver of BVOC emission change in China. Climate variability contributed significantly to interannual variations but not much to the long-term trend during the study period.
Tie Dai, Yueming Cheng, Daisuke Goto, Yingruo Li, Xiao Tang, Guangyu Shi, and Teruyuki Nakajima
Atmos. Chem. Phys., 21, 4357–4379, https://doi.org/10.5194/acp-21-4357-2021, https://doi.org/10.5194/acp-21-4357-2021, 2021
Short summary
Short summary
The anthropogenic emission of sulfur dioxide (SO2) over China has significantly declined as a consequence of the clean air actions. We have developed a new emission inversion system to dynamically update the SO2 emission grid by grid over China by assimilating ground-based SO2 observations. The inverted SO2 emission over China in November 2016 on average had declined by 49.4 % since 2010, which is well in agreement with the bottom-up estimation of 48.0 %.
Lei Kong, Xiao Tang, Jiang Zhu, Zifa Wang, Jianjun Li, Huangjian Wu, Qizhong Wu, Huansheng Chen, Lili Zhu, Wei Wang, Bing Liu, Qian Wang, Duohong Chen, Yuepeng Pan, Tao Song, Fei Li, Haitao Zheng, Guanglin Jia, Miaomiao Lu, Lin Wu, and Gregory R. Carmichael
Earth Syst. Sci. Data, 13, 529–570, https://doi.org/10.5194/essd-13-529-2021, https://doi.org/10.5194/essd-13-529-2021, 2021
Short summary
Short summary
China's air pollution has changed substantially since 2013. Here we have developed a 6-year-long high-resolution air quality reanalysis dataset over China from 2013 to 2018 to illustrate such changes and to provide a basic dataset for relevant studies. Surface fields of PM2.5, PM10, SO2, NO2, CO, and O3 concentrations are provided, and the evaluation results indicate that the reanalysis dataset has excellent performance in reproducing the magnitude and variation of air pollution in China.
Han Xiao, Qizhong Wu, Xiaochun Yang, Lanning Wang, and Huaqiong Cheng
Geosci. Model Dev., 14, 223–238, https://doi.org/10.5194/gmd-14-223-2021, https://doi.org/10.5194/gmd-14-223-2021, 2021
Short summary
Short summary
Few studies have investigated the effects of initial conditions on the simulation or prediction of PM2.5 concentrations. Here, sensitivity experiments are used to explore the effects of three initial mechanisms (clean, restart, and continuous) and emissions in Xi’an in December 2016. According to this work, if the restart mechanism cannot be used due to computing resource and storage space limitations when forecasting PM2.5 concentrations, a spin-up time of at least 27 h is needed.
Shaoqing Zhang, Haohuan Fu, Lixin Wu, Yuxuan Li, Hong Wang, Yunhui Zeng, Xiaohui Duan, Wubing Wan, Li Wang, Yuan Zhuang, Hongsong Meng, Kai Xu, Ping Xu, Lin Gan, Zhao Liu, Sihai Wu, Yuhu Chen, Haining Yu, Shupeng Shi, Lanning Wang, Shiming Xu, Wei Xue, Weiguo Liu, Qiang Guo, Jie Zhang, Guanghui Zhu, Yang Tu, Jim Edwards, Allison Baker, Jianlin Yong, Man Yuan, Yangyang Yu, Qiuying Zhang, Zedong Liu, Mingkui Li, Dongning Jia, Guangwen Yang, Zhiqiang Wei, Jingshan Pan, Ping Chang, Gokhan Danabasoglu, Stephen Yeager, Nan Rosenbloom, and Ying Guo
Geosci. Model Dev., 13, 4809–4829, https://doi.org/10.5194/gmd-13-4809-2020, https://doi.org/10.5194/gmd-13-4809-2020, 2020
Short summary
Short summary
Science advancement and societal needs require Earth system modelling with higher resolutions that demand tremendous computing power. We successfully scale the 10 km ocean and 25 km atmosphere high-resolution Earth system model to a new leading-edge heterogeneous supercomputer using state-of-the-art optimizing methods, promising the solution of high spatial resolution and time-varying frequency. Corresponding technical breakthroughs are of significance in modelling and HPC design communities.
Baozhu Ge, Syuichi Itahashi, Keiichi Sato, Danhui Xu, Junhua Wang, Fan Fan, Qixin Tan, Joshua S. Fu, Xuemei Wang, Kazuyo Yamaji, Tatsuya Nagashima, Jie Li, Mizuo Kajino, Hong Liao, Meigen Zhang, Zhe Wang, Meng Li, Jung-Hun Woo, Junichi Kurokawa, Yuepeng Pan, Qizhong Wu, Xuejun Liu, and Zifa Wang
Atmos. Chem. Phys., 20, 10587–10610, https://doi.org/10.5194/acp-20-10587-2020, https://doi.org/10.5194/acp-20-10587-2020, 2020
Short summary
Short summary
Performances of the simulated deposition for different reduced N (Nr) species in China were conducted with the Model Inter-Comparison Study for Asia. Results showed that simulated wet deposition of oxidized N was overestimated in northeastern China and underestimated in south China, but Nr was underpredicted in all regions by all models. Oxidized N has larger uncertainties than Nr, indicating that the chemical reaction process is one of the most importance factors affecting model performance.
Cited articles
Alvanos, M. and Christoudias, T.: GPU-accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model (version 2.52), Geosci. Model Dev., 10, 3679–3693, https://doi.org/10.5194/gmd-10-3679-2017, 2017.
AMD: ROCm Documentation Release 5.7.1, Advanced Micro Devices Inc., https://rocm.docs.amd.com/en/docs-5.7.1 (last access: 9 September 2024), 2023.
Bott, A.: A Positive Definite Advection Scheme Obtained by Nonlinear Renormalization of the Advective Fluxes, Mon. Weather Rev., 117, 1006–1016, https://doi.org/10.1175/1520-0493(1989)117<1006:APDASO>2.0.CO;2, 1989.
CAMx: A multi-scale photochemical modeling system for gas and particulate air pollution, ENVIRON International Corporation, https://www.camx.com/ (last access: 20 October 2023), 2023.
Cao, K. and Wu, Q.: The dataset of the manuscript “GPU-HADVPPM4HIP V1.0: higher model accuracy on China's domestically GPU-like accelerator using heterogeneous compute interface for portability (HIP) technology to accelerate the piecewise parabolic method (PPM) in an air quality model (CAMx V6.10)”, Zenodo [data set], https://doi.org/10.5281/zenodo.12747391, 2024.
Cao, K., Wu, Q., Wang, L., Wang, N., Cheng, H., Tang, X., Li, D., and Wang, L.: GPU-HADVPPM V1.0: a high-efficiency parallel GPU design of the piecewise parabolic method (PPM) for horizontal advection in an air quality model (CAMx V6.10), Geosci. Model Dev., 16, 4367–4383, https://doi.org/10.5194/gmd-16-4367-2023, 2023a.
Cao, K., Wu, Q., Wang, L., Wang, N., Cheng, H., Tang, X.,Li, D., and Wang, L.: The dataset of the manuscript “GPUHADVPPM V1.0: high-efficient parallel GPU design of the Piecewise Parabolic Method (PPM) for horizontal advection in air quality model (CAMx V6.10)”, Zenodo [data set], https://doi.org/10.5281/zenodo.7765218, 2023b.
Colella, P. and Woodward, P. R.: The Piecewise Parabolic Method (PPM) for gas-dynamical simulations, J. Comput. Phys., 54, 174–201, https://doi.org/10.1016/0021-9991(84)90143-8, 1984.
ENVIRON: User Guide for Comprehensive Air Quality Model with Extensions Version 6.1, ENVIRON International Corporation, https://www.camx.com/Files/CAMxUsersGuide_v6.10.pdf (last access: 9 September 2024), 2014.
ENVIRON: CAMx version 6.1, ENVIRON International Corporation [code], https://camx-wp.azurewebsites.net/download/source/, last access: 20 October 2023.
Huang, M., Huang, B., Mielikainen, J., Huang, H. L. A., Goldberg, M. D., and Mehta, A.: Further Improvement on GPUBased Parallel Implementation of WRF 5-Layer Thermal Diffusion Scheme, in: 2013 International Conference on Parallel and Distributed Systems, Seoul, South Korea, 15–18 December 2013, https://doi.org/10.1109/icpads.2013.126, 2013.
Linford, J. C., Michalakes, J., Vachharajani, M., and Sandu, A.: Automatic Generation of Multicore Chemical Kernels, IEEE T. Parall. Distr., 22, 119–131, https://doi.org/10.1109/tpds.2010.106, 2011.
Mielikainen, J., Huang, B., Huang, H.-L. A., and Goldberg, M. D.: GPU Implementation of Stony Brook University 5-Class Cloud Microphysics Scheme in the WRF, IEEE J. Sel. Top. Appl., 5, 625–633, https://doi.org/10.1109/jstars.2011.2175707, 2012.
Mielikainen, J., Huang, B., Wang, J., Allen Huang, H. L., and Goldberg, M. D.: Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme, Comput. Geosci., 52, 292–299, https://doi.org/10.1016/j.cageo.2012.10.006, 2013.
News: Frontier Remains as Sole Exaflop Machine and Retains Top Spot, Improving Upon Its Previous HPL Score, TOP500 international organization, https://www.top500.org/news/frontier-remains-sole-exaflop-machine-and-retains-top-spot-improving-upon-its-previous-hpl-score/ (last access: 20 October 2023), 2023.
NVIDIA: CUDA C Programming Guide Version 10.2, NVIDIA Corporation, https://docs.nvidia.com/cuda/archive/10.2/pdf/CUDA_C_Programming_Guide.pdf (last access: 20 October 2023), 2020.
Odman, M. and Ingram, C.: Multiscale Air Quality Simulation Platform (MAQSIP): Source Code Documentation and Validation, Technical report, MCNCNorth Carolina Supercomputing Center, Research Triangle Park, North Carolina, 83 pp., ENV-96TR002, 1996.
ROCm: AMD ROCm-HIP documentation, Advanced Micro Devices Inc., https://rocm.docs.amd.com/en/docs-5.0.0 (last access: 20 October 2023), 2023.
Sun, J., Fu, J. S., Drake, J. B., Zhu, Q., Haidar, A., Gates, M., Tomov, S., and Dongarra, J.: Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling, J. Adv. Model. Earth Sy., 10, 1952–1969, https://doi.org/10.1029/2018MS001276, 2018.
Top500: Supercomputing Top500 list, TOP500 international organization, https://www.top500.org/lists/top500/2023/06/ (last access: 20 October 2023), 2023.
Váňa, F., Düben, P., Lang, S., Palmer, T., Leutbecher, M., Salmond, D., and Carver, G.: Single Precision in Weather Forecasting Models: An Evaluation with the IFS, Mon. Weather Rev., 145, 495–502, https://doi.org/10.1175/mwr-d-16-0228.1, 2017.
Wang, H., Lin, J., Wu, Q., Chen, H., Tang, X., Wang, Z., Chen, X., Cheng, H., and Wang, L.: MP CBM-Z V1.0: design for a new Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical mechanism architecture for next-generation processors, Geosci. Model Dev., 12, 749–764, https://doi.org/10.5194/gmd-12-749-2019, 2019.
Wang, P., Jiang, J., Lin, P., Ding, M., Wei, J., Zhang, F., Zhao, L., Li, Y., Yu, Z., Zheng, W., Yu, Y., Chi, X., and Liu, H.: The GPU version of LASG/IAP Climate System Ocean Model version 3 (LICOM3) under the heterogeneous-compute interface for portability (HIP) framework and its large-scale application , Geosci. Model Dev., 14, 2781–2799, https://doi.org/10.5194/gmd-14-2781-2021, 2021.
Short summary
AMD’s heterogeneous-compute interface for portability was implemented to port the piecewise parabolic method solver from NVIDIA GPUs to China's GPU-like accelerators. The results show that the larger the model scale, the more acceleration effect on the GPU-like accelerator, up to 28.9 times. The multi-level parallelism achieves a speedup of 32.7 times on the heterogeneous cluster. By comparing the results, the GPU-like accelerators have more accuracy for the geoscience numerical models.
AMD’s heterogeneous-compute interface for portability was implemented to port the piecewise...