Articles | Volume 16, issue 15
https://doi.org/10.5194/gmd-16-4367-2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-16-4367-2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GPU-HADVPPM V1.0: a high-efficiency parallel GPU design of the piecewise parabolic method (PPM) for horizontal advection in an air quality model (CAMx V6.10)
College of Global Change and Earth System Science, Beijing Normal
University, Beijing 100875, China
College of Global Change and Earth System Science, Beijing Normal
University, Beijing 100875, China
Lingling Wang
CORRESPONDING AUTHOR
Henan Ecological Environment Monitoring and Safety Center, Henan Key Laboratory of Environmental Monitoring Technology, Zhengzhou 450000, China
Nan Wang
Henan Ecological Environment Monitoring and Safety Center, Henan Key Laboratory of Environmental Monitoring Technology, Zhengzhou 450000, China
Huaqiong Cheng
College of Global Change and Earth System Science, Beijing Normal
University, Beijing 100875, China
Xiao Tang
State Key Laboratory of Atmospheric Boundary Layer Physics and
Atmospheric Chemistry, Institute of Atmospheric Physics, Chinese Academy of
Science, Beijing 100029, China
Dongqing Li
College of Global Change and Earth System Science, Beijing Normal
University, Beijing 100875, China
Lanning Wang
CORRESPONDING AUTHOR
College of Global Change and Earth System Science, Beijing Normal
University, Beijing 100875, China
Related authors
Kai Cao, Qizhong Wu, Lingling Wang, Hengliang Guo, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongxing Li, Lina Liu, Dongqing Li, Hao Wu, and Lanning Wang
Geosci. Model Dev., 17, 6887–6901, https://doi.org/10.5194/gmd-17-6887-2024, https://doi.org/10.5194/gmd-17-6887-2024, 2024
Short summary
Short summary
AMD’s heterogeneous-compute interface for portability was implemented to port the piecewise parabolic method solver from NVIDIA GPUs to China's GPU-like accelerators. The results show that the larger the model scale, the more acceleration effect on the GPU-like accelerator, up to 28.9 times. The multi-level parallelism achieves a speedup of 32.7 times on the heterogeneous cluster. By comparing the results, the GPU-like accelerators have more accuracy for the geoscience numerical models.
Zehua Bai, Qizhong Wu, Kai Cao, Yiming Sun, and Huaqiong Cheng
Geosci. Model Dev., 17, 4383–4399, https://doi.org/10.5194/gmd-17-4383-2024, https://doi.org/10.5194/gmd-17-4383-2024, 2024
Short summary
Short summary
There is relatively limited research on the application of scientific computing on RISC CPU platforms. The MIPS architecture CPUs, a type of RISC CPUs, have distinct advantages in energy efficiency and scalability. The air quality modeling system can run stably on the MIPS and LoongArch platforms, and the experiment results verify the stability of scientific computing on the platforms. The work provides a technical foundation for the scientific application based on MIPS and LoongArch.
Jiayi Lai, Lanning Wang, Qizhong Wu, Yizhou Yang, and Fang Wang
Geosci. Model Dev., 18, 1089–1102, https://doi.org/10.5194/gmd-18-1089-2025, https://doi.org/10.5194/gmd-18-1089-2025, 2025
Short summary
Short summary
High-performance computing limitations often hinder numerical model development. Traditional models use double precision for accuracy, which is computationally expensive. Lower precision reduces costs but can introduce errors. The quasi-double-precision (QDP) algorithm helps mitigate these errors. This study applies the QDP algorithm to the Model for Prediction Across Scales – Atmosphere, showing reduced errors and computational time, making it an efficient solution for large-scale simulations.
Haoming Bao, Jiandong Shang, Jinzhu Li, Gang Wu, Haitao Wei, Lingling Wang, Nan Wang, Jingye Shi, Wenge Zhou, Feng Chen, Jiahui Guo, Jinyang Wang, Dujuan Zhang, and Hengliang Guo
EGUsphere, https://doi.org/10.5194/egusphere-2024-3495, https://doi.org/10.5194/egusphere-2024-3495, 2025
Short summary
Short summary
An analysis of ozone pollution in Henan Province, China, from 2015 to 2022 was conducted. The spatiotemporal distribution patterns of ozone pollution in Henan Province during this period and its driving factors were examined from the perspectives of pollutant concentrations, meteorological conditions, and socioeconomic factors. Time-series analysis and machine learning techniques were employed to predict both short-term and long-term ozone concentrations in the region.
Dylan Jones, Lucas Prates, Zhen Qu, William Cheng, Kazuyuki Miyazaki, Takashi Sekiya, Antje Inness, Rajesh Kumar, Xiao Tang, Helen Worden, Gerbrand Koren, and Vincent Huijen
EGUsphere, https://doi.org/10.5194/egusphere-2024-3759, https://doi.org/10.5194/egusphere-2024-3759, 2025
Short summary
Short summary
We evaluate five chemical reanalysis products to assess their potential to provide useful information on tropospheric ozone variability. We find that the reanalyses produce consistent information on ozone variations in the free troposphere, but have large discrepancies at the surface. The results suggests that improvements in the reanalyses are needed to better exploit the assimilated observations to enhance the utility of the reanalysis products at the surface.
Zichen Wu, Xueshun Chen, Zifa Wang, Huansheng Chen, Zhe Wang, Qing Mu, Lin Wu, Wending Wang, Xiao Tang, Jie Li, Ying Li, Qizhong Wu, Yang Wang, Zhiyin Zou, and Zijian Jiang
Geosci. Model Dev., 17, 8885–8907, https://doi.org/10.5194/gmd-17-8885-2024, https://doi.org/10.5194/gmd-17-8885-2024, 2024
Short summary
Short summary
We developed a model to simulate polycyclic aromatic hydrocarbons (PAHs) from global to regional scales. The model can reproduce PAH distribution well. The concentration of BaP (indicator species for PAHs) could exceed the target values of 1 ng m-3 over some areas (e.g., in central Europe, India, and eastern China). The change in BaP is lower than that in PM2.5 from 2013 to 2018. China still faces significant potential health risks posed by BaP although the Action Plan has been implemented.
Lei Kong, Xiao Tang, Zifa Wang, Jiang Zhu, Jianjun Li, Huangjian Wu, Qizhong Wu, Huansheng Chen, Lili Zhu, Wei Wang, Bing Liu, Qian Wang, Duohong Chen, Yuepeng Pan, Jie Li, Lin Wu, and Gregory R. Carmichael
Earth Syst. Sci. Data, 16, 4351–4387, https://doi.org/10.5194/essd-16-4351-2024, https://doi.org/10.5194/essd-16-4351-2024, 2024
Short summary
Short summary
A new long-term inversed emission inventory for Chinese air quality (CAQIEI) is developed in this study, which contains constrained monthly emissions of NOx, SO2, CO, PM2.5, PM10, and NMVOCs in China from 2013 to 2020 with a horizontal resolution of 15 km. Emissions of different air pollutants and their changes during 2013–2020 were investigated and compared with previous emission inventories, which sheds new light on the complex variations of air pollutant emissions in China.
Kai Cao, Qizhong Wu, Lingling Wang, Hengliang Guo, Nan Wang, Huaqiong Cheng, Xiao Tang, Dongxing Li, Lina Liu, Dongqing Li, Hao Wu, and Lanning Wang
Geosci. Model Dev., 17, 6887–6901, https://doi.org/10.5194/gmd-17-6887-2024, https://doi.org/10.5194/gmd-17-6887-2024, 2024
Short summary
Short summary
AMD’s heterogeneous-compute interface for portability was implemented to port the piecewise parabolic method solver from NVIDIA GPUs to China's GPU-like accelerators. The results show that the larger the model scale, the more acceleration effect on the GPU-like accelerator, up to 28.9 times. The multi-level parallelism achieves a speedup of 32.7 times on the heterogeneous cluster. By comparing the results, the GPU-like accelerators have more accuracy for the geoscience numerical models.
Zehua Bai, Qizhong Wu, Kai Cao, Yiming Sun, and Huaqiong Cheng
Geosci. Model Dev., 17, 4383–4399, https://doi.org/10.5194/gmd-17-4383-2024, https://doi.org/10.5194/gmd-17-4383-2024, 2024
Short summary
Short summary
There is relatively limited research on the application of scientific computing on RISC CPU platforms. The MIPS architecture CPUs, a type of RISC CPUs, have distinct advantages in energy efficiency and scalability. The air quality modeling system can run stably on the MIPS and LoongArch platforms, and the experiment results verify the stability of scientific computing on the platforms. The work provides a technical foundation for the scientific application based on MIPS and LoongArch.
Jiaxu Guo, Juepeng Zheng, Yidan Xu, Haohuan Fu, Wei Xue, Lanning Wang, Lin Gan, Ping Gao, Wubing Wan, Xianwei Wu, Zhitao Zhang, Liang Hu, Gaochao Xu, and Xilong Che
Geosci. Model Dev., 17, 3975–3992, https://doi.org/10.5194/gmd-17-3975-2024, https://doi.org/10.5194/gmd-17-3975-2024, 2024
Short summary
Short summary
To enhance the efficiency of experiments using SCAM, we train a learning-based surrogate model to facilitate large-scale sensitivity analysis and tuning of combinations of multiple parameters. Employing a hybrid method, we investigate the joint sensitivity of multi-parameter combinations across typical cases, identifying the most sensitive three-parameter combination out of 11. Subsequently, we conduct a tuning process aimed at reducing output errors in these cases.
Yaqi Wang, Lanning Wang, Juan Feng, Zhenya Song, Qizhong Wu, and Huaqiong Cheng
Geosci. Model Dev., 16, 6857–6873, https://doi.org/10.5194/gmd-16-6857-2023, https://doi.org/10.5194/gmd-16-6857-2023, 2023
Short summary
Short summary
In this study, to noticeably improve precipitation simulation in steep mountains, we propose a sub-grid parameterization scheme for the topographic vertical motion in CAM5-SE to revise the original vertical velocity by adding the topographic vertical motion. The dynamic lifting effect of topography is extended from the lowest layer to multiple layers, thus improving the positive deviations of precipitation simulation in high-altitude regions and negative deviations in low-altitude regions.
Xianwei Wu, Liang Hu, Lanning Wang, Haitian Lu, and Juepeng Zheng
Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2023-164, https://doi.org/10.5194/gmd-2023-164, 2023
Revised manuscript not accepted
Short summary
Short summary
In order to build an effective surrogate model for the community atmospheric model (CAM). We present a surrogate model-based parameter tuning framework for the CAM and apply it to improve the CAM5 precipitation performance and propose a multilevel surrogate model-based optimization method. We design a nonuniform parameter parameterization scheme and integrate the parameters using a parameter smoothing scheme, and the experimental results improve in four regions.
Jinming Feng, Meng Luo, Jun Wang, Yuan Qiu, Qizhong Wu, and Ke Wang
EGUsphere, https://doi.org/10.5194/egusphere-2023-867, https://doi.org/10.5194/egusphere-2023-867, 2023
Preprint withdrawn
Short summary
Short summary
We modified the code of the Weather Research and Forecasting Model (WRF) v3.8.1 to include the forcing components more than the Greenhouse Gases and evaluate the impact of forcing configurations on the climate simulation results in China. It showed that different external forcing configurations in WRF could result in considerable impact on the annual temperature and precipitation trend, which was stronger than parameterization schemes but was weaker than spectral nudging.
Lei Kong, Xiao Tang, Jiang Zhu, Zifa Wang, Yele Sun, Pingqing Fu, Meng Gao, Huangjian Wu, Miaomiao Lu, Qian Wu, Shuyuan Huang, Wenxuan Sui, Jie Li, Xiaole Pan, Lin Wu, Hajime Akimoto, and Gregory R. Carmichael
Atmos. Chem. Phys., 23, 6217–6240, https://doi.org/10.5194/acp-23-6217-2023, https://doi.org/10.5194/acp-23-6217-2023, 2023
Short summary
Short summary
A multi-air-pollutant inversion system has been developed in this study to estimate emission changes in China during COVID-19 lockdown. The results demonstrate that the lockdown is largely a nationwide road traffic control measure with NOx emissions decreasing by ~40 %. Emissions of other species only decreased by ~10 % due to smaller effects of lockdown on other sectors. Assessment results further indicate that the lockdown only had limited effects on the control of PM2.5 and O3 in China.
Yuejin Ye, Zhenya Song, Shengchang Zhou, Yao Liu, Qi Shu, Bingzhuo Wang, Weiguo Liu, Fangli Qiao, and Lanning Wang
Geosci. Model Dev., 15, 5739–5756, https://doi.org/10.5194/gmd-15-5739-2022, https://doi.org/10.5194/gmd-15-5739-2022, 2022
Short summary
Short summary
The swNEMO_v4.0 is developed with ultrahigh scalability through the concepts of hardware–software co-design based on the characteristics of the new Sunway supercomputer and NEMO4. Three breakthroughs, including an adaptive four-level parallelization design, many-core optimization and mixed-precision optimization, are designed. The simulations achieve 71.48 %, 83.40 % and 99.29 % parallel efficiency with resolutions of 2 km, 1 km and 500 m using 27 988 480 cores, respectively.
Zixuan Jia, Ruth M. Doherty, Carlos Ordóñez, Chaofan Li, Oliver Wild, Shipra Jain, and Xiao Tang
Atmos. Chem. Phys., 22, 6471–6487, https://doi.org/10.5194/acp-22-6471-2022, https://doi.org/10.5194/acp-22-6471-2022, 2022
Short summary
Short summary
This study investigates the modulation of daily PM2.5 over three major populated regions in China by regional meteorology and large-scale circulation during winter. These results demonstrate the benefits of considering the large-scale circulation for air quality studies. The novel circulation indices proposed here can explain a considerable fraction of the day-to-day variability of PM2.5 and can be combined with regional meteorology to improve our capability to predict the variability of PM2.5.
Qian Ma, Kaicun Wang, Yanyi He, Liangyuan Su, Qizhong Wu, Han Liu, and Youren Zhang
Earth Syst. Sci. Data, 14, 463–477, https://doi.org/10.5194/essd-14-463-2022, https://doi.org/10.5194/essd-14-463-2022, 2022
Short summary
Short summary
Surface incident solar radiation plays a key role in atmospheric circulation, the water cycle, and ecological equilibrium on Earth. A homogenized century-long surface incident solar radiation dataset was obtained over Japan.
Qian Ye, Jie Li, Xueshun Chen, Huansheng Chen, Wenyi Yang, Huiyun Du, Xiaole Pan, Xiao Tang, Wei Wang, Lili Zhu, Jianjun Li, Zhe Wang, and Zifa Wang
Geosci. Model Dev., 14, 7573–7604, https://doi.org/10.5194/gmd-14-7573-2021, https://doi.org/10.5194/gmd-14-7573-2021, 2021
Short summary
Short summary
We developed a global tropospheric atmospheric chemistry source–receptor model. This model can quantify the contributions of multiple air pollutants from various source regions in one simulation without introducing the nonlinear error of atmospheric chemistry. The S-R relationships of PM2.5 and O3 from a global high-resolution (0.5° × 0.5°) simulation were given and compared with previous studies. This model will be useful for creating a link between the scientific community and policymakers.
Ying Wei, Xueshun Chen, Huansheng Chen, Yele Sun, Wenyi Yang, Huiyun Du, Qizhong Wu, Dan Chen, Xiujuan Zhao, Jie Li, and Zifa Wang
Geosci. Model Dev., 14, 4411–4428, https://doi.org/10.5194/gmd-14-4411-2021, https://doi.org/10.5194/gmd-14-4411-2021, 2021
Short summary
Short summary
The sub-grid particle formation (SGPF) in plumes plays an important role in air pollution and climate. We coupled an SGPF scheme to a chemical transport model with an aerosol microphysics module and applied it to investigate the SGPF impact over China. The scheme clearly improved the model performance in simulating aerosol components and particle number at typical sites influenced by point sources. The results indicate the significant effects of SGPF on aerosol particles in industrial areas.
Xueshun Chen, Fangqun Yu, Wenyi Yang, Yele Sun, Huansheng Chen, Wei Du, Jian Zhao, Ying Wei, Lianfang Wei, Huiyun Du, Zhe Wang, Qizhong Wu, Jie Li, Junling An, and Zifa Wang
Atmos. Chem. Phys., 21, 9343–9366, https://doi.org/10.5194/acp-21-9343-2021, https://doi.org/10.5194/acp-21-9343-2021, 2021
Short summary
Short summary
Atmospheric aerosol particles have significant climate and health effects that depend on aerosol size, composition, and mixing state. A new global-regional nested aerosol model with an advanced particle microphysics module and a volatility basis set organic aerosol module was developed to simulate aerosol microphysical processes. Simulations strongly suggest the important role of anthropogenic organic species in particle formation over the areas influenced by anthropogenic sources.
Hui Wang, Qizhong Wu, Alex B. Guenther, Xiaochun Yang, Lanning Wang, Tang Xiao, Jie Li, Jinming Feng, Qi Xu, and Huaqiong Cheng
Atmos. Chem. Phys., 21, 4825–4848, https://doi.org/10.5194/acp-21-4825-2021, https://doi.org/10.5194/acp-21-4825-2021, 2021
Short summary
Short summary
We assessed the influence of the greening trend on BVOC emission in China. The comparison among different scenarios showed that vegetation changes resulting from land cover management are the main driver of BVOC emission change in China. Climate variability contributed significantly to interannual variations but not much to the long-term trend during the study period.
Tie Dai, Yueming Cheng, Daisuke Goto, Yingruo Li, Xiao Tang, Guangyu Shi, and Teruyuki Nakajima
Atmos. Chem. Phys., 21, 4357–4379, https://doi.org/10.5194/acp-21-4357-2021, https://doi.org/10.5194/acp-21-4357-2021, 2021
Short summary
Short summary
The anthropogenic emission of sulfur dioxide (SO2) over China has significantly declined as a consequence of the clean air actions. We have developed a new emission inversion system to dynamically update the SO2 emission grid by grid over China by assimilating ground-based SO2 observations. The inverted SO2 emission over China in November 2016 on average had declined by 49.4 % since 2010, which is well in agreement with the bottom-up estimation of 48.0 %.
Lei Kong, Xiao Tang, Jiang Zhu, Zifa Wang, Jianjun Li, Huangjian Wu, Qizhong Wu, Huansheng Chen, Lili Zhu, Wei Wang, Bing Liu, Qian Wang, Duohong Chen, Yuepeng Pan, Tao Song, Fei Li, Haitao Zheng, Guanglin Jia, Miaomiao Lu, Lin Wu, and Gregory R. Carmichael
Earth Syst. Sci. Data, 13, 529–570, https://doi.org/10.5194/essd-13-529-2021, https://doi.org/10.5194/essd-13-529-2021, 2021
Short summary
Short summary
China's air pollution has changed substantially since 2013. Here we have developed a 6-year-long high-resolution air quality reanalysis dataset over China from 2013 to 2018 to illustrate such changes and to provide a basic dataset for relevant studies. Surface fields of PM2.5, PM10, SO2, NO2, CO, and O3 concentrations are provided, and the evaluation results indicate that the reanalysis dataset has excellent performance in reproducing the magnitude and variation of air pollution in China.
Han Xiao, Qizhong Wu, Xiaochun Yang, Lanning Wang, and Huaqiong Cheng
Geosci. Model Dev., 14, 223–238, https://doi.org/10.5194/gmd-14-223-2021, https://doi.org/10.5194/gmd-14-223-2021, 2021
Short summary
Short summary
Few studies have investigated the effects of initial conditions on the simulation or prediction of PM2.5 concentrations. Here, sensitivity experiments are used to explore the effects of three initial mechanisms (clean, restart, and continuous) and emissions in Xi’an in December 2016. According to this work, if the restart mechanism cannot be used due to computing resource and storage space limitations when forecasting PM2.5 concentrations, a spin-up time of at least 27 h is needed.
Shaoqing Zhang, Haohuan Fu, Lixin Wu, Yuxuan Li, Hong Wang, Yunhui Zeng, Xiaohui Duan, Wubing Wan, Li Wang, Yuan Zhuang, Hongsong Meng, Kai Xu, Ping Xu, Lin Gan, Zhao Liu, Sihai Wu, Yuhu Chen, Haining Yu, Shupeng Shi, Lanning Wang, Shiming Xu, Wei Xue, Weiguo Liu, Qiang Guo, Jie Zhang, Guanghui Zhu, Yang Tu, Jim Edwards, Allison Baker, Jianlin Yong, Man Yuan, Yangyang Yu, Qiuying Zhang, Zedong Liu, Mingkui Li, Dongning Jia, Guangwen Yang, Zhiqiang Wei, Jingshan Pan, Ping Chang, Gokhan Danabasoglu, Stephen Yeager, Nan Rosenbloom, and Ying Guo
Geosci. Model Dev., 13, 4809–4829, https://doi.org/10.5194/gmd-13-4809-2020, https://doi.org/10.5194/gmd-13-4809-2020, 2020
Short summary
Short summary
Science advancement and societal needs require Earth system modelling with higher resolutions that demand tremendous computing power. We successfully scale the 10 km ocean and 25 km atmosphere high-resolution Earth system model to a new leading-edge heterogeneous supercomputer using state-of-the-art optimizing methods, promising the solution of high spatial resolution and time-varying frequency. Corresponding technical breakthroughs are of significance in modelling and HPC design communities.
Baozhu Ge, Syuichi Itahashi, Keiichi Sato, Danhui Xu, Junhua Wang, Fan Fan, Qixin Tan, Joshua S. Fu, Xuemei Wang, Kazuyo Yamaji, Tatsuya Nagashima, Jie Li, Mizuo Kajino, Hong Liao, Meigen Zhang, Zhe Wang, Meng Li, Jung-Hun Woo, Junichi Kurokawa, Yuepeng Pan, Qizhong Wu, Xuejun Liu, and Zifa Wang
Atmos. Chem. Phys., 20, 10587–10610, https://doi.org/10.5194/acp-20-10587-2020, https://doi.org/10.5194/acp-20-10587-2020, 2020
Short summary
Short summary
Performances of the simulated deposition for different reduced N (Nr) species in China were conducted with the Model Inter-Comparison Study for Asia. Results showed that simulated wet deposition of oxidized N was overestimated in northeastern China and underestimated in south China, but Nr was underpredicted in all regions by all models. Oxidized N has larger uncertainties than Nr, indicating that the chemical reaction process is one of the most importance factors affecting model performance.
Cited articles
Bleichrodt, F., Bisseling, R. H., and Dijkstra, H. A.: Accelerating a
barotropic ocean model using a GPU, Ocean Model., 41, 16–21,
https://doi.org/10.1016/j.ocemod.2011.10.001, 2012.
Cao, K., Wu, Q., Wang, L., Wang, N., Cheng, H., Tang, X., Li, D., and Wang,
L.: The dataset of the manuscript “GPU-HADVPPM V1.0: high-efficient parallel
GPU design of the Piecewise Parabolic Method (PPM) for horizontal advection
in air quality model (CAMx V6.10)”, Zenodo [data set],
https://doi.org/10.5281/zenodo.7765218, 2023.
Colella, P. and Woodward, P. R.: The Piecewise Parabolic Method (PPM) for
gas-dynamical simulations, J. Comput. Phys., 54, 174–201,
https://doi.org/10.1016/0021-9991(84)90143-8, 1984.
ENVIRON: User Guide for Comprehensive Air Quality Model with Extensions
Version 6.1, https://camx-wp.azurewebsites.net/Files/CAMxUsersGuide_v6.10.pdf (last access: 19 December 2022), 2014.
ENVIRON: CAMx version 6.1, ENVIRON [code], available at: https://camx-wp.azurewebsites.net/download/source/, last access: 24 March 2023.
Govett, M., Rosinski, J., Middlecoff, J., Henderson, T., Lee, J., MacDonald,
A., Wang, N., Madden, P., Schramm, J., and Duarte, A.: Parallelization and
Performance of the NIM Weather Model on CPU, GPU, and MIC Processors,
B. Am. Meteorol. Soc., 98, 2201–2213,
https://doi.org/10.1175/bams-d-15-00278.1, 2017.
Houyoux, M. R. and Vukovich, J. M.: Updates to the Sparse Matrix Operator Kernel Emissions (SMOKE) modeling system and integration with Models-3, The Emission Inventory: Regional Strategies for the Future, Air Waste Management Association, Raleigh, N.C., 1461, 1999.
Huang, B., Mielikainen, J., Plaza, A. J., Huang, B., Huang, A. H. L., and Goldberg, M. D.: GPU acceleration of WRF WSM5 microphysics, High-Performance Computing in Remote Sensing, 8183, 81830S–81830S-9, https://doi.org/10.1117/12.901826, 2011.
Huang, B., Huang, M., Mielikainen, J., Huang, B., Huang, H. L. A., Goldberg,
M. D., and Plaza, A. J.: On the acceleration of Eta Ferrier Cloud
Microphysics Scheme in the Weather Research and Forecasting (WRF) model
using a GPU, High-Performance Computing in Remote Sensing II,
8539, 85390K85390K11, https://doi.org/10.1117/12.976908, 2012.
Huang, M., Huang, B., Mielikainen, J., Huang, H. L. A., Goldberg, M. D., and
Mehta, A.: Further Improvement on GPU-Based Parallel Implementation of WRF
5-Layer Thermal Diffusion Scheme, in: 2013 International Conference on Parallel
and Distributed Systems, Seoul, South Korea, 15–18 December 2013, https://doi.org/10.1109/icpads.2013.126, 2013.
Huang, M., Huang, B., Chang, Y.-L., Mielikainen, J., Huang, H.-L. A., and
Goldberg, M. D.: Efficient Parallel GPU Design on WRF Five-Layer Thermal
Diffusion Scheme, IEEE J. Sel. Top. Appl., 8, 2249–2259, https://doi.org/10.1109/jstars.2015.2422268,
2015.
Jiang, J., Lin, P., Wang, J., Liu, H., Chi, X., Hao, H., Wang, Y., Wang, W.,
and Zhang, L.: Porting LASG/ IAP Climate System Ocean Model to Gpus Using
OpenAcc, IEEE Access, 7, 154490–154501, https://doi.org/10.1109/access.2019.2932443, 2019.
Mielikainen, J., Huang, B., Huang, H.-L. A., and Goldberg, M. D.: GPU
Acceleration of the Updated Goddard Shortwave Radiation Scheme in the
Weather Research and Forecasting (WRF) Model, IEEE J. Sel.
Top. Appl., 5, 555–562,
https://doi.org/10.1109/jstars.2012.2186119, 2012a.
Mielikainen, J., Huang, B., Huang, H.-L. A., and Goldberg, M. D.: GPU
Implementation of Stony Brook University 5-Class Cloud Microphysics Scheme
in the WRF, IEEE J. Sel. Top. Appl., 5, 625–633, https://doi.org/10.1109/jstars.2011.2175707, 2012b.
Mielikainen, J., Huang, B., Huang, H. L. A., Goldberg, M. D., and Mehta, A.:
Speeding Up the Computation of WRF Double-Moment 6-Class Microphysics Scheme
with GPU, J. Atmos. Ocean. Tech., 30, 2896–2906,
https://doi.org/10.1175/jtech-d-12-00218.1, 2013a.
Mielikainen, J., Huang, B., Wang, J., Allen Huang, H. L., and Goldberg, M.
D.: Compute unified device architecture (CUDA)-based parallelization of WRF
Kessler cloud microphysics scheme, Comput. Geosci., 52, 292–299,
https://doi.org/10.1016/j.cageo.2012.10.006, 2013b.
NVIDIA: CUDA C Programming Guide Version 10.2, https://docs.nvidia.com/cuda/archive/10.2/pdf/CUDA_C_Programming_Guide.pdf (last access: 19 December 2022), 2020.
NVIDIA: Floating Point and IEEE 754 Compliance for NVIDIA GPUs, Release
12.1, https://docs.nvidia.com/cuda/floating-point/#differences-from-x86, last
access: 18 May 2023.
Odman, M. and Ingram, C.: Multiscale Air Quality Simulation Platform (MAQSIP): Source Code Documentation and Validation, Technical report, 83 pp., ENV-96TR002, MCNCNorth Carolina Supercomputing Center, Research Triangle Park, North Carolina, 1996.
Price, E., Mielikainen, J., Huang, M., Huang, B., Huang, H.-L. A., and Lee,
T.: GPU-Accelerated Longwave Radiation Scheme of the Rapid Radiative
Transfer Model for General Circulation Models (RRTMG), IEEE J.
Sel. Top. Appl., 7,
3660–3667, https://doi.org/10.1109/jstars.2014.2315771, 2014.
Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Barker, D.M., Duda,
M. G., Huang, X. Y., Wang, W., and Powers, J. G.: A Description of the
Advanced Research WRF Version3 (No. NCAR/TN-475CSTR), University Corporation
for Atmospheric Research, NCAR, https://doi.org/10.5065/D68S4MVH, 2008.
Streets, D. G., Bond, T. C., Carmichael, G. R., Fernandes, S. D., Fu, Q.,
He, D., Klimont, Z., Nelson, S. M., Tsai, N. Y., Wang, M. Q., Woo, J. H.,
and Yarber, K. F.: An inventory of gaseous and primary aerosol emissions in
Asia in the year 2000, J. Geophys. Res.-Atmos., 108, 8809–8823,
https://doi.org/10.1029/2002JD003093, 2003.
Streets, D. G., Zhang, Q., Wang, L., He, K., Hao, J., Wu, Y., Tang, Y., and
Carmichael, G. R.: Revisiting China's CO emissions after the Transport and
Chemical Evolution over the Pacific (TRACE-P) mission: Synthesis of
inventories, atmospheric modeling, and observations, J. Geophys.
Res.-Atmos., 111, D14306, https://doi.org/10.1029/2006JD007118, 2006.
Sun, Y., Wu, Q., Wang, L., Zhang, B., Yan, P., Wang, L., Cheng, H., Lv, M.,
Wang, N., and Ma, S.: Weather Reduced the Annual Heavy Pollution Days after
2016 in Beijing, Sola, 18, 135–139, https://doi.org/10.2151/sola.2022-022, 2022.
Wahib, M. and Maruyama, N.: Highly optimized full GPU acceleration of non-hydrostatic weather model SCALE-LES, in: 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, USA, 23–27 September 2013, 18, 65, https://doi.org/10.1109/CLUSTER.2013.6702667, 2013.
Wang, P., Jiang, J., Lin, P., Ding, M., Wei, J., Zhang, F., Zhao, L., Li, Y., Yu, Z., Zheng, W., Yu, Y., Chi, X., and Liu, H.: The GPU version of LASG/IAP Climate System Ocean Model version 3 (LICOM3) under the heterogeneous-compute interface for portability (HIP) framework and its large-scale application , Geosci. Model Dev., 14, 2781–2799, https://doi.org/10.5194/gmd-14-2781-2021, 2021.
Wang, Y., Guo, M., Zhao, Y., and Jiang, J.: GPUs-RRTMG_LW:
high-efficient and scalable computing for a longwave radiative transfer
model on multiple GPUs, J. Supercomput., 77, 4698–4717,
https://doi.org/10.1007/s11227-020-03451-3, 2021.
Wang, Z., Wang, Y., Wang, X., Li, F., Zhou, C., Hu, H., and Jiang, J.:
GPU-RRTMG_SW: Accelerating a Shortwave Radiative Transfer
Scheme on GPU, IEEE Access, 9, 84231–84240, https://doi.org/10.1109/access.2021.3087507,
2016.
Xiao, H., Lu, Y., Huang, J., and Xue, W.: An MPI+OpenACC-based PRM scalar
advection scheme in the GRAPES model over a cluster with multiple CPUs and
GPUs, Tsinghua Sci. Technol., 27, 164–173,
https://doi.org/10.26599/TST.2020.9010026, 2022.
Xu, S., Huang, X., Oey, L.-Y., Xu, F., Fu, H., Zhang, Y., and Yang, G.: POM.gpu-v1.0: a GPU-based Princeton Ocean Model, Geosci. Model Dev., 8, 2815–2827, https://doi.org/10.5194/gmd-8-2815-2015, 2015.
Zhang, Q., Streets, D. G., Carmichael, G. R., He, K. B., Huo, H., Kannari, A., Klimont, Z., Park, I. S., Reddy, S., Fu, J. S., Chen, D., Duan, L., Lei, Y., Wang, L. T., and Yao, Z. L.: Asian emissions in 2006 for the NASA INTEX-B mission, Atmos. Chem. Phys., 9, 5131–5153, https://doi.org/10.5194/acp-9-5131-2009, 2009.
Short summary
Offline performance experiment results show that the GPU-HADVPPM on a V100 GPU can achieve up to 1113.6 × speedups to its original version on an E5-2682 v4 CPU. A series of optimization measures are taken, and the CAMx-CUDA model improves the computing efficiency by 128.4 × on a single V100 GPU card. A parallel architecture with an MPI plus CUDA hybrid paradigm is presented, and it can achieve up to 4.5 × speedup when launching eight CPU cores and eight GPU cards.
Offline performance experiment results show that the GPU-HADVPPM on a V100 GPU can achieve up to...