Articles | Volume 19, issue 9
https://doi.org/10.5194/gmd-19-3783-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-19-3783-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Actionable reporting of CPU-GPU performance comparisons: insights from a CLUBB case study
Department of Mathematical Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53211-3029, USA
NSF National Center for Atmospheric Research, Boulder, CO 80307-3000, USA
Vincent E. Larson
Department of Mathematical Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53211-3029, USA
Pacific Northwest National Laboratory, Richland, WA 99352, USA
John M. Dennis
NSF National Center for Atmospheric Research, Boulder, CO 80307-3000, USA
Sheri A. Voelz
NSF National Center for Atmospheric Research, Boulder, CO 80307-3000, USA
Related authors
No articles found.
Vincent E. Larson, Zhun Guo, Benjamin A. Stephens, Colin Zarzycki, Gerhard Dikta, Yun Qian, and Shaocheng Xie
Geosci. Model Dev., 18, 9767–9790, https://doi.org/10.5194/gmd-18-9767-2025, https://doi.org/10.5194/gmd-18-9767-2025, 2025
Short summary
Short summary
Global models of the atmosphere contain errors that lead to inaccurate simulations. A software tool ("QuadTune") is presented that attempts to mitigate errors related to suboptimal parameter values. It also displays diagnostic plots that provide hints about where structural errors might lie in the model.
Andrew Gettelman, Hugh Morrison, Trude Eidhammer, Katherine Thayer-Calder, Jian Sun, Richard Forbes, Zachary McGraw, Jiang Zhu, Trude Storelvmo, and John Dennis
Geosci. Model Dev., 16, 1735–1754, https://doi.org/10.5194/gmd-16-1735-2023, https://doi.org/10.5194/gmd-16-1735-2023, 2023
Short summary
Short summary
Clouds are a critical part of weather and climate prediction. In this work, we document updates and corrections to the description of clouds used in several Earth system models. These updates include the ability to run the scheme on graphics processing units (GPUs), changes to the numerical description of precipitation, and a correction to the ice number. There are big improvements in the computational performance that can be achieved with GPU acceleration.
Meng Huang, Po-Lun Ma, Nathaniel W. Chaney, Dalei Hao, Gautam Bisht, Megan D. Fowler, Vincent E. Larson, and L. Ruby Leung
Geosci. Model Dev., 15, 6371–6384, https://doi.org/10.5194/gmd-15-6371-2022, https://doi.org/10.5194/gmd-15-6371-2022, 2022
Short summary
Short summary
The land surface in one grid cell may be diverse in character. This study uses an explicit way to account for that subgrid diversity in a state-of-the-art Earth system model (ESM) and explores its implications for the overlying atmosphere. We find that the shallow clouds are increased significantly with the land surface diversity. Our work highlights the importance of accurately representing the land surface and its interaction with the atmosphere in next-generation ESMs.
Hui Wan, Kai Zhang, Philip J. Rasch, Vincent E. Larson, Xubin Zeng, Shixuan Zhang, and Ross Dixon
Geosci. Model Dev., 15, 3205–3231, https://doi.org/10.5194/gmd-15-3205-2022, https://doi.org/10.5194/gmd-15-3205-2022, 2022
Short summary
Short summary
This paper describes a tool embedded in a global climate model for sampling atmospheric conditions and monitoring physical processes as a numerical simulation is being carried out. The tool facilitates process-level model evaluation by allowing the users to select a wide range of quantities and processes to monitor at run time without having to do tedious ad hoc coding.
Cited articles
Andersch, M., Lucas, J., Álvarez-Mesa, M. A., and Juurlink, B.: On latency in GPU throughput microarchitectures, in: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 169–170, IEEE, https://doi.org/10.1109/ISPASS.2015.7095801, 2015. a
Bertagna, L., Deakin, M., Guba, O., Sunderland, D., Bradley, A. M., Tezaur, I. K., Taylor, M. A., and Salinger, A. G.: HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model, Geosci. Model Dev., 12, 1423–1441, https://doi.org/10.5194/gmd-12-1423-2019, 2019. a
Bogenschutz, P. A., Gettelman, A., Hannay, C., Larson, V. E., Neale, R. B., Craig, C., and Chen, C.-C.: The path to CAM6: coupled simulations with CAM5.4 and CAM5.5, Geosci. Model Dev., 11, 235–255, https://doi.org/10.5194/gmd-11-235-2018, 2018. a
Brown, A. R., Cederwall, R. T., Chlond, A., Duynkerke, P. G., Golaz, J.-C., Khairoutdinov, M., Lewellen, D. C., Lock, A. P., MacVean, M. K., Moeng, C.-H., Neggers, R. A. J., Siebesma, A. P., and Stevens, B.: Large‐Eddy Simulation of the Diurnal Cycle of Shallow Cumulus Convection over Land, Q. J. Roy. Meteor. Soc., 128, 1075–1093, https://doi.org/10.1256/003590002320373210, 2002. a
Coleman, D. M. and Feldman, D. R.: Porting Existing Radiation Code for GPU Acceleration, IEEE J. Sel. Top. Appl., 6, 2486–2491, https://doi.org/10.1109/JSTARS.2013.2247379, 2013. a
Danabasoglu, G., Lamarque, J.-F., Bacmeister, J., Bailey, D., DuVivier, A., Edwards, J., Emmons, L., Fasullo, J., Garcia, R., Gettelman, A., Hannay, C., Holland, M. M., Large, W. G., Lauritzen, P. H., Lawrence, D. M., Lenaerts, J. T. M., Lindsay, K., Lipscomb, W. H., Mills, M. J., Neale, R., Oleson, K. W., Otto-Bliesner, B., Phillips, A. S., Sacks, W., Tilmes, S., van Kampenhout, L., Vertenstein, M., Bertini, A., Dennis, J., Deser, C., Fischer, C., Fox-Kemper, B., Kay, J. E., Kinnison, D., Kushner, P. J., Larson, V. E., Long, M. C., Mickelson, S., Moore, J. K., Nienhouse, E., Polvani, L., Rasch, P. J., and Strand, W. G.: The Community Earth System Model Version 2 (CESM2), J. Adv. Model. Earth Sy., 12, e2019MS001916, https://doi.org/10.1029/2019MS001916, 2020. a
Escobar, J., Wautelet, P., Pianezze, J., Pantillon, F., Dauhut, T., Barthe, C., and Chaboureau, J.-P.: Porting the Meso-NH atmospheric model on different GPU architectures for the next generation of supercomputers (version MESONH-v55-OpenACC), Geosci. Model Dev., 18, 2679–2700, https://doi.org/10.5194/gmd-18-2679-2025, 2025. a, b
Gettelman, A., Morrison, H., Eidhammer, T., Thayer-Calder, K., Sun, J., Forbes, R., McGraw, Z., Zhu, J., Storelvmo, T., and Dennis, J.: Importance of ice nucleation and precipitation on climate with the Parameterization of Unified Microphysics Across Scales version 1 (PUMASv1), Geosci. Model Dev., 16, 1735–1754, https://doi.org/10.5194/gmd-16-1735-2023, 2023. a
Golaz, J., Van Roekel, L., Zheng, X., Roberts, A., Wolfe, J., Lin, W., Bradley, A., Tang, Q., Maltrud, M., Forsyth, R., Zhang, C., Zhou, T., Zhang, K., Zender, C., Wu, M., Wang, H., Turner, A., Singh, B., Richter, J., Qin, Y., Petersen, M., Mametjanov, A., Ma, P., Larson, V., Krishna, J., Keen, N., Jeffery, N., Hunke, E., Hannah, W., Guba, O., Griffin, B., Feng, Y., Engwirda, D., Di Vittorio, A., Dang, C., Conlon, L., Chen, C., Brunke, M., Bisht, G., Benedict, J., Asay-Davis, X., Zhang, Y., Zhang, M., Zeng, X., Xie, S., Wolfram, P., Vo, T., Veneziani, M., Tesfa, T., Sreepathi, S., Salinger, A., Reeves Eyre, J., Prather, M., Mahajan, S., Li, Q., Jones, P., Jacob, R., Huebler, G., Huang, X., Hillman, B., Harrop, B., Foucar, J., Fang, Y., Comeau, D., Caldwell, P., Bartoletti, T., Balaguru, K., Taylor, M., McCoy, R., Leung, L., and Bader, D.: The DOE E3SM Model Version 2: Overview of the Physical Model and Initial Model Evaluation, J. Adv. Model. Earth Sy., 14, https://doi.org/10.1029/2022MS003156, 2022. a
Govett, M., Rosinski, J., Middlecoff, J., Henderson, T., Lee, J., MacDonald, A., Wang, N., Madden, P., Schramm, J., and Duarte, A.: Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors, B. Am. Meteorol. Soc., 98, 2201–2213, https://doi.org/10.1175/BAMS-D-15-00278.1, 2017. a
HPE Cray: HPE Cray Compiling Environment Fortran Compiler (crayftn), version 15.0, part of HPE Cray Compiling Environment (CCE), https://cpe.ext.hpe.com/docs/latest/getting_started/CPE-CCE-Fortran.html (last access: 28 April 2026), 2024. a
Huebler, G. and Larson, V.: CLUBB code and profiling results, Zenodo [code, data set], https://doi.org/10.5281/zenodo.17081296, 2025. a
Iacono, M. J., Delamere, J. S., Mlawer, E. J., Shephard, M. W., Clough, S. A., and Collins, W. D.: Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models, J. Geophys. Res., 113, https://doi.org/10.1029/2008JD009944, 2008. a, b
Intel Corporation: Intel oneAPI Fortran Compiler (ifx), version 2024.2.1, part of Intel oneAPI HPC Toolkit, https://www.intel.com/content/www/us/en/developer/tools/oneapi/fortran-compiler.html (last access: 28 April 2026), 2024. a
Intel Corporation: Intel VTune Profiler, performance analysis and profiling tool (part of Intel oneAPI), https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html (last access: 28 April 2026), 2025. a
Jendersie, R., Lessig, C., and Richter, T.: A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1), Geosci. Model Dev., 18, 3017–3040, https://doi.org/10.5194/gmd-18-3017-2025, 2025. a
Kanur, S., Lund, W., Tsiopoulos, L., and Lilius, J.: Determining a device crossover point in CPU/GPU systems for streaming applications, in: 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 1417–1421, https://doi.org/10.1109/GlobalSIP.2015.7418432, 2015. a
Katsigiannis, S., Dimitsas, V., and Maroulis, D.: A GPU vs CPU performance evaluation of an experimental video compression algorithm, in: 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX), 1–6, https://doi.org/10.1109/QoMEX.2015.7148134, 2015. a
Khalilov, M. and Timofeev, A.: Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU, J. Phys. Conf. Ser., 1740, 012056, https://doi.org/10.1088/1742-6596/1740/1/012056, 2021. a
Lapillonne, X., Hupp, D., Gessler, F., Walser, A., Pauling, A., Lauber, A., Cumming, B., Osuna, C., Müller, C., Merker, C., Leuenberger, D., Leutwyler, D., Alexeev, D., Vollenweider, G., Van Parys, G., Jucker, J., Jansing, L., Arpagaus, M., Induni, M., Jacob, M., Kraushaar, M., Jähn, M., Stellio, M., Fuhrer, O., Baumann, P., Steiner, P., Kaufmann, P., Dietlicher, R., Müller, R., Kosukhin, S., Schulthess, T. C., Schättler, U., Cherkas, V., and Sawyer, W.: Operational numerical weather prediction with ICON on GPUs (version 2024.10), Geosci. Model Dev., 19, 755–772, https://doi.org/10.5194/gmd-19-755-2026, 2026. a
Larson, V. E.: CLUBB-SILHS: A parameterization of subgrid variability in the atmosphere, arXiv, https://doi.org/10.48550/arXiv.1711.03675, 2022. a, b
Lee, V. W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A. D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., and Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, SIGARCH Comput. Archit. News, 38, 451–460, https://doi.org/10.1145/1816038.1816021, 2010. a, b
Message Passing Interface Forum: MPI Standards, the official MPI specification, https://www.mpi-forum.org/docs/ (last access: 28 April 2026), 2024. a
Mielikainen, J., Price, E., Huang, B., Huang, H.-L. A., and Lee, T.: GPU Compute Unified Device Architecture (CUDA)-based Parallelization of the RRTMG Shortwave Rapid Radiative Transfer Model, IEEE J. Sel. Top. Appl., 9, 921–931, https://doi.org/10.1109/JSTARS.2015.2427652, 2016. a, b
NSF NCAR: NSF NCAR HPC Casper Documentation, https://ncar-hpc-docs.readthedocs.io/en/latest/compute-systems/casper/ (last access: 20 July 2025), 2025a. a
NSF NCAR: NSF NCAR HPC Derecho Documentation, https://ncar-hpc-docs.readthedocs.io/en/latest/compute-systems/derecho/ (last access: 20 July 2025), 2025b. a
NVIDIA: NVIDIA Nsight Compute Documentation, https://docs.nvidia.com/nsight-compute/ (last access: 18 July 2025), 2025a. a
NVIDIA Corporation: NVIDIA HPC SDK Fortran Compiler (nvfortran), version 24.11, included in NVIDIA HPC SDK 24.11, https://docs.nvidia.com/hpc-sdk/pdf/hpc-sdk2411rn.pdf (last access: 28 April 2026), 2024a. a
NVIDIA Corporation: CUDA Best Practices Guide, see guidance on occupancy and register usage, https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ (last access: 28 April 2026), 2024b. a
NVIDIA Corporation: NVIDIA Multi-Process Service (MPS), NVIDIA Docs Hub – GPU Management and Deployment, “Multi-Process Service (MPS)”, cUDA MPS enables concurrent multi-process execution on a single GPU by sharing CUDA contexts and scheduling resources, improving utilization and reducing context-switch overhead, https://docs.nvidia.com/deploy/mps/index.html (last access: 28 April 2026), 2025. a
OpenMP.org/specifications: OpenMP Application Programming Interface, includes
target offloading directives, https://www.openmp.org/specifications/ (last access: 28 April 2026), 2024. a
ORNL: Frontier User Guide, https://docs.olcf.ornl.gov/systems/frontier_user_guide.html (last access: 20 July 2025), 2025. a
Shan, H., Zhao, Z., and Wagner, M.: Accelerating the Performance of Modal Aerosol Module of E3SM Using OpenACC, in: Accelerator Programming Using Directives, edited by: Wienke, S. and Bhalachandra, S., 47–65, Springer International Publishing, Cham, ISBN 978-3-030-49943-3, 2020. a
Shen, D., Chabbi, M., and Liu, X.: An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs, in: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'18, 21–30, Association for Computing Machinery, New York, NY, USA, ISBN 9781450356459, https://doi.org/10.1145/3178442.3178445, 2018. a
Sun, J., Dennis, J. M., Mickelson, S. A. V., Vanderwende, B. J., Gettelman, A., and Thayer-Calder, K.: Acceleration of the Parameterization of Unified Microphysics Across Scales (PUMAS) on the Graphics Processing Unit (GPU) With Directive-Based Methods, J. Adv. Model. Earth Sy., 15, https://doi.org/10.1029/2022ms003515, 2023. a, b, c
Syberfeldt, A. and Ekblom, T.: A comparative evaluation of the GPU vs. the CPU for parallelization of evolutionary algorithms through multiple independent runs, International Journal of Computer Science & Information Technology (IJCSIT), 9, 1–14, https://doi.org/10.5121/ijcsit.2017.9301, 2017. a
Walker, D. W., Aldcroft, T., Cisneros, A., Fox, G. C., and Furmanski, W.: LU decomposition of banded matrices and the solution of linear systems on hypercubes, in: Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications – Volume 2, C3P, 1635–1655, Association for Computing Machinery, New York, NY, USA, ISBN 0897912780, https://doi.org/10.1145/63047.63124, 1989. a
Wang, P., Jiang, J., Lin, P., Ding, M., Wei, J., Zhang, F., Zhao, L., Li, Y., Yu, Z., Zheng, W., Yu, Y., Chi, X., and Liu, H.: The GPU version of LASG/IAP Climate System Ocean Model version 3 (LICOM3) under the heterogeneous-compute interface for portability (HIP) framework and its large-scale application , Geosci. Model Dev., 14, 2781–2799, https://doi.org/10.5194/gmd-14-2781-2021, 2021. a
Wang, Y., Zhao, Y., Li, W., Jiang, J., Ji, X., and Zomaya, A. Y.: Using a GPU to Accelerate a Longwave Radiative Transfer Model with Efficient CUDA-Based Methods, Appl. Sci., 9, https://doi.org/10.3390/app9194039, 2019. a
Watkins, J., Carlson, M., Shan, K., Tezaur, I., Perego, M., Bertagna, L., Kao, C., Hoffman, M. J., and Price, S. F.: Performance portable ice-sheet modeling with MALI, Int. J. High Perform. C., https://doi.org/10.13140/RG.2.2.17763.63526, 2022. a, b
Williams, S., Waterman, A., and Patterson, D.: Roofline: an insightful visual performance model for multicore architectures, Commun. ACM, 52, 65–76, https://doi.org/10.1145/1498765.1498785, 2009. a
Worley, P. H. and Drake, J. B.: Performance Portability in the Physical Parameterizations of the Community Atmospheric Model, Int. J. High Perform. C., 19, 187–201, https://doi.org/10.1177/1094342005056095, 2005. a, b
Zhong, Y., Dropsho, S. G., Shen, X., Studer, A., and Ding, C.: Miss rate prediction across program inputs and cache configurations, IEEE T. Comput., 56, 328–343, 2007. a
Zhong, Y., Shen, X., and Ding, C.: Program Locality Analysis Using Reuse Distance, ACM Trans. Progr. Lang. Sys., 31, https://doi.org/10.1145/1552309.1552310, 2009. a
Short summary
Central processing units (CPUs) and graphics processing units (GPUs) are different devices that suit different kinds of work. Using a climate modeling component, we provide a clearer way to tell which device type is faster for a given task. This matters because runs usually use only one device type. Our results are actionable: they guide device choice, report performance gains fairly, highlight code areas to improve, and show how code structure and optimization can change conclusions.
Central processing units (CPUs) and graphics processing units (GPUs) are different devices that...