Articles | Volume 14, issue 8
https://doi.org/10.5194/gmd-14-5205-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-14-5205-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Copula-based synthetic data augmentation for machine-learning emulators
David Meyer
CORRESPONDING AUTHOR
Department of Meteorology, University of Reading, Reading, UK
Department of Civil and Environmental Engineering, Imperial College
London, London, UK
Thomas Nagler
Mathematical Institute, Leiden University, Leiden,
the Netherlands
Robin J. Hogan
European Centre for Medium-Range Weather Forecasts,
Reading, UK
Department of Meteorology, University of Reading, Reading, UK
Related authors
No articles found.
Kaah P. Menang, Stefan A. Buehler, Lukas Kluft, Robin J. Hogan, and Florian E. Roemer
Atmos. Chem. Phys., 25, 11689–11701, https://doi.org/10.5194/acp-25-11689-2025, https://doi.org/10.5194/acp-25-11689-2025, 2025
Short summary
Short summary
We investigated the impact of the shortwave water vapour continuum absorption on clear-sky shortwave radiative feedback. For current temperatures, the impact is modest (<2%). In a warmer world, continuum-induced uncertainty in estimated feedback would be up to ~5%. Representing continuum absorption with the widely used semi-empirical model in radiative transfer calculations leads to an underestimation of this feedback. Constraining the shortwave continuum will help reduce these discrepancies.
Paolo Andreozzi, Mark D. Fielding, Robin J. Hogan, Richard M. Forbes, Samuel Rémy, Birger Bohn, and Ulrich Löhnert
EGUsphere, https://doi.org/10.5194/egusphere-2025-3790, https://doi.org/10.5194/egusphere-2025-3790, 2025
This preprint is open for discussion and under review for Atmospheric Chemistry and Physics (ACP).
Short summary
Short summary
Aerosols significantly contribute to the Earth’s climate, but models still struggle at representing them. Here we use satellite observations of clouds to improve aerosols in our weather and air-quality model. We show that African wildfires induce too bright simulated clouds and that our model removes too much aerosol from ice-containing clouds. This showcases how our approach effectively targets poorly observed aerosol processes, potentially informing weather forecasting and climate models.
Howard W. Barker, Jason N. S. Cole, Najda Villefranque, Zhipeng Qu, Almudena Velázquez Blázquez, Carlos Domenech, Shannon L. Mason, and Robin J. Hogan
Atmos. Meas. Tech., 18, 3095–3107, https://doi.org/10.5194/amt-18-3095-2025, https://doi.org/10.5194/amt-18-3095-2025, 2025
Short summary
Short summary
Measurements made by three instruments aboard EarthCARE are used to retrieve estimates of cloud and aerosol properties. A radiative closure assessment of these retrievals is performed by the ACMB-DF processor. Radiative transfer models acting on retrieved information produce broadband radiances commensurate with measurements made by EarthCARE’s broadband radiometer. Measured and modelled radiances for small domains are compared, and the likelihood of them differing by 10 W m2 defines the closure.
Jean-François Grailet, Robin J. Hogan, Nicolas Ghilain, David Bolsée, Xavier Fettweis, and Marilaure Grégoire
Geosci. Model Dev., 18, 1965–1988, https://doi.org/10.5194/gmd-18-1965-2025, https://doi.org/10.5194/gmd-18-1965-2025, 2025
Short summary
Short summary
The MAR (Modèle Régional Atmosphérique) is a regional climate model used for weather forecasting and studying the climate over various regions. This paper presents an update of MAR thanks to which it can precisely decompose solar radiation, in particular in the UV (ultraviolet) and photosynthesis ranges, both being critical to human health and ecosystems. As a first application of this new capability, this paper presents a method for predicting UV indices with MAR.
Robert Schoetter, Robin James Hogan, Cyril Caliot, and Valéry Masson
Geosci. Model Dev., 18, 405–431, https://doi.org/10.5194/gmd-18-405-2025, https://doi.org/10.5194/gmd-18-405-2025, 2025
Short summary
Short summary
Radiation is relevant to the atmospheric impact on people and infrastructure in cities as it can influence the urban heat island, building energy consumption, and human thermal comfort. A new urban radiation model, assuming a more realistic form of urban morphology, is coupled to the urban climate model Town Energy Balance (TEB). The new TEB is evaluated with a reference radiation model for a variety of urban morphologies, and an improvement in the simulated radiative observables is found.
Johannes Röttenbacher, André Ehrlich, Hanno Müller, Florian Ewald, Anna E. Luebke, Benjamin Kirbus, Robin J. Hogan, and Manfred Wendisch
Atmos. Chem. Phys., 24, 8085–8104, https://doi.org/10.5194/acp-24-8085-2024, https://doi.org/10.5194/acp-24-8085-2024, 2024
Short summary
Short summary
Weather prediction models simplify the physical processes related to light scattering by clouds consisting of complex ice crystals. Whether these simplifications are the cause for uncertainties in their prediction can be evaluated by comparing them with measurement data. Here we do this for Arctic ice clouds over sea ice using airborne measurements from two case studies. The model performs well for thick ice clouds but not so well for thin ones. This work can be used to improve the model.
Robin J. Hogan, Anthony J. Illingworth, Pavlos Kollias, Hajime Okamoto, and Ulla Wandinger
Atmos. Meas. Tech., 17, 3081–3083, https://doi.org/10.5194/amt-17-3081-2024, https://doi.org/10.5194/amt-17-3081-2024, 2024
Hanno Müller, André Ehrlich, Evelyn Jäkel, Johannes Röttenbacher, Benjamin Kirbus, Michael Schäfer, Robin J. Hogan, and Manfred Wendisch
Atmos. Chem. Phys., 24, 4157–4175, https://doi.org/10.5194/acp-24-4157-2024, https://doi.org/10.5194/acp-24-4157-2024, 2024
Short summary
Short summary
A weather model is used to compare solar radiation with measurements from an aircraft campaign in the Arctic. Model and observations agree on the downward radiation but show differences in the radiation reflected by the surface and the clouds, which in the model is too low above sea ice and too high above open ocean. The model–observation bias is reduced above open ocean by a realistic fraction of clouds and less cloud liquid water and above sea ice by less dark sea ice and more cloud droplets.
Shannon L. Mason, Howard W. Barker, Jason N. S. Cole, Nicole Docter, David P. Donovan, Robin J. Hogan, Anja Hünerbein, Pavlos Kollias, Bernat Puigdomènech Treserras, Zhipeng Qu, Ulla Wandinger, and Gerd-Jan van Zadelhoff
Atmos. Meas. Tech., 17, 875–898, https://doi.org/10.5194/amt-17-875-2024, https://doi.org/10.5194/amt-17-875-2024, 2024
Short summary
Short summary
When the EarthCARE mission enters its operational phase, many retrieval data products will be available, which will overlap both in terms of the measurements they use and the geophysical quantities they report. In this pre-launch study, we use simulated EarthCARE scenes to compare the coverage and performance of many data products from the European Space Agency production model, with the intention of better understanding the relation between products and providing a compact guide to users.
Megan A. Stretton, William Morrison, Robin J. Hogan, and Sue Grimmond
Geosci. Model Dev., 16, 5931–5947, https://doi.org/10.5194/gmd-16-5931-2023, https://doi.org/10.5194/gmd-16-5931-2023, 2023
Short summary
Short summary
Cities' materials and forms impact radiative fluxes. We evaluate the SPARTACUS-Urban multi-layer approach to modelling longwave radiation, describing realistic 3D geometry statistically using the explicit DART (Discrete Anisotropic Radiative Transfer) model. The temperature configurations used are derived from thermal camera observations. SPARTACUS-Urban accurately predicts longwave fluxes, with a low computational time (cf. DART), but has larger errors with sunlit/shaded surface temperatures.
Shannon L. Mason, Robin J. Hogan, Alessio Bozzo, and Nicola L. Pounder
Atmos. Meas. Tech., 16, 3459–3486, https://doi.org/10.5194/amt-16-3459-2023, https://doi.org/10.5194/amt-16-3459-2023, 2023
Short summary
Short summary
We present a method for accurately estimating the contents and properties of clouds, snow, rain, and aerosols through the atmosphere, using the combined measurements of the radar, lidar, and radiometer instruments aboard the upcoming EarthCARE satellite, and evaluate the performance of the retrieval, using test scenes simulated from a numerical forecast model. When EarthCARE is in operation, these quantities and their estimated uncertainties will be distributed in a data product called ACM-CAP.
Peter Ukkonen and Robin J. Hogan
Geosci. Model Dev., 16, 3241–3261, https://doi.org/10.5194/gmd-16-3241-2023, https://doi.org/10.5194/gmd-16-3241-2023, 2023
Short summary
Short summary
Climate and weather models suffer from uncertainties resulting from approximated processes. Solar and thermal radiation is one example, as it is computationally too costly to simulate precisely. This has led to attempts to replace radiation codes based on physical equations with neural networks (NNs) that are faster but uncertain. In this paper we use global weather simulations to demonstrate that a middle-ground approach of using NNs only to predict optical properties is accurate and reliable.
Abdanour Irbah, Julien Delanoë, Gerd-Jan van Zadelhoff, David P. Donovan, Pavlos Kollias, Bernat Puigdomènech Treserras, Shannon Mason, Robin J. Hogan, and Aleksandra Tatarevic
Atmos. Meas. Tech., 16, 2795–2820, https://doi.org/10.5194/amt-16-2795-2023, https://doi.org/10.5194/amt-16-2795-2023, 2023
Short summary
Short summary
The Cloud Profiling Radar (CPR) and ATmospheric LIDar (ATLID) aboard the EarthCARE satellite are used to probe the Earth's atmosphere by measuring cloud and aerosol profiles. ATLID is sensitive to aerosols and small cloud particles and CPR to large ice particles, snowflakes and raindrops. It is the synergy of the measurements of these two instruments that allows a better classification of the atmospheric targets and the description of the associated products, which are the subject of this paper.
Beatriz M. Monge-Sanz, Alessio Bozzo, Nicholas Byrne, Martyn P. Chipperfield, Michail Diamantakis, Johannes Flemming, Lesley J. Gray, Robin J. Hogan, Luke Jones, Linus Magnusson, Inna Polichtchouk, Theodore G. Shepherd, Nils Wedi, and Antje Weisheimer
Atmos. Chem. Phys., 22, 4277–4302, https://doi.org/10.5194/acp-22-4277-2022, https://doi.org/10.5194/acp-22-4277-2022, 2022
Short summary
Short summary
The stratosphere is emerging as one of the keys to improve tropospheric weather and climate predictions. This study provides evidence of the role the stratospheric ozone layer plays in improving weather predictions at different timescales. Using a new ozone modelling approach suitable for high-resolution global models that provide operational forecasts from days to seasons, we find significant improvements in stratospheric meteorological fields and stratosphere–troposphere coupling.
Robin J. Hogan and Marco Matricardi
Geosci. Model Dev., 13, 6501–6521, https://doi.org/10.5194/gmd-13-6501-2020, https://doi.org/10.5194/gmd-13-6501-2020, 2020
Short summary
Short summary
A key component of computer models used to predict weather and climate is the radiation scheme, which calculates how solar and infrared radiation heats and cools the atmosphere and surface, including the important role of greenhouse gases. This paper describes the experimental protocol and large datasets for a new project, CKDMIP, to evaluate and improve the accuracy of the treatment of atmospheric gases in the radiation schemes used worldwide, as well as their computational speed.
Cited articles
Aas, K., Czado, C., Frigessi, A., and Bakken, H.: Pair-copula constructions
of multiple dependence, Insur. Math. Econ., 44, 182–198,
https://doi.org/10.1016/j.insmatheco.2007.02.001, 2009.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: A System for Large-Scale Machine Learning, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, 265–283, 2016.
Bolton, T. and Zanna, L.: Applications of Deep Learning to Ocean Data
Inference and Subgrid Parameterization, J. Adv. Model. Earth Syst., 11,
376–399, https://doi.org/10.1029/2018MS001472, 2019.
Brenowitz, N. D. and Bretherton, C. S.: Prognostic Validation of a Neural
Network Unified Physics Parameterization, Geophys. Res. Lett., 45,
6289–6298, https://doi.org/10.1029/2018GL078510, 2018.
Cheruy, F., Chevallier, F., Morcrette, J.-J., Scott, N. A., and Chédin,
A.: Une méthode utilisant les techniques neuronales pour le calcul
rapide de la distribution verticale du bilan radiatif thermique terrestre, Comptes Rendus de l'Academie des Sciences Serie II,
322, 665–672, hal-02954375, 1996.
Chevallier, F., Ruy, F. C., Scott, N. A., and Din, A. C.: A Neural Network
Approach for a Fast and Accurate Computation of a Longwave Radiative Budget, J. Appl. Meteorol. Climatol., 37, 1385–1397, https://doi.org/10.1175/1520-0450(1998)037<1385:ANNAFA>2.0.CO;2, 1998.
Chevallier, F., Morcrette, J.-J., Chéruy, F., and Scott, N. A.: Use of a
neural-network-based long-wave radiative-transfer scheme in the ECMWF
atmospheric model, Q. J. Roy. Meteor. Soc., 126, 761–776,
https://doi.org/10.1002/qj.49712656318, 2000.
Czado, C.: Analyzing Dependent Data with Vine Copulas: A Practical Guide
With R, Springer International Publishing, Cham,
https://doi.org/10.1007/978-3-030-13785-4, 2019.
Dißmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D.: Selecting
and estimating regular vine copulae and application to financial returns,
Comput. Stat. Data Anal., 59, 52–69,
https://doi.org/10.1016/j.csda.2012.08.010, 2013.
Elsasser, W. M.: Heat transfer by infrared radiation in the atmosphere, Blue
Hill Meteorological Observatory, Harvard University, Milton, MA, USA, 1942.
Eresmaa, R. and McNally, A. P.: Diverse profile datasets from the ECMWF
137-level short-range forecasts, EUMETSAT Satellite Application Facility
(NWP SAF), European Centre for Medium-range Weather Forecasts
Shinfield Park, Reading, RG2 9AX, UK, 2014.
Gentine, P., Pritchard, M., Rasp, S., Reinaudi, G., and Yacalis, G.: Could
Machine Learning Break the Convection Parameterization Deadlock?, Geophys.
Res. Lett., 45, 5742–5751, https://doi.org/10.1029/2018GL078202, 2018.
Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT Press,
Cambridge, 775 pp., 2016.
Hocking, J., Vidot, J., Brunel, P., Roquet, P., Silveira, B., Turner, E., and Lupu, C.: A new gas absorption optical depth parameterisation for RTTOV version 13, Geosci. Model Dev., 14, 2899–2915, https://doi.org/10.5194/gmd-14-2899-2021, 2021.
Hogan, R. J. and Bozzo, A.: A Flexible and Efficient Radiation Scheme for
the ECMWF Model, J. Adv. Model. Earth Syst., 10, 1990–2008,
https://doi.org/10.1029/2018MS001364, 2018.
Hogan, R. J. and Matricardi, M.: Evaluating and improving the treatment of gases in radiation schemes: the Correlated K-Distribution Model Intercomparison Project (CKDMIP), Geosci. Model Dev., 13, 6501–6521, https://doi.org/10.5194/gmd-13-6501-2020, 2020.
Huntingford, C., Jeffers, E. S., Bonsall, M. B., Christensen, H. M., Lees,
T., and Yang, H.: Machine learning and artificial intelligence to aid
climate change research and preparedness, Environ. Res. Lett., 14, 124007,
https://doi.org/10.1088/1748-9326/ab4e55, 2019.
Joe, H.: Dependence Modeling with Copulas, 1st edn., Chapman and
Hall/CRC, https://doi.org/10.1201/b17116, 2014.
Krasnopolsky, V. M. and Lin, Y.: A Neural Network Nonlinear Multimodel
Ensemble to Improve Precipitation Forecasts over Continental US, Adv.
Meteorol., 2012, 649450 , https://doi.org/10.1155/2012/649450, 2012.
Krasnopolsky, V. M., Chalikov, D. V., and Tolman, H. L.: A neural network
technique to improve computational efficiency of numerical oceanic models, Ocean Model., 21, 363–383,
https://doi.org/10.1016/S1463-5003(02)00010-0, 2002.
Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Chalikov, D. V.: New
Approach to Calculation of Atmospheric Model Physics: Accurate and Fast
Neural Network Emulation of Longwave Radiation in a Climate Model, Mon. Wea.
Rev., 133, 1370–1383, https://doi.org/10.1175/MWR2923.1, 2005.
Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Using
Ensemble of Neural Networks to Learn Stochastic Convection Parameterizations
for Climate and Numerical Weather Prediction Models from Data Simulated by a
Cloud Resolving Model, Advances in Artificial Neural Systems, 2013, 485913,
https://doi.org/10.1155/2013/485913, 2013.
Kurtzer, G. M., Sochat, V., and Bauer, M. W.: Singularity: Scientific
containers for mobility of compute, PLoS ONE, 12, e0177459,
https://doi.org/10.1371/journal.pone.0177459, 2017.
López-Pintado, S. and Romo, J.: On the Concept of Depth for Functional
Data, J. Am. Stat. Assoc., 104, 718–734,
https://doi.org/10.1198/jasa.2009.0108, 2009.
Meyer, D.: Data archive for paper “Copula-based synthetic data augmentation for machine learning-emulators” (Version 1.2.0) [Data set], https://doi.org/10.5281/zenodo.5150327, 2021.
Meyer, D. and Nagler, T.: Synthia: multidimensional synthetic data generation in Python (Version 0.3.0), Zenodo, https://doi.org/10.5281/zenodo.5150200, 2020.
Meyer, D. and Nagler, T.: Synthia: Multidimensional synthetic data generation in Python, Journal of Open Source Software, https://doi.org/10.21105/joss.02863, 2021.
Meyer, D., Schoetter, R., Riechert, M., Verrelle, A., Tewari, M., Dudhia,
J., Masson, V., Reeuwijk, M., and Grimmond, S.: WRF-TEB: Implementation and
Evaluation of the Coupled Weather Research and Forecasting (WRF) and Town
Energy Balance (TEB) Model, J. Adv. Model. Earth Syst., 12, e2019MS001961,
https://doi.org/10.1029/2019MS001961, 2020.
Meyer, D., Hogan, R. J., Dueben, P. D., and Mason, S. L.: Machine Learning Emulation of 3D Cloud Radiative Effects, J. Adv. Model. Earth Syst., https://doi.org/10.1029/2021MS002550, 2021.
Nagler, T., Schellhase, C., and Czado, C.: Nonparametric estimation of
simplified vine copula models: comparison of methods, Dependence Model., 5, 99–120, https://doi.org/10.1515/demo-2017-0007, 2017.
Nowack, P., Braesicke, P., Haigh, J., Abraham, N. L., Pyle, J., and
Voulgarakis, A.: Using machine learning to build temperature-based ozone
parameterizations for climate sensitivity simulations, Environ. Res. Lett.,
13, 104016, https://doi.org/10.1088/1748-9326/aae2be, 2018.
O'Gorman, P. A. and Dwyer, J. G.: Using Machine Learning to Parameterize
Moist Convection: Potential for Modeling of Climate, Climate Change, and
Extreme Events, J. Adv. Model. Earth Syst., 10, 2548–2563,
https://doi.org/10.1029/2018MS001351, 2018.
Patki, N., Wedge, R., and Veeramachaneni, K.: The Synthetic Data Vault, in:
2016 IEEE International Conference on Data Science and Advanced Analytics
(DSAA), 2016 IEEE International Conference on Data Science and Advanced
Analytics (DSAA), Montreal, QC, Canada, 399–410,
https://doi.org/10.1109/DSAA.2016.49, 2016.
Petty, G. W.: A First Course in Atmospheric Radiation, End of Line Clearance
Book, Madison, Wis, 459 pp., 2006.
Rasp, S. and Lerch, S.: Neural Networks for Postprocessing Ensemble Weather
Forecasts, Mon. Weather Rev., 146, 3885–3900,
https://doi.org/10.1175/MWR-D-18-0187.1, 2018.
Rasp, S., Pritchard, M. S., and Gentine, P.: Deep learning to represent
subgrid processes in climate models, P. Natl. Acad. Sci. USA, 115,
9684–9689, https://doi.org/10.1073/pnas.1810286115, 2018.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J.,
Carvalhais, N., and Prabhat: Deep learning and process understanding for
data-driven Earth system science, Nature, 566, 195–204,
https://doi.org/10.1038/s41586-019-0912-1, 2019.
Seitola, T., Mikkola, V., Silen, J., and Järvinen, H.: Random
projections in reducing the dimensionality of climate simulation data,
Tellus A, 66, 25274, https://doi.org/10.3402/tellusa.v66.25274, 2014.
Shorten, C. and Khoshgoftaar, T. M.: A survey on Image Data Augmentation for
Deep Learning, J. Big Data, 6, 60, https://doi.org/10.1186/s40537-019-0197-0,
2019.
Sklar, M.: Fonctions de repartition an dimensions et leurs marges, Open Journal of Statistics, 8,
229–231, 1959.
Tagasovska, N., Ackerer, D., and Vatter, T.: Copulas as high-dimensional
generative models: Vine copula autoencoders, in: Advances in neural
information processing systems 32, edited by: Wallach, H., Larochelle, H.,
Beygelzimer, A., dAlché-Buc, F., Fox, E., and Garnett, R., Curran
Associates, Inc., 6528–6540, 2019.
Trivedi, P. K. and Zimmer, D. M.: Copula Modeling: An Introduction for
Practitioners, FNT in Econometrics, 1, 1–111, https://doi.org/10.1561/0800000005,
2006.
Ukkonen, P., Pincus, R., Hogan, R. J., Nielsen, K. P., and Kaas, E.:
Accelerating radiation computations for dynamical models with targeted
machine learning and code optimization, J. Adv. Model. Earth Syst., 12, e2020MS002226,
https://doi.org/10.1029/2020ms002226, 2020.
Veerman, M. A., Pincus, R., Stoffer, R., van Leeuwen, C. M., Podareanu, D.,
and van Heerwaarden, C. C.: Predicting atmospheric optical properties for
radiative transfer computations using neural networks, Phil. Trans. R. Soc.
A., 379, 20200095, https://doi.org/10.1098/rsta.2020.0095, 2021.
Wan, Z., Zhang, Y., and He, H.: Variational autoencoder based synthetic data
generation for imbalanced learning, in: 2017 IEEE Symposium Series on
Computational Intelligence (SSCI), Honolulu, HI, 27 November–1 December 2017,
https://doi.org/10.1109/SSCI.2017.8285168, 2017.
Xu, L. and Veeramachaneni, K.: Synthesizing Tabular Data using Generative
Adversarial Networks, arXiv [preprint], arXiv:1811.11264, 27 November 2018.
Short summary
A major limitation in training machine-learning emulators is often caused by the lack of data. This paper presents a cheap way to increase the size of training datasets using statistical techniques and thereby improve the performance of machine-learning emulators.
A major limitation in training machine-learning emulators is often caused by the lack of data....