Preprints
https://doi.org/10.5194/gmd-2020-427
https://doi.org/10.5194/gmd-2020-427

Submitted as: development and technical paper 05 Jan 2021

Submitted as: development and technical paper | 05 Jan 2021

Review status: this preprint is currently under review for the journal GMD.

Copula-based synthetic data generation for machine learning emulators in weather and climate: application to a simple radiation model

David Meyer1,2, Thomas Nagler3, and Robin J. Hogan4,1 David Meyer et al.
  • 1Department of Meteorology, University of Reading, Reading, UK
  • 2Department of Civil and Environmental Engineering, Imperial College London, London, UK
  • 3Mathematical Institute, Leiden University, Leiden, The Netherlands
  • 4European Centre for Medium-Range Weather Forecasts, Reading, UK

Abstract. Can we improve machine learning (ML) emulators with synthetic data? The use of real data for training ML models is often the cause of major limitations. For example, real data may be (a) only representative of a subset of situations and domains, (b) expensive to source, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data is becoming increasingly popular in computer vision, the training of ML emulators in weather and climate still relies on the use of real data datasets. Here we investigate whether the use of copula-based synthetically-augmented datasets improves the prediction of ML emulators for estimating the downwelling longwave radiation. Results show that bulk errors are cut by up to 75 % for the mean bias error (from 0.08 to −0.02 W m−2) and by up to 62 % (from 1.17 to 0.44 W m−2) for the mean absolute error, thus showing potential for improving the generalization of future ML emulators.

David Meyer et al.

Status: open (until 02 Mar 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

David Meyer et al.

Data sets

Data archive for paper "Copula-based synthetic data generation for machine learning emulators in weather and climate: application to a simple radiation model" David Meyer https://doi.org/10.5281/zenodo.4320794

Model code and software

Data archive for paper "Copula-based synthetic data generation for machine learning emulators in weather and climate: application to a simple radiation model" David Meyer https://doi.org/10.5281/zenodo.4320794

David Meyer et al.

Viewed

Total article views: 174 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
142 30 2 174 1 0
  • HTML: 142
  • PDF: 30
  • XML: 2
  • Total: 174
  • BibTeX: 1
  • EndNote: 0
Views and downloads (calculated since 05 Jan 2021)
Cumulative views and downloads (calculated since 05 Jan 2021)

Viewed (geographical distribution)

Total article views: 164 (including HTML, PDF, and XML) Thereof 162 with geography defined and 2 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 22 Jan 2021