Articles | Volume 14, issue 8
https://doi.org/10.5194/gmd-14-5205-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-14-5205-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Copula-based synthetic data augmentation for machine-learning emulators
David Meyer
CORRESPONDING AUTHOR
Department of Meteorology, University of Reading, Reading, UK
Department of Civil and Environmental Engineering, Imperial College
London, London, UK
Thomas Nagler
Mathematical Institute, Leiden University, Leiden,
the Netherlands
Robin J. Hogan
European Centre for Medium-Range Weather Forecasts,
Reading, UK
Department of Meteorology, University of Reading, Reading, UK
Viewed
Total article views: 6,806 (including HTML, PDF, and XML)
Cumulative views and downloads
(calculated since 05 Jan 2021)
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 4,368 | 2,319 | 119 | 6,806 | 136 | 160 |
- HTML: 4,368
- PDF: 2,319
- XML: 119
- Total: 6,806
- BibTeX: 136
- EndNote: 160
Total article views: 4,799 (including HTML, PDF, and XML)
Cumulative views and downloads
(calculated since 18 Aug 2021)
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 3,665 | 1,031 | 103 | 4,799 | 125 | 153 |
- HTML: 3,665
- PDF: 1,031
- XML: 103
- Total: 4,799
- BibTeX: 125
- EndNote: 153
Total article views: 2,007 (including HTML, PDF, and XML)
Cumulative views and downloads
(calculated since 05 Jan 2021)
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 703 | 1,288 | 16 | 2,007 | 11 | 7 |
- HTML: 703
- PDF: 1,288
- XML: 16
- Total: 2,007
- BibTeX: 11
- EndNote: 7
Viewed (geographical distribution)
Total article views: 6,806 (including HTML, PDF, and XML)
Thereof 6,559 with geography defined
and 247 with unknown origin.
Total article views: 4,799 (including HTML, PDF, and XML)
Thereof 4,676 with geography defined
and 123 with unknown origin.
Total article views: 2,007 (including HTML, PDF, and XML)
Thereof 1,883 with geography defined
and 124 with unknown origin.
| Country | # | Views | % |
|---|
| Country | # | Views | % |
|---|
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1
Cited
24 citations as recorded by crossref.
- Machine Learning Emulation of 3D Cloud Radiative Effects D. Meyer et al.
- Machine Learning Emulation of Urban Land Surface Processes D. Meyer et al.
- Challenges and Benchmark Datasets for Machine Learning in the Atmospheric Sciences: Definition, Status, and Outlook P. Dueben et al.
- PSO Based TCN Hybrid Optimization for Turbulent Heat Transfer Prediction of Multiple Synthetic Jets in Crossflow S. Akçay et al.
- Integrative modeling of heterogeneous soil salinity using sparse ground samples and remote sensing images L. Wang et al.
- Enhancing Lobular Inflammation Severity Prediction in NAFLD Using Synthetic Data Augmentation and XAI: A Novel Approach D. Kumar et al.
- Synthia: multidimensional synthetic data generation in Python D. Meyer & T. Nagler
- Improving Predictions of Technical Inefficiency R. James et al.
- Tabular data generation models: An in-depth survey and performance benchmarks with extensive tuning G. Kindji et al.
- Wind Reference Year: A New Approach R. Lázaro et al.
- CoLIME with 2D Copulas for Reliable Local Explanations on Imbalanced Network Data M. Bacevicius et al.
- Yet Another Discriminant Analysis (YADA): A Probabilistic Model for Machine Learning Applications R. Field et al.
- Improving Weather Forecasts for Sailing Events Using a Combination of a Numerical Forecast Model and Machine Learning Postprocessing S. Beimel et al.
- Soybean yield prediction using machine learning algorithms under a cover crop management system L. Santos et al.
- Synthetic data generation using Copula model and driving behavior analysis E. Savran & F. Karpat
- Statistical mechanics in climate emulation: Challenges and perspectives I. Sudakow et al.
- Human-in-the-Loop Digital Twin Framework for Ergonomics of Exoskeletons in Construction A. Afolabi et al.
- Quality Data Extractor (QDE): Elevating Synthetic Data Augmentation Through Post-Generation Filtration P. Sachdeva et al.
- Generative and explainable artificial intelligence models for enhancing scour depth prediction around a cubical artificial reef T. Nguyen et al.
- Estimation of the correlation between temperature and precipitation in Bafra Plain using Copula Ç. Sözen et al.
- Enhancing Drought Prediction in Semi-Arid Climates: A Synthetic Data and Neural Network Approach Applied to Karaman Region, Turkey A. Duvan & S. Yildizel
- Integrative Stacking Machine Learning Model for Small Cell Lung Cancer Prediction Using Metabolomics Profiling M. Sumon et al.
- A novel classical machine learning framework for early sepsis prediction using electronic health record data from ICU patients J. Prithula et al.
- A framework to create, evaluate and select synthetic datasets for survival prediction in oncology A. Christoforou et al.
24 citations as recorded by crossref.
- Machine Learning Emulation of 3D Cloud Radiative Effects D. Meyer et al.
- Machine Learning Emulation of Urban Land Surface Processes D. Meyer et al.
- Challenges and Benchmark Datasets for Machine Learning in the Atmospheric Sciences: Definition, Status, and Outlook P. Dueben et al.
- PSO Based TCN Hybrid Optimization for Turbulent Heat Transfer Prediction of Multiple Synthetic Jets in Crossflow S. Akçay et al.
- Integrative modeling of heterogeneous soil salinity using sparse ground samples and remote sensing images L. Wang et al.
- Enhancing Lobular Inflammation Severity Prediction in NAFLD Using Synthetic Data Augmentation and XAI: A Novel Approach D. Kumar et al.
- Synthia: multidimensional synthetic data generation in Python D. Meyer & T. Nagler
- Improving Predictions of Technical Inefficiency R. James et al.
- Tabular data generation models: An in-depth survey and performance benchmarks with extensive tuning G. Kindji et al.
- Wind Reference Year: A New Approach R. Lázaro et al.
- CoLIME with 2D Copulas for Reliable Local Explanations on Imbalanced Network Data M. Bacevicius et al.
- Yet Another Discriminant Analysis (YADA): A Probabilistic Model for Machine Learning Applications R. Field et al.
- Improving Weather Forecasts for Sailing Events Using a Combination of a Numerical Forecast Model and Machine Learning Postprocessing S. Beimel et al.
- Soybean yield prediction using machine learning algorithms under a cover crop management system L. Santos et al.
- Synthetic data generation using Copula model and driving behavior analysis E. Savran & F. Karpat
- Statistical mechanics in climate emulation: Challenges and perspectives I. Sudakow et al.
- Human-in-the-Loop Digital Twin Framework for Ergonomics of Exoskeletons in Construction A. Afolabi et al.
- Quality Data Extractor (QDE): Elevating Synthetic Data Augmentation Through Post-Generation Filtration P. Sachdeva et al.
- Generative and explainable artificial intelligence models for enhancing scour depth prediction around a cubical artificial reef T. Nguyen et al.
- Estimation of the correlation between temperature and precipitation in Bafra Plain using Copula Ç. Sözen et al.
- Enhancing Drought Prediction in Semi-Arid Climates: A Synthetic Data and Neural Network Approach Applied to Karaman Region, Turkey A. Duvan & S. Yildizel
- Integrative Stacking Machine Learning Model for Small Cell Lung Cancer Prediction Using Metabolomics Profiling M. Sumon et al.
- A novel classical machine learning framework for early sepsis prediction using electronic health record data from ICU patients J. Prithula et al.
- A framework to create, evaluate and select synthetic datasets for survival prediction in oncology A. Christoforou et al.
Saved (final revised paper)
Latest update: 06 May 2026
Short summary
A major limitation in training machine-learning emulators is often caused by the lack of data. This paper presents a cheap way to increase the size of training datasets using statistical techniques and thereby improve the performance of machine-learning emulators.
A major limitation in training machine-learning emulators is often caused by the lack of data....