Articles | Volume 14, issue 8
Development and technical paper 18 Aug 2021
Development and technical paper | 18 Aug 2021
Copula-based synthetic data augmentation for machine-learning emulators
David Meyer et al.
No articles found.
Beatriz M. Monge-Sanz, Alessio Bozzo, Nicholas Byrne, Martyn P. Chipperfield, Michail Diamantakis, Johannes Flemming, Lesley J. Gray, Robin J. Hogan, Luke Jones, Linus Magnusson, Inna Polichtchouk, Theodore G. Shepherd, Nils Wedi, and Antje Weisheimer
Atmos. Chem. Phys. Discuss.,
Preprint under review for ACPShort summary
The stratosphere is emerging as one of the keys to improve tropospheric weather and climate predictions. This study provides evidence of the role the stratospheric ozone layer plays in improving weather predictions at different time scales. Using a new ozone modelling approach suitable for high resolution global models that provide operational forecasts from days to seasons, we find significant improvements in stratospheric meteorological fields and stratosphere-troposphere coupling.
Robin J. Hogan and Marco Matricardi
Geosci. Model Dev., 13, 6501–6521,Short summary
A key component of computer models used to predict weather and climate is the radiation scheme, which calculates how solar and infrared radiation heats and cools the atmosphere and surface, including the important role of greenhouse gases. This paper describes the experimental protocol and large datasets for a new project, CKDMIP, to evaluate and improve the accuracy of the treatment of atmospheric gases in the radiation schemes used worldwide, as well as their computational speed.
Shannon L. Mason, Robin J. Hogan, Christopher D. Westbrook, Stefan Kneifel, Dmitri Moisseev, and Leonie von Terzi
Atmos. Meas. Tech., 12, 4993–5018,Short summary
The mass contents of snowflakes are critical to remotely sensed estimates of snowfall. The signatures of snow measured at three radar frequencies can distinguish fluffy, fractal snowflakes from dense and more homogeneous rimed snow. However, we show that the shape of the particle size spectrum also has a significant impact on triple-frequency radar signatures and must be accounted for when making triple-frequency radar estimates of snow that include variations in particle structure and density.
Jeronimo Escribano, Alessio Bozzo, Philippe Dubuisson, Johannes Flemming, Robin J. Hogan, Laurent C.-Labonnote, and Olivier Boucher
Geosci. Model Dev., 12, 805–827,Short summary
Accurate shortwave radiance computations are becoming increasingly important for some applications in atmospheric composition. In this work we propose a benchmark protocol and dataset to asses the accuracy and computing runtime of radiance calculations of radiative transfer models. It is applied to four models, showing the potential of this benchmark to evaluate the model performance under a variety of atmospheric conditions, viewing geometries, aerosol loading, and optical properties.
Robin J. Hogan, Tristan Quaife, and Renato Braghiere
Geosci. Model Dev., 11, 339–350,Short summary
This paper describes a fast new method for calculating how much sunlight is absorbed and reflected by forests and other types of vegetation, rigorously taking account of the complex 3-D structure. Careful evaluation shows it to perform well even in difficult scenes with snow on the ground. The method is suitable for use within the computer models used to make weather and climate forecasts, where it has the potential to improve predictions of near-surface temperature and photosynthesis rates.
Shannon L. Mason, J. Christine Chiu, Robin J. Hogan, and Lin Tian
Atmos. Chem. Phys., 17, 11567–11589,Short summary
Airborne Doppler radar measurements are used to estimate the properties of tropical stratiform rain. Doppler velocity measurements provide sufficient information to estimate the rain rate over land and also to retrieve the raindrop size distribution over ocean, addressing major uncertainties in current satellite measurements of rain. These results suggest that EarthCARE, with the first space-borne Doppler radar, will facilitate improved global measurements of rain.
Related subject area
Earth and space science informaticsAutomated geological map deconstruction for 3D model construction using map2loop 1.0 and map2model 1.0A spatially explicit approach to simulate urban heat mitigation with InVEST (v3.8.0)S-SOM v1.0: a structural self-organizing map algorithm for weather typingUsing Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, ChinaCurrent status on the need for improved accessibility to climate models codedh2loop 1.0: an open-source python library for automated processing and classification of geological logsClimateNet: an expert-labeled open dataset and deep learning architecture for enabling high-precision analyses of extreme weatherA spatiotemporal weighted regression model (STWR v1.0) for analyzing local nonstationarity in space and timeA new end-to-end workflow for the Community Earth System Model (version 2.0) for the Coupled Model Intercomparison Project Phase 6 (CMIP6)HyLands 1.0: a hybrid landscape evolution model to simulate the impact of landslides and landslide-derived sediment on landscape evolutionComparative analysis of atmospheric radiative transfer models using the Atmospheric Look-up table Generator (ALG) toolbox (version 2.0)Fast domain-aware neural network emulation of a planetary boundary layer parameterization in a numerical weather forecast modelVISIR-1.b: ocean surface gravity waves and currents for energy-efficient navigationTopological data analysis and machine learning for recognizing atmospheric river patterns in large climate datasetsGlobal hydro-climatic biomes identified via multitask learningA run control framework to streamline profiling, porting, and tuning simulation runs and provenance tracking of geoscientific applicationsAn improved logistic regression model based on a spatially weighted technique (ILRBSWT v1.0) and its application to mineral prospectivity mappingHigh-performance software framework for the calculation of satellite-to-satellite data matchups (MMS version 1.2)A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)Reverse engineering model structures for soil and ecosystem respiration: the potential of gene expression programmingA high-fidelity multiresolution digital elevation model for Earth systemsCPMIP: measurements of real computational performance of Earth system models in CMIP6Automatic delineation of geomorphological slope units with r.slopeunits v1.0 and their optimization for landslide susceptibility modelingCommunity Intercomparison Suite (CIS) v1.4.0: a tool for intercomparing models and observationsAsynchronous communication in spectral-element and discontinuous Galerkin methods for atmospheric dynamics – a case study using the High-Order Methods Modeling Environment (HOMME-homme_dg_branch)GO2OGS 1.0: a versatile workflow to integrate complex geological information with fault data into numerical simulation modelsAn open and extensible framework for spatially explicit land use change modelling: the lulcc R packagePlant functional type classification for earth system models: results from the European Space Agency's Land Cover Climate Change InitiativeNon-singular spherical harmonic expressions of geomagnetic vector and gradient tensor fields in the local north-oriented reference frameAn approach to enhance pnetCDF performance in environmental modeling applicationsA strategy for GIS-based 3-D slope stability modelling over large areasAn approach to computing direction relations between separated object groupsImproving computational efficiency in large linear inverse problems: an example from carbon dioxide flux estimationCoupling technologies for Earth System ModellingQuality assessment concept of the World Data Center for Climate and its application to CMIP5 dataA web service based tool to plan atmospheric research flightsAutomated continuous verification for numerical simulation
Mark Jessell, Vitaliy Ogarko, Yohan de Rose, Mark Lindsay, Ranee Joshi, Agnieszka Piechocka, Lachlan Grose, Miguel de la Varga, Laurent Ailleres, and Guillaume Pirot
Geosci. Model Dev., 14, 5063–5092,Short summary
We have developed software that allows the user to extract sufficient information from unmodified digital maps and associated datasets that we are able to use to automatically build 3D geological models. By automating the process we are able to remove human bias from the procedure, which makes the workflow reproducible.
Martí Bosch, Maxence Locatelli, Perrine Hamel, Roy P. Remme, Jérôme Chenal, and Stéphane Joost
Geosci. Model Dev., 14, 3521–3537,Short summary
The article presents a novel approach to simulate urban heat mitigation from land use/land cover data based on three biophysical mechanisms: tree shade, evapotranspiration and albedo. An automated procedure is proposed to calibrate the model parameters to best fit temperature observations from monitoring stations. A case study in Lausanne, Switzerland, shows that the approach outperforms regressions based on satellite data and provides valuable insights into design heat mitigation policies.
Quang-Van Doan, Hiroyuki Kusaka, Takuto Sato, and Fei Chen
Geosci. Model Dev., 14, 2097–2111,Short summary
This study proposes a novel structural self-organizing map (S-SOM) algorithm. The superiority of S-SOM is that it can better recognize the difference (or similarity) among spatial (or temporal) data used for training and thus improve the clustering quality compared to traditional SOM algorithms.
Batunacun, Ralf Wieland, Tobia Lakes, and Claas Nendel
Geosci. Model Dev., 14, 1493–1510,Short summary
Extreme gradient boosting (XGBoost) can provide alternative insights that conventional land-use models are unable to generate. Shapley additive explanations (SHAP) can interpret the results of the purely data-driven approach. XGBoost achieved similar and robust simulation results. SHAP values were useful for analysing the complex relationship between the different drivers of grassland degradation.
Juan A. Añel, Michael García-Rodríguez, and Javier Rodeiro
Geosci. Model Dev., 14, 923–934,Short summary
This work shows that it continues to be hard, if not impossible, to obtain some of the most used climate models worldwide. We reach this conclusion through a systematic study and encourage all development teams and research centres to make public the models they use to produce scientific results.
Ranee Joshi, Kavitha Madaiah, Mark Jessell, Mark Lindsay, and Guillaume Pirot
Geosci. Model Dev. Discuss.,
Revised manuscript accepted for GMDShort summary
We have developed a software that allows the user to extract and standardize drill hole information from legacy datasets and/or different drilling campaigns. It also provides functionality to upscale the lithological information. These functionalities were possible by developing thesauri to identify and group geological terminologies together.
Prabhat, Karthik Kashinath, Mayur Mudigonda, Sol Kim, Lukas Kapp-Schwoerer, Andre Graubner, Ege Karaismailoglu, Leo von Kleist, Thorsten Kurth, Annette Greiner, Ankur Mahesh, Kevin Yang, Colby Lewis, Jiayi Chen, Andrew Lou, Sathyavat Chandran, Ben Toms, Will Chapman, Katherine Dagon, Christine A. Shields, Travis O'Brien, Michael Wehner, and William Collins
Geosci. Model Dev., 14, 107–124,Short summary
Detecting extreme weather events is a crucial step in understanding how they change due to climate change. Deep learning (DL) is remarkable at pattern recognition; however, it works best only when labeled datasets are available. We create
ClimateNet– an expert-labeled curated dataset – to train a DL model for detecting weather events and predicting changes in extreme precipitation. This work paves the way for DL-based automated, high-fidelity, and highly precise analytics of climate data.
Xiang Que, Xiaogang Ma, Chao Ma, and Qiyu Chen
Geosci. Model Dev., 13, 6149–6164,Short summary
This paper presents a spatiotemporal weighted regression (STWR) model for exploring nonstationary spatiotemporal processes in nature and socioeconomics. A value change rate is introduced in the temporal kernel, which presents significant model fitting and accuracy in both simulated and real-world data. STWR fully incorporates observed data in the past and outperforms geographic temporal weighted regression (GTWR) and geographic weighted regression (GWR) models in several experiments.
Sheri Mickelson, Alice Bertini, Gary Strand, Kevin Paul, Eric Nienhouse, John Dennis, and Mariana Vertenstein
Geosci. Model Dev., 13, 5567–5581,Short summary
Every generation of MIP exercises introduces new layers of complexity and an exponential growth in the amount of data requested. CMIP6 required us to develop a new tool chain and forced us to change our methodologies. The new methods discussed in this paper provided us with an 18 times faster speedup over our existing methods. This allowed us to meet our deadlines and we were able to publish more than half a million data sets on the Earth System Grid Federation (ESGF) for the CMIP6 project.
Benjamin Campforts, Charles M. Shobe, Philippe Steer, Matthias Vanmaercke, Dimitri Lague, and Jean Braun
Geosci. Model Dev., 13, 3863–3886,Short summary
Landslides shape the Earth’s surface and are a dominant source of terrestrial sediment. Rivers, then, act as conveyor belts evacuating landslide-produced sediment. Understanding the interaction among rivers and landslides is important to predict the Earth’s surface response to past and future environmental changes and for mitigating natural hazards. We develop HyLands, a new numerical model that provides a toolbox to explore how landslides and rivers interact over several timescales.
Jorge Vicent, Jochem Verrelst, Neus Sabater, Luis Alonso, Juan Pablo Rivera-Caicedo, Luca Martino, Jordi Muñoz-Marí, and José Moreno
Geosci. Model Dev., 13, 1945–1957,Short summary
The modeling of light propagation through the atmosphere is key to process satellite images and to understand atmospheric processes. However, existing atmospheric models can be complex to use in practical applications. Here we aim at providing a new software tool to facilitate using advanced models and to generate large databases of simulated data. As a test case, we use this tool to analyze differences between several atmospheric models, showing the capabilities of this open-source tool.
Jiali Wang, Prasanna Balaprakash, and Rao Kotamarthi
Geosci. Model Dev., 12, 4261–4274,Short summary
Parameterizations are frequently used in models representing physical phenomena and are often the computationally expensive portions of the code. Using model output from simulations performed using a weather model, we train deep neural networks to provide an accurate alternative to a physics-based parameterization. We demonstrate that a domain-aware deep neural network can successfully simulate the entire diurnal cycle of the boundary layer physics and the results are transferable.
Gianandrea Mannarini and Lorenzo Carelli
Geosci. Model Dev., 12, 3449–3480,Short summary
The VISIR ship-routing model is updated in order to deal with ocean currents. The optimal tracks we computed through VISIR in the Atlantic ocean show great seasonal and regional variability, following a variable influence of surface gravity waves and currents. We assess how these tracks contribute to voyage energy-efficiency gains through a standard indicator (EEOI) of the International Maritime Organization. Also, the new model features are validated against an exact analytical benchmark.
Grzegorz Muszynski, Karthik Kashinath, Vitaliy Kurlin, Michael Wehner, and Prabhat
Geosci. Model Dev., 12, 613–628,Short summary
We present the automated method for recognizing atmospheric rivers in climate data, i.e., climate model output and reanalysis product. The method is based on topological data analysis and machine learning, both of which are powerful tools that the climate science community often does not use. An advantage of the proposed method is that it is free of selection of subjective threshold conditions on a physical variable. This method is also suitable for rapidly analyzing large amounts of data.
Christina Papagiannopoulou, Diego G. Miralles, Matthias Demuzere, Niko E. C. Verhoest, and Willem Waegeman
Geosci. Model Dev., 11, 4139–4153,Short summary
Common global land cover and climate classifications are based on vegetation–climatic characteristics derived from observational data, ignoring the interaction between the local climate and biome. Here, we model the interplay between vegetation and local climate by discovering spatial relationships among different locations. The resulting global
hydro-climatic biomescorrespond to regions of coherent climate–vegetation interactions that agree well with traditional global land cover maps.
Wendy Sharples, Ilya Zhukov, Markus Geimer, Klaus Goergen, Sebastian Luehrs, Thomas Breuer, Bibi Naz, Ketan Kulkarni, Slavko Brdar, and Stefan Kollet
Geosci. Model Dev., 11, 2875–2895,Short summary
Next-generation geoscientific models are based on complex model implementations and workflows. Next-generation HPC systems require new programming paradigms and code optimization. In order to meet the challenge of running complex simulations on new massively parallel HPC systems, we developed a run control framework that facilitates code portability, code profiling, and provenance tracking to reduce both the duration and the cost of code migration and development, while ensuring reproducibility.
Daojun Zhang, Na Ren, and Xianhui Hou
Geosci. Model Dev., 11, 2525–2539,Short summary
Geographically weighted regression is a widely used method to deal with spatial heterogeneity, which is common in geostatistics. However, most existing software does not support logistic regression and cannot deal with missing data, which exist extensively in mineral prospectivity mapping. This work generalized logistic regression to spatial statistics based on a spatially weighted technique. The new model also supports an anisotropic local window, which is another innovative point.
Thomas Block, Sabine Embacher, Christopher J. Merchant, and Craig Donlon
Geosci. Model Dev., 11, 2419–2427,Short summary
For calibration and validation purposes it is necessary to detect simultaneous data acquisitions from different spaceborne platforms. We present an algorithm and a software system which implements a general approach to resolve this problem. The multisensor matchup system (MMS) can detect simultaneous acquisitions in a large dataset (> 100 TB) and extract data for matching locations for further analysis. The MMS implements a flexible software infrastructure and allows for high parallelization.
David Hassell, Jonathan Gregory, Jon Blower, Bryan N. Lawrence, and Karl E. Taylor
Geosci. Model Dev., 10, 4619–4646,Short summary
We present a formal data model for version 1.6 of the CF (Climate and Forecast) metadata conventions that provide a description of the physical meaning of geoscientific data and their spatial and temporal properties. We describe the CF conventions and how they lead to our CF data model, and compare it other data models for storing data and metadata. We present cf-python version 2.1: a software implementation of the CF data model capable of manipulating any CF-compliant dataset.
Iulia Ilie, Peter Dittrich, Nuno Carvalhais, Martin Jung, Andreas Heinemeyer, Mirco Migliavacca, James I. L. Morison, Sebastian Sippel, Jens-Arne Subke, Matthew Wilkinson, and Miguel D. Mahecha
Geosci. Model Dev., 10, 3519–3545,Short summary
Accurate representation of land-atmosphere carbon fluxes is essential for future climate projections, although some of the responses of CO2 fluxes to climate often remain uncertain. The increase in available data allows for new approaches in their modelling. We automatically developed models for ecosystem and soil carbon respiration using a machine learning approach. When compared with established respiration models, we found that they are better in prediction as well as offering new insights.
Xinqiao Duan, Lin Li, Haihong Zhu, and Shen Ying
Geosci. Model Dev., 10, 239–253,Short summary
This article proposes an optimized transformation for topographic datasets. The resulting topographic grid exhibits good surface approximation and quasi-uniform high-quality. Both features of the processed topography build a concrete base from which improved endogenous or exogenous parameters can be derived, and makes it suitable for Earth and environmental simulations.
Venkatramani Balaji, Eric Maisonnave, Niki Zadeh, Bryan N. Lawrence, Joachim Biercamp, Uwe Fladrich, Giovanni Aloisio, Rusty Benson, Arnaud Caubel, Jeffrey Durachta, Marie-Alice Foujols, Grenville Lister, Silvia Mocavero, Seth Underwood, and Garrett Wright
Geosci. Model Dev., 10, 19–34,Short summary
Climate models are among the most computationally expensive scientific applications in the world. We present a set of measures of computational performance that can be used to compare models that are independent of underlying hardware and the model formulation. They are easy to collect and reflect performance actually achieved in practice. We are preparing a systematic effort to collect these metrics for the world's climate models during CMIP6, the next Climate Model Intercomparison Project.
Massimiliano Alvioli, Ivan Marchesini, Paola Reichenbach, Mauro Rossi, Francesca Ardizzone, Federica Fiorucci, and Fausto Guzzetti
Geosci. Model Dev., 9, 3975–3991,Short summary
Slope units are morphological mapping units bounded by drainage and divide lines that maximize within-unit homogeneity and between-unit heterogeneity. We use r.slopeunits, a software for the automatic delination of slope units. We outline an objective procedure to optimize the software input parameters for landslide susceptibility (LS) zonation. Optimization is achieved by maximizing an objective function that simultaneously evaluates terrain aspect segmentation quality and LS model performance.
Duncan Watson-Parris, Nick Schutgens, Nicholas Cook, Zak Kipling, Philip Kershaw, Edward Gryspeerdt, Bryan Lawrence, and Philip Stier
Geosci. Model Dev., 9, 3093–3110,Short summary
In this paper we describe CIS, a new command line tool for the easy visualization, analysis and comparison of a wide variety of gridded and ungridded data sets used in Earth sciences. Users can now use a single tool to not only view plots of satellite, aircraft, station or model data, but also bring them onto the same spatio-temporal sampling. This allows robust, quantitative comparisons to be made easily. CIS is an open-source project and welcomes input from the community.
Benjamin F. Jamroz and Robert Klöfkorn
Geosci. Model Dev., 9, 2881–2892,Short summary
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement communication hiding data exchange in the High-Order Methods Modeling Environment (HOMME) for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. The presented approach produces significant performance and scalability gains in large-scale simulations.
T. Fischer, D. Naumov, S. Sattler, O. Kolditz, and M. Walther
Geosci. Model Dev., 8, 3681–3694,Short summary
We present a workflow to convert geological models into the open-source VTU format for usage in numerical simulation models. Tackling relevant scientific questions or engineering tasks often involves multidisciplinary approaches. Conversion workflows are needed between the diverse tools of the various disciplines. Our approach offers an open-source, platform-independent, robust, and comprehensible method that is potentially useful for a multitude of similar environmental studies.
S. Moulds, W. Buytaert, and A. Mijic
Geosci. Model Dev., 8, 3215–3229,Short summary
The contribution of lulcc is to provide a free and open-source framework for land use change modelling. The software, which is provided as an R package, addresses problems associated with the current paradigm of closed-source, specialised land use change modelling software which disrupt the scientific process. It is an attempt to move the discipline towards open and transparent science and to ensure land use change models are accessible to scientists working across the geosciences.
B. Poulter, N. MacBean, A. Hartley, I. Khlystova, O. Arino, R. Betts, S. Bontemps, M. Boettcher, C. Brockmann, P. Defourny, S. Hagemann, M. Herold, G. Kirches, C. Lamarche, D. Lederer, C. Ottlé, M. Peters, and P. Peylin
Geosci. Model Dev., 8, 2315–2328,Short summary
Land cover is an essential variable in earth system models and determines conditions driving biogeochemical, energy and water exchange between ecosystems and the atmosphere. A methodology is presented for mapping plant functional types used in global vegetation models from a updated land cover classification system and open-source conversion tool, resulting from a consultative process among map producers and modelers engaged in the European Space Agency’s Land Cover Climate Change Initiative.
J. Du, C. Chen, V. Lesur, and L. Wang
Geosci. Model Dev., 8, 1979–1990,
D. C. Wong, C. E. Yang, J. S. Fu, K. Wong, and Y. Gao
Geosci. Model Dev., 8, 1033–1046,
M. Mergili, I. Marchesini, M. Alvioli, M. Metz, B. Schneider-Muntau, M. Rossi, and F. Guzzetti
Geosci. Model Dev., 7, 2969–2982,Short summary
The article deals with strategies to (i) reduce computation time and to (ii) appropriately account for uncertain input parameters when applying an open source GIS sliding surface model to estimate landslide susceptibility for a 90km² study area in central Italy. For (i), the area is split into a large number of tiles, enabling the exploitation of multi-processor computing environments. For (ii), the model is run with various parameter combinations to compute the slope failure probability.
H. Yan, Z. Wang, and J. Li
Geosci. Model Dev., 6, 1591–1599,
V. Yadav and A. M. Michalak
Geosci. Model Dev., 6, 583–590,
S. Valcke, V. Balaji, A. Craig, C. DeLuca, R. Dunlap, R. W. Ford, R. Jacob, J. Larson, R. O'Kuinghttons, G. D. Riley, and M. Vertenstein
Geosci. Model Dev., 5, 1589–1596,
M. Stockhause, H. Höck, F. Toussaint, and M. Lautenschlager
Geosci. Model Dev., 5, 1023–1032,
M. Rautenhaus, G. Bauer, and A. Dörnbrack
Geosci. Model Dev., 5, 55–71,
P. E. Farrell, M. D. Piggott, G. J. Gorman, D. A. Ham, C. R. Wilson, and T. M. Bond
Geosci. Model Dev., 4, 435–449,
Aas, K., Czado, C., Frigessi, A., and Bakken, H.: Pair-copula constructions of multiple dependence, Insur. Math. Econ., 44, 182–198, https://doi.org/10.1016/j.insmatheco.2007.02.001, 2009.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: A System for Large-Scale Machine Learning, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, 265–283, 2016.
Bolton, T. and Zanna, L.: Applications of Deep Learning to Ocean Data Inference and Subgrid Parameterization, J. Adv. Model. Earth Syst., 11, 376–399, https://doi.org/10.1029/2018MS001472, 2019.
Brenowitz, N. D. and Bretherton, C. S.: Prognostic Validation of a Neural Network Unified Physics Parameterization, Geophys. Res. Lett., 45, 6289–6298, https://doi.org/10.1029/2018GL078510, 2018.
Cheruy, F., Chevallier, F., Morcrette, J.-J., Scott, N. A., and Chédin, A.: Une méthode utilisant les techniques neuronales pour le calcul rapide de la distribution verticale du bilan radiatif thermique terrestre, Comptes Rendus de l'Academie des Sciences Serie II, 322, 665–672, hal-02954375, 1996.
Chevallier, F., Ruy, F. C., Scott, N. A., and Din, A. C.: A Neural Network Approach for a Fast and Accurate Computation of a Longwave Radiative Budget, J. Appl. Meteorol. Climatol., 37, 1385–1397, https://doi.org/10.1175/1520-0450(1998)037<1385:ANNAFA>2.0.CO;2, 1998.
Chevallier, F., Morcrette, J.-J., Chéruy, F., and Scott, N. A.: Use of a neural-network-based long-wave radiative-transfer scheme in the ECMWF atmospheric model, Q. J. Roy. Meteor. Soc., 126, 761–776, https://doi.org/10.1002/qj.49712656318, 2000.
Czado, C.: Analyzing Dependent Data with Vine Copulas: A Practical Guide With R, Springer International Publishing, Cham, https://doi.org/10.1007/978-3-030-13785-4, 2019.
Dißmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns, Comput. Stat. Data Anal., 59, 52–69, https://doi.org/10.1016/j.csda.2012.08.010, 2013.
Elsasser, W. M.: Heat transfer by infrared radiation in the atmosphere, Blue Hill Meteorological Observatory, Harvard University, Milton, MA, USA, 1942.
Eresmaa, R. and McNally, A. P.: Diverse profile datasets from the ECMWF 137-level short-range forecasts, EUMETSAT Satellite Application Facility (NWP SAF), European Centre for Medium-range Weather Forecasts Shinfield Park, Reading, RG2 9AX, UK, 2014.
Gentine, P., Pritchard, M., Rasp, S., Reinaudi, G., and Yacalis, G.: Could Machine Learning Break the Convection Parameterization Deadlock?, Geophys. Res. Lett., 45, 5742–5751, https://doi.org/10.1029/2018GL078202, 2018.
Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT Press, Cambridge, 775 pp., 2016.
Hocking, J., Vidot, J., Brunel, P., Roquet, P., Silveira, B., Turner, E., and Lupu, C.: A new gas absorption optical depth parameterisation for RTTOV version 13, Geosci. Model Dev., 14, 2899–2915, https://doi.org/10.5194/gmd-14-2899-2021, 2021.
Hogan, R. J. and Bozzo, A.: A Flexible and Efficient Radiation Scheme for the ECMWF Model, J. Adv. Model. Earth Syst., 10, 1990–2008, https://doi.org/10.1029/2018MS001364, 2018.
Hogan, R. J. and Matricardi, M.: Evaluating and improving the treatment of gases in radiation schemes: the Correlated K-Distribution Model Intercomparison Project (CKDMIP), Geosci. Model Dev., 13, 6501–6521, https://doi.org/10.5194/gmd-13-6501-2020, 2020.
Huntingford, C., Jeffers, E. S., Bonsall, M. B., Christensen, H. M., Lees, T., and Yang, H.: Machine learning and artificial intelligence to aid climate change research and preparedness, Environ. Res. Lett., 14, 124007, https://doi.org/10.1088/1748-9326/ab4e55, 2019.
Joe, H.: Dependence Modeling with Copulas, 1st edn., Chapman and Hall/CRC, https://doi.org/10.1201/b17116, 2014.
Krasnopolsky, V. M. and Lin, Y.: A Neural Network Nonlinear Multimodel Ensemble to Improve Precipitation Forecasts over Continental US, Adv. Meteorol., 2012, 649450 , https://doi.org/10.1155/2012/649450, 2012.
Krasnopolsky, V. M., Chalikov, D. V., and Tolman, H. L.: A neural network technique to improve computational efficiency of numerical oceanic models, Ocean Model., 21, 363–383, https://doi.org/10.1016/S1463-5003(02)00010-0, 2002.
Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Chalikov, D. V.: New Approach to Calculation of Atmospheric Model Physics: Accurate and Fast Neural Network Emulation of Longwave Radiation in a Climate Model, Mon. Wea. Rev., 133, 1370–1383, https://doi.org/10.1175/MWR2923.1, 2005.
Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Using Ensemble of Neural Networks to Learn Stochastic Convection Parameterizations for Climate and Numerical Weather Prediction Models from Data Simulated by a Cloud Resolving Model, Advances in Artificial Neural Systems, 2013, 485913, https://doi.org/10.1155/2013/485913, 2013.
Kurtzer, G. M., Sochat, V., and Bauer, M. W.: Singularity: Scientific containers for mobility of compute, PLoS ONE, 12, e0177459, https://doi.org/10.1371/journal.pone.0177459, 2017.
López-Pintado, S. and Romo, J.: On the Concept of Depth for Functional Data, J. Am. Stat. Assoc., 104, 718–734, https://doi.org/10.1198/jasa.2009.0108, 2009.
Meyer, D.: Data archive for paper “Copula-based synthetic data augmentation for machine learning-emulators” (Version 1.2.0) [Data set], https://doi.org/10.5281/zenodo.5150327, 2021.
Meyer, D. and Nagler, T.: Synthia: multidimensional synthetic data generation in Python (Version 0.3.0), Zenodo, https://doi.org/10.5281/zenodo.5150200, 2020.
Meyer, D. and Nagler, T.: Synthia: Multidimensional synthetic data generation in Python, Journal of Open Source Software, https://doi.org/10.21105/joss.02863, 2021.
Meyer, D., Schoetter, R., Riechert, M., Verrelle, A., Tewari, M., Dudhia, J., Masson, V., Reeuwijk, M., and Grimmond, S.: WRF-TEB: Implementation and Evaluation of the Coupled Weather Research and Forecasting (WRF) and Town Energy Balance (TEB) Model, J. Adv. Model. Earth Syst., 12, e2019MS001961, https://doi.org/10.1029/2019MS001961, 2020.
Meyer, D., Hogan, R. J., Dueben, P. D., and Mason, S. L.: Machine Learning Emulation of 3D Cloud Radiative Effects, J. Adv. Model. Earth Syst., https://doi.org/10.1029/2021MS002550, 2021.
Nagler, T., Schellhase, C., and Czado, C.: Nonparametric estimation of simplified vine copula models: comparison of methods, Dependence Model., 5, 99–120, https://doi.org/10.1515/demo-2017-0007, 2017.
Nowack, P., Braesicke, P., Haigh, J., Abraham, N. L., Pyle, J., and Voulgarakis, A.: Using machine learning to build temperature-based ozone parameterizations for climate sensitivity simulations, Environ. Res. Lett., 13, 104016, https://doi.org/10.1088/1748-9326/aae2be, 2018.
O'Gorman, P. A. and Dwyer, J. G.: Using Machine Learning to Parameterize Moist Convection: Potential for Modeling of Climate, Climate Change, and Extreme Events, J. Adv. Model. Earth Syst., 10, 2548–2563, https://doi.org/10.1029/2018MS001351, 2018.
Patki, N., Wedge, R., and Veeramachaneni, K.: The Synthetic Data Vault, in: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 399–410, https://doi.org/10.1109/DSAA.2016.49, 2016.
Petty, G. W.: A First Course in Atmospheric Radiation, End of Line Clearance Book, Madison, Wis, 459 pp., 2006.
Rasp, S. and Lerch, S.: Neural Networks for Postprocessing Ensemble Weather Forecasts, Mon. Weather Rev., 146, 3885–3900, https://doi.org/10.1175/MWR-D-18-0187.1, 2018.
Rasp, S., Pritchard, M. S., and Gentine, P.: Deep learning to represent subgrid processes in climate models, P. Natl. Acad. Sci. USA, 115, 9684–9689, https://doi.org/10.1073/pnas.1810286115, 2018.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, https://doi.org/10.1038/s41586-019-0912-1, 2019.
Seitola, T., Mikkola, V., Silen, J., and Järvinen, H.: Random projections in reducing the dimensionality of climate simulation data, Tellus A, 66, 25274, https://doi.org/10.3402/tellusa.v66.25274, 2014.
Shorten, C. and Khoshgoftaar, T. M.: A survey on Image Data Augmentation for Deep Learning, J. Big Data, 6, 60, https://doi.org/10.1186/s40537-019-0197-0, 2019.
Sklar, M.: Fonctions de repartition an dimensions et leurs marges, Open Journal of Statistics, 8, 229–231, 1959.
Tagasovska, N., Ackerer, D., and Vatter, T.: Copulas as high-dimensional generative models: Vine copula autoencoders, in: Advances in neural information processing systems 32, edited by: Wallach, H., Larochelle, H., Beygelzimer, A., dAlché-Buc, F., Fox, E., and Garnett, R., Curran Associates, Inc., 6528–6540, 2019.
Trivedi, P. K. and Zimmer, D. M.: Copula Modeling: An Introduction for Practitioners, FNT in Econometrics, 1, 1–111, https://doi.org/10.1561/0800000005, 2006.
Veerman, M. A., Pincus, R., Stoffer, R., van Leeuwen, C. M., Podareanu, D., and van Heerwaarden, C. C.: Predicting atmospheric optical properties for radiative transfer computations using neural networks, Phil. Trans. R. Soc. A., 379, 20200095, https://doi.org/10.1098/rsta.2020.0095, 2021.
Wan, Z., Zhang, Y., and He, H.: Variational autoencoder based synthetic data generation for imbalanced learning, in: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, 27 November–1 December 2017, https://doi.org/10.1109/SSCI.2017.8285168, 2017.
Xu, L. and Veeramachaneni, K.: Synthesizing Tabular Data using Generative Adversarial Networks, arXiv [preprint], arXiv:1811.11264, 27 November 2018.
A major limitation in training machine-learning emulators is often caused by the lack of data. This paper presents a cheap way to increase the size of training datasets using statistical techniques and thereby improve the performance of machine-learning emulators.
A major limitation in training machine-learning emulators is often caused by the lack of data....