Benchmark datasets and machine learning algorithms for Earth system science data (ESSD/GMD inter-journal SI)(ESSD/GMD inter-journal SI)
Benchmark datasets and machine learning algorithms for Earth system science data (ESSD/GMD inter-journal SI)(ESSD/GMD inter-journal SI)
Editor(s): GMD topic editors | Coordinator: Peter Düben Special issue jointly organized between Earth System Science Data and Geoscientific Model Development
A large number of machine learning studies are currently performed in Earth System science to extract information from observations or model data, to learn the dynamics of components of the Earth system, or to build tools to improve weather and climate predictions (for example in model post-processing). These studies require both the domain knowledge of Earth system scientists and the expertise of machine learning experts. The astounding success of machine learning in other scientific disciplines – such as computer vision, speech recognition, robotics, and autonomous driving – is not least a result of the free and open sharing of benchmark datasets and software, which allows easy reproducibility by the community as well as the quantitative comparison of the quality of machine learning solutions. Contrary to many other machine learning disciplines, Earth system datasets and validation methods require thorough documentation to achieve the goal of providing a fair and firm basis for model comparisons according to the accepted community standards. Therefore, the Earth system science community has expressed a need for dedicated benchmark datasets and machine learning tools to process Earth system data.

With this special issue, we offer a platform to develop and openly share these benchmark datasets together with specific machine learning tasks and evaluation metrics as well as innovative solutions for machine learning applications in Earth system science with the wider community of machine learners and environmental scientists.

The special issue is managed jointly by ESSD and GMD and invites manuscripts focusing either on the description of new benchmark datasets (ESSD) or new machine learning model developments (GMD). Dataset papers can reference model papers and vice versa, but this is not a requirement as long as both the datasets and model codes are made freely available and easily accessible. To be accepted as special issue contribution a manuscript must define a specific machine learning task including a loss function and evaluation metric and the authors must provide executable code which demonstrates the machine learning problem with a vanilla solution. The usual guidelines of both journals apply.

Download citations of all papers

28 Feb 2024
Towards variance-conserving reconstructions of climate indices with Gaussian process regression in an embedding space
Marlene Klockmann, Udo von Toussaint, and Eduardo Zorita
Geosci. Model Dev., 17, 1765–1787,,, 2024
Short summary
13 Sep 2023
Simulation model of Reactive Nitrogen Species in an Urban Atmosphere using a Deep Neural Network: RNDv1.0
Junsu Gil, Meehye Lee, Jeonghwan Kim, Gangwoong Lee, Joonyoung Ahn, and Cheol-Hee Kim
Geosci. Model Dev., 16, 5251–5263,,, 2023
Short summary
29 Aug 2023
A gridded air quality forecast through fusing site-available machine learning predictions from RFSML v1.0 and chemical transport model results from GEOS-Chem v13.1.0 using the ensemble Kalman filter
Li Fang, Jianbing Jin, Arjo Segers, Hong Liao, Ke Li, Bufan Xu, Wei Han, Mijie Pang, and Hai Xiang Lin
Geosci. Model Dev., 16, 4867–4882,,, 2023
Short summary
10 Aug 2023
Automatic snow type classification of snow micropenetrometer profiles with machine learning algorithms
Julia Kaltenborn, Amy R. Macfarlane, Viviane Clay, and Martin Schneebeli
Geosci. Model Dev., 16, 4521–4550,,, 2023
Short summary
27 Jul 2023
HR-GLDD: a globally distributed dataset using generalized deep learning (DL) for rapid landslide mapping on high-resolution (HR) satellite imagery
Sansar Raj Meena, Lorenzo Nava, Kushanav Bhuyan, Silvia Puliero, Lucas Pedrosa Soares, Helen Cristina Dias, Mario Floris, and Filippo Catani
Earth Syst. Sci. Data, 15, 3283–3298,,, 2023
Short summary
28 Jun 2023
The EUPPBench postprocessing benchmark dataset v1.0
Jonathan Demaeyer, Jonas Bhend, Sebastian Lerch, Cristina Primo, Bert Van Schaeybroeck, Aitor Atencia, Zied Ben Bouallègue, Jieyu Chen, Markus Dabernig, Gavin Evans, Jana Faganeli Pucer, Ben Hooper, Nina Horat, David Jobst, Janko Merše, Peter Mlakar, Annette Möller, Olivier Mestre, Maxime Taillardat, and Stéphane Vannitsem
Earth Syst. Sci. Data, 15, 2635–2653,,, 2023
Short summary
23 May 2023
CLGAN: a generative adversarial network (GAN)-based video prediction model for precipitation nowcasting
Yan Ji, Bing Gong, Michael Langguth, Amirpasha Mozaffari, and Xiefei Zhi
Geosci. Model Dev., 16, 2737–2752,,, 2023
Short summary
09 May 2023
ClinoformNet-1.0: stratigraphic forward modeling and deep learning for seismic clinoform delineation
Hui Gao, Xinming Wu, Jinyu Zhang, Xiaoming Sun, and Zhengfa Bi
Geosci. Model Dev., 16, 2495–2513,,, 2023
Short summary
20 Apr 2023
Causal deep learning models for studying the Earth system
Tobias Tesch, Stefan Kollet, and Jochen Garcke
Geosci. Model Dev., 16, 2149–2166,,, 2023
Short summary
24 Mar 2023
DL-RMD: a geophysically constrained electromagnetic resistivity model database (RMD) for deep learning (DL) applications
Muhammad Rizwan Asif, Nikolaj Foged, Thue Bording, Jakob Juul Larsen, and Anders Vest Christiansen
Earth Syst. Sci. Data, 15, 1389–1401,,, 2023
Short summary
10 Jan 2023
A machine learning approach to address air quality changes during the COVID-19 lockdown in Buenos Aires, Argentina
Melisa Diaz Resquin, Pablo Lichtig, Diego Alessandrello, Marcelo De Oto, Darío Gómez, Cristina Rössler, Paula Castesana, and Laura Dawidowski
Earth Syst. Sci. Data, 15, 189–209,,, 2023
Short summary
20 Dec 2022
WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting
Ibrahim Demir, Zhongrun Xiang, Bekir Demiray, and Muhammed Sit
Earth Syst. Sci. Data, 14, 5605–5616,,, 2022
Short summary
13 Dec 2022
Temperature forecasting by deep learning methods
Bing Gong, Michael Langguth, Yan Ji, Amirpasha Mozaffari, Scarlet Stadtler, Karim Mache, and Martin G. Schultz
Geosci. Model Dev., 15, 8931–8956,,, 2022
Short summary
13 Dec 2022
Representing chemical history in ozone time-series predictions – a model experiment study building on the MLAir (v1.5) deep learning framework
Felix Kleinert, Lukas H. Leufen, Aurelia Lupascu, Tim Butler, and Martin G. Schultz
Geosci. Model Dev., 15, 8913–8930,,, 2022
Short summary
24 Oct 2022
Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China
Li Fang, Jianbing Jin, Arjo Segers, Hai Xiang Lin, Mijie Pang, Cong Xiao, Tuo Deng, and Hong Liao
Geosci. Model Dev., 15, 7791–7807,,, 2022
Short summary
11 Oct 2022
Estimation of missing building height in OpenStreetMap data: a French case study using GeoClimate 0.0.1
Jérémy Bernard, Erwan Bocher, Elisabeth Le Saux Wiederhold, François Leconte, and Valéry Masson
Geosci. Model Dev., 15, 7505–7532,,, 2022
Short summary
22 Sep 2022
Calving fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front extraction from synthetic aperture radar imagery
Nora Gourmelon, Thorsten Seehaus, Matthias Braun, Andreas Maier, and Vincent Christlein
Earth Syst. Sci. Data, 14, 4287–4313,,, 2022
Short summary
14 Sep 2022
Improving Latin American Soil Information Database for Digital Soil Mapping enhances its usability and scalability
Sergio Díaz-Guadarrama, Iván Lizarazo, Mario Guevara, Marcos Angelini, Gustavo A. Araujo-Carrillo, Jainer Argeñal, Daphne Armas, Rafael A. Balta, Adriana Bolivar, Nelson Bustamante, Ricardo O. Dart, Martin Dell Aqua, Arnulfo Encina, Hernán Figueredo, Fernando Fontes, Joan S. Gutiérrez-Diaz, Wilmer Jiménez, Raúl S. Lavado, Jesús F. Mansilla-Baca, Maria de Lourdes Mendonça-Santos, Lucas M. Moretti, Iván D. Muñoz, Carolina Olivera, Guillermo Olmedo, Christian Omuto, Sol Ortiz, Carla Pascale, Marco Pfeiffer, Iván A. Ramos, Danny Ríos, Rafael Rivera, Lady M. Rodríguez, Darío M. Rodríguez, Albán Rosales, Kenset Rosales, Guillermo Schulz, Victor Sevilla, Leonardo M. Tenti, Ronald Vargas, Viviana M. Varón-Ramírez, Gustavo M. Vasques, Yusuf Yigini, and Yolanda Rubiano
Earth Syst. Sci. Data Discuss.,,, 2022
Revised manuscript accepted for ESSD (discussion: closed, 6 comments)
Short summary
03 Jun 2022
Global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties
Clara Betancourt, Timo T. Stomberg, Ann-Kathrin Edrich, Ankit Patnala, Martin G. Schultz, Ribana Roscher, Julia Kowalski, and Scarlet Stadtler
Geosci. Model Dev., 15, 4331–4354,,, 2022
Short summary
30 Mar 2022
TimeSpec4LULC: a global multispectral time series database for training LULC mapping models with machine learning
Rohaifa Khaldi, Domingo Alcaraz-Segura, Emilio Guirado, Yassir Benhammou, Abdellatif El Afia, Francisco Herrera, and Siham Tabik
Earth Syst. Sci. Data, 14, 1377–1411,,, 2022
Short summary
23 Feb 2022
Using neural network ensembles to separate ocean biogeochemical and physical drivers of phytoplankton biogeography in Earth system models
Christopher Holder, Anand Gnanadesikan, and Marie Aude-Pradal
Geosci. Model Dev., 15, 1595–1617,,, 2022
Short summary
27 Jan 2022
EuLerian Identification of ascending AirStreams (ELIAS 2.0) in numerical weather prediction and climate models – Part 1: Development of deep learning model
Julian F. Quinting and Christian M. Grams
Geosci. Model Dev., 15, 715–730,,, 2022
Short summary
27 Jan 2022
EuLerian Identification of ascending AirStreams (ELIAS 2.0) in numerical weather prediction and climate models – Part 2: Model application to different datasets
Julian F. Quinting, Christian M. Grams, Annika Oertel, and Moritz Pickl
Geosci. Model Dev., 15, 731–744,,, 2022
Short summary
24 Jan 2022
An inventory of supraglacial lakes and channels across the West Antarctic Ice Sheet
Diarmuid Corr, Amber Leeson, Malcolm McMillan, Ce Zhang, and Thomas Barnes
Earth Syst. Sci. Data, 14, 209–228,,, 2022
Short summary
20 Dec 2021
Model calibration using ESEm v1.1.0 – an open, scalable Earth system emulator
Duncan Watson-Parris, Andrew Williams, Lucia Deaconu, and Philip Stier
Geosci. Model Dev., 14, 7659–7672,,, 2021
Short summary
24 Jun 2021
AQ-Bench: a benchmark dataset for machine learning on global air quality metrics
Clara Betancourt, Timo Stomberg, Ribana Roscher, Martin G. Schultz, and Scarlet Stadtler
Earth Syst. Sci. Data, 13, 3013–3033,,, 2021
Short summary
CC BY 4.0