Journal cover Journal topic
Geoscientific Model Development An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

IF value: 5.240
IF 5-year value: 5.768
IF 5-year
CiteScore value: 8.9
SNIP value: 1.713
IPP value: 5.53
SJR value: 3.18
Scimago H <br class='widget-line-break'>index value: 71
Scimago H
h5-index value: 51
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Submitted as: development and technical paper 22 Oct 2020

Submitted as: development and technical paper | 22 Oct 2020

Review status
This preprint is currently under review for the journal GMD.

Strengths and weaknesses of three Machine Learning methods for pCO2 interpolation

Jake Stamell1, Rea R. Rustagi1, Lucas Gloege1, and Galen A. McKinley1,2 Jake Stamell et al.
  • 1Columbia University, New York, NY 10027, USA
  • 2Lamont-Doherty Earth Observatory, Palisades, NY 10964, USA

Abstract. Using the Large Enemble Testbed, a collection of 100 members from four independent Earth system models, we test three general-purpose Machine Learning (ML) approaches to understand their strengths and weaknesses in statistically reconstructing full-coverage surface ocean pCO2 from sparse in situ data. To apply the Testbed, we sample the full-field model pCO2 as real-world pCO2 collected from 1982–2016 for each ensemble member. We then use ML approaches to reconstruct the full-field and compare with the original model full-field pCO2 to assess reconstruction skill. We use feed forward neural network (NN), XGBoost (XGB), and random forest (RF) approaches to perform the reconstructions. Our baseline is the NN, since this approach has previously been shown to be a successful method for pCO2 reconstruction. The XGB and RF allow us to test tree-based approaches. We perform comparisons to a test set, which consists of 20% of the real-world sampled data that are withheld from training. Statistical comparisons with this test set are equivalent to that which could be derived using real-world data. Unique to the Testbed is that it allows for comparison to all the "unseen" points to which the ML algorithms extrapolate. When compared to the test set, XGB and RF both perform better than NN based on a suite of regression metrics. However, when compared to the unseen data, degradation of performance is large with XGB and even larger with RF. Degradation is comparatively small with NN, indicating a greater ability to generalize. Despite its larger degradation, in the final comparison to unseen data, XGB slightly outperforms NN and greatly outperforms RF, with lowest mean bias and more consistent performance across Testbed members. All three approaches perform best in the open ocean and for seasonal variability, but performance drops off at longer time scales and in regions of low sampling, such as the Southern Ocean and coastal zones. For decadal variability, all methods overestimate the amplitude of variability and have moderate skill in reconstruction of phase. For this timescale, the greater ability of the NN to generalize allows it to slightly outperform XGB. Taking into account all comparisons, we find XGB to be best able to reconstruct surface ocean pCO2 from the limited available data.

Jake Stamell et al.

Interactive discussion

Status: open (until 06 Jan 2021)
Status: open (until 06 Jan 2021)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement

Jake Stamell et al.

Data sets

ML methods for pCO2 reconstruction - Large Ensemble Testbed - NN/XGB/RF Jake Stamell and Galen A. McKinley

Jake Stamell et al.


Total article views: 452 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
360 91 1 452 3 5
  • HTML: 360
  • PDF: 91
  • XML: 1
  • Total: 452
  • BibTeX: 3
  • EndNote: 5
Views and downloads (calculated since 22 Oct 2020)
Cumulative views and downloads (calculated since 22 Oct 2020)

Viewed (geographical distribution)

Total article views: 267 (including HTML, PDF, and XML) Thereof 265 with geography defined and 2 with unknown origin.
Country # Views %
  • 1



No saved metrics found.


No discussed metrics found.
Latest update: 01 Dec 2020
Publications Copernicus
Short summary
Using simulated surface ocean pCO2 from Earth System Models, we test three Machine Learning methods (neural network, XGBoost, random forest) to discern their ability to reconstruct global coverage from sparse observations. Synthetic data means we can train based on real-world sampling patterns and then evaluate against the known full coverage result of the original simulation. ML approaches perform best in the open ocean, but struggle in regions of low sampling. XGBoost saw the best performance.
Using simulated surface ocean pCO2 from Earth System Models, we test three Machine Learning...