Submitted as: development and technical paper 30 Sep 2020

Submitted as: development and technical paper | 30 Sep 2020

Review status: a revised version of this preprint is currently under review for the journal GMD.

Data reduction for inverse modeling: an adaptive approach v1.0

Xiaoling Liu1, August L. Weinbren1, He Chang1, Jovan Tadić2, Marikate E. Mountain3, Michael E. Trudeau4, Arlyn E. Andrews4, Zichong Chen1, and Scot M. Miller1 Xiaoling Liu et al.
  • 1Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, MD, USA
  • 2Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  • 3Atmospheric and Environmental Research, Inc., Lexington, MA, USA
  • 4Global Monitoring Laboratory, National Oceanic and Atmospheric Administration, Boulder, CO, USA

Abstract. The number of greenhouse gas (GHG) observing satellites has greatly expanded in recent years, and these new datasets provide an unprecedented constraint on global GHG sources and sinks. However, a continuing challenge for inverse models that are used to estimate these sources and sinks is the sheer number of satellite observations, sometimes in the millions per day. These massive datasets often make it prohibitive to implement inverse modeling calculations and/or assimilate the observations using many types of atmospheric models. Although these satellite datasets are very large, the information content of any single observation is often modest and non-exclusive due to redundancy with neighboring observations and due to measurement noise. In this study, we develop an adaptive approach to reduce the size of satellite datasets using geostatistics. A guiding principle is to reduce the data more in regions with little variability in the observations and less in regions with high variability. We subsequently tune and evaluate the approach using synthetic and real data case studies for North America from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite. The proposed approach to data reduction yields more accurate CO2 flux estimates than the commonly-used method of binning and averaging the satellite data. We further develop a metric for choosing a level of data reduction; we can reduce the satellite dataset to an average of one observation per ~80–140 km for the specific case studies here without substantially compromising the flux estimate, but we find that reducing the data further quickly degrades the accuracy of the estimated fluxes. Overall, the approach developed here could be applied to a range of inverse problems that use very large trace gas datasets.

Xiaoling Liu et al.

Status: final response (author comments only)
Status: final response (author comments only)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Login for authors/topical editors] [Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement

Xiaoling Liu et al.

Data sets

Geostatistical inverse modeling with large atmospheric data: data files for a case study from OCO-2 Scot M. Miller, Arvind K. Saibaba, Michael E. Trudeau, Arlyn E. Andrews, Thomas Nehrkorn, and Marikate E. Mountain

Model code and software

Data reduction for large atmospheric satellite datasets Xiaoling Liu, Scot M. Miller, and August Weinbren

Geostatistical inverse modeling with large atmospheric datasets Scot M. Miller and Arvind K. Saibaba

Xiaoling Liu et al.


Total article views: 386 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
290 86 10 386 35 8 13
  • HTML: 290
  • PDF: 86
  • XML: 10
  • Total: 386
  • Supplement: 35
  • BibTeX: 8
  • EndNote: 13
Views and downloads (calculated since 30 Sep 2020)
Cumulative views and downloads (calculated since 30 Sep 2020)

Viewed (geographical distribution)

Total article views: 239 (including HTML, PDF, and XML) Thereof 238 with geography defined and 1 with unknown origin.
Country # Views %
  • 1
Latest update: 15 Apr 2021
Short summary
Observations of greenhouse gases have become far more numerous in recent years due to new satellite observations. The sheer size of these datasets makes it challenging to incorporate these data into statistical models and use these data to estimate greenhouse gas sources and sinks. In this paper, we develop an approach to reduce the size of these datasets while preserving the most information possible. We subsequently test this approach using satellite observations of carbon dioxide.