Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools. We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioning on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data.

We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally. The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments.

We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license.

Climate change is one of the most important present-day global
environmental challenges. The underlying
reason is anthropogenic carbon emissions. According to the Intergovernmental Panel on Climate Change, carbon dioxide (

Several instruments orbiting the Earth produce enormous quantities of remote sensing data, used to compute local estimates of

There are two key advances in this work. First, we describe the computational approaches that allow satGP to tackle remote-sensing-related spatial statistics problems of enormous sizes. Second, we present formulations of a multi-scale covariance function and a space-dependent mean function, types of which we have not seen used in the remote sensing community. We also show how these functions can be efficiently learned from data.

Related to this work, several kriging studies have been published before in the
context of remotely sensed

More recently

Applications to remote sensing data have also resulted in publications more focused on methods.

For Gaussian processes, various approaches have been studied to overcome the difficulties posed by large amounts of data. For instance,

In this work we describe the satGP program, which solves very large spatiotemporal statistics problems with up to at least the order of

We validate the multi-scale covariance modeling approach by learning the covariance function parameters of a data set drawn with satGP from the prior of a multi-scale Gaussian process.
To demonstrate the computational capabilities of this early version of satGP, we computed global

Mean function

In addition to the OCO-2 work, we demonstrate the capabilities of satGP with synthetic ozone data from the Whole Atmosphere Community Climate Model (WACCM4)

The rest of the paper is organized in the following manner. Section 2 describes the methods both generally and as implemented in satGP. Section 3 discusses the computational details in satGP. Section 4 presents and discusses simulation results, including a multi-scale synthetic parameter identifiability study, an application to synthetic WACCM4-generated data, and applications using the OCO-2 V9 data. In the concluding Sect. 5, current limitations and some possible future directions are briefly mentioned.

In geosciences, kriging

With Gaussian processes, we want to learn properties of a spatiotemporal surface from some observational data of some quantity of interest. To each point in space and time corresponds a Gaussian distribution of that quantity, whose mean and variance can be calculated by solving a local regression problem. This is closely related to solving a spatiotemporal interpolation problem when the observations have Gaussian errors.

The theory of Bayesian statistics, Gaussian processes, and Markov random fields that is used in this work is well known, and therefore, many of the novel aspects in this section have to do with the computational methods and modifications that are presented, such as observation selection schemes in Sect.

This section goes through the Gaussian process formalism and presents both generic
and satGP-specific forms of mean and covariance functions. This is followed by discussion of how observation selection
is carried out for solving local subproblems and how model parameters are learned. The presentation of the general Gaussian process problem is based on

Most commonly used notation related to inputs and mean/covariance functions in Sect.

A Gaussian process is a stochastic process, which can be thought of as
an infinite-dimensional Gaussian distribution in that the joint distributions of the process at any finite set of space–time points are multivariate normal. We denote points in the spatiotemporal domain by

The Gaussian process, or Gaussian random field, is denoted by

The function

In what follows, the domain

For the mean function

The parameter vectors

The definition of

The covariance function

The paradigm of Bayesian statistics is standard for analyzing data and uncertainties, and it is also widely used in geosciences

The

Even though a full posterior distribution of the parameters is not obtained this way, the solution of the Gaussian process itself is Bayesian in that the posterior marginals at each

For prediction in the context of Gaussian random functions, the properties of multivariate normal distributions are exploited for calculating marginals of the random field

Equation (

When satGP is not used for learning GP covariance parameters or generating synthetic training sets, the finite set of test inputs

In addition to solving this spatial problem, the marginal distributions of the

A Markov random field is a probabilistic model that describes the conditional independence structure in a set of random variables. In satGP, an MRF is used to describe how the

Technically, the MRF in satGP is an undirected graphical model

The set of edges

The satGP program needs to compute the marginal distributions of each

The marginal distribution

The modification of the algorithm loses the ability of the upper-right and lower-left corners to communicate effectively, but since most remote sensing data sets contain at least some observations for some time period for most nodes, the far-away information does not affect results in many practical scenarios. Techniques such as generalized belief propagation

The results should not change due to changes in the user-chosen grid resolution, and for this reason satGP inversely weights the edges exponentially according to the distances between the (geographical) coordinates corresponding to the connected nodes. This rate of exponential decay is user configurable by the

Assume that for the vertex

In the particular form of the mean function

Select

Find a best-guess

Given

Find the posterior marginal distribution of

Using the

The smoothness, amplitude, and length scales of the Gaussian process realizations are determined by the covariance kernel used.
The satGP program supports several different types of covariance function components for forming the full covariance function

For convenience, let

The exponential family of covariance functions with parameters

The Matérn family of covariance functions, with

A periodic kernel with

satGP contains an additional covariance function that utilizes local wind information when computing the covariances. The underlying rationale is that winds affect how quantities of interest such as gases in the atmosphere or algae blooms in surface water spread. For this reason, if wind data are available, it is natural to try to use them for inference with the Gaussian process.
We define the wind-informed covariance kernel with parameters

The spatial scaling

Equicovariance ellipses from the wind-informed kernel with various wind vectors

The covariance functions used in this work to model

The kernel components of a multi-scale kernel are in this work called

Instead of calling

Using a large number of observations makes solving the Gaussian process Eqs. (

Assume that the multi-scale kernel defined by the user contains

Algorithm for selecting observations for carrying out predictions at test input

Since the subkernels are handled sequentially, their order may affect which observations are selected due to the exclusion in Eq. (

For learning the spatially varying

Selecting the observations could also be done based on values of

From Sect.

In the presence of a huge number of observations, calculating the determinant of the full covariance

Most important satGP control variables and high-level C structs: first section contains parameters for program logic, second for domain specification, third for covariance and mean function definition, and last for observation handling. This list is by no means exhaustive – the configuration file contains lots of variables that can control the program. Some additional tweaking is possible by changing hard-coded values directly in the source code, such as those listed in Appendix

While the selection of inputs included in

The maximum likelihood estimate approximation in Eq. (

While this method is suitable for finding point estimates for the parameters

By default the scaled posterior

The software also includes a capability to learn the covariance parameters using optimization algorithms such as COBYLA or SBPLEX available in NLopt. These methods are much faster than MCMC but have the tendency of getting stuck in local minima, limiting their usefulness.

The satGP code is written in C, with visualization scripts written in Python and parallelization implemented with OpenMP directives. The program reads data from netCDF and text files and the configuration from a C header file. For linear algebra satGP uses the C interfaces of LAPACK and BLAS and LAPACKE and CBLAS, and optimization tasks are carried out with the NLopt library. The computations are performed in single precision in order to save memory resources with the largest data sets and also to improve performance.

The most important configuration variables are listed in Table

Several parameters can be tweaked to improve computational efficiency, including all of those in the second and last sections of Table

Overview of satGP execution. After initialization, data are read for training

The execution of the program is presented in Fig.

For computing the marginals, the spatial domain can be decomposed with

The

The function

The Gaussian process algorithm is an interpolation algorithm when observation noise is 0, and interpolation algorithms may misbehave when used for extrapolation. In a spatiotemporal large grid, when

In this section we present several simulation studies. The first experiment examines parameter identifiability with the multi-scale kernel using satGP-generated data. We then demonstrate how satGP posterior distributions look like compared to truth using synthetic ozone fields from the WACCM4 model.

After that we concentrate on analyzing satGP results produced using the OCO-2 Level 2 data. First, we learn the parameters of the locally varying mean function of the form in Eq. (

We performed a synthetic study to confirm the identifiability of the multi-scale covariance function parameters. The synthetic data were generated by satGP by sampling from zero-mean processes with known covariance parameters and with a random spatial pattern from the prior, adding 1 % noise. The parameters were then estimated by computing the posterior mean estimates using adaptive Metropolis.

The identifiability experiment was performed with various kernels, and recovering the true parameters was the more difficult the more complex the kernel was. With a single Matérn, exponential, or periodic kernel, the parameters could be recovered very easily. This was also true for a combination of exponential and Matérn kernels with a relatively small

The covariance kernel parameters were still recoverable with a combination of three kernels, Matérn with

The parameter limits, true values, and posterior means of the synthetic experiment with three kernels are given in Table

Lower and upper limits, with true and estimated parameter values. The three-kernel synthetic covariance function parameter estimation problem is already very difficult, here resulting in slight overestimation of the parameters of the smallest kernel.

Scaled MCMC posteriors from a synthetic study where data were generated with a multi-scale Gaussian process. The figure demonstrates that even with three subkernels, multi-scale Gaussian process kernel parameters can be recovered. The lower-left part shows the pairwise marginal distributions of the parameters, and the black crosses denote the true parameter values. The axis labels are on the left and below the figure. The upper-right triangle shows sample correlations between the parameters from the chain, with axis labels on the left and on the top. Small within-subkernel positive correlations are present. The contours shown include 85 % (black), 50 % (red), and 15 % (blue) of the posterior mass.

How well parameters can be learned from data depends always on the data and the exact Gaussian process form chosen. While the identifiability studies presented here show that the parameter calibration procedure works and that covariance parameters are recoverable in a synthetic settings, identifiability cannot be always expected. Still, even in these cases, the MAP and/or posterior mean estimates of the covariance parameters should provide good point estimates for

A synthetic study using WACCM4-generated ozone data was conducted to verify and to illustrate that the methods to learn the model parameters

The WACCM4 model is an atmospheric component of the Community Earth System Model from NCAR

Ozone data at approximately 400 locations were sampled daily over a two-year period in a random pattern from the domain of the experiment to learn the parameters of the mean and covariance functions. The training data set was then generated by interpolating to these points from the simulated WACCM4 data. This sampling procedure corresponds to creating on average one observation daily for each

Using these data, the mean function parameters were fitted locally using the method in Sect.

Covariance function parameter values learned from OCO-2 data. First column shows the Matérn kernel parameters, and the second column shows the exponential kernel parameters. The spatial length-scale parameters are given as distance on the unit sphere, with

For computing the posterior predictive distributions, the observational data

The marginal posterior predictive distributions were computed globally in a uniform grid with the resolution of

Figure

Ozone field mixing ratios at 3.7 kPa for 2 December 2002. Panel

The simulations with non-synthetic remote sensing data use the V9 data from the OCO-2 satellite.
OCO-2 was launched in 2014, and it orbits the Earth on a Sun-synchronous orbit

The present work uses the

Calibrating the mean function from OCO-2 V9

The constant term

At high latitudes

Figure

Mean values of mean function coefficients that were described as a Markov random field, calculated in a

The OCO-2 data have several natural spatial and temporal length scales. The distance between adjacent observations is only 1 to 2 km in space and some hundredths of a second in time, but the distance between consecutive orbits is thousands of kilometers in space and several hours in time. On consecutive days the satellite passes close to the trajectory of the previous day at a distance of tens to 300 km depending on the latitude. The Earth has natural temporal diurnal and annual cycles, but since OCO-2 is Sun-synchronous, only the latter matters with OCO-2 data. Since the annual cycle is already fitted by finding the mean function coefficients

The covariance parameters for the two-component kernel are given in Table

Learning the covariance parameters from OCO-2 V9 data used the following configuration parameters for satGP:

The marginal posterior predictive distribution at test points

Figures

Data from the OCO-2 can be used to demonstrate how the multi-scale kernel formulation affects the predictive posterior distributions. Figure

Comparison of a multi-scale kernel with the two components described in Sect.

Figure

The total kernel size was kept at 1024 (

The wind-informed kernel, Eq. (

The covariance parameters for a single wind kernel were learned by taking the median of an MCMC posterior, similarly as was done in Sect.

The simulation results are shown in Fig.

Optimally the wind-informed kernel should utilize winds that are not recomputed from the observations as was done here for convenience but rather directly from a weather or climate model or from a wind data product. The satGP program contains configuration options for doing this. The optimal covariance function parameter values are conditional on the wind data, so the values should be learned separately for each new application and wind data set.

In this work we introduced the first version of a fast general purpose Gaussian process software, satGP v0.1.2, which is in particular intended to be used with remote sensing data. We showed how the program solves spatial statistics problems of enormous sizes by using a spatially varying mean function, learned by computing marginals of an MRF, and by using a multi-scale covariance function, parameters of which are found either by using optimization algorithms or with adaptive Markov chain Monte Carlo. We also presented how satGP allows the conduction of synthetic parameter identification studies by sampling from Gaussian process prior and posterior distributions, and this could be done with any kernel prescribed, including a nonstationary wind-informed kernel.
The features of satGP were demonstrated first with a small-scale synthetic ozone study and then using the enormous

Various aspects of satGP can be improved in future versions, some of which include improving the observation selection/thinning scheme for statistical optimality, adding support for multivariate models and higher input dimensions, and adding methods for finding locally stationary model parameters to be able to describe heterogeneous scenes better. Despite all the room for development, satGP is a useful tool already in its present state, and it may with little additional modeling be used, e.g., to fuse data from different sources, such as GOSAT, GOSAT-2, OCO-2, TANSAT, and OCO-3. This will enable producing more precise posterior estimates, and with that a more complete picture of the evolution of for instance the atmospheric carbon dioxide distribution. Such statistically principled products that incorporate uncertainty information can then be used as a robust backbone for both making policy decisions and further scientific analyses.

The satGP software by design allows for a lot of flexibility for defining how to model the quantity of interest as a Gaussian random field. This section goes over those possibilities and some practical recommendations. The parameters in Table

Of the four sections in Table

The

The parameter

The

In the third section of Table

The parameters

Learning the covariance parameters

For learning the covariance parameters, parameter limits need to be given. These should correspond to the expected length scales in the data – e.g., long-range fluctuations with low-amplitude, and short-scale variations due to local effects. It is in practice best if the parameter ranges do not overlap.

If the exponent of the exponential kernel needs to be changed, that needs to be done by changing the

For constructing the mean function, the configuration file contains the parameter

In the last section, the

The

The variable

Whether the observations for computing the local values are chosen at random or greedily is determined by the variable

In addition to the parameters and variables listed here, there are also other parameters in the configuration file and in the code, even though those should not need to be changed. Any variables that the user might want to tweak are generally accompanied by at least some comments describing their effects.

In the current version, the satGP program is run with the script

The satGP code is available as a Supplement to this paper under the MIT license. The OCO-2 V9 data used is freely available directly from NASA. The WACCM4 model is available from UCAR as a component of the Community Earth System Model.

The supplement related to this article is available online at:

JS, AS, HH, and YM designed the study. TH produced the WACCM4-specific results. JS prepared this paper, wrote the satGP code, chose, tested, and implemented the computational methods, and performed the non-WACCM4 simulations, with contributions from all coauthors.

The authors declare that they have no conflict of interest.

We would like to thank Pekka Verronen and Monika Andersson from the Finnish Meteorological Institute for providing the WACCM4 data fields. The research was partly carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004). US Government support acknowledged.

This work was supported by the Centre of Excellence of Inverse Modelling and Imaging (CoE), Academy of Finland, decision number 312122.

This paper was edited by Klaus Gierens and reviewed by two anonymous referees.