This directory contains the satGP program, which is a Gaussian process
software meant to be used with remote sensing data sets. The
capabilities and computational details are described in the following
paper:

Susiluoto et al. "Efficient multi-scale Gaussian process regression
for massive remote sensing data with satGP v0.1.2", Geoscientific
Model Development, 2020.

The software code in version 0.1.2 has been clened up and restructured
and the version number has been bumped to 0.2.1. There are no
differences to how the computations are carried out between these two
version. The satGP code is available under the MIT license, see
LICENSE for the full text.

Any bugs can be reported (for now) to jouni.i.susiluoto@jpl.nasa.gov.
The git tree of this code will hopefully become available online
sooner or later.


1. Installation and dependencies

satGP is written in C99, and should hence work on any platform with a
C compiler. However, so far all development has been carried out on
Debian (buster) and Ubuntu Linux (18.04) systems using the gcc
compiler.

The compilation uses the meson build system, https://mesonbuild.com/
. The current dependencies are LAPACKE, BLAS, netCDF, NLOpt, and
OpenMP. If jemalloc is available, that can be used, but it is not
required. Plotting is done with python3 and matplotlib. An option for
valgrinding is included in the runscript gproc.sh as a standard
option. On a Debian-based Linux, install packages with apt with
something like:

$ apt install gcc meson liblapacke-dev libnetcdf-dev libblas-dev \
    libnlopt-dev libjemalloc-dev valgrind

The exact names of the packages changes from system to system.


2. Running satGP

===>>> Please also read Susiluoto et al. 2020, 
       especially Sect. 3 and Appendix A. <<<===

The software is compiled every time it is run. The compilation and
running are done by the gproc.sh script. It takes in two arguments: an
experiment name and a number specifying what kind of a model run is
done. The top of that script lists the options, e.g. doing

$ ./gproc.sh test 1

will create a directories experiments/test-build and experiments/test,
and in the latter one of these a regular multi-threaded experiment
will be run. By default, the number of threads is 12, but that can be
changed in the file easily.

satGP requires data for running experiments. The config file,
gpconfig.h, defaults to using OCO2-v9 data, which should be put in the
data/oco2-v9 directory.

The configuration of satGP is mostly performed by changing the file
gpconfig.h, which contains lots of comments to explain the various
options. There are quirks and issues, and you'll possibly need to use
a debugger to learn what's wrong (run by ./gproc.sh dirname 3), but
the code contains a lot of comments to help out with sorting out any
problems.


3. Plotting results

satGP code comes with the scripts used to plot results as in the
Susiluoto et al. paper. The plotting scripts reside in the plotting
subdirectory, and probably the most useful ones are
plot_field_and_unc.py and plot_mean_function_betas.py. These probably
may need to be modified somewhat to work with other data.


4. Using data other than OCO-2 in satGP

The current version of satGP requires modifying the code to change the
data used to drive the Gaussian process. The Susiluoto et al. 2020
article lists some of these modifications in the appendix, but here
are some others:

- To change the mean function, functions

    mean_functions.h:global_mean_on_day(),
    mean_functions.h:grad_helper_global_mean_on_day()
    mean_functions.h:fit_beta_parameters_with_unc()

  need to be changed. In the last one of these, it is the construction
  of the F matrix that needs to be modified.

- There is a constant offset to the time variable, and if data that is
  earlier than 2009 is used, that may need to be changed (but it might
  just work). The relevant places are found by grepping for 10004800
  in the source files.

- The data reading happens from netCDF files in
  gputils:readnetCDF(). Currently there is a line saying
 
    if (((E->mode) || (dv[i] > 380)) && /* Discard CO2 readings under 380 ppm */                   
 
  and since for other quantities of interest data values (dv) under
  380 may be just fine, this condition of course needs to be
  changed. Any data filtering is best done at this point anyway.

- In the plotting scripts parameter limits need to be changed if data
  other than CO2 is used.


5. Shortcomings in the current version:
  
- Reading data from e.g. text files is not implemented at the moment
  for running GP in a grid, but all the functions to do that exist in
  the code base, so if needed, this should be easy to accomplish by
  just looping over the observations and repeatedly calling
  gputils.h:add_datapoint_to_S()

- The mean function code has been written in an OCO-2 specific
  fashion, and e.g. modifying the form of the mean function should be
  easier than it currently is.

- S->first_day_noon should actually go to config or be automatically
  resolved from data

- Dealing with reading and writing different types of data has not
  been properly implemented.


6. Function names in code corresponding to those in Figure 5
   (algorithm) in Susiluoto et al. 2020. Note that these are just the
   first called functions, which then call various other routines.

   | Function name in paper   | File                   | Function in code          |
   |--------------------------+------------------------+---------------------------|
   | ReadData()               | gaussian_proc.h        | add_datafile()            |
   | AddToState()             | gputils.h              | add_datapoint_to_S()      |
   | FindLocalMeanfunCoeffs() | mean_functions.h       | fit_all_beta_parameters() |
   | ReInitializeState()      | -/gpstructs.h          | free()/initialize_state() |
   | FindCovFunCoeffs()       | calibrate.h            | find_GP_parameters()      |
   | Decompose()              | gp.c                   | (not a separate function) |
   | SelectObservations()     | covariance_functinos.h | pick_observations()       |
   | ComputeMarginal          | gaussian_proc.h        | predict()                 |

   Even though it is not mentioned in the paper, for finding the
   covariance kernel, data is added with
   gaussian_proc.h:add_datapoint_to_S_for_calibration() instead of
   function gaussian_proc.h:add_datapoint_to_S().
