A method to reconstruct missing data in sea surface temperature data using a neural network is presented. Satellite observations working in the optical and infrared bands are affected by clouds, which obscure part of the ocean underneath. In this paper, a neural network with the structure of a convolutional auto-encoder is developed to reconstruct the missing data based on the available cloud-free pixels in satellite images. Contrary to standard image reconstruction with neural networks, this application requires a method to handle missing data (or data with variable accuracy) in the training phase. The present work shows a consistent approach which uses the satellite data and its expected error variance as input and provides the reconstructed field along with its expected error variance as output. The neural network is trained by maximizing the likelihood of the observed value. The approach, called DINCAE (Data INterpolating Convolutional Auto-Encoder), is applied to a 25-year time series of Advanced Very High Resolution Radiometer (AVHRR) sea surface temperature data and compared to DINEOF (Data INterpolating Empirical Orthogonal Functions), a commonly used method to reconstruct missing data based on an EOF (empirical orthogonal function) decomposition. The reconstruction error of both approaches is computed using cross-validation and in situ observations from the World Ocean Database. DINCAE results have lower error while showing higher variability than the DINEOF reconstruction.

The ocean temperature is an essential variable to study the dynamics of the ocean, because density is a function of temperature and therefore the ocean velocity variability depends partially on ocean temperature. The amount of heat stored in the ocean is also critical for weather predictions at various scales (e.g., hurricane path prediction in the short range, as well as for seasonal and climate predictions).

The ocean sea surface temperature (SST) has been routinely measured since the beginning of the 1980s. However, as for any measuring technique working in the infrared or visible bands, clouds often obscure large parts of the field of view. Several techniques have been proposed for reconstructing gappy satellite data, but often small-scale information is filtered out.

DINEOF

Neural networks are mathematical models that can efficiently extract nonlinear relationships from a mapping problem (i.e., an input/output relationship that can be determined through a mathematical function). Neural networks are therefore especially well positioned to learn nonlinear, stochastic features measured at the sea surface by satellite sensors, and their use might prove efficient in retaining these structures when analyzing satellite data, e.g., for reconstructing missing data.

Neural networks can be composed of a wide variety of building blocks, such as fully connected layers

The use of neural networks in the frame of Earth observation has been increasing recently.

The objective of this article is to present a neural network in the form of a convolutional auto-encoder which can be trained on gappy satellite observations in order to reconstruct missing observations and also to provide an error estimate of the reconstruction. This neural network is referred to in the following as DINCAE (Data INterpolating Convolutional Auto-Encoder). An auto-encoder is a particular type of network which can compress and decompress the information in an input dataset

In Sect.

For this study we used the longest available time series coming from the Advanced Very High Resolution Radiometer (AVHRR) dataset

For this study, only SST data with quality flags of 4 or higher are retained

To assess the accuracy of the reconstruction method, cross-validation is used

Initially, the average cloud coverage of the dataset is 46 % (over all 25 years). The cloud coverage for the last 50 scenes is increased to 77 % when the cross-validation points are excluded. A significant part of the scene is obscured after marking the data for cross-validation, but in the Mediterranean Sea the cloud coverage is relatively low compared to the globally averaged cloud coverage, which is 75 %

The red rectangle delimits the studied region, and the color represents the bathymetry in meters. The arrows represent the main currents: the Western Corsican Current (WCC), the Eastern Corsican Current (ECC) and the Northern Current (NC).

Convolutional and other deep neural networks are extensively used in computer vision, and they find an increasing number of applications in Earth sciences

The handling of missing data is done in analogy to data assimilation in numerical ocean models. The standard optimal interpolation equations

The time average has been removed from the SST dataset (computed over all years but excluding the cross-validation dataset). The neural network thus works with anomalies relative to this mean SST. To obtain reasonable results, the network uses more input than merely SST divided by its error variance and the inverse of the error variance. The total list of input parameters is consequently the following:

SST anomalies scaled by the inverse of the error variance (the scaled anomaly is zero if the data are missing),

inverse of the error variance (zero if the data are missing),

scaled SST anomalies and inverse of error variance of the previous day,

scaled SST anomalies and inverse of error variance of the next day,

longitude (scaled linearly between

latitude (scaled linearly between

cosine of the day of the year divided by 365.25,

sine of the day of the year divided by 365.25.

The complete dataset is thus represented by an array of the size

SST scaled by the inverse of the expected error variance,

logarithm of the inverse of the expected error variance.

The overall structure of the neural network (Table 2) is a convolutional auto-encoder

A rectified linear unit (RELU) activation function is commonly used in neural networks, which is defined as

However, in our case it quickly leads (in 10 epochs) to a zero gradient and thus to no improvement in training. This problem is solved by choosing a leaky RELU

List of all steps in DINCAE. The additional dimension of the size of the minibatch is omitted in the output sizes below. Max pooling and average pooling are tested for the pooling layers.

The input dataset is randomly shuffled (over the time dimension) and partitioned into so-called minibatches of 50 images, as an array of the size

For every input image, more data points were masked (in addition to the cross-validation) by using a randomly chosen cloud mask during training. The cloud mask of a training image would thus be the union of the cloud mask of the input dataset and a randomly chosen cloud mask. This allows us to assess the capability of the network to recover missing data under clouds. Without the additional clouds, the neural network would simply learn to reproduce the SST values that are already received as input. At every epoch a different mask is applied to a given image to mitigate overfitting and aid generalization.

The aim of DINCAE is to provide a good SST reconstruction but also an assessment of the accuracy of the reconstruction. The output of the neural network is assumed to be a Gaussian probability distribution function (pdf) characterized by a mean

The cost function finally has the following form:

The loss function per individual scalar sample is the term in brackets of the previous equation. The first term is directly related to the mean square error but scaled by the estimated error standard deviation. The second term penalizes any over-estimation of the error standard deviation. The third term is a constant term which can be neglected in the following as it does not influence the gradient. The sum in the previous equation runs over all grid points where a measurement is available but excluding the measurements withheld for cross-validation as the latter are never used during training.

We used the Adam optimizer

During the development of the neural network, it was clear that it tended to overfit the provided observations, leading to degraded results when comparing the network to cross-validation data. Commonly used strategies were therefore used to avoid overfitting, namely introducing a dropout layer between the fully connected layers of the network. The dropout layer randomly sets, with a probability of 0.3, the output of these intermediate layers to zero during the training of the network. We also added some Gaussian-distributed noise to the input of the network with a zero mean and a standard deviation of 0.05

It is useful to compare the proposed approach to the traditional auto-encoder to highlight the different choices that have been adopted. The essential steps to implement and validate an auto-encoder are the following:

Some data are marked for validation and never used during training.

The network is given some data as input and produces an output which should be as close as possible to the input. All training data are thus given at all epochs to the network.

The network is validated using the validation dataset that was set aside.

In essence, the traditional auto-encoder optimizes how well the provided input data can be recovered after dimensionality reduction. In the present approach, there are two steps where data are intentionally hidden to the network:

The validation data that were set aside and never used during the training, similar to the traditional auto-encoder.

Some additional data in every minibatch were set aside to compute the reconstruction error and its gradient (unlike the traditional auto-encoder). This additional subset is chosen at random.

This is done because the main purpose of the network is to assess the ability of the network to reconstruct the missing data using the available data. The proposed method is not withholding less data than the traditional auto-encoder. The downside of the approach is that the cost function fluctuates more, because it is computed only over a relatively smaller set of data. But for us this is acceptable (and controlled by taking the average of the output of the network at several epochs, as explained later), because the cost function reflects more closely the objective: reconstructing missing data from the available data (instead of reproducing the input data as it is the case of the traditional auto-encoder).

The traditional auto-encoder approach trained using only clear images was not considered, because only 13 images out of 5266 have a cloud coverage of less than 5 %. So the ability to handle missing data was a requirement for us from the start.

The results of the DINCAE method are compared to the reconstruction obtained by the DINEOF method

The classical DINEOF technique reconstructs the cross-validation data points withheld in the last 50 images with an error of 0.4629

Comparison with the independent cross-validation data and the dependent data used for training (in

The Fig.

The cost function computed internally for every minibatch during the optimization.

The neural network is updated using the gradient for every minibatch during training, and after every 10 epochs the current state of the neural network is used to infer the missing data over the whole time series, and in particular reconstructing the missing data in the cross-validation dataset. But, importantly, the network is not updated using the cross-validation data.

Figure

Instead of using the average, the median reconstruction was also tested, as the median is more robust to outliers. The results were very similar and slightly better with the average instead of the median SST. In the following, only the average estimate is used.

RMSE difference with cross-validation dataset as a function of iteration. The solid blue line represents the DINCAE reconstruction at different steps of the iterative minimization algorithm. The dashed cyan line is the DINEOF reconstruction and the dashed red line is the average DINCAE reconstruction between epochs 200 and 1000.

Training this network for 1000 epochs takes 4.5 h on a GeForce GTX 1080 and Intel Core i7-7700 with the neural network library TensorFlow

The original SST versus the reconstructed SST for the cross-validation dataset. The color represents the estimated expected error standard deviation.

Figure

Scaled errors are computed as the difference between the reconstructed SST and the actual measured SST (withheld during cross-validation) divided by the expected standard deviation error.

To obtain a clearer idea of the reliability of the expected error, we computed the difference between the cross-validation SST and the reconstructed SST divided by the expected error standard deviation. A histogram of the scaled differences is shown in Fig.

An interpolation technique, which is commonly used in operational context, is optimal interpolation. This technique is able to provide an expected error variance of the interpolated fields based on a series of assumptions, in particular that the errors are Gaussian distributed with a known covariance and zero mean. Given these assumptions, the error variance of the optimal interpolation algorithm is only found to be weakly related to the observed RMSE in a study by

Different variants of the neural network are tested in order to optimize its structure.
The number of skip connections has a quite significant impact on the results. The cross-validation RMSE is reduced from 0.4458

Increasing the number of filters of the convolutional layers from 16, 24, 36, 54 to 16, 32, 48, 64 (with the input convolution layer fixed by 10 filters as it has to correspond to the number of inputs)
and increasing the number of neurons of the bottleneck accordingly
leads to a slight degradation for the present case compared to the cross-validation dataset, which indicates that the neural network starts to overfit if the number of filters is increased. A subsequent test with narrower convolutional layers of sizes 16, 22, 31 and 44 leads to very similar but slightly worse results with 0.3928

The DINCAE neural network with an increasing or decreasing number of layers (5 or 3 convolutional layers) did not improve the results. However, it is possible that the depth of the neural network is dependent on the available training dataset and that for more extensive data increasing the number of layers could have a positive effect.

Max pooling layers are commonly used in image classification problems

For every time instance, we use the data from three time instances in the reconstruction: the current day, as well as the data from the previous and next days. As a variant of the previous reconstruction experiment, we increased the number of time instances from 3 to 5, centered at the current time instance. However, the cross-validation error for this experiment is 0.433

In all cases the biases are relatively small and the present discussion is essentially also valid when considering the centered RMSE (i.e., the RMSE difference when the bias is removed). In the following, we only use DINCAE with all skip connections and four convolutional layers with the number of filters set to 16, 24, 36 and 54 and average pooling for future comparison.

Panel

Figure

Panel

In some cases the DINCAE reconstruction also introduces some artifacts as some zonal and meridional gradients near the open boundaries (Fig.

Comparison with the World Ocean Database for SST grid points covered by clouds. The RMSE, CRMSE and bias are in degrees Celsius.

Standard deviation computed around the seasonal average in degrees Celsius.

To further quantify how well the reconstruction methods could recover data under a cloud cover, we use in situ temperature from the World Ocean Database 2018

In Fig.

This paper presents a consistent way to handle missing data in satellite images for neural networks. Essentially, the neural network uses the measured data divided by its expected error variance. Missing data are thus treated as data with an infinitely large error variance. The cost function of the neural network is chosen such that the network provides the reconstruction but also the confidence of the reconstruction error (quantified by the expected error variance). An over- or underestimation of the expected error variance are both penalized by maximizing the likelihood and assuming Gaussian distributed errors. This approach can be easily generalized to parametric probability distributions, in particular to log-normal distributions for concentrations like remote sensed chlorophyll

The presented reconstruction method DINCAE compared favorably to the widely used DINEOF reconstruction method, which is based on a truncated EOF analysis. Formally, there are similarities between an auto-encoder (composed of just two fully connected layers) and an EOF projection followed by an EOF reconstruction

The expected error for the reconstruction reflects well the areas covered by the satellite measurements as well as the areas with more intrinsic variability (like meanders of the Northern Current). The expected error predicted by the neural network provides a good indication of the accuracy of the reconstruction.

The accuracy of the reconstructed data under clouds was also assessed by comparing the results to in situ observations of the World Ocean Database 2018. Also compared to this dataset, the RMSE of the DINCAE reconstruction is lower than the corresponding results from DINEOF.

It is quite common that data analysis methods to reconstruct missing data tend to smooth the available observations in order to fill the area of missing observations. Therefore, the temporal variability (relative to the seasonal cycle) of the reconstructed sea surface temperature was computed from the original data and from the reconstructed data using DINCAE and DINEOF. The variability of the reconstructed SST with DINEOF generally underestimated the variability in the original dataset, but the variability of the DINCAE reconstruction matched the variability of the original data relatively well.

The tests conducted in this paper show that DINCAE is able to provide a good reconstruction of missing data in satellite SST observations and retaining more variability than the DINEOF method. In addition, the expected error variance of the reconstruction is estimated by avoiding several assumptions (difficult to justify in practice) of other methods like optimal interpolation.

The open-source code is released under the terms of the GNU General Public Licence v3 (or, at your discretion, any later version) and available at the following address: and available at the following address:

AB designed and implemented the neural network. AAA made the DINEOF simulations. AB, AAA, ML and JMB contributed to the planning and discussions and to the writing of the article.

The authors declare that they have no conflict of interest.

We thank the anonymous reviewers and Zhaohui Han for carefully reading the article and providing constructive remarks and interesting interpretations of the results.

The F.R.S.-FNRS (Fonds de la Recherche Scientifique de Belgique) is acknowledged for funding the position of Alexander Barth. This research was partly performed with funding from the Belgian Science Policy Office (BELSPO) STEREO III program in the framework of the MULTI-SYNC project (contract SR/00/359). Matjaz Licer would like to acknowledge COST action ES1402 – “Evaluation of Ocean Syntheses” for funding his contribution to this work. Computational resources have been provided in part by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the F.R.S.-FNRS under grant no. 2.5020.11 and by the Walloon Region. The AVHRR v5 dataset was obtained from the NASA EOSDIS Physical Oceanography Distributed Active Archive Center (PO.DAAC) at the Jet Propulsion Laboratory, Pasadena, CA. The National Centers for Environmental Information (NOAA, USA) and the (International Oceanographic Data and Information Exchange (IODE) are thanked for the World Ocean Database 2018.

This research has been supported by the Belgian Science Policy Office (grant no. contract SR/00/359), the F.R.S.-FNRS (Belgium) (grant no. 2.5020.11), and the COST Action (grant no. ES1402).

This paper was edited by Patrick Jöckel and reviewed by two anonymous referees.