4DVarNet-SSH: end-to-end learning of variational interpolation schemes for nadir and wide-swath satellite altimetry

Beauchamp, Maxime; Febvre, Quentin; Georgenthum, Hugo; Fablet, Ronan

doi:https://doi.org/10.5194/gmd-16-2119-2023

Articles | Volume 16, issue 8

https://doi.org/10.5194/gmd-16-2119-2023

Articles | Volume 16, issue 8

Development and technical paper

19 Apr 2023

Development and technical paper |

| 19 Apr 2023

4DVarNet-SSH: end-to-end learning of variational interpolation schemes for nadir and wide-swath satellite altimetry

Maxime Beauchamp, Quentin Febvre, Hugo Georgenthum, and Ronan Fablet

Abstract

The reconstruction of sea surface currents from satellite altimeter data is a key challenge in spatial oceanography, especially with the upcoming wide-swath SWOT (Surface Water and Ocean and Topography) altimeter mission. Operational systems, however, generally fail to retrieve mesoscale dynamics for horizontal scales below 100 km and timescales below 10 d. Here, we address this challenge through the 4DVarnet framework, an end-to-end neural scheme backed on a variational data assimilation formulation. We introduce a parameterization of the 4DVarNet scheme dedicated to the space–time interpolation of satellite altimeter data. Within an observing system simulation experiment (NATL60), we demonstrate the relevance of the proposed approach, both for nadir and nadir plus SWOT altimeter configurations for two contrasting case study regions in terms of upper ocean dynamics. We report a relative improvement with respect to the operational optimal interpolation between 30 % and 60 % in terms of the reconstruction error. Interestingly, for the nadir plus SWOT altimeter configuration, we reach resolved space–timescales below 70 km and 7 d. The code is open source to enable reproducibility and future collaborative developments. Beyond its applicability to large-scale domains, we also address the uncertainty quantification issues and generalization properties of the proposed learning setting. We discuss further future research avenues and extensions to other ocean data assimilation and space oceanography challenges.

Download & links

How to cite.

Received: 28 Sep 2022 – Discussion started: 15 Nov 2022 – Revised: 29 Jan 2023 – Accepted: 10 Mar 2023 – Published: 19 Apr 2023

1 Introduction

Satellite altimetry is the main data source for the observation and reconstruction of sea surface dynamics on a global scale (Chelton et al., 2001). Current satellite altimeters only deliver along-track nadir observations. This results in a very scarce sampling of the ocean surface. Interpolation schemes are then key components of the operational processing of satellite altimetry data. Current operational products (Taburet et al., 2019; Lellouche et al., 2018), however, show a limited ability to retrieve the full range of mesoscale dynamics. The upcoming wide-swath altimetry SWOT (Surface Water and Ocean and Topography) mission (see, e.g., Gaultier et al., 2015) will provide, for the first time, a two-dimensional observation of the sea surface height (SSH). The space–time sampling of satellite altimeters will, however, still remain scarce for a long time, which has motivated the recent surge in research literature towards the finding an improvement of the interpolation of satellite-derived SSH fields (see, e.g., Lopez-Radcenco et al., 2019, Lguensat et al., 2017, Beauchamp et al., 2021, and Ballarotta et al., 2019).

Besides operational schemes based on optimal interpolation techniques (Taburet et al., 2019) and data assimilation schemes for ocean circulation models (Benkiran et al., 2021), we may sort the proposed SSH interpolation schemes into three main categories, namely extension of optimal interpolation approaches towards multiscale schemes (Ardhuin et al., 2020), data assimilation schemes using sea surface dynamical priors such as quasi-geostrophic (QG) dynamics (Le Guillou et al., 2020), and data-driven interpolation methods. The latter comprises both EOF-based (empirical orthogonal function) techniques (Beckers and Rixen, 2003 b; Alvera-Azcárate et al., 2009), analog approaches (Lguensat et al., 2017; Tandeo et al., 2020), and, more recently, deep learning schemes (Fablet et al., 2020; Fablet and Chapron, 2022; Manucharyan et al., 2021; Beauchamp et al., 2020).

Here, we further explore this avenue and, more specifically, the 4DVarNet framework recently introduced in Fablet et al. (2021). As it relies on a variational data assimilation formulation, it appears to be particularly suited to the space–time interpolation of sea surface variables from irregularly sampled observations. We propose a parameterization of the 4DVarNet scheme dedicated to SSH interpolation from satellite altimeter data and report OSSE (observing system simulation experiment) results to support the relevance of the proposed scheme. Our main contributions are as follows:

The proposed 4DVarNet-SSH scheme delivers an end-to-end neural architecture using raw satellite altimeter data and optimally interpolated fields as input. We also address uncertainty quantification issues using an ensemble method.
For the OSSE on two case study regions, respectively, along the GULFSTREAM, and for an open-ocean area dominated by mesoscale eddy dynamics, the 4DVarNet-SSH scheme outperforms previous work and significantly improves the performance metrics with respect to the operational processing. We also support the relevance of wide-swath SWOT altimeter data to significantly improve the reconstruction of sea surface dynamics compared to nadir-only satellite altimeters.
We deliver an open-source code for the proposed 4DVarNet-SSH scheme. It relies on PyTorch and associated state-of-the-art packages. As such, it supports multi-GPU configuration and can scale up to large-scale domains.

We believe that these contributions are able to help in the development of deep learning approaches for satellite altimetry and, more broadly, for operational oceanography.

This paper is organized as follows. Section 2 briefly reviews the key methodological aspects and related work. We describe the proposed 4DVarNet-SSH approach in Sect. 3, and Sect. 4 presents the considered OSSE setting. We report our results in Sect. 5 and further discuss our main contributions in Sect. 6.

2 Background and related work

From a methodological point of view, interpolation problems in geoscience are classically regarded as data assimilation issues (Asch et al., 2016). They aim at estimating the state x_t of a multidimensional dynamical system as follows:

\begin{matrix} (1) & \{\begin{cases} \frac{d x_{t}}{d t} & = M (x_{t}) + η_{t} \\ y_{t} & = H_{t} (x_{t}) + ε_{t} \end{cases} . \end{matrix}

The first equation relates to the forecast step which describes the evolution of the system from time t to t+dt, according to the potentially nonlinear model $x_{k + 1} = M (x_{k})$ . The second equation introduces the observations y_t at time t, where ℋ_t is the corresponding observation operator, which is usually known and also potentially learnable. η(t) is the model error and ε(t) the observation error. Both errors are generally assumed to be Gaussian, unbiased, and uncorrelated over time. When discretized on a spatiotemporal grid, where index $k = 1, \dots, T$ refers to time t_k, their associated covariance matrices are $Q_{k} \in R^{m \times m}$ and $R_{k} \in R^{p_{k} \times p_{k}}$ .

Broadly speaking, a vast family of data assimilation methods stems from the minimization of some energy or function which involves two terms, a dynamical prior, and an observation term. We may distinguish two main categories of data assimilation approaches (Evensen, 2009), namely variational and statistical data assimilation. Specifically, within a variational data assimilation framework, the state analysis x^a results in a gradient-based minimization of the defined variational cost $J (x) = J_{Φ} (x, y, Ω)$ (Asch et al., 2016). The latter generally combines the sum of an observation term and a regularization term involving an operator Φ, as follows:

\begin{array}{l} J_{Φ} (x, y, Ω) & = \frac{1}{2} \sum_{i = k - L}^{k + L} | | y_{i} - H_{i} (x_{i}) | |_{R_{i}^{- 1}}^{2} \\ + \frac{1}{2} \sum_{i = k - L}^{k + L} | | x_{i} - Φ_{i} (x_{i}) | |_{Q_{i}^{- 1}}^{2} . \end{array}

The prior operator Φ_k is a time-stepping operator at time t_k. In a model-based data assimilation, it generally relates to a dynamical model ℳ to provide a background estimation (i.e., the physical prior) corresponding to the deterministic forecast. $x = {x_{k - L}, \dots, x_{k + L}}$ , respectively, denotes the temporal vectors of sizes $(2 L + 1) \times m$ , and $\prod_{i = k - L}^{k + L} p_{i}$ associated with the state of the system and the observations within the data assimilation window (DAW) of length 2L+1 centered around t_k. ℋ_k is the observation operator, and Ω={Ω_k} is the set of subdomains of 𝒟, with observations at times t_k, $k = 1, \dots, T$ . Last, Q_k and R_k are the background and the observation error covariance matrices. This formulation of functional $J_{Φ} (x, y, Ω)$ directly relates to weak constraint 4D-Var (see, e.g., Carrassi et al., 2018), which aims at estimating the optimal trajectory of a system in a predefined DAW, given the additional constraint of model uncertainty in the objective function 𝒥.

Regarding statistical data assimilation, many state-of-the-art methods rely on optimal interpolation (OI), which is the basic block of all statistical methods, especially regarding SSH-related datasets. OI has been used for decades (Taburet et al., 2019) for the interpolation of along-track nadir altimeter datasets and is still used today for the operational marine (Copernicus Marine Environment Monitoring Service, CMEMS) and climate (Copernicus Climate Change Service, C3S) production of the EU Copernicus program. It involves a significant smoothing for solving spatial scales up to 150 km. Extensions of OI schemes to multiscales to better account for mesoscale sea surface dynamics have recently been proposed (Ardhuin et al., 2020; Ubelmann et al., 2016).

Variational DA schemes have also been widely explored for the assimilation of satellite altimeter data in ocean general circulation models (see, e.g., Ngodock et al., 2015, Benkiran et al., 2021, or Li et al., 2021). Previous works have also considered quasi-geostrophic (QG) dynamics as an approximate and reduced-order dynamical prior for sea surface dynamics, leading to state-of-the-art performance (Ubelmann et al., 2016; Le Guillou et al., 2020).

Whereas model-driven and optimal interpolation approaches are the state-of-the-art solutions for operational products, data-driven strategies have recently emerged as promising alternatives to improve the space–time resolution of interpolated products. We may cite, among others, DINEOF (Data Interpolating Empirical Orthogonal Functions; Beckers and Rixen, 2003 a; Alvera-Azcárate et al., 2005, 2009) and the analog data assimilation (AnDA; Lguensat et al., 2017; Tandeo et al., 2020) and the recent developments of deep learning schemes (Barth et al., 2019). Beauchamp et al. (2020) have reported a benchmarking experiment, which supported the relevance of data-driven schemes compared with the operational OI product.

Here, we further explore deep learning approaches, and more particularly the 4DVarNet scheme (Fablet and Chapron, 2022), which bridges variational data assimilation and deep learning. Because the analyzed state obtained from OI matches the minimization of the 3D-Var cost function, this establishes the formal link between the statistical DA framework and the optimal control theory used in the variational formulation. When adding time as an extra dimension, 4D-Var generalizes the stationary case of the 3D-Var formulation (see, e.g., Carrassi et al., 2018). It makes the 4DVarNet framework relevant for comparison with traditional DA methods. The BOOST-SWOT 2020 data challenge (https://github.com/ocean-data-challenges/2020a_SSH_mapping_NATL60, last access: 2022) provides a representative benchmarking framework to assess the performance of SSH mapping schemes for nadir-only and nadir plus SWOT (denoted as nadir+swot in the figures) altimetry datasets.

As detailed hereafter, we introduce a parameterization of the 4DVarNet scheme dedicated to SSH interpolation issues and demonstrate its relevance in the context of the benchmarking settings introduced in the BOOST-SWOT 2020 data challenge.

3 Method

This section details the proposed learning-based framework for the interpolation of satellite altimeter data. We first briefly review 4DVarNet framework recently introduced in Fablet et al. (2021) in Sect. 3.1 and present the proposed parameterization for SSH mapping from nadir and SWOT altimeter data in Sect. 3.2. We describe the resulting PyTorch package and its associated implementation details in Sect. 3.4 and the proposed learning setting in Sect. 3.3.

3.1 4DVarNet framework

The 4DVarNet framework introduced in Fablet et al. (2021) provides a generic approach for the training of 4D-Var models and solvers. They have been shown to outperform classic 4D-Var solvers for toy case studies, such as Lorenz-63 and Lorenz-96 dynamics, when considering partially observed systems. The 4DVarNet framework can be regarded as an extension that used trainable, gradient-based solvers of the deep learning scheme, which led to the best SSH interpolation performance in our previous work (Beauchamp et al., 2020).

From a methodological point of view, the 4DVarNet framework derives an end-to-end neural architecture from an underlying variational data assimilation (DA) formulation (see Sect. 2 again) as follows:

\begin{matrix} (2) & J_{Φ} (x, y, Ω) = λ_{1} | | y - H (x) | |_{Ω}^{2} + λ_{2} | | x - Φ (x) | |^{2}, \end{matrix}

where λ_1,2 are predefined or tunable scalar weights, and we replaced the Mahalanobis norms $| | . | |_{R}^{- 1}$ and $| | . | |_{Q}^{- 1}$ with a standard mean square norm for the sake of simplicity. In the regularization term, we substitute the traditional dynamical prior ℳ with a neural operator Φ based on a convolutional architecture. Then, we can exploit the automatic differentiation tools embedded in deep learning libraries to consider the following iterative gradient-based solver, denoted as Γ, for the minimization of variational cost 𝒥_Φ with regard to the state x, as follows:

\begin{matrix} (3) & \{\begin{array}{cl} g^{(i + 1)} & = LSTM [α \cdot \nabla_{x} J_{Φ} (x^{(i)}, y, Ω), h (i), c (i)] \\ x^{(i + 1)} & = x^{(i)} - T (g^{(i + 1)}) \end{array}, \end{matrix}

where LSTM denotes a convolutional long short-term memory model (see, e.g., Shi et al., 2015), α a is normalization scalar, h(i) and c(i) denote the internal states of the LSTM, and 𝒯 is a linear mapping. The key idea relies on the capability of the LSTM to learn an adaptive gradient update g⁽ⁱ⁺¹⁾ from the gradient of the variational cost ∇_x𝒥_Φ(x⁽ⁱ⁾) in order to considerably speed up the optimization and reach the corresponding optimal state. This iterative rule, based on a trainable LSTM operator, is similar to meta-learning schemes (Andrychowicz et al., 2016). Due to the ability of LSTM models to capture long-term dependencies, it results in a trainable gradient descent with momentum.

https://gmd.copernicus.org/articles/16/2119/2023/gmd-16-2119-2023-f01

Figure 1Sketch of the gradient-based algorithm. The upper-left stack of images corresponds to an example of the temporal sequence of SSH observations, with missing data used as input. The upper-right stack of images is an example of intermediate reconstruction of the SSH gradient at iteration i, while the bottom-left stack of images identifies the updated reconstruction fields used as new inputs after each iteration of the algorithm.

4DVarNet-SSH: end-to-end learning of variational interpolation schemes for nadir and wide-swath satellite altimetry

3.1 4DVarNet framework

3.2 4DVarNet-SSH parameterization

3.3 Learning setting

3.4 Implementation aspects

4.1 NATL60 dataset and case study regions

4.2 Simulated altimetry datasets

4.3 Evaluation framework

5.1 Benchmarking experiments

5.2 Impact of SWOT data on the interpolation performance

5.3 Generalization performance

5.4 Uncertainty quantification for 4DVarNet-SSH interpolations