These authors contributed equally to this work.

Geoscientific models are facing increasing challenges to exploit growing datasets coming from remote sensing. Universal differential equations (UDEs), aided by differentiable programming, provide a new scientific modelling paradigm enabling both complex functional inversions to potentially discover new physical laws and data assimilation from heterogeneous and sparse observations. We demonstrate an application of UDEs as a proof of concept to learn the creep component of ice flow, i.e. a nonlinear diffusivity differential equation, of a glacier evolution model. By combining a mechanistic model based on a two-dimensional shallow-ice approximation partial differential equation with an embedded neural network, i.e. a UDE, we can learn parts of an equation as nonlinear functions that then can be translated into mathematical expressions. We implemented this modelling framework as

In the past decade, remote sensing observations have sparked a revolution in scientific computing and modelling within Earth sciences, with a particular impact on the field of glaciology

Universal differential equations (UDEs;

Basic representation of universal differential equations (UDEs) and their associated modelling philosophy. UDEs sit at the intersection of physical domain knowledge, represented by differential equations, numerical methods used to solve the differential equations and data-driven models, often represented as machine learning.

In glaciology, models have not escaped these general trends. As for machine learning in general, classification methods have been more popular than regression methods (e.g.

In terms of data assimilation and model parameter calibration, many different approaches to obtain differentiable glacier models have been developed

We present an application of universal differential equations, i.e. a differential equation with an embedded function approximator (e.g. a neural network). For the purpose of this study, this NN is used to infer a prescribed artificial law determining the ice creep coefficient in Glen's law

In this section we introduce the partial differential equation (PDE) describing the flow of ice through the SIA equation, and we present its equivalent universal differential equation (UDE) with an embedded neural network.

We consider the SIA equation to describe the temporal variation in the ice thickness

Gradients here always refer to the spatial derivatives in

Importantly for our analysis, some of the coefficients that play a central role in the ice flow dynamics of a glacier (e.g.

An important property of the SIA equation is that the ice surface velocity

In the last few years there has been an increasing interest in transitioning physical models to a data-driven domain, where unknowns in the laws governing the physical system of interest are identified via the use of machine learning algorithms. The philosophy behind universal differential equations is to embed a rich family of parametric functions inside a differential equation so that the base structure of the differential equation is preserved, but more flexibility is allowed in the model at the moment of fitting observed data. This family of functions, usually referred to as the universal approximator because of their ability to approximate a large family of functions, includes, among others, neural networks, polynomial expansions and splines. An example of this is a universal ordinary differential equation

In this study, the function

We consider a simple synthetic example where we fix

Overview of

The combination of Python tools from the Open Global Glacier Model (OGGM) with the UDE glacier modelling framework in Julia has resulted in the creation of a new Julia package named

The following data are used for the initial conditions of simulated glaciers: a digital elevation model (DEM) for the glacier surface elevation

In order to solve Eq. (

In order to create conditions similar to those one would encounter when using remote sensing observations for the functional inversions, we add Gaussian noise with a mean of 0 and standard deviation of

For the SIA UDE, we substitute

In order to compute the glacier surface mass balance (i.e. a source/sink) for both the PDEs and the UDEs, we used a very simple temperature-index model with a single melt factor and a precipitation factor set to 5 mm d

In order to find a good compromise between computational efficiency and memory usage, we preprocess raw climate files from W5E5

In order to minimize the loss function from Eq. (

Here we compare the evaluation of the gradients using a continuous adjoint method integrated with automatic differentiation and a hybrid method that combines automatic differentiation with finite differences.

For the first method based on pure automatic differentiation, we used the

The second method consists of using AD just for the neural network and finite differences for capturing the variability in the loss function with respect to the parameter

Once the gradient has been computed by one of the previous methods, optimization of the total loss function without any extra regularization penalty to the weights in the loss function is performed using a Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimizer with parameter

In recent years, scientific workflows in Earth and environmental sciences have benefited from transitioning from local to cloud computing

Despite its apparent simplicity, it is not a straightforward problem to invert the spatial function of

Training the UDE with full batches using the continuous adjoint method described in Appendix B2 converges in around 20 epochs. The NN is capable of successfully approximating the prescribed nonlinear function of

We also compared the efficiency of our approach when using the finite-difference scheme. Since this does not require heavy backward operations as the continuous adjoint method does, the finite-difference method runs faster (around 1 min per epoch). However, we encountered difficulties in picking the right step size

The addition of surface mass balance (i.e. a source/sink) to the SIA equation further complicates things for the functional inversion, particularly from a computational point of view. The accumulation and ablation (subtraction) of mass on the glacier introduces additional noise to the pure ice flow signal. The mass loss in the lower elevations of the glacier slows down ice flow near the tongue, whereas the accumulation of mass in the higher elevations accelerates the ice flow on the upper parts of the glacier.

As an experiment to test the robustness of the functional inversions made by the UDE, we used different surface mass balance models for the reference simulation (i.e. the ground truth) and the UDE. This means that the surface mass balance signal during training is totally different from the one in the ground truth. We achieved this by using a temperature-index model with a melt factor of 4 mm d

This shows the robustness of this modelling approach, particularly when the ice surface velocities

Differences in surface elevation for a 5-year simulation, coming from the different applied surface mass balance rates, between the ground truth data and the training of the UDE. Despite the noise coming from the different surface mass balance signals, the UDE is perfectly capable of learning the underlying nonlinear function of

This first implementation of a UDE on glacier ice flow modelling serves as a baseline to tackle more complex problems with large datasets. One main simplification of this current setup needs to be overcome in order to make the model useful at a global scale for exploring and discovering empirical laws. In this study, only ice deformation (creep) has been taken into account in the diffusivity. Basal sliding, at the ice–bedrock interface, will have to be included in the SIA equation to accommodate different configurations and behaviours of many glaciers around the world. Therefore, a logical next step would be to infer

Interestingly, depending on the ice surface velocity observations used, the need for a numerical solver and a UDE is not imperative for a functional inversion. For a single snapshot of ice surface velocities between two dates (e.g. 2017–2018 in

Alternatively, the optimization of the NN for ice rheology inference based on ice surface velocities has proved to be robust to the noise added by the SMB component. This serves to validate an alternative glacier ice dynamic model calibration strategy to that of the majority of large-scale glacier models (e.g. OGGM and GloGEM;

Automatic differentiation is the centrepiece of the modelling framework presented in this study. In the Julia programming language, multiple AD packages exist, which are compatible with both differential equation and neural network packages, as part of the SciML ecosystem. Each package has advantages and disadvantages, which make them suitable for different tasks. In our case,

Overall, the vision on AD from Julia is highly ambitious, attempting to perform AD directly on source code, with minimal impact on the user side and with the possibility of easily switching AD back ends. From a technical point of view, this is much more complex to achieve than hard-coded gradients linked to specific operators, an approach followed by JAX

In this work, we model glacier ice flow using a two-dimensional SIA equation described by Eq. (

A recent alternative to such a computationally heavy approach is the use of convolutional neural networks as emulators for the solutions of the differential equations of a high-order ice flow model

The combination of solvers for differential equations with modern machine learning techniques opens the door to new methodological questions that include the standard ones about the design of the machine learning method (loss function, optimization method, regularization) but also new phenomena that emerge purely by the use of numerical solutions of differential equations in the loss function. Although this intersection between data science and dynamical systems has been widely explored (see

Another interesting question pertains to the training and regularization of UDEs and related physics-informed neural networks. During training, we observed that the neural network never overfitted the noisy version of prescribed law

Despite the ever-increasing numbers of new Earth observations coming from remote sensing, it is still extremely challenging to translate complex, sparse, noisy data into actual models and physical laws. Paraphrasing

We presented

This study can serve as a baseline for other researchers interested in applying UDEs to similar nonlinear diffusivity problems. It also provides a code base to be used as a backbone to explore new parametrizations for large-scale glacier modelling, such as for glacier ice rheology, basal sliding or more complex hybrid surface mass balance models.

Solving Eq. (

We consider a uniform grid on points

An important consideration when working with numerical schemes for differential equations is the stability of the method. Here we consider just explicit methods, although the spatial discretization is the same for implicit methods. Explicit methods for the SIA equation are conditionally stable. In order to get stability, we need to undertake the following

Evaluate the diffusivity in a staggered grid

Choose a temporal step size

Assume we know the value of

Compute (another) average diffusivity but now on the edges of the primal grid (blue arrows in Figure

Compute the diffusive part of the SIA equations on the point in the primal grid

Update the value of

Staggered grid used to solve the shallow-ice approximation PDE. The black squares represent the primal grid, empty circles the staggered grid, diamonds the points in the grid where the diffusivity evaluated in the staggered grid is averaged (Eqs.

In practice, the step size

In this section we provide a high-level explanation of the two methods we used to compute the gradient of functions involving solutions of differential equations, namely finite differences and continuous adjoint sensitivity analysis. Consider a system of ordinary differential equations given by

The simplest way of evaluating a derivative is by computing the difference between the evaluation of the function at a given point and a small perturbation of the function. In the case of a loss function, we can approximate

However, there are a series of problems associated with this approach. The first one is how this scales with the number of parameters

Consider an integrated loss function of the form

solve the original differential equation

solve the backward adjoint differential equation Eq. (

compute the simplified version of the full gradient in Eq. (

Table C1 includes all the details of the glaciers used in this study to train the UDE. Glaciers were picked randomly across the world to sample different climates with long-term air temperatures ranging from

Table of glaciers used for training the UDE. Grid size and grid res (i.e. resolution) indicate the adaptive constant grid used by OGGM to adapt all gridded data for each glacier.

The source code of

JB conceived the project, designed the study, developed the model, wrote the paper and made the figures. FS designed the study, developed the model, investigated the sensitivity methods and wrote the paper. FM helped with retrieving the datasets to force the model with OGGM, contributed to the coupling between both models and provided glaciological advice. RL helped with the experiment design and technical choices. BW provided glaciological advice and helped design the project. FP provided advice on the methods and software development and helped design the project. All authors contributed to the writing of the paper by providing comments and feedback.

At least one of the (co-)authors is a member of the editorial board of

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

We would like to thank Kurt Cuffey for valuable discussions and comments on glacier modelling and physics; Harry Zekollari for all the conversations and help related to large-scale glacier modelling; Per-Olof Persson for the discussions on differential equations and numerical solvers; Giles Hooker for useful feedback regarding the statistical analysis; Chris Rackauckas for all the insights and discussions regarding scientific machine learning in the Julia ecosystem; and the Julia community for the technical support, bug hunting, and the interesting discussions in the Julia Slack and Discourse. We also thank the Jupyter Meets the Earth and 2i2c team (Erik Sundell, Yuvi Panda and many others) for helping with the infrastructure of the JupyterHub. Jordi Bolibar would like to thank CNRS – UGA, Institut des Géosciences de l'Environnement, UMR 5001, in Grenoble (France) for hosting him as an invited researcher. We thank Douglas Brinkerhoff and the anonymous reviewer for their constructive comments that helped improve the quality of this paper.

This research has been supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek, Stichting voor de Technische Wetenschappen (Vidi grant 016.Vidi.171.063) and the National Science Foundation (EarthCube programme under awards 1928406 and 1928374).

This paper was edited by Ludovic Räss and reviewed by Douglas Brinkerhoff and one anonymous referee.