Estimating parameters of chaotic geophysical models is challenging due to their inherent unpredictability. These models cannot be calibrated with standard least squares or filtering methods if observations are temporally sparse. Obvious remedies, such as averaging over temporal and spatial data to characterize the mean behavior, do not capture the subtleties of the underlying dynamics. We perform Bayesian inference of parameters in high-dimensional and computationally demanding chaotic dynamical systems by combining two approaches: (i) measuring model–data mismatch by comparing chaotic attractors and (ii) mitigating the computational cost of inference by using surrogate models. Specifically, we construct a likelihood function suited to chaotic models by evaluating a distribution over distances between points in the phase space; this distribution defines a summary statistic that depends on the geometry of the attractor, rather than on pointwise matching of trajectories. This statistic is computationally expensive to simulate, compounding the usual challenges of Bayesian computation with physical models. Thus, we develop an inexpensive surrogate for the log likelihood with the local approximation Markov chain Monte Carlo method, which in our simulations reduces the time required for accurate inference by orders of magnitude. We investigate the behavior of the resulting algorithm with two smaller-scale problems and then use a quasi-geostrophic model to demonstrate its large-scale application.

Time evolution of many geophysical dynamical systems is chaotic. Chaoticity means that state of a system sufficiently far in the future cannot be predicted even if we know the dynamics and the initial conditions very precisely. Commonly used examples of chaotic systems include climate, weather, and the solar system.

A system being chaotic does not mean that it is random: the dynamics of models of chaotic systems are still determined by parameters, which may be either deterministic or random

Parameters of a dynamical system model are most commonly inferred by minimizing a cost function that captures model–observation mismatch

This straightforward strategy – for instance, using the squared Euclidean distance between model outputs and data to construct a Gaussian likelihood – is, however, inadequate for chaotic models, where small changes in parameters, or even in the tolerances used for numerical solvers, can lead to arbitrarily large differences in model outputs

The present works combines two recent methods to tackle these problems due to model chaoticity and computational cost. Chaoticity is tamed by using
correlation integral likelihood (CIL)

The CIL method is based on the concept of fractal dimension from mathematical physics, which broadly speaking characterizes the space-filling properties of the trajectory of a dynamical system. Earlier work

The LA-MCMC method

The rest of this paper is organized as follows. Section

Traditional parameter estimation methods, which directly utilize the model–observation mismatch, constrain the modeling to limited time intervals when the model is chaotic. This avoids the eventual divergence (chaotic behavior) of orbits that are initially close to each other.
A classical example is variational data assimilation for weather prediction, where the initial states of the model are estimated using observational data and algorithms such as 4D-Var, after which a short-time forecast can be simulated

Sequential data assimilation methods, such as the Kalman filter (KF)

Filtering-based approaches generally introduce additional tuning parameters, such as the length of the assimilation time window, the model error covariance matrix, and covariance inflation parameters.
These choices have an impact on model parameter estimation and may introduce bias. Indeed, as discussed in

Climate model parameters have in previous studies

Computational limitations make applying algorithms such as MCMC challenging for weather and climate models.
Generating even very short MCMC chains
may require methods such as parallel sampling and early rejection for tractability

Several Monte Carlo methods have been presented to tackle expensive or intractable likelihoods; see, e.g.,

In this work, we employ a different summary statistic, where the observations are considered as samples from the underlying attractor. Due to the nature of the summary statistic used in CIL, the observation time stamps are not explicitly used. This allows arbitrarily sparse observation time series, and consecutive observations may be farther than any window where the system remains predictable. To the best of our knowledge, parameter estimation in this setting has not been discussed in the literature. Another difference with the synthetic likelihood approach is that it involves regenerating data for computing the likelihood at every new model parameter value, which would be computationally unfeasible in our setting.

We first construct a likelihood function that models the observations by comparing certain summary statistics of the observations to the corresponding statistics of a trajectory simulated from the chaotic model. As a source of statistics, we will choose the correlation integral, which depends on the fractal dimension of the chaotic attractor. Unlike other statistics – such as the ergodic averages of a trajectory – the correlation integral is able to constrain the parameters of a chaotic model

Let us denote by

Using the model–observation mismatch at a collection of times

Instead, we need a summary statistic that retains information relevant for parameter estimation but still defines a computationally tractable likelihood. To this end,

We will use the CIL to evaluate the “difference” between two chaotic attractors. For this purpose, we will first describe how to statistically characterize the geometry of a given attractor, given suitable observations

Intuitively, the CIL thus interprets observations of a chaotic trajectory as samples from a fixed distribution over phase space. It allows the time between observations to be arbitrarily large – importantly, much longer than the system's non-chaotic prediction window.

Now we describe the CIL construction in detail. Suppose that we have collected a data set

Now we define

The Gaussian distribution of

Because the feature vectors

Note that the CIL approach described above already reduces the computational cost of inference by only requiring simulation of the (potentially expensive) chaotic model for a single epoch. We compare each epoch of the data to the same single-epoch model output. Each of these comparisons results in an estimate of the log likelihood, which we then average over data epochs. A larger data set

Moreover, the initial values are randomized for all simulations and sampling is started only after the model has integrated beyond the initial, predictable, time window. The independence of the sampled parameter posteriors from the initial values was verified both here and in earlier works by repeated experiments.

Our approach is broadly similar to the synthetic likelihood method (e.g.,

Even with the developments described above, estimating the CIL at each candidate parameter value

First introduced in

Here, we briefly summarize one step of the LA-MCMC construction and refer to

“Refinement” in stage (i) consists of adding a computationally intensive log-likelihood evaluation at some parameter value

Intuitively, if the surrogate converges to the true log likelihood, then the samples generated with LA-MCMC will (asymptotically) be drawn from the true posterior distribution. After any finite number of steps, however, the surrogate error introduces a bias into the sampling algorithm. The refinement strategy must therefore ensure that this bias is not the dominant source of error. At the same time, refinements must occur infrequently to ensure that LA-MCMC is computationally cheaper than using the true log likelihood.

Our examples use an adaptive proposal density

The parameters of the algorithm are fixed as given in

This section contains numerical experiments to illustrate the methods introduced in the previous sections. As a large-scale example, we characterize the posterior distribution of parameters in the two-layer quasi-geostrophic (QG) model. The computations needed to characterize the posterior distribution with standard MCMC methods in this example would be prohibitive without massive computational resources and are therefore omitted. In contrast, we will show that the LA-MCMC method is able to simulate from the parameter posterior distribution.

Before presenting this example, we first demonstrate that the posteriors produced by LA-MCMC agree with those obtained via exact MCMC sampling methods in cases where the latter are computationally tractable using two examples: the classical Lorenz 63 system and the higher-dimensional Kuramoto–Sivashinsky (KS) model. In both examples, we quantify the computational savings due to LA-MCMC, and in the second we introduce additional ways to speed up computation using parallel (GPU) integration.

Let

For numerical tests, one can either use one long time series or integrate a shorter time interval several times using different initial values to create the training set for the likelihood.
For these experiments, the latter method was used with

The range of the bin radii

As always with histograms, the number of bins

To balance the possibly different magnitudes of the components of the state vector, each component is scaled and shifted to the range

In all the three experiments, we create MCMC chains of length

The Lorenz 63 model was integrated with a standard Runge–Kutta solver. The numerical solution of the KS-model is based on our in-house fast Fourier transform (FFT)-based solver, which runs on the GPU side and is built around Nvidia compute unified device architecture (CUDA) toolchain and cuFFT library (which is a part of the CUDA ecosystem). The quasi-geostrophic model employs semi-Lagrangian solver and runs entirely on CPU, but the code has been significantly optimized with performance-critical parts, such as advection operator, compiled using an Intel single program compiler (ISPC) with support of Advanced Vector Extensions 2 (AVX2) vectorization.

We use the classical three-dimensional Lorenz 63 system

The reference data were generated with parameter values

The set of vectors

Left: for all combinations of

Two-dimensional posterior marginal distributions of the parameters of the Lorenz 63 model obtained with LA-MCMC and AM.

Pairwise two-dimensional marginals of the parameter posterior are shown in Fig.

To get an idea of the computational savings achieved with LA-MCMC, the computation of the MCMC chains of length

Comparison of the cumulative number of full likelihood evaluations while using AM (black line) and LA-MCMC (colored lines). Every colored line correspond to a different chain obtained with LA-MCMC by using the same likelihood.

The second example is the 256-dimensional Kuramoto–Sivashinsky (KS) partial differential equation (PDE) system.
The purpose of this example is to introduce ways to improve the computational efficiency by a piecewise parallel integration over the time interval of given data. Also,
we demonstrate how decreasing the number of observed components impacts the accuracy of parameter estimation.
Even though the posterior evaluation proves to be relatively expensive, direct verification of the results with those obtained by using standard adaptive MCMC is still possible.
The Kuramoto–Sivashinsky model is given by the fourth-order PDE:

Assume that the solution for this problem can be represented by a truncation of the Fourier series

A total of 64 epochs of the 256-dimensional KS model are integrated over the time interval

The time needed to integrate the model up to

Parameter values of the four parameter vectors used in the forward KS model simulation examples in Fig.

Parameter posterior distributions from the KS system, produced with MCMC both with and without the local approximation surrogate, are shown in Fig.

Model trajectories from simulations with four different parameter vectors are shown in Fig.

Posterior distribution of the parameters of the KS system. The parameter values are shown in Table

Example model trajectories from the KS system. Panel (1) shows simulation using the true parameters, the parameters used for (4) are inside the posterior distribution, and (2) and (3) are generated from simulations with parameters outside the posterior distribution, shown in Fig.

Additional experiments were performed to evaluate the stability of the method when not all of the model states were observed. Keeping the setup otherwise fixed, the number of elements of the state vectors observed was reduced from the full 256 step by step to 128, 64, and 32.
The resulting MCMC chains are presented in Fig.

Comparison between the KS system's posterior distribution in cases where all or only a part of the states are observed.

The methodology is here applied to a computationally intensive model, where a brute-force parameter posterior estimation would be too time consuming. We employ the well-known quasi-geostrophic model

The QG model approximates the behavior on a latitudinal “stripe” at two given atmospheric heights, projected onto a two-layered cylinder.
The model geometry implies periodic boundary conditions, seamlessly stitching together the extreme eastern and western parts of the rectangular spatial domain with coordinates

An example of the layer structure of the two-layer quasi-geostrophic model. The terms

In a non-dimensional form, the QG system can be written as

An example of the 6050-dimensional state of the quasi-geostrophic model. The contour lines for both the stream function and potential vorticity are shown for both layers. Note the cylindrical boundary conditions.

It is assumed that the motion determined by the model is geostrophic, essentially meaning that potential vorticity of the flow is preserved on both layers:

The numerical integration of this system is carried out using the semi-Lagrangian scheme,
where the potential vorticities

Finally, the velocity field is updated by Eq. (

For estimating model parameters from synthetic data, a reference data set is created with 64 epochs each containing

The model state is characterized by two distinct fields, the vorticities and stream functions, that naturally are dependent on each other. As shown in

The Gaussian likelihood of the state is created by stacking these two feature vectors one after another.

The normality of the resulting

For parameter estimation, inferring the layer heights from synthetic data is considered. The reference data set with

In the experiments performed, the number of forward model evaluations needed was ranging in the interval

The clearly non-Gaussian posterior distribution of the

Bayesian parameter estimation with computationally demanding computer models is highly non-trivial. The associated computational challenges often become insurmountable when the model dynamics are chaotic. In this work, we showed it is possible to overcome these challenges by combining the CIL with an MCMC method based on local surrogates of the log-likelihood function (LA-MCMC). The CIL captures changes in the geometry of the underlying attractor of the chaotic system, while local approximation MCMC makes generating long MCMC chains based on this likelihood tractable, with computational savings of roughly 2 orders of magnitude, as shown in Table

Summary of results. This table shows the speed-up due to the CIL/LA-MCMC combination. Since running the quasi-geostrophic model 100 000 times was not possible, the nominal length of the MCMC chain and the speed-up due to LA-MCMC are reported in parentheses in the last column. The numbers of forward model evaluations with LA-MCMC (second row) are rough averages over several MCMC simulations.

There are many potential directions for extension of this work. First, it should be feasible to run parallel LA-MCMC chains that share model evaluations in a single evaluated set; doing so can accelerate the construction of accurate local surrogate models, as demonstrated in

While answering these questions will require further work, we believe the research presented in this paper provides a promising and reasonable step towards estimating parameters in the context of expensive operational models.

The MATLAB code that documents the CIL and LA-MCMC approaches is available in the Supplement. Forward model code for performing model simulations (Lorenz, Kuramoto–Sivashinsky, and quasi-geostrophic model) is also available in the Supplement.

The data were created using the code provided in the Supplement.

The supplement related to this article is available online at:

HH and YM designed the study with input from all authors. SS, HH, AD, and JS combined the CIL and LA-MCMC methods for carrying out the research. AB wrote and provided implementations of the KS and QG models for GPUs, including custom numerics and testing. SS wrote the CIL code and the version of LA-MCMC used (based on earlier work by Antti Solonen), and carried out the simulations. All authors discussed the results and shared the responsibility of writing the manuscript. SS prepared the figures.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the Centre of Excellence of Inverse Modelling and Imaging (CoE), Academy of Finland, decision no. 312 122. Sebastian Springer was supported by the Academy of Finland, project no. 334 817. Youssef Marzouk and Andrew Davis were supported by the US Department of Energy, Office of Advanced Scientific Computing Research (ASCR), SciDAC program, as part of the FASTMath Institute.

This research has been supported by the Academy of Finland (grant no. 312122).

This paper was edited by Rohitash Chandra and reviewed by two anonymous referees.