Efficient Bayesian inference for large chaotic dynamical systems

Estimating parameters of chaotic geophysical models is challenging due to these models’ inherent unpredictability. With temporally sparse long-range observations, these models cannot be calibrated using standard least squares or filtering methods. Obvious remedies, such as averaging over temporal and spatial data to characterize the mean behavior, do not capture the subtleties of the underlying dynamics. We 5 perform Bayesian inference of parameters in high-dimensional and computationally demanding chaotic dynamical systems by combining two approaches: (i) measuring model-data mismatch by comparing chaotic attractors, and (ii) mitigating the computational cost of inference by using surrogate models. Specifically, we construct a likelihood function suited to chaotic models by evaluating a distribution over distances between points in the phase space; this distribution defines a summary statistic that depends on the attractor’s 10 geometry, rather than on pointwise matching of trajectories. This statistic is computationally expensive to simulate, compounding the usual challenges of Bayesian computation with physical models. Thus we develop an inexpensive surrogate for the log-likelihood via local approximation Markov chain Monte Carlo, which in our simulations reduces the time required for accurate inference by orders of magnitude. We 1 https://doi.org/10.5194/gmd-2020-350 Preprint. Discussion started: 26 October 2020 c © Author(s) 2020. CC BY 4.0 License.


Abstract.
Estimating parameters of chaotic geophysical models is challenging due to these models' inherent unpredictability. With temporally sparse long-range observations, these models cannot be calibrated using standard least squares or filtering methods. Obvious remedies, such as averaging over temporal and spatial data to characterize the mean behavior, do not capture the subtleties of the underlying dynamics. We 5 perform Bayesian inference of parameters in high-dimensional and computationally demanding chaotic dynamical systems by combining two approaches: (i) measuring model-data mismatch by comparing chaotic attractors, and (ii) mitigating the computational cost of inference by using surrogate models. Specifically, we construct a likelihood function suited to chaotic models by evaluating a distribution over distances between points in the phase space; this distribution defines a summary statistic that depends on the attractor's 10 geometry, rather than on pointwise matching of trajectories. This statistic is computationally expensive to simulate, compounding the usual challenges of Bayesian computation with physical models. Thus we develop an inexpensive surrogate for the log-likelihood via local approximation Markov chain Monte Carlo, which in our simulations reduces the time required for accurate inference by orders of magnitude. We data assimilation for weather prediction, where the initial states of the model are estimated using observational data and algorithms such as 4DVar, after which a short-time forecast can be simulated (Asch et al.,40 2016).
Sequential data assimilation methods, such as the Kalman filter (Law et al., 2015), allow us to recursively update both the model state and the model parameters by conditioning them on observational data obtained over sufficiently short time scales. Here, ensemble filtering methods provide a useful alternative to the extended Kalman filter or variational methods; see Houtekamer and Zhang (2016) for a recent review of 45 various ensemble variants. With methods such as state augmentation, model parameters can be updated as part of the filtering problem (Liu and West, 2001). Alternatively, the state values can be "integrated out" to obtain the marginal likelihood over the model parameters; see Durbin and Koopman (2012). In Hakkarainen et al. (2012), this so-called filter likelihood approach was used to estimate parameters of chaotic systems. Yet essentially all filtering-based approaches introduce additional tuning parameters, such 50 as the length of the assimilation time window, the model error covariance matrix, and covariance inflation parameters. These choices have an impact on model parameter estimation and may introduce bias. Indeed, as discussed in Hakkarainen et al. (2013), changing the filtering method requires updating the parameters of the dynamical model. Operational ensemble prediction systems (EPS) can be used even in the absence of ensemble filtering. Parameter calibration methods such as those in Jarvinen et al. (2011); Laine et al. (2011) 55 have been applied to the Integrated Forecast System (IFS) weather models (Ollinaho et al., , 2013(Ollinaho et al., , 2014 at European Centre for Medium-Range Weather Forecasts (ECMWF). However, these approaches are heuristic and again limited to relatively short predictive windows. Previous work (for example, Roeckner et al. (2003); ECMWF (2013); Stevens et al. (2013)) calibrates parameters of climate models by matching summary statistics of quantities of interest, such as top-of-60 atmosphere radiation, with the corresponding summary statistics from re-analysis data or output from competing models. The vast majority of these approaches produce only point estimates. Järvinen et al. (2010) infers the closure parameters of a large-scale computationally intensive climate model, ECHAM5, using a Bayesian approach based on several summary statistics and realized via MCMC. However, computational limitations make applying algorithms such as MCMC challenging. Even short MCMC chains of 65 roughly 2000 iterations require methods such as parallel sampling and early rejection for tractability (Solo-nen et al., 2012). Moreover, even if these computational challenges can be overcome, finding statistics that actually constrain the parameters is difficult, and inference results can be thus be inconclusive. The failure of the summary statistic approach in Järvinen et al. (2010) can be explained intuitively: the chosen statistics average out too much information, and therefore fail to characterize the geometry of the underlying chaotic 70 attractor in a meaningful way.
The present work combines two recent approaches that together enable a Bayesian approach to inference and uncertainty quantification for the parameters of chaotic high-dimensional models. The correlation integral likelihood (CIL) (Haario et al., 2015) is able to constrain the parameters of chaotic dynamical systems, and the local approximation MCMC (LA-MCMC) method for posterior sampling (Conrad et al.,75 2016) makes asymptotically exact posterior characterization feasible, even with computationally expensive models.
The CIL method is based on the concept of fractal dimension from mathematical physics, which, broadly speaking, characterizes the space-filling properties of the trajectory of a dynamical system. Earlier work (e.g., Cencini et al. (2010)) describes a number of different approaches for estimating the fractal dimension.

80
Our previous work extends this concept: instead of computing the fractal dimension of a single trajectory, a similar computation measures the distance between different model trajectories (Haario et al., 2015). The modification provides a normally distributed summary statistic of the data, which is sensitive to changes in the underlying attractor from which the data was sampled. Statistics that better describe the attractor yield likelihood functions that can better constrain the model parameters and, therefore, more meaningful 85 parameter posterior distributions.
A related approach, discussed in the context of intractable likelihoods, is Bayesian inference using synthetic likelihoods, proposed as an alternative to approximate Bayesian computation (ABC); see Wood (2010) and Price et al. (2018). The method we present here uses a different summary statistic. Moreover, the synthetic likelihood approach involves re-creating the likelihood for every new model parameter value, 90 which would require excessive CPU times in our setting. Recent work by Morzfeld et al. (2018) describes another feature-vector approach for data assimilation. For more details and comparisons among these approaches, see the discussion below in Section 2.1.
The LA-MCMC method (Conrad et al., 2016(Conrad et al., , 2018 approximates the computationally expensive loglikelihood function using local polynomial regression. In this method, the MCMC sampler directly uses 95 the approximation of the log-likelihood to construct proposals and evaluate the Metropolis acceptance probability. Infrequently but regularly adding "full" likelihood evaluations to the point set used to construct the local polynomial regression continually improves the approximation, however. Expensive full likelihood evaluations are thus used only to construct the approximation or "surrogate" model. Conrad et al. (2016) show that, given an appropriate mechanism for triggering likelihood evaluations, the resulting 100 Markov chain converges to the true posterior distribution while reducing the number of expensive likelihood evaluations (and hence forward model simulations) by orders of magnitude. Davis et al. (2020)  here.
The rest of this paper is organized as follows. Section 2 describes the methodologies used in this work, including the CIL, the stochastic LA-MCMC algorithm, and the merging of these two approaches. Section 3 is dedicated to numerical experiments. The approach is first verified by comparing results in cases where the parameter posteriors can also be obtained with standard sampling methods. Further modifications to reduce 110 computational demand of evalutating the CIL are presented. A fine-grid version of the quasi-geostrophic (QG) model is then used to demonstrate that model identification is possible in long-range simulations, beyond reach of standard methods. These examples are followed by a concluding discussion in Section 4.

115
We first construct a likelihood function that models the observations by comparing certain summary statistics of the observations to the corresponding statistics of a trajectory simulated from the chaotic model. As a source of statistics, we will choose the correlation integral, which depends on the fractal dimension of the chaotic attractor. Unlike other statistics-such as the ergodic averages of a trajectory -the correlation integral is able to constrain the parameters of a chaotic model (Haario et al., 2015). Let us denote by a dynamical system with state u(t) ∈ R n , initial state u 0 ∈ R n , and parameters θ ∈ R q . The time-discretized system, with time steps t i ∈ {t 1 , . . . , t τ } denoting selected observation points, can be written as

125
Either the full state u i ∈ R n or a subset s i ∈ R d≤n of the state components are observed. We will use S = {s 1 , . . . , s τ } to denote a collection of these observables at successive times.
Using the model-observation mismatch at a collection of times t i to constrain the value of the parameters θ is not suitable when the system (1) has chaotic dynamics, since the state vector values s i are unpredictable after a finite time interval. Though long-time trajectories s(t) of chaotic systems are not predictable in the 130 time domain, they do, however, represent samples from an underlying attractor in the phase space. The states are generated deterministically, but the model's chaotic nature allows us to interpret the states as samples from a particular θ-dependent distribution. Yet obvious choices for summary statistics T that depend on the observed states S, such as ergodic averages, ignore important aspects of the dynamics and are thus unable to constrain the model parameters. For example, the statistic T (S) = 1 τ τ i=1 s i is easy to 135 compute and is normally distributed in the limit τ → ∞ (under appropriate conditions), but this ergodic mean says very little about the shape of the chaotic attractor.
Instead, we need a summary statistic that retains information relevant for parameter estimation, but still defines a computationally tractable likelihood. To this end, Haario et al. (2015) devised the correlation integral likelihood (CIL), which retains enough information about the attractor to constrain the model 140 parameters. We first review the CIL and then discuss how to make evaluation of the likelihood tractable.
We will use the CIL to evaluate the "difference" between two chaotic attractors. For this purpose, we will first describe how to statistically characterize the geometry of a given attractor, given suitable observations S. In particular, constructing the CIL likelihood will require three steps: (i) computing distances between observables sampled from a given attractor; (ii) evaluating the empirical cumulative distribution function 145 (ECDF) of these distances and deriving certain summary statistics T from the ECDF; and (iii) estimating the mean and covariance of T by repeating steps (i) and (ii). Intuitively, the CIL thus interprets observations of a chaotic trajectory as samples from a fixed distribution over phase space. It allows the time between observations to be arbitrarily large-importantly, much longer than the system's non-chaotic prediction window.

150
Now we describe the CIL construction in detail. Suppose that we have collected a data set S comprising observations of the dynamical system of interest. Let S be split into n epo different subsets called epochs.
The epochs can, in principle, be any subsets of length N from the reference data set S. In this paper, however, we restrict the epochs to be time-consecutive intervals of N evenly spaced observations. Let , respectively. In other words, superscripts refer to different epochs and subscripts refer to the time points within those epochs. Haario et al. (2015) then define the modified correlation integral sum C(R, N, s k , s l ) by counting all pairs of observations that are less than a distance R > 0 from each other: where 1 denotes the indicator function and · is the Euclidean norm on R d . In the physics literature, evaluating Eq. (3) in the limit R → 0, with k = l and i = j, numerically approximates the fractal dimension of the attractor that produced s k = s l (Grassberger and Procaccia, 1983a, b). Here, we instead use Eq. (3) to characterize the distribution of distances between s k and s l at all relevant scales. We assume that the 165 state space is bounded; therefore, an R 0 covering all pairwise distances in Eq.
(3) exists. For a prescribed (3) defines a discretization of the empirical CDF of the distances s k i − s l j , with discretization boundaries given by the numbers R m . Now we define y k,l m = C(R m , N, s k , s l ) as components of a statistic T (s k , s l ) = y k,l := (y k,l 0 , . . . , y k,l M ). This statistic is also called the feature vector. According to Borovkova et al. (2001);Neumeyer (2004), 170 the vectors y k,l are normally distributed. This is a generalization of the classical result of Donsker (1951), which applies to i.i.d. samples from a scalar-valued distribution. We characterize this normal distribution by subsampling the full data set S. Specifically, we approximate the mean µ and covariance Σ of T by the sample mean and sample covariance of the set {y k,l : 1 ≤ k, l ≤ n epo , k = l}, evaluated for all 1 2 n epo (n epo −1) pairs of epochs (s k , s l ) using fixed values of R 0 , b, M , and N .

175
The Gaussian distribution of T effectively characterizes the geometry of the attractor represented in the data set S. Now we wish to use this distribution to infer the parameters θ. Given a candidate parameter valueθ, we use the model to generate states s * for the length of a single epoch. We then evaluate the statistics y k, * m = C(R m , N, s k , s * (θ)) as in Eq. (3), by computing the distances between elements of s * (θ) and the states of an epoch s k selected from the data S. Combining these statistics into a 180 feature vector y k, * (θ) = (y k, * m ) M m=0 , we can write a noisy estimate of the log-likelihood function: Comparing s * (θ) with other epochs drawn from the data set S, however, will produce different realizations of the feature vector. We thus average the resulting log-likelihoods over all epochs,

185
This averaging, which involves evaluating Eq. (4) n epo times, involves only new distance computations and is thus relatively cheap relative to time integration of the dynamical model.
Because the feature vectors y k, * are random for any finite N , and because the number of epochs n epo is also finite, the log-likelihood in Eq. (5) is necessarily random. It is then useful to view Eq. (5) as estimate of an underlying true log-likelihood. We are therefore in a setting where cannot evaluate the unnormalized 190 posterior density exactly; we only have access to a noisy approximation of it. Previous work (Springer et al., 2019) has demonstrated that derivative-free optimizers such as the differential evolution (DE) algorithm can successfully identify the posterior mode in this setting, yielding a point estimate of θ. In the fully Bayesian setting, one could characterize the posterior p(θ|S) using pseudo-marginal MCMC methods (Andrieu and Roberts, 2009), but at significant computational expense. Below, we will use a surrogate model constructed Our approach is broadly similar to the synthetic likelihood method (e.g., Wood (2010); Price et al.
( 2018)), but differs in two key respects: (i) we use a novel summary statistic that is able to characterize 205 chaotic attractors, and (ii) we only need to evaluate the forward model for a single epoch. Comparatively, synthetic likelihoods typically use summary statistics such as auto-covariances at a given lag or regression coefficients. These methods also require long-time integration of the forward model for each candidate parameter value θ, rather than integration for only one epoch. Morzfeld et al. (2018) also discuss several ways of using feature vectors for inference in geophysics. A distinction of the present work is that we use 210 an ECDF-based summary statistic that is provably Gaussian, and we perform extensive Bayesian analysis of the parameter posteriors via novel MCMC methods. These methods are described next.

Local Approximation MCMC
Even with the developments described above, estimating the correlation integral likelihood at each candidate parameter valueθ can still be computationally intensive. We thus introduce a surrogate modeling 215 method that replaces many of these CIL evaluations with an inexpensive approximation, while still providing convergence guarantees. This method is called local approximation MCMC (LA-MCMC). Metropolis-Hastings MCMC is that the acceptance probability in stage (iii) is computed only using the approximation or surrogate model of the log-likelihood, at both the current and proposed states. This introduces an error, relative to computation of the acceptance probability with exact likelihood evaluations, but 235 stage (i) of the algorithm is designed to control and incrementally reduce this error at the appropriate rate.
"Refinement" in stage (i) consists of adding a computationally intensive log-likelihood evaluation at some parameter value θ i , denoted by L(θ i ), to the evaluated set {(θ i , L(θ i ))} K i=1 . These K pairs are used to construct the local approximation via a kernel-weighted local polynomial regression (Kohler, 2002). The values {θ i } K i=1 are called "support points" in this paper. Details on the regression formulation are in Davis 240 et al. (2020); Conrad et al. (2016). As the support points cover the regions of high posterior probability more densely, the accuracy of the local polynomial surrogate will increase. This error is well understood (Kohler, 2002;Conn et al., 2009) and, crucially, takes advantage of smoothness in the underlying true log-likelihood function. This smoothness ultimately allows the cardinality of the evaluated set to be much smaller than the number of MCMC steps.

245
Intuitively, if the surrogate converges to the true log-likelihood, then the samples generated with LA-MCMC will (asymptotically) be drawn from the true posterior distribution. After any finite number of steps, however, the surrogate error introduces a bias into the sampling algorithm. The refinement strategy must therefore ensure that this bias is not the dominant source of error. At the same time, refinements must occur infrequently, to ensure that LA-MCMC is computationally cheaper than using the true log-250 likelihood. Davis et al. (2020) analyzes the trade-off between surrogate-induced bias and MCMC variance and proposes a rate-optimal refinement strategy. We use essentially the same algorithm here, but add an isotropic 2 penalty on the polynomial coefficients of the kernel-weighted local regression problem solved to evaluate the surrogate at any parameter value θ; in other words, we perform local ridge regression rather than ordinary least squares, which improves performance with noisy likelihoods.

255
For pseudocode of the full LA-MCMC algorithm, we refer to Davis et al. (2020, Algorithm 1). Parameters of the algorithm are as follows: (i) initial error threshold γ 0 = 1; (ii) error threshold decay rate γ 1 = 1.0; (iii) maximum poisedness constantΛ = 100; (iv) tail-correction parameter η = 0 (no tail correction); (v) local polynomial degree p = 2. The number of nearest neighbors k used to construct each local polynomial surrogate is chosen to be k = k 0 + (K − k 0 ) 1/3 where k 0 = √ qD, q is the dimension of the parameters 260 θ ∈ R q , and D is the number of coefficients in the local polynomial approximation of total degree p = 2, i.e., D = (q + 2)(q + 1)/2. If we had k = D, the approximation would be an interpolant. Instead we oversample by a factor √ q, as suggested in Conrad et al. (2016), and allow k to grow slowly with the size K of the evaluated set as in Davis (2018).

Numerical experiments 265
This section contains numerical experiments to illustrate the methods introduced in the previous sections.
As a large-scale example, we characterize the posterior distribution of parameters in the two-layer quasigeostrophic (QG) model. The computations needed to characterize the posterior distribution with standard MCMC methods in this example would be prohibitive without massive computational resources and are therefore omitted. In contrast, we will show that the LA-MCMC method is able to simulate from the In all the three experiments we create MCMC chains of length 10 5 . However, due to the use of the LA-MCMC approach, the number of full forward model evaluations is much lower, around 1000 or less; we will report these values more specifically below.

Lorenz 63
We use the classical three-dimensional Lorenz 63 system (Lorenz, 1963) as a simple first example to demonstrate how LA-MCMC can be successfully paired with the CIL and the Adaptive Metropolis (AM) algorithm (Haario et al., 2001(Haario et al., , 2006 to obtain the posterior distribution for chaotic systems at a greatly reduced computational cost, compared to AM without the local approximation. The time evolution of the 310 state vector s = (X, Y, Z) is given bẏ This system of equations is often said to describe an extreme simplification of a weather model.

315
The reference data were generated with parameter values σ = 10, ρ = 28, and β = 8 3 by performing n epo = 64 distinct model simulations, with observations made at 2000 evenly distributed times between [10,20000]. These observations were perturbed with 5% multiplicative Gaussian noise. The length of the predictable time window is roughly 7, which is less than the time between consecutive observations. The parameters of the CIL method were obtained as described at the start of Section 3, with values M = 14, 320 R 0 = 2.85, and b = 1.51.
The set of vectors {y k,l |k, l ≤ n epo } is shown in Fig. 1 in the log-log scale. The figure shows how the variability of these vectors is quite small. Figure 1 validates the normality assumption for feature vectors.

The Kuramoto-Sivashinsky model
The second example is the 256-dimensional Kuramoto-Sivashinsky (KS) PDE system. The purpose of this example is to introduce ways to improve the computational efficiency by a piecewise parallel integration over the time interval of given data. Also, we demonstrate how decreasing the number of observed  MCMC is still possible. The Kuramoto-Sivashinsky model is given by the fourth-order PDE where s = s(x, t) is a real function of x ∈ R and t ∈ R + . In addition it is assumed that s is spatially 340 periodical with period of L, i.e. s(x + L, t) = s(x, t). This experiment uses the parametrization from (Yiorgos Smyrlis, 1996) that maps the spatial domain [− L 2 , L 2 ] to [−π, π] by settingx = 2π L x andt = 2π L 2 t.
With L = 100, the true value of parameter γ is (π/50) 2 ≈ 0.0039, and the true value of η becomes 1 2 . These two parameters are the ones that are then estimated with the LA-MCMC method. This system was 16 https://doi.org/10.5194/gmd-2020-350 Preprint. Discussion started: 26 October 2020 c Author(s) 2020. CC BY 4.0 License. derived by Kuramoto et al. in (Kuramoto and Yamada, 1976;Kuramoto, 1978) as a model for phase turbu-345 lence in reaction-diffusion systems. (Sivashinsky, 1977) used the same system for describing instabilities of laminar flames.
Assume that the solution for this problem can be represented by a truncation of the Fourier series Using this form reduces Eq. (9) to a system of ordinary differential equations for the unknown coefficients 350 A j (t) and B j (t), where the terms F 1 (·) and F 2 (·) are polynomials of the vectors A and B. For details, see (Huttunen et al., 2018). The solution can be effectively computed on graphics processors in parallel, and if computational Model trajectories from simulations with four different parameter vectors are shown in Fig. 5. These parameter values were 1) the "true" value which was used to generate training data, 2) another parameter 380 from inside the posterior distribution, and 3-4) two other parameters from outside the posterior distribution.
These parameters are also shown in Fig. 4. Visually inspecting the outputs, Cases 1-3 look similar, while results using parameter vector 4, furthest away from the posterior, are markedly different. Even though the third parameter vector is outside the posterior, the resulting trajectory is not easily distinguishable from 1 and 2, indicating that the CIL method differentiates between the trajectories more efficiently.

385
Additional experiments were performed to evaluate the stability of the method when not all of the model states were observed. Keeping the setup otherwise fixed, the number of elements of the state vectors observed was reduced from the full 256 step by step to 128, 64, and 32. The resulting MCMC chains are presented in Fig. 6, and as expected, when less is observed, the size of the posterior distribution grows.  Table 1, while examples of the respective integrated trajectories are given in Fig. 5.

390
The methodology is here applied to a computationally intensive model, where a brute-force parameter posterior estimation would be too time-consuming. We employ the well-known quasi-geostrophic model ((Fandry and Leslie, 1984) and (Pedlosky, 1987)) using a dense grid to achieve complex chaotic dynamics in high dimensions. The wall-clock time for one forward model simulation is roughly 10 minutes, so a naïve calculation of a posterior sample of size 100000 would take around two years. We demonstrate how and the topographic constraints, the model parameters include the thicknesses of the two atmospheric layers, denoted by H 1 and H 2 . Furthermore, the QG-model accounts for the Coriolis force, whose strength is controlled by the parameter f 0 . An example of the two-layer geometry is presented in Fig. 7.
In a non-dimensional form the QG system can be written as is the Rossby number of the system with L and U designating the length and speed scales, respectively.
It is assumed that the motion determined by the model is geostrophic, essentially meaning that potential 415 vorticity of the flow is preserved on both layers: Here u i and v i are velocity fields, which are functions of both space and time. They are obtained from the stream functions ψ i via Equations (13)-(16) define the spatio-temporal evolution of the quantities q i , ψ i , i = 1, 2.
The numerical integration of this system is carried out using the semi-Lagrangian scheme, where the potential vorticities q i are computed according to Eq. (15) for given velocities u i and v i . With these q i the stream functions can then be obtained from Eq. (13) and (14) with a two-stage finite difference scheme.
Finally, the velocity field is updated by Eq. (16) for the next iteration round. For more details see (Fandry 425 and Leslie, 1984).
For estimating model parameters from synthetic data, a reference data set is created with 64 epochs each containing N = 1000 observations. These data are sampled from the model trajectory with ∆ t = 8 (where a time step of length 1 corresponds to 6h) in the time interval [192,8192], that amounts to a long-range integration of roughly 5-6 years of a climate model. The spatial domain is discretized into a 55 × 55 grid, 430 which results in consistent chaotic behavior and more complex dynamics than with the often-used 20 × 40 grid. This is reflected in higher variability in the feature vectors, as seen in the Fig. 1. A snapshot of the 6050-dimensional trajectory of the QG system is displayed in Figure 8.
The model state is characterized by two distinct fields, the vorticities and stream functions, that naturally are dependent on each other. But as shown in (Haario et al., 2015), it is useful to construct separate feature 435 vectors to characterize the dynamics in such situations. For this reason, two separate feature vectors are constructed -one for the potential vorticity on both layers and the other for the stream function.
The Gaussian likelihood of the state is created by stacking these two feature vectors one after another.
The normality of the resulting 2(M + 1)-dimensional vector may again be verified as shown in Fig. 1 the feature vectors were computed on a GPU and therefore the required computation time for doing this was negligible compared to the model integration time. The posterior distribution of the two parameters is presented in Fig. 9.

Conclusions and future work
Bayesian parameter estimation with computationally demanding computer models is highly non-trivial.

455
The associated computational challenges often become insurmountable when the model dynamics are chaotic. In this work we showed it is possible to overcome these challenges by combining the correlation integral likelihood (CIL) with an MCMC method based on local surrogates of the log-likelihood function (LA-MCMC). The CIL captures changes in the geometry of the underlying attractor of the chaotic system, would have been computationally intractable. We note that the computational demands of the QG model already get quite close to those of weather models at coarse resolutions.
There are many potential directions for extension of this work. First, it should be feasible to run parallel LA-MCMC chains that share model evaluations in a single evaluated set; doing so can accelerate the con-470 struction of accurate local surrogate models, as demonstrated in Conrad et al. (2018), and is a useful way of harnessing parallel computational resources within surrogate-based MCMC. Extending this approach to higher-dimensional parameters is also of interest. While LA-MCMC has been successfully applied to chains of dimension up to q = 12 (Conrad et al., 2018), future work should explore sparsity and other truncations of the local polynomial approximation to improve scaling with dimension. From the CIL per-475 spective, calibrating more complex models, such as weather models, often requires choosing the part of the state vector from which the feature vectors are computed. While computing the likelihood from the full high-dimensional state is computationally feasible, Haario et al. (2015) showed that carefully choosing a subset of the state for the feature vectors performs better. Also, the epochs may need to be chosen sufficiently long to include potential rare events, so that changes in rare event patterns can be identified. This, 480 naturally, will increase the computational cost if one wants to be confident of the inclusion of such events.
While answering these questions will require further work, we believe the research presented in this paper provides a promising and reasonable step towards estimating parameters in the context of expensive operational models.
Code availability. Code is available at request.

485
Author contributions. HH and YM designed the study with input from all authors. SS, HH, AD, and JS combined the CIL and LA-MCMC methods for carrying out the research. AB wrote and provided implementations of the KS and QG models for GPUs, including custom numerics and testing. SS wrote the CIL code and the version of LA-MCMC used (based on earlier work by Antti Solonen), and carried out the simulations. All authors discussed the results and shared the responsibility of writing the manuscript. SS prepared the figures.