<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article"><?xmltex \makeatother\@nolinetrue\makeatletter?>
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-14-4319-2021</article-id><title-group><article-title>Efficient Bayesian inference for large chaotic dynamical systems</article-title><alt-title>Efficient Bayesian inference in chaotic dynamical systems</alt-title>
      </title-group><?xmltex \runningtitle{Efficient Bayesian inference in chaotic dynamical systems}?><?xmltex \runningauthor{S.~Springer et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Springer</surname><given-names>Sebastian</given-names></name>
          <email>sebastian.springer@lut.fi</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff3">
          <name><surname>Haario</surname><given-names>Heikki</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff3 aff4">
          <name><surname>Susiluoto</surname><given-names>Jouni</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff6">
          <name><surname>Bibov</surname><given-names>Aleksandr</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4 aff5">
          <name><surname>Davis</surname><given-names>Andrew</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Marzouk</surname><given-names>Youssef</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Department of Computational and Process Engineering, Lappeenranta University of Technology, Lappeenranta, Finland</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Research unit of Mathematical Sciences, University of Oulu, Oulu, Finland</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Finnish Meteorological Institute, Helsinki, Finland</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA, USA</institution>
        </aff>
        <aff id="aff5"><label>5</label><institution>Courant Institute of Mathematical Sciences, New York University, New York, NY, USA</institution>
        </aff>
        <aff id="aff6"><label>6</label><institution>Varjo Technologies Oy, Helsinki, Finland</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Sebastian Springer (sebastian.springer@lut.fi)</corresp></author-notes><pub-date><day>9</day><month>July</month><year>2021</year></pub-date>
      
      <volume>14</volume>
      <issue>7</issue>
      <fpage>4319</fpage><lpage>4333</lpage>
      <history>
        <date date-type="received"><day>17</day><month>October</month><year>2020</year></date>
           <date date-type="rev-request"><day>26</day><month>October</month><year>2020</year></date>
           <date date-type="rev-recd"><day>17</day><month>May</month><year>2021</year></date>
           <date date-type="accepted"><day>29</day><month>May</month><year>2021</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 Sebastian Springer et al.</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021.html">This article is available from https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021.pdf</self-uri>
      <abstract><title>Abstract</title>
    <p id="d1e161">Estimating parameters of chaotic geophysical models is challenging due to their inherent unpredictability. These models cannot be calibrated with   standard least squares or filtering methods if observations are temporally sparse. Obvious remedies, such as averaging over temporal and spatial data to characterize the mean behavior, do not capture the subtleties of the underlying dynamics. We perform Bayesian inference of parameters in high-dimensional and computationally demanding chaotic dynamical systems by combining two approaches:
(i) measuring model–data mismatch by comparing chaotic attractors and (ii) mitigating the computational cost of inference by using surrogate models. Specifically, we construct a likelihood function suited to chaotic models by evaluating a distribution over  distances between points in the phase space; this distribution defines a summary statistic that depends on the  geometry of the attractor, rather than on pointwise matching of trajectories.
This statistic is computationally expensive to simulate, compounding the usual challenges of Bayesian computation with physical models. Thus, we develop
an inexpensive surrogate for the log likelihood with the local approximation Markov chain Monte Carlo method, which in our simulations reduces the time required for accurate inference by orders of magnitude. We investigate the behavior of the resulting algorithm with two smaller-scale problems and then use a quasi-geostrophic model to demonstrate its large-scale application.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e173">Time evolution of many geophysical dynamical systems is chaotic. Chaoticity means that state of a system sufficiently far in the future cannot be predicted even if we know the dynamics and the initial conditions very precisely. Commonly used examples of chaotic systems include climate, weather, and the solar system.</p>
      <p id="d1e176">A system being chaotic does not mean that it is random: the dynamics of models of chaotic systems are still determined by parameters, which may be either deterministic or random <xref ref-type="bibr" rid="bib1.bibx16" id="paren.1"/>. For example, Monte Carlo methods may be used to simulate future climate variability, but the distribution of possible climates will depend on the parameters of the climate model, and using the wrong model parameter distribution will result in potentially biased results with inaccurate uncertainties. For this reason, parameter estimation in chaotic models is an important problem for a range of geophysical applications.
This paper focuses on Bayesian approaches to parameter inference in settings where (a) model dynamics are chaotic, and (b) sequential observations of the system are obtained so rarely that the model behavior has become unpredictable.</p>
      <p id="d1e182">Parameters of a dynamical system model are most commonly inferred by minimizing a cost function that captures model–observation mismatch <xref ref-type="bibr" rid="bib1.bibx52" id="paren.2"/>. In the Bayesian setting <xref ref-type="bibr" rid="bib1.bibx16" id="paren.3"/>, modeling this mismatch probabilistically yields a likelihood function, which enables maximum likelihood estimation or fully Bayesian<?pagebreak page4320?> inference. In fully Bayesian inference, the problem is further regularized by prescribing a prior distribution on the model state. In practice, Bayesian inference is often realized via Markov chain Monte Carlo (MCMC) methods <xref ref-type="bibr" rid="bib1.bibx15 bib1.bibx44" id="paren.4"/>.</p>
      <p id="d1e194">This straightforward strategy – for instance, using the squared Euclidean distance between model outputs and data to construct a Gaussian likelihood – is, however, inadequate for chaotic models, where small changes in parameters, or even in the tolerances used for numerical solvers, can lead to arbitrarily large differences in model outputs <xref ref-type="bibr" rid="bib1.bibx46" id="paren.5"/>. Furthermore, modeling dynamical systems is often computationally very demanding, which makes sample generation time consuming. Since successful application of MCMC generally requires large numbers of model evaluations, performing Bayesian inference with MCMC is often not possible.</p>
      <p id="d1e201">The present works combines two recent methods to tackle these problems due to model chaoticity and computational cost. Chaoticity is tamed by using
correlation integral likelihood (CIL) <xref ref-type="bibr" rid="bib1.bibx21" id="paren.6"/>, which is able to constrain the parameters of chaotic dynamical systems. We couple CIL with local approximation MCMC (LA-MCMC) <xref ref-type="bibr" rid="bib1.bibx7" id="paren.7"/>, which is a surrogate modeling technique that makes asymptotically exact posterior characterization feasible for computationally expensive models. We show how combining these methods enables a Bayesian approach to infer the parameters of chaotic high-dimensional models and quantify their uncertainties in situations previously discussed as intractable <xref ref-type="bibr" rid="bib1.bibx46" id="paren.8"/>. Moreover, we introduce several computational improvements to further enhance the applicability of the approach.</p>
      <p id="d1e213">The CIL method is based on the concept of fractal dimension from mathematical physics, which broadly speaking characterizes the space-filling properties of the trajectory of a dynamical system. Earlier work <xref ref-type="bibr" rid="bib1.bibx5" id="paren.9"><named-content content-type="pre">e.g.,</named-content></xref> describes a number of different approaches for estimating the fractal dimension. Our previous work extends this concept: instead of computing the fractal dimension of a single trajectory, a similar computation measures the distance between different model trajectories <xref ref-type="bibr" rid="bib1.bibx21" id="paren.10"/>, based on which a specific summary statistic, called feature vector, is computed. The modification provides a normally distributed statistic of the data, which is sensitive to changes in the underlying attractor from which the data were sampled. Statistics that are sensitive to changes in the  attractor yield likelihood functions that can better constrain the model parameters and therefore also result in more meaningful parameter posterior distributions.</p>
      <p id="d1e224">The LA-MCMC method <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx8" id="paren.11"/>  approximates the computationally expensive log-likelihood function using local polynomial regression. In this method, the MCMC sampler directly uses the approximation of the log likelihood to construct proposals and evaluate the Metropolis acceptance probability. Infrequently but regularly adding “full” likelihood evaluations to the point set used to construct the local polynomial regression continually improves the approximation, however. Expensive full likelihood evaluations are thus used only to construct the approximation or “surrogate” model. <xref ref-type="bibr" rid="bib1.bibx7" id="text.12"/> show that, given an appropriate mechanism for triggering likelihood evaluations, the resulting Markov chain converges to the true posterior distribution while reducing the number of expensive likelihood evaluations (and hence forward model simulations) by orders of magnitude. <xref ref-type="bibr" rid="bib1.bibx9" id="text.13"/> show that LA-MCMC converges with approximately the expected <inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>T</mml:mi></mml:msqrt></mml:mrow></mml:math></inline-formula> error decay rate after a finite number of steps <inline-formula><mml:math id="M2" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula>, and <xref ref-type="bibr" rid="bib1.bibx10" id="text.14"/> introduces a numerical parameter that ensures convergence even if only noisy estimates of the target density are available. This modification is useful for the chaotic systems studied here.</p>
      <p id="d1e260">The rest of this paper is organized as follows. Section <xref ref-type="sec" rid="Ch1.S2"/> reviews some additional background literature and related work and Sect. <xref ref-type="sec" rid="Ch1.S3"/> describes the methodologies used in this work, including the CIL, the stochastic LA-MCMC algorithm, and the merging of these two approaches.
Section <xref ref-type="sec" rid="Ch1.S4"/> is dedicated to numerical experiments, where the CIL/LA-MCMC approach is applied to several, progressively more demanding examples. These examples are followed by a concluding discussion in Sect. <xref ref-type="sec" rid="Ch1.S5"/>.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Background and related work </title>
      <p id="d1e279">Traditional parameter estimation methods, which directly utilize the model–observation mismatch,  constrain the modeling to limited time intervals when the model is chaotic. This avoids the eventual divergence (chaotic behavior) of orbits that are initially close to each other.
A classical example is variational data assimilation for weather prediction, where the initial states of the model are estimated using observational data and algorithms such as 4D-Var, after which a short-time forecast can be simulated <xref ref-type="bibr" rid="bib1.bibx2" id="paren.15"/>.</p>
      <p id="d1e285">Sequential data assimilation methods, such as the Kalman filter (KF) <xref ref-type="bibr" rid="bib1.bibx33" id="paren.16"/>, allow parameter estimation by recursively updating both the model state and the model parameters by conditioning them on observational data obtained over sufficiently short timescales.
With methods such as state augmentation, model parameters can be updated as part of the filtering problem <xref ref-type="bibr" rid="bib1.bibx34" id="paren.17"/>.
Alternatively, the state values can be integrated out to obtain the marginal likelihood over the model parameters <xref ref-type="bibr" rid="bib1.bibx12" id="paren.18"/>. <xref ref-type="bibr" rid="bib1.bibx22" id="text.19"/> use this filter likelihood approach to estimate parameters of chaotic systems.
For models with strongly non-linear dynamics, ensemble filtering methods provide a useful alternative to the extended Kalman filter or variational methods; see <xref ref-type="bibr" rid="bib1.bibx24" id="text.20"/> for a recent review of various ensemble variants.</p>
      <?pagebreak page4321?><p id="d1e303"><?xmltex \hack{\newpage}?>Filtering-based approaches generally introduce additional tuning parameters, such as the length of the assimilation time window, the model error covariance matrix, and covariance inflation parameters.
These choices have an impact on model parameter estimation and may introduce bias. Indeed, as discussed in <xref ref-type="bibr" rid="bib1.bibx23" id="text.21"/>, changing the filtering method requires updating the parameters of the dynamical model.
Alternatives to KF-based parameter estimation methods that do not require ensemble filtering include operational ensemble prediction systems (EPSs); for example, ensemble parameter calibration methods by
<xref ref-type="bibr" rid="bib1.bibx27" id="text.22"/> and <xref ref-type="bibr" rid="bib1.bibx32" id="text.23"/> have been applied to the Integrated Forecast System (IFS)
weather models <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx40 bib1.bibx41" id="paren.24"/> at the European Centre for Medium-Range Weather Forecasts (ECMWF).
However, these approaches are heuristic and again limited to relatively short predictive windows.</p>
      <p id="d1e319">Climate model parameters have in previous studies  <xref ref-type="bibr" rid="bib1.bibx45 bib1.bibx13 bib1.bibx51" id="paren.25"><named-content content-type="pre">e.g.,</named-content></xref> been calibrated  by matching summary statistics of quantities of interest, such as top-of-atmosphere radiation, with the corresponding summary statistics from reanalysis data or output from competing models.
The vast majority of these approaches produce only point estimates. A fully Bayesian parameter inversion was performed by
<xref ref-type="bibr" rid="bib1.bibx26" id="text.26"/>, who inferred closure parameters of a large-scale computationally intensive climate model, ECHAM5, using MCMC and  several different summary statistics.</p>
      <p id="d1e331">Computational limitations make applying algorithms such as MCMC challenging for weather and climate models.
Generating even very short MCMC chains
may require methods such as parallel sampling and early rejection for tractability <xref ref-type="bibr" rid="bib1.bibx49" id="paren.27"/>. Moreover, even if these computational challenges can be overcome, finding statistics that actually constrain the parameters is difficult, and inference results can be thus be inconclusive.
The failure of the summary statistic approach in <xref ref-type="bibr" rid="bib1.bibx26" id="text.28"/> can be explained intuitively: the chosen statistics average out too much information and therefore fail to characterize the geometry of the underlying chaotic attractor in a meaningful way.</p>
      <p id="d1e340">Several Monte Carlo methods have been presented to tackle expensive or intractable likelihoods; see, e.g., <xref ref-type="bibr" rid="bib1.bibx36" id="text.29"/> for a recent comprehensive literature review. Two notable such methods are approximate Bayesian computation (ABC) <xref ref-type="bibr" rid="bib1.bibx3" id="paren.30"/> and pseudo-marginal sampling <xref ref-type="bibr" rid="bib1.bibx1" id="paren.31"/>.
The approach that is most  closely related to the one presented in this paper is Bayesian inference using synthetic likelihoods, which was proposed as an alternative to ABC <xref ref-type="bibr" rid="bib1.bibx53 bib1.bibx43" id="paren.32"/>. Recent work by  <xref ref-type="bibr" rid="bib1.bibx37" id="text.33"/> describes another feature vector approach for data assimilation. For  more details and comparisons among these approaches, see the discussion below in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>.</p>
      <p id="d1e361">In this work, we employ  a different summary statistic,
where the observations are considered as samples from the underlying attractor. Due to the nature of the summary statistic used in CIL, the observation time stamps are not explicitly used. This allows arbitrarily sparse observation time series, and consecutive observations may be farther than any window where the system remains predictable. To the best of our knowledge,  parameter estimation in this setting has not been discussed in the literature.
Another difference with  the synthetic likelihood
approach is that it  involves regenerating data for computing  the likelihood at every new model parameter value, which would be computationally unfeasible in our setting.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methods</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Correlation integral likelihood</title>
      <p id="d1e379">We first construct a likelihood function that models the observations by comparing certain summary statistics of the observations to the corresponding statistics of a trajectory simulated from the chaotic model. As a source of statistics, we will choose the correlation integral, which depends on the fractal dimension of the chaotic attractor. Unlike other statistics – such as the ergodic averages of a trajectory – the correlation integral is able to constrain the parameters of a chaotic model <xref ref-type="bibr" rid="bib1.bibx21" id="paren.34"/>.</p>
      <p id="d1e385">Let us denote by
            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M3" display="block"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi mathvariant="bold">u</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mo mathvariant="bold">,</mml:mo><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mi mathvariant="bold">u</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">u</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          a dynamical system with state <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mi mathvariant="bold">u</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>n</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, initial state
<inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">u</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>n</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, and parameters <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>q</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>.
The time-discretized system, with time steps <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> denoting selected observation points, can be written as
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M8" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>≡</mml:mo><mml:mi mathvariant="bold">u</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>;</mml:mo><mml:msub><mml:mi mathvariant="bold">u</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          Either the full state <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>n</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> or a subset <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>≤</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> of the state components are observed. We will use <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mi mathvariant="bold">S</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> to denote a collection of these observables at successive times.</p>
      <p id="d1e656">Using the model–observation mismatch at a collection of times <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to constrain the value of the parameters <inline-formula><mml:math id="M13" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula> is not suitable when the system (<xref ref-type="disp-formula" rid="Ch1.E1"/>) has chaotic dynamics, since the state vector values <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are unpredictable after a finite time interval.
Though long-time trajectories <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> of chaotic systems are not predictable in the time domain, they do, however, represent samples from an underlying attractor in the phase space. The states are generated deterministically, but the model's chaotic nature allows us to interpret the states as samples from a particular <inline-formula><mml:math id="M16" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>-dependent distribution.
Yet obvious choices for summary statistics <inline-formula><mml:math id="M17" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> that depend on the observed states <inline-formula><mml:math id="M18" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>, such as ergodic averages, ignore important aspects of the dynamics and are thus unable to constrain the model parameters. For example, the statistic <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold">S</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">τ</mml:mi></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is easy to compute and is normally distributed in the limit <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi><mml:mo>→</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> (under appropriate conditions),<?pagebreak page4322?> but this ergodic mean says very little about the shape of the chaotic attractor.</p>
      <p id="d1e777">Instead, we need a summary statistic that retains information relevant for parameter estimation but still defines a computationally tractable likelihood. To this end, <xref ref-type="bibr" rid="bib1.bibx21" id="text.35"/> devised the  CIL, which retains enough information about the attractor to constrain the model parameters. We first review the CIL and then discuss how to make evaluation of the likelihood tractable.</p>
      <p id="d1e784">We will use the CIL to evaluate the “difference” between two chaotic attractors. For this purpose, we will first describe how to statistically characterize the geometry of a given attractor, given suitable observations <inline-formula><mml:math id="M21" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>. In particular, constructing the CIL likelihood  will require three steps: (i) computing distances between observables sampled from a given attractor;  (ii) evaluating the empirical cumulative distribution function (ECDF) of these distances and deriving certain summary statistics <inline-formula><mml:math id="M22" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> from the ECDF; and (iii) estimating the mean and covariance of <inline-formula><mml:math id="M23" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> by repeating steps (i) and (ii).</p>
      <p id="d1e808">Intuitively, the CIL thus interprets observations of a chaotic trajectory as samples from a fixed distribution over phase space. It allows the time between observations to be arbitrarily large – importantly, much longer than the system's non-chaotic prediction window.</p>
      <p id="d1e811">Now we describe the CIL construction in detail. Suppose that we have collected a data set <inline-formula><mml:math id="M24" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula> comprising observations of the dynamical system of interest. Let <inline-formula><mml:math id="M25" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula> be split into <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> different subsets called epochs. The epochs can, in principle, be any subsets of length <inline-formula><mml:math id="M27" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> from the reference data set <inline-formula><mml:math id="M28" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>. In this paper, however, we restrict the epochs to be time-consecutive intervals of <inline-formula><mml:math id="M29" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> evenly spaced observations. Let <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:msubsup><mml:mo mathvariant="italic">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>j</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:msubsup><mml:mo mathvariant="italic">}</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>, with  <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>≤</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>≤</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>≠</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:math></inline-formula>, be two such disjoint epochs. The individual observable vectors <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>j</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> comprising each epoch come from time intervals <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, respectively. In other words, superscripts refer to different epochs and  subscripts refer to the time points within those epochs.
<xref ref-type="bibr" rid="bib1.bibx21" id="text.36"/> then define the modified correlation integral sum  <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> by counting all pairs of observations that are less than a distance <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> from each other:

                <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M40" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi>C</mml:mi><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>≤</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mn mathvariant="bold">1</mml:mn><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:msub><mml:mfenced close=")" open="("><mml:mfenced close="∥" open="∥"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>j</mml:mi><mml:mi>l</mml:mi></mml:msubsup></mml:mrow></mml:mfenced></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where  <bold>1</bold> denotes the indicator function   and <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mo>∥</mml:mo><mml:mo>⋅</mml:mo><mml:mo>∥</mml:mo></mml:mrow></mml:math></inline-formula> is the  Euclidean norm on <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>.
In the physics literature, evaluating Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>) in the limit <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>→</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>≠</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:math></inline-formula>, numerically approximates the fractal dimension of the attractor that produced <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx18" id="paren.37"/>. Here, we instead use Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>) to characterize the distribution of distances between <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> at all relevant scales. We assume that the state space is bounded; therefore, an <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> covering all pairwise distances in Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>) exists.
For a prescribed  set  of radii <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>, Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>) defines a discretization of the ECDF of the distances  <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:mo>∥</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>j</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>∥</mml:mo></mml:mrow></mml:math></inline-formula>, with discretization boundaries given by the numbers <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e1450">Now we define <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>C</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as components of a statistic <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>:=</mml:mo><mml:mo>(</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mn mathvariant="normal">0</mml:mn><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mi>M</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. This statistic is also called the feature vector. According to <xref ref-type="bibr" rid="bib1.bibx4" id="text.38"/> and <xref ref-type="bibr" rid="bib1.bibx38" id="text.39"/>, the vectors <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>  are normally distributed, and the estimates of the mean and covariance converge at the rate <inline-formula><mml:math id="M58" display="inline"><mml:msqrt><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub></mml:mrow></mml:msqrt></mml:math></inline-formula> to their limit points.
This is a generalization of the classical result of <xref ref-type="bibr" rid="bib1.bibx11" id="text.40"/>, which applies to Independent and identically distributed samples from a scalar-valued distribution. We characterize this normal distribution by subsampling the full data set <inline-formula><mml:math id="M59" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>. Specifically, we approximate the mean <inline-formula><mml:math id="M60" display="inline"><mml:mi mathvariant="bold-italic">μ</mml:mi></mml:math></inline-formula> and  covariance <inline-formula><mml:math id="M61" display="inline"><mml:mi mathvariant="normal">Σ</mml:mi></mml:math></inline-formula> of <inline-formula><mml:math id="M62" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> by the sample mean and sample covariance of the set
<inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>:</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">1</mml:mn><mml:mo>≤</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>≤</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>≠</mml:mo><mml:mi>l</mml:mi><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>,  evaluated for all <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> pairs of epochs <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> using fixed values of <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M67" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M68" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M69" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>.</p>
      <p id="d1e1778">The Gaussian distribution of <inline-formula><mml:math id="M70" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> effectively characterizes the geometry of the attractor represented in the data set <inline-formula><mml:math id="M71" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>. Now we wish to use this distribution to infer the parameters <inline-formula><mml:math id="M72" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>. Given a  candidate parameter value <inline-formula><mml:math id="M73" display="inline"><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover></mml:math></inline-formula>, we use the model to generate states <inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mspace width="-0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:msubsup><mml:mspace width="-0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:msubsup><mml:mo mathvariant="italic">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> for the length of a single epoch. We then evaluate the statistics <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>C</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mspace width="-0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as in Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>), by computing the distances between elements of <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mspace width="-0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and the states of an epoch <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> selected from the data <inline-formula><mml:math id="M78" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>. Combining these statistics into a feature vector <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mspace linebreak="nobreak" width="-0.125em"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mo>*</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mo>)</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>, we can write a noisy estimate of the log-likelihood function:
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M80" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>log⁡</mml:mi><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>|</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mspace linebreak="nobreak" width="-0.125em"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow></mml:mfenced><mml:mo>⊤</mml:mo></mml:msup><mml:msup><mml:mi mathvariant="normal">Σ</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mspace width="-0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>+</mml:mo><mml:mtext>constant</mml:mtext><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
          Comparing <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mspace linebreak="nobreak" width="-0.125em"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with other epochs drawn from the data set <inline-formula><mml:math id="M82" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula>, however,  will produce different realizations of the feature vector. We thus average the resulting log likelihoods over all epochs:
            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M83" display="block"><mml:mrow><mml:mi>log⁡</mml:mi><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="bold">S</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:munderover><mml:mi>log⁡</mml:mi><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">|</mml:mi><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          This averaging, which involves evaluating Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> times, involves only new distance computations and is thus relatively cheap relative to time integration of the dynamical model.</p>
      <?pagebreak page4323?><p id="d1e2253">Because the feature vectors <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mo>*</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are random for any finite <inline-formula><mml:math id="M86" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>, and because the number of epochs <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is also finite, the log likelihood in Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>) is necessarily random. It is then useful to view Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>) as estimate of an underlying true log likelihood. We are therefore in a setting where cannot evaluate the unnormalized posterior density exactly; we only have access to a noisy approximation of it. Previous work <xref ref-type="bibr" rid="bib1.bibx50" id="paren.41"/> has demonstrated that  derivative-free optimizers such as the differential evolution (DE) algorithm  can successfully identify the posterior mode in this setting, yielding a point estimate of <inline-formula><mml:math id="M88" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>. In the fully Bayesian setting, one could characterize the posterior <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="bold">S</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> using pseudo-marginal MCMC methods <xref ref-type="bibr" rid="bib1.bibx1" id="paren.42"/> but at significant computational expense. Below, we will use a surrogate model constructed adaptively during MCMC sampling to reduce this computational burden.</p>
      <p id="d1e2327">Note that the CIL approach described above already reduces the computational cost of inference by only requiring simulation of the (potentially expensive) chaotic model for a single epoch. We compare each epoch of the data to the same single-epoch model output. Each of these comparisons results in an estimate of the log likelihood, which we then average over data epochs. A larger data set <inline-formula><mml:math id="M90" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula> can reduce the variance of this average, but does not require additional simulations of the dynamical model. Also, we do not require any knowledge about the initial conditions of the model; we omit an initial time interval before extracting <inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mspace width="-0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> to ensure that the observed trajectory is on the chaotic attractor.</p>
      <p id="d1e2358">Moreover, the initial values are
randomized for all simulations and sampling is started only after the model has integrated beyond the initial, predictable, time window.  The independence of the
sampled  parameter posteriors from the
initial values was verified both here and in earlier works by repeated experiments.</p>
      <p id="d1e2361">Our approach is broadly similar to the synthetic likelihood method (e.g., <xref ref-type="bibr" rid="bib1.bibx53 bib1.bibx43" id="altparen.43"/>) but differs in two key respects: (i) we use a novel summary statistic that is able to characterize chaotic attractors, and (ii) we only need to  evaluate the forward model for a single epoch. Comparatively, synthetic likelihoods typically use summary statistics such as auto-covariances at a given lag or regression coefficients. These methods also require long-time integration of the forward model for each candidate parameter value <inline-formula><mml:math id="M92" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>, rather than integration for only one epoch. <xref ref-type="bibr" rid="bib1.bibx37" id="text.44"/> also discuss several ways of using feature vectors for inference in geophysics. A distinction of the present work is that we use an ECDF-based summary statistic that is provably Gaussian, and we perform extensive Bayesian analysis of the parameter posteriors via novel MCMC methods. These methods are described next.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Local approximation MCMC</title>
      <p id="d1e2385">Even with the developments described above, estimating the  CIL at each candidate parameter value <inline-formula><mml:math id="M93" display="inline"><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover></mml:math></inline-formula> is computationally intensive. We thus use local approximation MCMC (LA-MCMC) <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx8 bib1.bibx9" id="paren.45"/> – a surrogate modeling method that replaces many of these CIL evaluations with an inexpensive approximation. Replacing expensive density evaluations with a surrogate was first introduced by  <xref ref-type="bibr" rid="bib1.bibx47" id="text.46"/> and <xref ref-type="bibr" rid="bib1.bibx28" id="text.47"/>. LA-MCMC extends these ideas by continually refining the surrogate during sampling, which guarantees convergence.</p>
      <p id="d1e2407">First introduced in <xref ref-type="bibr" rid="bib1.bibx7" id="text.48"/>, LA-MCMC builds local surrogate models for the log likelihood while simultaneously sampling the posterior. The surrogate is incrementally and infinitely refined during sampling and thus tailored to the problem – i.e., made more accurate in regions of high posterior probability. Specifically, the surrogate model is a local polynomial computed by fitting nearby evaluations of the “true” log likelihood. We emphasize that the approximation itself is not locally supported. At each point <inline-formula><mml:math id="M94" display="inline"><mml:mover accent="true"><mml:mi mathvariant="italic">θ</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula>, we locally construct a polynomial approximation, which globally defines a piecewise polynomial surrogate model. This is an important distinction because the piecewise polynomial approximation is not necessarily a probability density function. In fact, the surrogate function may not even be integrable. Despite this challenge, <xref ref-type="bibr" rid="bib1.bibx9" id="text.49"/> devise a refinement strategy that ensures convergence and bounds the error after a finite number of samples. In particular, <xref ref-type="bibr" rid="bib1.bibx9" id="text.50"/> shows that the error in the approximate Markov chain computed with the local surrogate model decays at approximately the expected <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>T</mml:mi></mml:msqrt></mml:mrow></mml:math></inline-formula> rate, where <inline-formula><mml:math id="M96" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> is the number of MCMC steps. <xref ref-type="bibr" rid="bib1.bibx10" id="text.51"/> demonstrated that noisy estimates of the likelihood are sufficient to construct the surrogate model and still retain asymptotic convergence. Empirical studies <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx8 bib1.bibx9" id="paren.52"/> on problems of moderate parameter dimension showed that the number of expensive likelihood evaluations per MCMC step can be reduced by orders of magnitude, with no discernable loss of accuracy in posterior expectations.</p>
      <p id="d1e2456">Here, we briefly summarize one step of the LA-MCMC construction and refer to <xref ref-type="bibr" rid="bib1.bibx9" id="text.53"/> for details. Each LA-MCMC step consists of four stages: (i) possibly refine the local polynomial approximation of the log likelihood, (ii) propose a new candidate MCMC state, (iii) compute the acceptance probability, and (iv) accept or reject the proposed state. The major distinction between this algorithm and standard Metropolis–Hastings MCMC is that the acceptance probability in stage (iii) is computed only using the approximation or surrogate model of the log likelihood, at both the current and proposed states. This introduces an error, relative to computation of the acceptance probability with exact likelihood evaluations, but stage (i) of the algorithm is designed to control and incrementally reduce this error at the appropriate rate.</p>
      <p id="d1e2462">“Refinement” in stage (i) consists of adding a computationally intensive log-likelihood evaluation at some parameter value <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, denoted by <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mi mathvariant="script">L</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, to the evaluated set <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="script">L</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:msubsup><mml:mo mathvariant="italic">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>. These <inline-formula><mml:math id="M100" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> pairs are used to construct the local approximation via a kernel-weighted local polynomial regression <xref ref-type="bibr" rid="bib1.bibx29" id="paren.54"/>.  The values <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msubsup><mml:mo mathvariant="italic">}</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> are called “support points” in this paper. Details on the regression formulation are in <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx7" id="text.55"/>. As the support points cover the regions of high posterior probability more densely, the accuracy of the local polynomial surrogate will increase. This error is well understood <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx6" id="paren.56"/> and, crucially, takes advantage of smoothness in the underlying true log-likelihood function. This smoothness ultimately allows the cardinality of the<?pagebreak page4324?> evaluated set to be much smaller than the number of MCMC steps.</p>
      <p id="d1e2578">Intuitively, if the surrogate converges to the true log likelihood, then the samples generated with LA-MCMC will (asymptotically) be drawn from the true posterior distribution. After any finite number of steps, however, the surrogate error introduces a bias into the sampling algorithm. The refinement strategy must therefore ensure that this bias is not the dominant source of error. At the same time, refinements must occur infrequently to ensure that LA-MCMC is computationally cheaper than using the true log likelihood. <xref ref-type="bibr" rid="bib1.bibx9" id="text.57"/> analyzes the trade-off between surrogate-induced bias and MCMC variance and proposes a rate-optimal refinement strategy.
We use a similar algorithm, only adding an isotropic <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">ℓ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> penalty on the polynomial coefficients. More specifically, we rescale the variables so the max/min values are <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> – often called coded units – then regularize the regression by a penalty parameter (in our cases, the value <inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> was found sufficient). This penalty term modifies the ordinary least squares problem into local ridge regression, which improves performance with noisy likelihoods.</p>
      <p id="d1e2617">Our examples use an adaptive proposal density <xref ref-type="bibr" rid="bib1.bibx20" id="text.58"/>. This choice deviates slightly from the theory in <xref ref-type="bibr" rid="bib1.bibx9" id="text.59"/>, which assumes a constant-in-time proposal density. However, this does not necessarily imply that adaptive or gradient-based methods will not converge. In particular, <xref ref-type="bibr" rid="bib1.bibx7" id="text.60"/> show asymptotic convergence using an adaptive proposal density and <xref ref-type="bibr" rid="bib1.bibx8" id="text.61"/> strengthen this result by showing that the Metropolis-adjusted Langevin algorithm, which is a gradient based MCMC method, is asymptotically exact when using a continually refined local polynomial approximation. These results require some additional assumptions about the target density's tail behavior and the stronger rate optimal result from <xref ref-type="bibr" rid="bib1.bibx9" id="text.62"/> has not been shown for such algorithms. However, in practice, we see that adaptive methods still work well in our applications. Exploring the theoretical implications of this is interesting and merits further discussion but is beyond the scope of this paper.</p>
      <p id="d1e2635">The parameters of the algorithm are fixed as given in <xref ref-type="bibr" rid="bib1.bibx9" id="text.63"/>, for all the examples discussed here: (i) initial error threshold <inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">γ</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>; (ii) error threshold
decay rate <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">γ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>; (iii) maximum poisedness constant <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="normal">Λ</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>; (iv) tail-correction parameter <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> (no tail correction); (v) local polynomial degree <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. The number of nearest neighbors <inline-formula><mml:math id="M110" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> used to construct each local polynomial surrogate is chosen to be <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi>K</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> where <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mi>q</mml:mi></mml:msqrt><mml:mi>D</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M113" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula> is the dimension of the parameters <inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>q</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M115" display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula> is the number of coefficients in the local polynomial approximation of total degree <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, i.e., <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>q</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi>q</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. If we had <inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:math></inline-formula>, the approximation would be an interpolant. Instead, we oversample by a factor <inline-formula><mml:math id="M119" display="inline"><mml:msqrt><mml:mi>q</mml:mi></mml:msqrt></mml:math></inline-formula>, as suggested in <xref ref-type="bibr" rid="bib1.bibx7" id="text.64"/>, and allow <inline-formula><mml:math id="M120" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> to grow slowly with the size <inline-formula><mml:math id="M121" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> of the evaluated set as in <xref ref-type="bibr" rid="bib1.bibx10" id="text.65"/>.
All these details together with example runs can be found in the MATLAB  implementation available in the Supplement.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Numerical experiments</title>
      <p id="d1e2901">This section contains numerical experiments to illustrate the methods introduced in the previous sections. As a large-scale example, we characterize the posterior distribution of parameters in the two-layer quasi-geostrophic (QG) model. The computations needed to characterize the posterior distribution with standard MCMC methods in this example would be prohibitive without massive computational resources and are therefore omitted. In contrast, we will show that the LA-MCMC method is able to simulate from the parameter posterior distribution.</p>
      <p id="d1e2904">Before presenting this example, we first demonstrate that the posteriors produced by LA-MCMC agree with those obtained via exact MCMC sampling methods in cases where the latter are computationally tractable using two examples: the classical Lorenz 63 system and the higher-dimensional Kuramoto–Sivashinsky (KS) model. In both examples, we quantify the computational savings due to LA-MCMC, and in the second we introduce additional ways to speed up computation using parallel (GPU) integration.</p>
      <p id="d1e2907">Let <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denote the time difference between consecutive observations; one epoch thus contains the times in the interval <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mi>N</mml:mi><mml:msub><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mi>N</mml:mi><mml:msub><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The number of data points in one epoch <inline-formula><mml:math id="M124" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> varies between 1000 and 2000, depending on the example. The  training set  <inline-formula><mml:math id="M125" display="inline"><mml:mi mathvariant="bold">S</mml:mi></mml:math></inline-formula> consists of a collection of  <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> such intervals.
In all the examples, we  choose <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>  to be relatively large, beyond the  predictable window. This is more for demonstration purposes than a necessity; the background theory from <inline-formula><mml:math id="M128" display="inline"><mml:mi>U</mml:mi></mml:math></inline-formula> statistics allows  the subsequent state vectors to be weakly dependent.  Numerically, a set of observations that is chosen too densely results in the <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> test failing, and for this reason we  recommended to always check for normality before starting the parameter estimation.</p>
      <p id="d1e3015">For numerical tests, one can either use one long time series or integrate a shorter time interval several times using different initial values to create the training set for the likelihood.
For these experiments, the latter method was used with <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">64</mml:mn></mml:mrow></mml:math></inline-formula>, yielding <inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:mstyle scriptlevel="+1"><mml:mtable class="substack"><mml:mtr><mml:mtd><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn mathvariant="normal">2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mstyle><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2016</mml:mn></mml:mrow></mml:math></inline-formula> different pairs <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, each of which resulted in an ECDF constructed from <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> pairwise distances.
According to tests performed while calibrating the algorithm, these values of <inline-formula><mml:math id="M134" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are sufficient to obtain robust posterior estimates. With less data, the parameter posteriors will be less precise.</p>
      <?pagebreak page4325?><p id="d1e3109">The range of the bin radii <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.33em" linebreak="nobreak"/><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> is selected by examining the distances within the training set, keeping in mind that a positive variance is needed for every bin to avoid a singular covariance matrix. So the largest radius <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> can be obtained from
          <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M138" display="block"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="false">min⁡</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>≠</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:munder><mml:mfenced close="}" open="{"><mml:mrow><mml:munder><mml:mo movablelimits="false">max⁡</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mfenced open="∥" close="∥"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>j</mml:mi><mml:mi>l</mml:mi></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>
        over the disjoint  subsets of the samples  <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and  <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> of length <inline-formula><mml:math id="M141" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>.  The smallest radius is  selected  by requiring that for all the possible pairs  <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, it holds that <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="script">B</mml:mi><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>)</mml:mo><mml:mo>∩</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>≠</mml:mo><mml:mi mathvariant="normal">∅</mml:mi></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="script">B</mml:mi><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the ball of radius <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> centered at <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>. That is,
          <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M147" display="block"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="false">max⁡</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>≠</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:munder><mml:mfenced close="}" open="{"><mml:mrow><mml:munder><mml:mo movablelimits="false">min⁡</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mfenced open="∥" close="∥"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>j</mml:mi><mml:mi>l</mml:mi></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
        The base value <inline-formula><mml:math id="M148" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> is  obtained by <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, and using this value we fix all the other radii <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e3443">As always with histograms, the number of bins <inline-formula><mml:math id="M151" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> must be selected first. Too small an <inline-formula><mml:math id="M152" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> loses information, while too large values yield noisy histograms, and this noisiness can be seen also in the ECDFs. However, numerical experiments show that the final results – the parameter posteriors – are not too sensitive to the specific value of <inline-formula><mml:math id="M153" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>. For instance, for the  Lorenz 63 case below, the range of <inline-formula><mml:math id="M154" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> was varied between 5 and 40, and only a minor decrease of the size of the parameter posteriors was noticed for increasing <inline-formula><mml:math id="M155" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>. Any slight increase of accuracy comes with a computational  cost: higher values of <inline-formula><mml:math id="M156" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> increase the stochasticity of the likelihood evaluations, which leads to smaller acceptance rates in the MCMC sampling, e.g., from 0.36 to 0.17 to 0.03 for <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula>, respectively, when using the standard adaptive Metropolis sampler.
In the examples presented in this section, the length <inline-formula><mml:math id="M158" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> of the feature vector is fixed to 14 for the Lorenz 63 model and 32 for the higher-dimensional KS and QG models.</p>
      <p id="d1e3516">To balance the possibly different magnitudes of the components of the state vector, each component is scaled and shifted to the range <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> before computing the distances. While this scaling could also be performed in other ways, this method worked well in practice for the models considered. The normality of the ensemble of feature vectors is ascertained by comparing the histograms of the quadratic forms in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) visually to the appropriate <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> distribution.</p>
      <p id="d1e3550">In all the three experiments, we create MCMC chains of length <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>. However, due to the use of the LA-MCMC approach, the number of full forward model evaluations is much lower, around 1000 or less; we will report these values more specifically below.</p>
      <p id="d1e3564">The Lorenz 63 model was integrated with a standard Runge–Kutta solver.
The numerical solution of the KS-model is based on our in-house fast Fourier transform (FFT)-based solver, which runs on the GPU side and is built around Nvidia compute unified device architecture (CUDA) toolchain and cuFFT library (which is a part of the CUDA ecosystem). The quasi-geostrophic model employs semi-Lagrangian solver and runs entirely on CPU, but the code has been significantly optimized with performance-critical parts, such as advection operator, compiled using an Intel single program compiler (ISPC)  with support of Advanced Vector Extensions 2 (AVX2) vectorization.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Lorenz 63</title>
      <p id="d1e3574">We use the classical three-dimensional Lorenz 63 system <xref ref-type="bibr" rid="bib1.bibx35" id="paren.66"/> as a simple first example to demonstrate how  LA-MCMC can be successfully paired with the CIL and the adaptive Metropolis (AM) algorithm <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx20" id="paren.67"/> to obtain the posterior distribution for chaotic systems at a greatly reduced computational cost, compared to AM  without the local approximation.
The time evolution of the state vector  <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>,</mml:mo><mml:mi>Z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is given by
            <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M163" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:mi>Y</mml:mi><mml:mo>-</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mi>Y</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>X</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">ρ</mml:mi><mml:mo>-</mml:mo><mml:mi>Z</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi>Y</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mi>Z</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>X</mml:mi><mml:mi>Y</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mi>Z</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
          This system of equations is often said to describe an extreme simplification of a  weather model.</p>
      <p id="d1e3693">The reference data were generated with parameter values <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:mi mathvariant="italic">ρ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">28</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">8</mml:mn><mml:mn mathvariant="normal">3</mml:mn></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> by performing <inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">64</mml:mn></mml:mrow></mml:math></inline-formula> distinct model simulations, with observations made at 2000 evenly distributed times between <inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">10</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">20</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">000</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. These observations were
perturbed with 5 % multiplicative Gaussian noise.
The length of the predictable time window is roughly 7, which is less than the time between consecutive observations.
The parameters of the CIL method were obtained as described at the start of Sect. <xref ref-type="sec" rid="Ch1.S4"/>, with values <inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">14</mml:mn></mml:mrow></mml:math></inline-formula>,  <inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2.85</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.51</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e3812">The set of vectors <inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>|</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi><mml:mo>≤</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mtext>epo</mml:mtext></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> is shown in Fig. <xref ref-type="fig" rid="Ch1.F1"/> in the log–log scale. The figure shows how the variability of these vectors is quite small.
Figure <xref ref-type="fig" rid="Ch1.F1"/> validates the normality assumption for feature vectors.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e3857">Left: for all combinations of <inline-formula><mml:math id="M173" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M174" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>, the feature vectors  for  Lorenz 63 and Kuramoto–Sivashinsky, along with concatenated feature vectors for the quasi-geostrophic system are shown. Right: normality check; the <inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> density function versus the histograms of the respective negative log-likelihood (NLL) values; see Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>).  </p></caption>
          <?xmltex \igopts{width=483.69685pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f01.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e3895">Two-dimensional posterior marginal distributions of the parameters of the Lorenz 63 model obtained with LA-MCMC and AM.</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f02.png"/>

        </fig>

      <p id="d1e3904">Pairwise two-dimensional marginals of the parameter posterior are shown in Fig. <xref ref-type="fig" rid="Ch1.F2"/>, both from sampling the posterior with  full forward model simulations (AM)  and with using the surrogate sampling approach for generating the chain.
These two posteriors are almost perfectly superimposed.  Indeed, the difference is at the same level as that between repetitions of the standard AM sampling alone.</p>
      <p id="d1e3909">To get an idea of the  computational savings achieved with LA-MCMC, the computation of the MCMC chains of length <inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> was repeated 10 times. The cumulative number of full likelihood evaluations is presented in Fig. <xref ref-type="fig" rid="Ch1.F3"/>. At the end of the chains, the number of full likelihood evaluations varied between 955 and 1016.  Thus, by using LA-MCMC in this setting, remarkable computational savings of up to 2 orders of magnitude are achieved.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e3927">Comparison of the cumulative number of full likelihood evaluations while using AM (black line) and LA-MCMC (colored lines). Every colored line correspond to a different chain obtained with LA-MCMC by using the same likelihood.</p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f03.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>The Kuramoto–Sivashinsky model</title>
      <p id="d1e3945">The second example is the 256-dimensional  Kuramoto–Sivashinsky (KS) partial differential equation (PDE) system.
The purpose of this example is to introduce ways to improve the  computational efficiency  by a piecewise parallel integration over the time interval of given data. Also,
we demonstrate how decreasing the number of observed components<?pagebreak page4326?> impacts the accuracy of parameter estimation.
Even though the posterior evaluation proves to be  relatively expensive,  direct verification of  the results with those obtained  by using standard adaptive MCMC is still possible.
The Kuramoto–Sivashinsky model is given by the fourth-order PDE:
            <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M177" display="block"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">η</mml:mi></mml:mfrac></mml:mstyle><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi mathvariant="italic">γ</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mi>x</mml:mi><mml:mi>x</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a real function of <inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>∈</mml:mo><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>∈</mml:mo><mml:msub><mml:mi mathvariant="double-struck">R</mml:mi><mml:mo>+</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula>. In addition, it is assumed that  <inline-formula><mml:math id="M181" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula> is spatially periodical with period of <inline-formula><mml:math id="M182" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula>, i.e., <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:mi>s</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. This experiment uses the parametrization from <xref ref-type="bibr" rid="bib1.bibx54" id="paren.68"/> that maps the spatial domain <inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">π</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="italic">π</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> by setting  <inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mstyle><mml:mi>x</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>t</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula>. With <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>, the true value of parameter  <inline-formula><mml:math id="M189" display="inline"><mml:mi mathvariant="italic">γ</mml:mi></mml:math></inline-formula> is <inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="italic">π</mml:mi><mml:mo>/</mml:mo><mml:mn mathvariant="normal">50</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>≈</mml:mo><mml:mn mathvariant="normal">0.0039</mml:mn></mml:mrow></mml:math></inline-formula>,  and the true value of <inline-formula><mml:math id="M191" display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula> becomes <inline-formula><mml:math id="M192" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle></mml:math></inline-formula>. These two parameters are the ones that are then estimated with the LA-MCMC method.
This system was derived by <xref ref-type="bibr" rid="bib1.bibx31" id="text.69"/> and <xref ref-type="bibr" rid="bib1.bibx30" id="text.70"/> as a model for phase turbulence in reaction–diffusion systems.  <xref ref-type="bibr" rid="bib1.bibx48" id="text.71"/> used the same system for describing instabilities of laminar flames.</p>
      <?pagebreak page4327?><p id="d1e4276">Assume that the solution for this problem can be represented by a truncation of the Fourier series
            <disp-formula id="Ch1.E10" content-type="numbered"><label>10</label><mml:math id="M193" display="block"><mml:mrow><?xmltex \hack{\hbox\bgroup\fontsize{9.2}{9.2}\selectfont$\displaystyle}?><mml:mi>s</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow><mml:mi mathvariant="normal">∞</mml:mi></mml:munderover><mml:mfenced open="[" close="]"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mi>sin⁡</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mstyle><mml:mi>j</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mi>cos⁡</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:mfrac></mml:mstyle><mml:mi>j</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>.</mml:mo><?xmltex \hack{$\egroup}?></mml:mrow></mml:math></disp-formula>
          Using this form reduces Eq. (<xref ref-type="disp-formula" rid="Ch1.E9"/>)  to a system of ordinary differential equations for the unknown coefficients <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>,

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M196" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E11"><mml:mtd><mml:mtext>11</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mover accent="true"><mml:mi>A</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msup><mml:mi>j</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msup><mml:mi>j</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msup><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>A</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E12"><mml:mtd><mml:mtext>12</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mover accent="true"><mml:mi>B</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msup><mml:mi>j</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msup><mml:mi>j</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msup><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>B</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            where the terms <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are polynomials of the vectors <inline-formula><mml:math id="M199" display="inline"><mml:mi mathvariant="bold-italic">A</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M200" display="inline"><mml:mi mathvariant="bold-italic">B</mml:mi></mml:math></inline-formula>. For details, see <xref ref-type="bibr" rid="bib1.bibx25" id="text.72"/>. The solution can be effectively computed on graphics processors in parallel, and if computational resources allow, several instances of Eq. (<xref ref-type="disp-formula" rid="Ch1.E9"/>) can be solved in parallel. Even on fast consumer-level laptops, several thousand simulations can be performed in parallel when the discretization of the <inline-formula><mml:math id="M201" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> dimension contains around 500 points.</p>
      <p id="d1e4658">A total of 64 epochs of the 256-dimensional KS model are integrated over the time interval <inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">150</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">000</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, and as in the
case of the Lorenz 63 model, the initial predictable time window is discarded and  <inline-formula><mml:math id="M203" display="inline"><mml:mn mathvariant="normal">1024</mml:mn></mml:math></inline-formula> equidistant measurements from <inline-formula><mml:math id="M204" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">500</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">150</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> are
selected, with <inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>≈</mml:mo><mml:mn mathvariant="normal">146</mml:mn></mml:mrow></mml:math></inline-formula>.
The parameters used for the CIL method were <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1801.7</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">32</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.025</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e4761">The time needed to integrate the model up to <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">150</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> is approximately 103 s with the Nvidia 1070 GPU,
implying that generating an MCMC chain with 100 000 samples with standard MCMC algorithms would take almost 4 months.
The use of LA-MCMC alone again shortens the time needed by a factor of 100 to around 28 h.
However, the calculations can yet be considerably enhanced by parallel computing.
In practice, this translates the problem of generating a candidate trajectory of length 150 000 into generating observations from
several shorter time intervals. In our example, an efficient division is to perform
128 parallel calculations each of length 4500, with randomized initial values
close to the  values selected from the training set. Discarding the predictable interval <inline-formula><mml:math id="M210" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">500</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>  and taking eight observations at intervals of 500 yields the same number (1024) of observations as in the initial setting.
While the total integration time increases, this reduces the wall-clock time needed for computation of a single candidate simulation from 103 to 2.5 s. The full MCMC chain can be then be generated in 70 h without the surrogate model and in 42 min using LA-MCMC.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e4799">Parameter values of the four parameter vectors used in the  forward KS model simulation examples in Fig. <xref ref-type="fig" rid="Ch1.F5"/>. The parameter vectors in the first column labeled 1 are the true parameters, and the second one resides inside the posterior. The last two are outside the posterior. These parameters correspond to points shown in the posterior distribution shown in  Fig. <xref ref-type="fig" rid="Ch1.F4"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:colspec colnum="4" colname="col4" align="center"/>
     <oasis:colspec colnum="5" colname="col5" align="center"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Case 1</oasis:entry>
         <oasis:entry colname="col3">Case 2</oasis:entry>
         <oasis:entry colname="col4">Case 3</oasis:entry>
         <oasis:entry colname="col5">Case 4</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M211" display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.50000</oasis:entry>
         <oasis:entry colname="col3">0.47820</oasis:entry>
         <oasis:entry colname="col4">0.49500</oasis:entry>
         <oasis:entry colname="col5">0.52000</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M212" display="inline"><mml:mi mathvariant="italic">γ</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.00395</oasis:entry>
         <oasis:entry colname="col3">0.00467</oasis:entry>
         <oasis:entry colname="col4">0.00350</oasis:entry>
         <oasis:entry colname="col5">0.00500</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e4891">Parameter posterior distributions from the KS system, produced with MCMC both with and without the local approximation surrogate, are shown in Fig. <xref ref-type="fig" rid="Ch1.F4"/>.
Repeating the<?pagebreak page4328?> calculations several times yielded no meaningful differences in the results.
In this experiment, the number of forward model evaluations LA-MCMC needed for generating a chain of length 100 000  was in the range <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">1131</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1221</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e4912">Model trajectories from simulations with four different parameter vectors are shown in Fig. <xref ref-type="fig" rid="Ch1.F5"/>. These parameter values were (1) the “true” value which was used to generate training data, (2) another parameter from inside the posterior distribution, and (3–4) two other parameters from outside the posterior distribution. These parameters are also shown in Fig. <xref ref-type="fig" rid="Ch1.F4"/>.
Visually inspecting the outputs, cases 1–3 look similar, while results using parameter vector 4, furthest away from the  posterior, are markedly different. Even though the third parameter vector is outside the posterior, the resulting trajectory is not easily distinguishable from cases 1 and 2, indicating that the CIL method differentiates between the trajectories more efficiently.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e4921">Posterior distribution of the parameters of the KS system. The parameter values are shown in Table <xref ref-type="table" rid="Ch1.T1"/>, while examples of the respective integrated trajectories are given in Fig. <xref ref-type="fig" rid="Ch1.F5"/>.</p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f04.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e4936">Example model trajectories from the KS system. Panel (1) shows simulation using the true parameters, the parameters used for (4) are inside the posterior distribution, and (2) and (3) are generated from simulations with parameters outside the posterior distribution, shown in Fig. <xref ref-type="fig" rid="Ch1.F4"/>. The values of the parameter vectors 1, 2, 3, and 4 are  given in Table <xref ref-type="table" rid="Ch1.T1"/>. The <inline-formula><mml:math id="M214" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis shows the 256-dimensional state vector, and the <inline-formula><mml:math id="M215" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis the time evolution of the system.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f05.png"/>

        </fig>

      <p id="d1e4964">Additional experiments were performed  to evaluate the stability of the method when not all of the model states were observed. Keeping the setup otherwise fixed, the number of elements of the state vectors observed was reduced from the full 256 step by step to 128, 64, and 32.
The resulting MCMC chains are presented in Fig. <xref ref-type="fig" rid="Ch1.F6"/>, and as expected, when less is observed, the size of the posterior distribution grows.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e4971">Comparison between the KS system's posterior distribution in cases where all or only a part of the states are observed. </p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f06.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>The quasi-geostrophic model</title>
      <p id="d1e4988">The methodology is here applied  to a computationally intensive model, where a brute-force parameter posterior estimation would be too time consuming. We employ the well-known quasi-geostrophic model
<xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx42" id="paren.73"/> using a  dense grid to achieve complex chaotic dynamics in high dimensions. The wall-clock time for one long-time forward model simulation is roughly 10 min, so
a naïve  calculation of a posterior sample of size 100 000 would take around 2 years. We demonstrate how the application of the methods verified in the two previous examples reduces this time to a few hours.</p>
      <p id="d1e4994">The QG model approximates the behavior on a latitudinal “stripe” at two given atmospheric heights, projected onto a two-layered cylinder.
The model geometry implies periodic boundary conditions, seamlessly stitching together the extreme eastern and western parts of the rectangular spatial domain with coordinates <inline-formula><mml:math id="M216" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M217" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>.
For the northern and southern edges, user-specified time-independent Dirichlet boundary conditions are used. In addition to these conditions and the topographic constraints, the model parameters include the mean  thicknesses of  the two interacting atmospheric layers, denoted by <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. The QG model also accounts for the Coriolis force.
An example of the two-layer geometry is  presented in Fig. <xref ref-type="fig" rid="Ch1.F7"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7"><?xmltex \currentcnt{7}?><?xmltex \def\figurename{Figure}?><label>Figure 7</label><caption><p id="d1e5037">An example of the layer structure of the two-layer quasi-geostrophic model. The terms <inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> denote mean zonal flows, respectively, in the top and the bottom layer.</p></caption>
          <?xmltex \igopts{width=199.169291pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f07.png"/>

        </fig>

      <p id="d1e5069">In a non-dimensional form, the QG system can be written as

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M222" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E13"><mml:mtd><mml:mtext>13</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>q</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mi>y</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E14"><mml:mtd><mml:mtext>14</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>q</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            where   <inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are potential vorticities, and <inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are stream functions with indexes <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> for the upper and the lower layers, respectively. Both the <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are functions of time <inline-formula><mml:math id="M228" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> and spatial coordinates <inline-formula><mml:math id="M229" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M230" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>.
The coefficients <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mn mathvariant="normal">0</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:msup><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mi>g</mml:mi><mml:mo mathvariant="normal">´</mml:mo></mml:mover><mml:msub><mml:mi>H</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> control how much the model layers interact, and    <inline-formula><mml:math id="M232" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mi>L</mml:mi><mml:mo>/</mml:mo><mml:mi>U</mml:mi></mml:mrow></mml:math></inline-formula>  gives a nondimensional version of <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, the northward gradient of the Coriolis force that gives rise to faster cyclonic flows closer to the poles. The Coriolis parameter is given by <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi>W</mml:mi><mml:mi>sin⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">ℓ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M235" display="inline"><mml:mi>W</mml:mi></mml:math></inline-formula> is the angular speed of Earth and <inline-formula><mml:math id="M236" display="inline"><mml:mi mathvariant="normal">ℓ</mml:mi></mml:math></inline-formula> is the latitude of interest. <inline-formula><mml:math id="M237" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M238" display="inline"><mml:mi>U</mml:mi></mml:math></inline-formula> give the length and speed scales, respectively, and <inline-formula><mml:math id="M239" display="inline"><mml:mover accent="true"><mml:mi>g</mml:mi><mml:mo mathvariant="normal">´</mml:mo></mml:mover></mml:math></inline-formula> is a gravity constant.
Finally, <inline-formula><mml:math id="M240" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>S</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> defines the   topography for the lower layer, where <inline-formula><mml:math id="M241" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mi>U</mml:mi><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mi>L</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> is the Rossby number of the system.
For further details, see <xref ref-type="bibr" rid="bib1.bibx14" id="text.74"/> and <xref ref-type="bibr" rid="bib1.bibx42" id="text.75"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8" specific-use="star"><?xmltex \currentcnt{8}?><?xmltex \def\figurename{Figure}?><label>Figure 8</label><caption><p id="d1e5491">An example of the  6050-dimensional state of the quasi-geostrophic model.
The contour lines for both the stream function and potential vorticity are shown for both layers.
Note the cylindrical boundary conditions.</p></caption>
          <?xmltex \igopts{width=483.69685pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f08.png"/>

        </fig>

      <?pagebreak page4329?><p id="d1e5500">It is assumed that the motion determined by the model is geostrophic, essentially  meaning that potential vorticity of  the flow is preserved on both layers:
            <disp-formula id="Ch1.E15" content-type="numbered"><label>15</label><mml:math id="M242" display="block"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          Here, <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are velocity fields, which are functions of both space and time. They
are obtained from the stream functions  <inline-formula><mml:math id="M245" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> via
            <disp-formula id="Ch1.E16" content-type="numbered"><label>16</label><mml:math id="M246" display="block"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo><mml:mspace width="2em" linebreak="nobreak"/><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          Equations (<xref ref-type="disp-formula" rid="Ch1.E13"/>)–(<xref ref-type="disp-formula" rid="Ch1.E16"/>) define the spatiotemporal evolution of the quantities <inline-formula><mml:math id="M247" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M248" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e5703">The numerical integration of this system is carried out using the semi-Lagrangian scheme,
where the potential vorticities <inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are computed according to Eq. (<xref ref-type="disp-formula" rid="Ch1.E15"/>) for given velocities <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M251" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. With these  <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the stream functions can then be obtained from Eqs. (<xref ref-type="disp-formula" rid="Ch1.E13"/>) and (<xref ref-type="disp-formula" rid="Ch1.E14"/>) with a two-stage finite difference scheme.</p>
      <p id="d1e5757">Finally, the velocity field is updated by Eq. (<xref ref-type="disp-formula" rid="Ch1.E16"/>) for the next iteration round.</p>
      <?pagebreak page4330?><p id="d1e5763">For estimating model parameters from synthetic data, a reference data set is created with 64 epochs each containing <inline-formula><mml:math id="M253" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula> observations. These data are sampled from the model trajectory with <inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> (where a time step of length 1 corresponds to 6 h) in the time interval <inline-formula><mml:math id="M255" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">192</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">8192</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, that amounts to a long-range integration of  roughly 5–6 years of a climate model.
The spatial domain is discretized into a <inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:mn mathvariant="normal">55</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">55</mml:mn></mml:mrow></mml:math></inline-formula> grid, which results in consistent chaotic behavior and   more complex dynamics  than with the often-used <inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:mn mathvariant="normal">20</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> grid. This is reflected in  higher variability in the feature vectors, as seen in the Fig. <xref ref-type="fig" rid="Ch1.F1"/>. A snapshot  of the 6050-dimensional trajectory of the QG system is displayed in Fig. <xref ref-type="fig" rid="Ch1.F8"/>.</p>
      <p id="d1e5838">The model state is characterized by two distinct fields, the vorticities and stream functions, that naturally are dependent on each other. As shown in <xref ref-type="bibr" rid="bib1.bibx21" id="paren.76"/>, it is useful to construct separate feature vectors to characterize the dynamics in such situations.
For this reason, two  separate feature vectors are constructed – one for the potential vorticity on both layers and  the other for the stream function.</p>
      <p id="d1e5844">The Gaussian likelihood of the state is created by stacking these two feature vectors one after another.</p>
      <p id="d1e5847">The normality of the resulting <inline-formula><mml:math id="M258" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>-dimensional  vector may again be  verified as shown in  Fig. <xref ref-type="fig" rid="Ch1.F1"/>. The number of bins was set to 32, leading to
parameter values  <inline-formula><mml:math id="M259" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">55</mml:mn></mml:mrow></mml:math></inline-formula> and  <inline-formula><mml:math id="M260" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.075</mml:mn></mml:mrow></mml:math></inline-formula> for potential vorticity, and  <inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">31</mml:mn></mml:mrow></mml:math></inline-formula>, and  <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.046</mml:mn></mml:mrow></mml:math></inline-formula> for the stream function.</p>
      <p id="d1e5925">For parameter estimation, inferring the layer heights from synthetic data is considered. The reference data  set with <inline-formula><mml:math id="M263" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">epo</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">64</mml:mn></mml:mrow></mml:math></inline-formula> integrations is produced using the values <inline-formula><mml:math id="M264" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5500</mml:mn></mml:mrow></mml:math></inline-formula> and  <inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">4500</mml:mn></mml:mrow></mml:math></inline-formula>. A single forward model evaluation takes 10 min on a fast laptop, and therefore generating MCMC chains
of length <inline-formula><mml:math id="M266" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> with brute force would take around 2 years to run.  As previously, using LA-MCMC again reduces the computation time by a factor of 100.</p>
      <p id="d1e5984">In the experiments performed, the number of forward model evaluations  needed  was ranging in the interval <inline-formula><mml:math id="M267" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">682</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">762</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, which translates to around 1 week of computing time. As verified with Kuramoto–Sivashinsky example, the forward model integration can be split to segments  computed in parallel, which reduced time required to generate data for computing the likelihood further with a factor around 50, corresponding to around 3 h for generating the MCMC chain. The pairwise distances for generating the feature vectors were computed on a GPU,  and therefore the required computation time for doing this was negligible compared to the
model integration time. The posterior distribution of the two parameters is presented in Fig. <xref ref-type="fig" rid="Ch1.F9"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9"><?xmltex \currentcnt{9}?><?xmltex \def\figurename{Figure}?><label>Figure 9</label><caption><p id="d1e6008">The clearly non-Gaussian posterior distribution of the <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> parameters of the  quasi-geostrophic system shows how these parameters anticorrelate with each other.</p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/4319/2021/gmd-14-4319-2021-f09.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions and future work</title>
      <p id="d1e6049">Bayesian parameter estimation with computationally demanding computer models is highly non-trivial. The associated computational challenges often become insurmountable when the model dynamics are chaotic. In this work, we showed it is possible to overcome these challenges by combining the  CIL with an MCMC method based on local surrogates of the log-likelihood function (LA-MCMC). The CIL captures changes in the geometry of the  underlying attractor  of the chaotic system, while local approximation MCMC makes generating long MCMC chains based on this likelihood tractable, with computational savings of roughly 2 orders of magnitude, as shown in Table <xref ref-type="table" rid="Ch1.T2"/>. Our methods were verified by sampling the parameter posteriors of the Lorenz 63 and the Kuramoto–Sivashinsky models, where an (expensive) comparison to exact MCMC with<?pagebreak page4331?> the CIL was still feasible. Then we applied our approach to the quasi-geostrophic model with  a deliberately extended grid size. Without CIL, parameter estimation would not have been possible with chaotic models such as these; without LA-MCMC, the generation of long MCMC and sufficiently accurate chains for the higher-resolution QG model parameters would have been computationally intractable. We note that the computational demands of the QG model already get quite close to those of weather models at coarse resolutions. We believe that the approach developed here can provide ways to solve problems such as the climate model closure parameter estimation investigated in <xref ref-type="bibr" rid="bib1.bibx26" id="text.77"/> or long-time assimilation problems with uncertain  model parameters,  discussed in <xref ref-type="bibr" rid="bib1.bibx46" id="text.78"/> as unsolved and intractable.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e6063">Summary of results. This table shows the speed-up due to the  CIL/LA-MCMC combination. Since running the quasi-geostrophic model 100 000 times was not possible, the nominal length of the MCMC chain and the speed-up due to LA-MCMC are  reported in parentheses in the last column. The numbers of forward model evaluations with LA-MCMC (second row) are rough averages over several MCMC simulations. </p></caption><oasis:table frame="topbot"><?xmltex \begin{scaleboxenv}{.95}[.95]?><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">L63</oasis:entry>
         <oasis:entry colname="col3">KS</oasis:entry>
         <oasis:entry colname="col4">QG</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Model evaluations, AM</oasis:entry>
         <oasis:entry colname="col2">100 000</oasis:entry>
         <oasis:entry colname="col3">100 000</oasis:entry>
         <oasis:entry colname="col4">(100 000)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Model evaluations, LA-MCMC</oasis:entry>
         <oasis:entry colname="col2">1000</oasis:entry>
         <oasis:entry colname="col3">1000</oasis:entry>
         <oasis:entry colname="col4">700</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Speed-up factor</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3">100</oasis:entry>
         <oasis:entry colname="col4">(143)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup><?xmltex \end{scaleboxenv}?></oasis:table></table-wrap>

      <p id="d1e6145">There are many potential directions for extension of this work. First, it should be feasible to run parallel LA-MCMC chains that share model evaluations in a single evaluated set; doing so can accelerate the construction of accurate local surrogate models, as demonstrated in <xref ref-type="bibr" rid="bib1.bibx8" id="text.79"/>, and is a useful way of harnessing parallel computational resources within surrogate-based MCMC. Extending this approach to higher-dimensional parameters is also of interest. While LA-MCMC has been successfully applied to chains of dimension up to <inline-formula><mml:math id="M270" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">12</mml:mn></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx8" id="paren.80"/>, future work should explore sparsity and other truncations of the local polynomial approximation to improve scaling with dimension. From the CIL perspective, calibrating more complex models, such as weather models, often requires choosing the part of the state vector from which the feature vectors are computed. While computing the likelihood from the full high-dimensional state is computationally feasible, <xref ref-type="bibr" rid="bib1.bibx21" id="text.81"/> showed that carefully choosing a subset of the state for the feature vectors performs better. Also, the epochs may need to be chosen sufficiently long to include potential rare events, so that changes in rare event patterns can be identified. This, naturally, will increase the computational cost if one wants to be confident in the inclusion of such events.</p>
      <p id="d1e6170">While answering these questions will require further work, we believe the research presented in this paper provides a promising and reasonable step towards estimating parameters in the context of expensive operational models.</p>
</sec>

      
      </body>
    <back><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d1e6177">The MATLAB code that documents the CIL and  LA-MCMC approaches is available in the Supplement. Forward model code for performing model simulations (Lorenz, Kuramoto–Sivashinsky, and quasi-geostrophic model) is also available in the Supplement.</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d1e6183">The data were created using the code provided in the Supplement.</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d1e6186">The supplement related to this article is available online at: <inline-supplementary-material xlink:href="https://doi.org/10.5194/gmd-14-4319-2021-supplement" xlink:title="zip">https://doi.org/10.5194/gmd-14-4319-2021-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e6195">HH and YM designed the study with input from all authors.
SS, HH, AD, and JS combined the CIL and LA-MCMC methods for carrying out the research.
AB wrote and provided implementations of the KS and QG models for GPUs, including custom numerics and testing.
SS wrote the CIL code and the version of LA-MCMC used (based on earlier work by Antti Solonen), and carried out the simulations.
All authors discussed the results and shared the responsibility of writing the manuscript.
SS prepared the figures.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e6201">The authors declare that they have no conflict of interest.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e6207">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><ack><title>Acknowledgements</title><?pagebreak page4332?><p id="d1e6213">This work was supported by the  Centre of Excellence of Inverse Modelling and Imaging (CoE), Academy of Finland, decision no. 312 122. Sebastian Springer was supported by the Academy of Finland, project no. 334 817. Youssef Marzouk and Andrew Davis were supported by the US Department of Energy, Office of Advanced Scientific Computing Research (ASCR), SciDAC program, as part of the FASTMath Institute.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e6218">This research has been supported by the Academy of Finland (grant no. 312122).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e6224">This paper was edited by Rohitash Chandra and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><?xmltex \def\ref@label{{Andrieu and Roberts(2009)}}?><label>Andrieu and Roberts(2009)</label><?label Andrieu?><mixed-citation>Andrieu, C. and Roberts, G. O.: The pseudo-marginal approach for efficient
Monte Carlo computations, Ann. Statist., 37, 697–725,
<ext-link xlink:href="https://doi.org/10.1214/07-AOS574" ext-link-type="DOI">10.1214/07-AOS574</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx2"><?xmltex \def\ref@label{{Asch et~al.(2016)}}?><label>Asch et al.(2016)</label><?label Bocquet16?><mixed-citation>
Asch, M., Bocquet, M., and Nodet, M.: Data Assimilation: Methods, Algorithms,
and Applications, SIAM, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx3"><?xmltex \def\ref@label{{Beaumont et~al.(2002)}}?><label>Beaumont et al.(2002)</label><?label Beaumont2025?><mixed-citation>
Beaumont, M. A., Zhang, W., and Balding, D. J.: Approximate Bayesian
Computation in Population Genetics, Genetics, 162, 2025–2035, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx4"><?xmltex \def\ref@label{{Borovkova et~al.(2001)}}?><label>Borovkova et al.(2001)</label><?label LimTheor?><mixed-citation>Borovkova, S., Burton, R., and Dehling, H.: Limit theorems for functionals of
mixing processes with applications to <inline-formula><mml:math id="M271" display="inline"><mml:mi>U</mml:mi></mml:math></inline-formula>-statistics and dimension
estimation, T. Am. Math. Soc., 353,
4261–4318, <ext-link xlink:href="https://doi.org/10.1090/S0002-9947-01-02819-7" ext-link-type="DOI">10.1090/S0002-9947-01-02819-7</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx5"><?xmltex \def\ref@label{{Cencini et~al.(2010)}}?><label>Cencini et al.(2010)</label><?label cencini2010chaos?><mixed-citation>Cencini, M., Cecconi, F., and Vulpiani, A.: Chaos: From Simple Models to
Complex Systems, World Scientific: Series on advances in statistical
mechanics, WORLD SCIENTIFIC, <ext-link xlink:href="https://doi.org/10.1142/7351" ext-link-type="DOI">10.1142/7351</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx6"><?xmltex \def\ref@label{{Conn et~al.(2009)}}?><label>Conn et al.(2009)</label><?label Conn09?><mixed-citation>Conn, A. R., Scheinberg, K., and Vicente, L. N.: Introduction to
derivative-free optimization, MPS-SIAM, Society for Industrial and Applied Mathematics, <ext-link xlink:href="https://doi.org/10.1137/1.9780898718768" ext-link-type="DOI">10.1137/1.9780898718768</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx7"><?xmltex \def\ref@label{{Conrad et~al.(2016)}}?><label>Conrad et al.(2016)</label><?label Conrad16?><mixed-citation>Conrad, P. R., Marzouk, Y. M., Pillai, N. S., and Smith, A.: Accelerating
asymptotically exact MCMC for computationally intensive models via local
approximations, J. Am. Stat. Assoc., 111,
1591–1607, <ext-link xlink:href="https://doi.org/10.1080/01621459.2015.1096787" ext-link-type="DOI">10.1080/01621459.2015.1096787</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx8"><?xmltex \def\ref@label{{Conrad et~al.(2018)}}?><label>Conrad et al.(2018)</label><?label Conrad18?><mixed-citation>Conrad, P. R., Davis, A. D., Marzouk, Y. M., Pillai, N. S., and Smith, A.:
Parallel local approximation MCMC for expensive models, SIAM/ASA Journal on
Uncertainty Quantification, 6, 339–373, <ext-link xlink:href="https://doi.org/10.1137/16M1084080" ext-link-type="DOI">10.1137/16M1084080</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx9"><?xmltex \def\ref@label{{Davis et~al.(2020)}}?><label>Davis et al.(2020)</label><?label Davis20?><mixed-citation>Davis, A., Marzouk, Y., Smith, A., and Pillai, N.: Rate-optimal refinement
strategies for local approximation MCMC, arXiv, available at: <uri>https://arxiv.org/abs/2006.00032</uri>, arXiv:2006.00032, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx10"><?xmltex \def\ref@label{{Davis(2018)}}?><label>Davis(2018)</label><?label Davis_thesis_18?><mixed-citation>
Davis, A. D.: Prediction under uncertainty: from models for marine-terminating
glaciers to Bayesian computation, PhD thesis, Massachusetts Institute of
Technology, Massachusetts, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx11"><?xmltex \def\ref@label{{Donsker(1951)}}?><label>Donsker(1951)</label><?label Donsker51?><mixed-citation>
Donsker, M. D.: An invariance principle for certain probability limit theorems,
Mem. Am. Math. Soc., 6, 1–10, 1951.</mixed-citation></ref>
      <ref id="bib1.bibx12"><?xmltex \def\ref@label{{Durbin and Koopman(2012)}}?><label>Durbin and Koopman(2012)</label><?label Koopman?><mixed-citation>
Durbin, J. and Koopman, S. J.: Time Series Analysis by State Space Methods,
Oxford University Press, Oxford, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx13"><?xmltex \def\ref@label{{ECMWF(2013)}}?><label>ECMWF(2013)</label><?label ifs2013?><mixed-citation>ECMWF: IFS Documentation – Cy40r1, Tech. Rep. 349, European Centre for
Medium-Range Weather Forecasts, Reading, England,
available at: <uri>https://www.ecmwf.int/sites/default/files/elibrary/2014/9203-part-iii-dynamics-and-numerical-procedures.pdf</uri> (last access: October 2020),
2013.</mixed-citation></ref>
      <ref id="bib1.bibx14"><?xmltex \def\ref@label{{Fandry and Leslie(1984)}}?><label>Fandry and Leslie(1984)</label><?label Fandry?><mixed-citation>Fandry, C. B. and Leslie, L. M.: A Two-Layer Quasi-Geostrophic Model of Summer
Trough Formation in the Australian Subtropical Easterlies, J.
Atmos. Sci., 41, 807–818,
<ext-link xlink:href="https://doi.org/10.1175/1520-0469(1984)041&lt;0807:ATLQGM&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0469(1984)041&lt;0807:ATLQGM&gt;2.0.CO;2</ext-link>, 1984.</mixed-citation></ref>
      <ref id="bib1.bibx15"><?xmltex \def\ref@label{{Gamerman(1997)}}?><label>Gamerman(1997)</label><?label gamerman1997?><mixed-citation>
Gamerman, D.: Markov Chain Monte Carlo: Stochastic Simulation for
Bayesian Inference, Chapman &amp; Hall/CRC Texts in Statistical Science,
Taylor &amp; Francis, London, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx16"><?xmltex \def\ref@label{{Gelman et~al.(2013)}}?><label>Gelman et al.(2013)</label><?label gelman2013?><mixed-citation>
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D.:
Bayesian Data Analysis, Chapman and Hall/CRC, 3rd edn., 2013.</mixed-citation></ref>
      <ref id="bib1.bibx17"><?xmltex \def\ref@label{{Grassberger and Procaccia(1983{\natexlab{a}})}}?><label>Grassberger and Procaccia(1983a)</label><?label Grassberger83?><mixed-citation>Grassberger, P. and Procaccia, I.: Procaccia, I.: Estimation of the Kolmogorov
entropy from a chaotic signal, Phys. Rev. A, 28, 2591–2593, <ext-link xlink:href="https://doi.org/10.1103/PhysRevA.28.2591" ext-link-type="DOI">10.1103/PhysRevA.28.2591</ext-link>,
1983a.</mixed-citation></ref>
      <ref id="bib1.bibx18"><?xmltex \def\ref@label{{Grassberger and Procaccia(1983{\natexlab{b}})}}?><label>Grassberger and Procaccia(1983b)</label><?label Grassberger83P?><mixed-citation>Grassberger, P. and Procaccia, I.: Measuring the strangeness of strange
attractors, Physica D, 9, 189–208,
<ext-link xlink:href="https://doi.org/10.1016/0167-2789(83)90298-1" ext-link-type="DOI">10.1016/0167-2789(83)90298-1</ext-link>, 1983b.</mixed-citation></ref>
      <ref id="bib1.bibx19"><?xmltex \def\ref@label{{Haario et~al.(2001)}}?><label>Haario et al.(2001)</label><?label Haario01?><mixed-citation>
Haario, H., Saksman, E., and Tamminen, J.: An adaptive Metropolis algorithm,
Bernoulli, 7, 223–242, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx20"><?xmltex \def\ref@label{{Haario et~al.(2006)}}?><label>Haario et al.(2006)</label><?label Haario06?><mixed-citation>Haario, H., Laine, M., Mira, A., and Saksman, E.: DRAM: Efficient adaptive
MCMC, Stat. Comput., 16, 339–354,
<ext-link xlink:href="https://doi.org/10.1007/s11222-006-9438-0" ext-link-type="DOI">10.1007/s11222-006-9438-0</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx21"><?xmltex \def\ref@label{{Haario et~al.(2015)}}?><label>Haario et al.(2015)</label><?label Haario15?><mixed-citation>Haario, H., Kalachev, L., and Hakkarainen, J.: Generalized correlation integral
vectors: A distance concept for chaotic dynamical systems, Chaos, 25, 063102,
<ext-link xlink:href="https://doi.org/10.1063/1.4921939" ext-link-type="DOI">10.1063/1.4921939</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx22"><?xmltex \def\ref@label{{Hakkarainen et~al.(2012)}}?><label>Hakkarainen et al.(2012)</label><?label npg-19-127-2012?><mixed-citation>Hakkarainen, J., Ilin, A., Solonen, A., Laine, M., Haario, H., Tamminen, J., Oja, E., and Järvinen, H.: On closure parameter estimation in chaotic systems, Nonlin. Processes Geophys., 19, 127–143, <ext-link xlink:href="https://doi.org/10.5194/npg-19-127-2012" ext-link-type="DOI">10.5194/npg-19-127-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx23"><?xmltex \def\ref@label{{Hakkarainen et~al.(2013)}}?><label>Hakkarainen et al.(2013)</label><?label Dilemma?><mixed-citation>Hakkarainen, J., Solonen, A., Ilin, A., Susiluoto, J., Laine, M., Haario, H.,
and Järvinen, H.: A dilemma of the uniqueness of weather and climate model
closure parameters, Tellus A, 65,
20147, <ext-link xlink:href="https://doi.org/10.3402/tellusa.v65i0.20147" ext-link-type="DOI">10.3402/tellusa.v65i0.20147</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx24"><?xmltex \def\ref@label{{Houtekamer and Zhang(2016)}}?><label>Houtekamer and Zhang(2016)</label><?label Houtekamer16?><mixed-citation>Houtekamer, P. L. and Zhang, F.: Review of the Ensemble Kalman Filter for
Atmospheric Data Assimilation, Mon. Weather Rev., 144, 4489–4532,
<ext-link xlink:href="https://doi.org/10.1175/MWR-D-15-0440.1" ext-link-type="DOI">10.1175/MWR-D-15-0440.1</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx25"><?xmltex \def\ref@label{{Huttunen et~al.(2018)}}?><label>Huttunen et al.(2018)</label><?label Huttunen18?><mixed-citation>Huttunen, J., Kaipio, J., and Haario, H.: Approximation error approach in
spatiotemporally chaotic models with application to Kuramoto–Sivashinsky
equation, Comput. Stat. Data Anal., 123, 13–31,
<ext-link xlink:href="https://doi.org/10.1016/j.csda.2018.01.015" ext-link-type="DOI">10.1016/j.csda.2018.01.015</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx26"><?xmltex \def\ref@label{{J\"{a}rvinen et~al.(2010)}}?><label>Järvinen et al.(2010)</label><?label ACP10?><mixed-citation>Järvinen, H., Räisänen, P., Laine, M., Tamminen, J., Ilin, A., Oja, E., Solonen, A., and Haario, H.: Estimation of ECHAM5 climate model closure parameters with adaptive MCMC, Atmos. Chem. Phys., 10, 9993–10002, <ext-link xlink:href="https://doi.org/10.5194/acp-10-9993-2010" ext-link-type="DOI">10.5194/acp-10-9993-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx27"><?xmltex \def\ref@label{{Jarvinen et~al.(2011)}}?><label>Jarvinen et al.(2011)</label><?label EPPES1?><mixed-citation>Jarvinen, H., Laine, M., Solonen, A., and Haario, H.: Ensemble prediction and
parameter estimation system: the concept, Q. J. Roy.
Meteor. Soc., 138, 281–288, <ext-link xlink:href="https://doi.org/10.1002/qj.923" ext-link-type="DOI">10.1002/qj.923</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx28"><?xmltex \def\ref@label{{Kennedy and O'Hagan(2001)}}?><label>Kennedy and O'Hagan(2001)</label><?label Kennedy01?><mixed-citation>Kennedy, M. C. and O'Hagan, A.: Bayesian calibration of computer models,
J. R. Stat. Soc. B,
63, 425–464, <ext-link xlink:href="https://doi.org/10.1111/1467-9868.00294" ext-link-type="DOI">10.1111/1467-9868.00294</ext-link>, 2001.</mixed-citation></ref>
      <?pagebreak page4333?><ref id="bib1.bibx29"><?xmltex \def\ref@label{{Kohler(2002)}}?><label>Kohler(2002)</label><?label Kohler2002?><mixed-citation>
Kohler, M.: Universal consistency of local polynomial kernel regression
estimates, Ann. I. Stat. Math., 54, 879–899,
2002.</mixed-citation></ref>
      <ref id="bib1.bibx30"><?xmltex \def\ref@label{Kuramoto(1978)}?><label>Kuramoto(1978)</label><?label Kuramoto78?><mixed-citation>Kuramoto, Y.: Diffusion-Induced Chaos in Reaction Systems, Progress of Theoretical Physics Supplement, 64, 346–367, <ext-link xlink:href="https://doi.org/10.1143/PTPS.64.346" ext-link-type="DOI">10.1143/PTPS.64.346</ext-link>, 1978.</mixed-citation></ref>
      <ref id="bib1.bibx31"><?xmltex \def\ref@label{{Kuramoto and Yamada(1976)}}?><label>Kuramoto and Yamada(1976)</label><?label Kuramoto76?><mixed-citation>Kuramoto, Y. and Yamada, T.: Turbulent State in Chemical Reactions, Prog.
Theor. Phys., 56, 679–681, <ext-link xlink:href="https://doi.org/10.1143/PTP.56.679" ext-link-type="DOI">10.1143/PTP.56.679</ext-link>, 1976.</mixed-citation></ref>
      <ref id="bib1.bibx32"><?xmltex \def\ref@label{{Laine et~al.(2011)}}?><label>Laine et al.(2011)</label><?label EPPES2?><mixed-citation>Laine, M., Solonen, A., Haario, H., and Jarvinen, H.: Ensemble prediction and
parameter estimation system: the method, Q. J. Roy.
Meteor. Soc., 138, 289–297, <ext-link xlink:href="https://doi.org/10.1002/qj.922" ext-link-type="DOI">10.1002/qj.922</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx33"><?xmltex \def\ref@label{{Law et~al.(2015)}}?><label>Law et al.(2015)</label><?label law2015?><mixed-citation>
Law, K., Stuart, A., and Zygalakis, K.: Data Assimilation, Springer
International Publishing, Cham, Switzerland, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx34"><?xmltex \def\ref@label{{Liu and West(2001)}}?><label>Liu and West(2001)</label><?label Liu2001?><mixed-citation>Liu, J. and West, M.: Combined Parameter and State Estimation in
Simulation-Based Filtering, Springer New York, New York, NY, 197–223,
<ext-link xlink:href="https://doi.org/10.1007/978-1-4757-3437-9_10" ext-link-type="DOI">10.1007/978-1-4757-3437-9_10</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx35"><?xmltex \def\ref@label{{Lorenz(1963)}}?><label>Lorenz(1963)</label><?label Lorenz1963?><mixed-citation>Lorenz, E. N.: Deterministic Nonperiodic Flow, J. Atmos.
Sci., 20, 130–141, <ext-link xlink:href="https://doi.org/10.1175/1520-0469(1963)020&lt;0130:DNF&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0469(1963)020&lt;0130:DNF&gt;2.0.CO;2</ext-link>,
1963.</mixed-citation></ref>
      <ref id="bib1.bibx36"><?xmltex \def\ref@label{{Luego et~al.(2020)}}?><label>Luego et al.(2020)</label><?label Luengo_etal?><mixed-citation>Luego, D., Martino, L., Elvira, V., and Särkkä, S.: A survey of Monte Carlo
methods for parameter estimation, EURASIP J. Adv. Sig.
Pr., 2020, 25, <ext-link xlink:href="https://doi.org/10.1186/s13634-020-00675-6" ext-link-type="DOI">10.1186/s13634-020-00675-6</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx37"><?xmltex \def\ref@label{{Morzfeld et~al.(2018)}}?><label>Morzfeld et al.(2018)</label><?label Morzfeld?><mixed-citation>Morzfeld, M., Adams, J., Lunderman, S., and Orozco, R.: Feature-based data assimilation in geophysics, Nonlin. Processes Geophys., 25, 355–374, <ext-link xlink:href="https://doi.org/10.5194/npg-25-355-2018" ext-link-type="DOI">10.5194/npg-25-355-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx38"><?xmltex \def\ref@label{{Neumeyer(2004)}}?><label>Neumeyer(2004)</label><?label Neumeyer04?><mixed-citation>Neumeyer, N.: A central limit theorem for two-sample U-processes, Stat.
Probabil. Lett., 67, 73–85,
<ext-link xlink:href="https://doi.org/10.1016/j.spl.2002.12.001" ext-link-type="DOI">10.1016/j.spl.2002.12.001</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx39"><?xmltex \def\ref@label{{Ollinaho et~al.(2012)}}?><label>Ollinaho et al.(2012)</label><?label EPPES3?><mixed-citation>Ollinaho, P., Laine, M., Solonen, A., Haario, H., and Järvinen, H.: NWP model
forecast skill optimization via closure parameter variations, Q.
J. Roy. Meteor. Soc., 139, 1520–1532,
<ext-link xlink:href="https://doi.org/10.1002/qj.2044" ext-link-type="DOI">10.1002/qj.2044</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx40"><?xmltex \def\ref@label{{Ollinaho et~al.(2013)}}?><label>Ollinaho et al.(2013)</label><?label EPPES4?><mixed-citation>Ollinaho, P., Bechtold, P., Leutbecher, M., Laine, M., Solonen, A., Haario, H., and Järvinen, H.: Parameter variations in prediction skill optimization at ECMWF, Nonlin. Processes Geophys., 20, 1001–1010, <ext-link xlink:href="https://doi.org/10.5194/npg-20-1001-2013" ext-link-type="DOI">10.5194/npg-20-1001-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx41"><?xmltex \def\ref@label{{Ollinaho et~al.(2014)}}?><label>Ollinaho et al.(2014)</label><?label Ollinaho14?><mixed-citation>Ollinaho, P., Järvinen, H., Bauer, P., Laine, M., Bechtold, P., Susiluoto, J., and Haario, H.: Optimization of NWP model closure parameters using total energy norm of forecast error as a target, Geosci. Model Dev., 7, 1889–1900, <ext-link xlink:href="https://doi.org/10.5194/gmd-7-1889-2014" ext-link-type="DOI">10.5194/gmd-7-1889-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx42"><?xmltex \def\ref@label{{Pedlosky(1987)}}?><label>Pedlosky(1987)</label><?label Pedlosky87?><mixed-citation>Pedlosky, J.: Geophysical Fluid Dynamics, Springer-Verlag, New York, 22–57,
1987.
 </mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx43"><?xmltex \def\ref@label{{Price et~al.(2018)}}?><label>Price et al.(2018)</label><?label Price?><mixed-citation>Price, L. F., Drovandi, C. C., Lee, A., and Nott, D. J.: Bayesian Synthetic
Likelihood, J. Comput. Graph. Stat., 27, 1–11,
<ext-link xlink:href="https://doi.org/10.1080/10618600.2017.1302882" ext-link-type="DOI">10.1080/10618600.2017.1302882</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx44"><?xmltex \def\ref@label{{Robert and Casella(2004)}}?><label>Robert and Casella(2004)</label><?label RobCas040?><mixed-citation>
Robert, C. and Casella, G.: Monte Carlo Statistical Methods, Springer-Verlag, New York, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx45"><?xmltex \def\ref@label{{Roeckner et~al.(2003)}}?><label>Roeckner et al.(2003)</label><?label Roeckner03?><mixed-citation>
Roeckner, E., Bäuml, G., Bonaventura, L., Brokopf, R., Esch, M., Giorgetta, M., Hagemann, S., Kirchner, I., Kornblueh, L., Manzini, E., Rhodin, A., Schlese, U., Schulzweida, U., and Tompkins, A.: The atmospheric general circulation model ECHAM 5. PART I: Model
description, Report,  MPI fur Meteorologie, Hamburg, 349, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx46"><?xmltex \def\ref@label{{Rougier(2013)}}?><label>Rougier(2013)</label><?label Rougier20120297?><mixed-citation>Rougier, J.: “Intractable and unsolved”: some
thoughts on statistical data assimilation with uncertain static parameters,
Philos. T. R. Soc. S.-A, 371, 371, <ext-link xlink:href="https://doi.org/10.1098/rsta.2012.0297" ext-link-type="DOI">10.1098/rsta.2012.0297</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx47"><?xmltex \def\ref@label{{Sacks et~al.(1989)}}?><label>Sacks et al.(1989)</label><?label Sacks89?><mixed-citation>
Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P.: Design and Analysis
of Computer Experiments, Stat. Sci., 4, 409–423, 1989.</mixed-citation></ref>
      <ref id="bib1.bibx48"><?xmltex \def\ref@label{{Sivashinsky(1977)}}?><label>Sivashinsky(1977)</label><?label SIVASHINSKY77?><mixed-citation>Sivashinsky, G.: Nonlinear analysis of hydrodynamic instability in laminar
flames – I. Derivation of basic equations, Acta Astronaut., 4, 1177–1206, <ext-link xlink:href="https://doi.org/10.1016/0094-5765(77)90096-0" ext-link-type="DOI">10.1016/0094-5765(77)90096-0</ext-link>, 1977.</mixed-citation></ref>
      <ref id="bib1.bibx49"><?xmltex \def\ref@label{{Solonen et~al.(2012)}}?><label>Solonen et al.(2012)</label><?label solonen2012?><mixed-citation>Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., and Järvinen,
H.: Efficient MCMC for Climate Model Parameter Estimation: Parallel Adaptive
Chains and Early Rejection, Bayesian Anal., 7, 715–736,
<ext-link xlink:href="https://doi.org/10.1214/12-BA724" ext-link-type="DOI">10.1214/12-BA724</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx50"><?xmltex \def\ref@label{{Springer et~al.(2019)}}?><label>Springer et al.(2019)</label><?label Springer19?><mixed-citation>Springer, S., Haario, H., Shemyakin, V., Kalachev, L., and Shchepakin, D.:
Robust parameter estimation of chaotic systems, Inv. Prob. Imag., 13, 1189–1212, <ext-link xlink:href="https://doi.org/10.3934/ipi.2019053" ext-link-type="DOI">10.3934/ipi.2019053</ext-link>,
2019.</mixed-citation></ref>
      <ref id="bib1.bibx51"><?xmltex \def\ref@label{{Stevens et~al.(2013)}}?><label>Stevens et al.(2013)</label><?label Stevens13?><mixed-citation>Stevens, B., Giorgetta, M., Esch, M., Mauritsen, T., Crueger, T., Rast, S.,
Salzmann, M., Schmidt, H., Bader, J., Block, K., Brokopf, R., Fast, I.,
Kinne, S., Kornblueh, L., Lohmann, U., Pincus, R., Reichler, T., and
Roeckner, E.: Atmospheric component of the MPI-M Earth System Model:
ECHAM6, J. Adv. Model. Earth Sy., 5, 146–172,
<ext-link xlink:href="https://doi.org/10.1002/jame.20015" ext-link-type="DOI">10.1002/jame.20015</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx52"><?xmltex \def\ref@label{{Tarantola(2005)}}?><label>Tarantola(2005)</label><?label tarantola2005?><mixed-citation>
Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter
Estimation, Society for Industrial and Applied Mathematics, Philadelphia, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx53"><?xmltex \def\ref@label{{Wood(2010)}}?><label>Wood(2010)</label><?label Wood_nature?><mixed-citation>
Wood, S.: Statistical inference for noisy nonlinear ecological dynamic systems,
Nature, 466, 1102–1104, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx54"><?xmltex \def\ref@label{{Yiorgos~Smyrlis(1996)}}?><label>Yiorgos Smyrlis(1996)</label><?label smyrlis?><mixed-citation>
Yiorgos Smyrlis, D. P.: Computational study of chaotic and ordered solutions of
the Kuramoto-Shivashinsky equation, NASA Contractor Report 198283, 96-12, NASA reports, Hampton, Virginia,
1996.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Efficient Bayesian inference for large chaotic dynamical systems</article-title-html>
<abstract-html><p>Estimating parameters of chaotic geophysical models is challenging due to their inherent unpredictability. These models cannot be calibrated with   standard least squares or filtering methods if observations are temporally sparse. Obvious remedies, such as averaging over temporal and spatial data to characterize the mean behavior, do not capture the subtleties of the underlying dynamics. We perform Bayesian inference of parameters in high-dimensional and computationally demanding chaotic dynamical systems by combining two approaches:
(i) measuring model–data mismatch by comparing chaotic attractors and (ii) mitigating the computational cost of inference by using surrogate models. Specifically, we construct a likelihood function suited to chaotic models by evaluating a distribution over  distances between points in the phase space; this distribution defines a summary statistic that depends on the  geometry of the attractor, rather than on pointwise matching of trajectories.
This statistic is computationally expensive to simulate, compounding the usual challenges of Bayesian computation with physical models. Thus, we develop
an inexpensive surrogate for the log likelihood with the local approximation Markov chain Monte Carlo method, which in our simulations reduces the time required for accurate inference by orders of magnitude. We investigate the behavior of the resulting algorithm with two smaller-scale problems and then use a quasi-geostrophic model to demonstrate its large-scale application.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Andrieu and Roberts(2009)</label><mixed-citation>
Andrieu, C. and Roberts, G. O.: The pseudo-marginal approach for efficient
Monte Carlo computations, Ann. Statist., 37, 697–725,
<a href="https://doi.org/10.1214/07-AOS574" target="_blank">https://doi.org/10.1214/07-AOS574</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Asch et al.(2016)</label><mixed-citation>
Asch, M., Bocquet, M., and Nodet, M.: Data Assimilation: Methods, Algorithms,
and Applications, SIAM, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Beaumont et al.(2002)</label><mixed-citation>
Beaumont, M. A., Zhang, W., and Balding, D. J.: Approximate Bayesian
Computation in Population Genetics, Genetics, 162, 2025–2035, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Borovkova et al.(2001)</label><mixed-citation>
Borovkova, S., Burton, R., and Dehling, H.: Limit theorems for functionals of
mixing processes with applications to <i>U</i>-statistics and dimension
estimation, T. Am. Math. Soc., 353,
4261–4318, <a href="https://doi.org/10.1090/S0002-9947-01-02819-7" target="_blank">https://doi.org/10.1090/S0002-9947-01-02819-7</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Cencini et al.(2010)</label><mixed-citation>
Cencini, M., Cecconi, F., and Vulpiani, A.: Chaos: From Simple Models to
Complex Systems, World Scientific: Series on advances in statistical
mechanics, WORLD SCIENTIFIC, <a href="https://doi.org/10.1142/7351" target="_blank">https://doi.org/10.1142/7351</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Conn et al.(2009)</label><mixed-citation>
Conn, A. R., Scheinberg, K., and Vicente, L. N.: Introduction to
derivative-free optimization, MPS-SIAM, Society for Industrial and Applied Mathematics, <a href="https://doi.org/10.1137/1.9780898718768" target="_blank">https://doi.org/10.1137/1.9780898718768</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Conrad et al.(2016)</label><mixed-citation>
Conrad, P. R., Marzouk, Y. M., Pillai, N. S., and Smith, A.: Accelerating
asymptotically exact MCMC for computationally intensive models via local
approximations, J. Am. Stat. Assoc., 111,
1591–1607, <a href="https://doi.org/10.1080/01621459.2015.1096787" target="_blank">https://doi.org/10.1080/01621459.2015.1096787</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Conrad et al.(2018)</label><mixed-citation>
Conrad, P. R., Davis, A. D., Marzouk, Y. M., Pillai, N. S., and Smith, A.:
Parallel local approximation MCMC for expensive models, SIAM/ASA Journal on
Uncertainty Quantification, 6, 339–373, <a href="https://doi.org/10.1137/16M1084080" target="_blank">https://doi.org/10.1137/16M1084080</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Davis et al.(2020)</label><mixed-citation>
Davis, A., Marzouk, Y., Smith, A., and Pillai, N.: Rate-optimal refinement
strategies for local approximation MCMC, arXiv, available at: <a href="https://arxiv.org/abs/2006.00032" target="_blank"/>, arXiv:2006.00032, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Davis(2018)</label><mixed-citation>
Davis, A. D.: Prediction under uncertainty: from models for marine-terminating
glaciers to Bayesian computation, PhD thesis, Massachusetts Institute of
Technology, Massachusetts, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Donsker(1951)</label><mixed-citation>
Donsker, M. D.: An invariance principle for certain probability limit theorems,
Mem. Am. Math. Soc., 6, 1–10, 1951.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Durbin and Koopman(2012)</label><mixed-citation>
Durbin, J. and Koopman, S. J.: Time Series Analysis by State Space Methods,
Oxford University Press, Oxford, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>ECMWF(2013)</label><mixed-citation>
ECMWF: IFS Documentation – Cy40r1, Tech. Rep. 349, European Centre for
Medium-Range Weather Forecasts, Reading, England,
available at: <a href="https://www.ecmwf.int/sites/default/files/elibrary/2014/9203-part-iii-dynamics-and-numerical-procedures.pdf" target="_blank"/> (last access: October 2020),
2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Fandry and Leslie(1984)</label><mixed-citation>
Fandry, C. B. and Leslie, L. M.: A Two-Layer Quasi-Geostrophic Model of Summer
Trough Formation in the Australian Subtropical Easterlies, J.
Atmos. Sci., 41, 807–818,
<a href="https://doi.org/10.1175/1520-0469(1984)041&lt;0807:ATLQGM&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0469(1984)041&lt;0807:ATLQGM&gt;2.0.CO;2</a>, 1984.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gamerman(1997)</label><mixed-citation>
Gamerman, D.: Markov Chain Monte Carlo: Stochastic Simulation for
Bayesian Inference, Chapman &amp; Hall/CRC Texts in Statistical Science,
Taylor &amp; Francis, London, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Gelman et al.(2013)</label><mixed-citation>
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D.:
Bayesian Data Analysis, Chapman and Hall/CRC, 3rd edn., 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Grassberger and Procaccia(1983a)</label><mixed-citation>
Grassberger, P. and Procaccia, I.: Procaccia, I.: Estimation of the Kolmogorov
entropy from a chaotic signal, Phys. Rev. A, 28, 2591–2593, <a href="https://doi.org/10.1103/PhysRevA.28.2591" target="_blank">https://doi.org/10.1103/PhysRevA.28.2591</a>,
1983a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Grassberger and Procaccia(1983b)</label><mixed-citation>
Grassberger, P. and Procaccia, I.: Measuring the strangeness of strange
attractors, Physica D, 9, 189–208,
<a href="https://doi.org/10.1016/0167-2789(83)90298-1" target="_blank">https://doi.org/10.1016/0167-2789(83)90298-1</a>, 1983b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Haario et al.(2001)</label><mixed-citation>
Haario, H., Saksman, E., and Tamminen, J.: An adaptive Metropolis algorithm,
Bernoulli, 7, 223–242, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Haario et al.(2006)</label><mixed-citation>
Haario, H., Laine, M., Mira, A., and Saksman, E.: DRAM: Efficient adaptive
MCMC, Stat. Comput., 16, 339–354,
<a href="https://doi.org/10.1007/s11222-006-9438-0" target="_blank">https://doi.org/10.1007/s11222-006-9438-0</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Haario et al.(2015)</label><mixed-citation>
Haario, H., Kalachev, L., and Hakkarainen, J.: Generalized correlation integral
vectors: A distance concept for chaotic dynamical systems, Chaos, 25, 063102,
<a href="https://doi.org/10.1063/1.4921939" target="_blank">https://doi.org/10.1063/1.4921939</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Hakkarainen et al.(2012)</label><mixed-citation>
Hakkarainen, J., Ilin, A., Solonen, A., Laine, M., Haario, H., Tamminen, J., Oja, E., and Järvinen, H.: On closure parameter estimation in chaotic systems, Nonlin. Processes Geophys., 19, 127–143, <a href="https://doi.org/10.5194/npg-19-127-2012" target="_blank">https://doi.org/10.5194/npg-19-127-2012</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Hakkarainen et al.(2013)</label><mixed-citation>
Hakkarainen, J., Solonen, A., Ilin, A., Susiluoto, J., Laine, M., Haario, H.,
and Järvinen, H.: A dilemma of the uniqueness of weather and climate model
closure parameters, Tellus A, 65,
20147, <a href="https://doi.org/10.3402/tellusa.v65i0.20147" target="_blank">https://doi.org/10.3402/tellusa.v65i0.20147</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Houtekamer and Zhang(2016)</label><mixed-citation>
Houtekamer, P. L. and Zhang, F.: Review of the Ensemble Kalman Filter for
Atmospheric Data Assimilation, Mon. Weather Rev., 144, 4489–4532,
<a href="https://doi.org/10.1175/MWR-D-15-0440.1" target="_blank">https://doi.org/10.1175/MWR-D-15-0440.1</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Huttunen et al.(2018)</label><mixed-citation>
Huttunen, J., Kaipio, J., and Haario, H.: Approximation error approach in
spatiotemporally chaotic models with application to Kuramoto–Sivashinsky
equation, Comput. Stat. Data Anal., 123, 13–31,
<a href="https://doi.org/10.1016/j.csda.2018.01.015" target="_blank">https://doi.org/10.1016/j.csda.2018.01.015</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Järvinen et al.(2010)</label><mixed-citation>
Järvinen, H., Räisänen, P., Laine, M., Tamminen, J., Ilin, A., Oja, E., Solonen, A., and Haario, H.: Estimation of ECHAM5 climate model closure parameters with adaptive MCMC, Atmos. Chem. Phys., 10, 9993–10002, <a href="https://doi.org/10.5194/acp-10-9993-2010" target="_blank">https://doi.org/10.5194/acp-10-9993-2010</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Jarvinen et al.(2011)</label><mixed-citation>
Jarvinen, H., Laine, M., Solonen, A., and Haario, H.: Ensemble prediction and
parameter estimation system: the concept, Q. J. Roy.
Meteor. Soc., 138, 281–288, <a href="https://doi.org/10.1002/qj.923" target="_blank">https://doi.org/10.1002/qj.923</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Kennedy and O'Hagan(2001)</label><mixed-citation>
Kennedy, M. C. and O'Hagan, A.: Bayesian calibration of computer models,
J. R. Stat. Soc. B,
63, 425–464, <a href="https://doi.org/10.1111/1467-9868.00294" target="_blank">https://doi.org/10.1111/1467-9868.00294</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Kohler(2002)</label><mixed-citation>
Kohler, M.: Universal consistency of local polynomial kernel regression
estimates, Ann. I. Stat. Math., 54, 879–899,
2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Kuramoto(1978)</label><mixed-citation>
Kuramoto, Y.: Diffusion-Induced Chaos in Reaction Systems, Progress of Theoretical Physics Supplement, 64, 346–367, <a href="https://doi.org/10.1143/PTPS.64.346" target="_blank">https://doi.org/10.1143/PTPS.64.346</a>, 1978.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Kuramoto and Yamada(1976)</label><mixed-citation>
Kuramoto, Y. and Yamada, T.: Turbulent State in Chemical Reactions, Prog.
Theor. Phys., 56, 679–681, <a href="https://doi.org/10.1143/PTP.56.679" target="_blank">https://doi.org/10.1143/PTP.56.679</a>, 1976.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Laine et al.(2011)</label><mixed-citation>
Laine, M., Solonen, A., Haario, H., and Jarvinen, H.: Ensemble prediction and
parameter estimation system: the method, Q. J. Roy.
Meteor. Soc., 138, 289–297, <a href="https://doi.org/10.1002/qj.922" target="_blank">https://doi.org/10.1002/qj.922</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Law et al.(2015)</label><mixed-citation>
Law, K., Stuart, A., and Zygalakis, K.: Data Assimilation, Springer
International Publishing, Cham, Switzerland, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Liu and West(2001)</label><mixed-citation>
Liu, J. and West, M.: Combined Parameter and State Estimation in
Simulation-Based Filtering, Springer New York, New York, NY, 197–223,
<a href="https://doi.org/10.1007/978-1-4757-3437-9_10" target="_blank">https://doi.org/10.1007/978-1-4757-3437-9_10</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Lorenz(1963)</label><mixed-citation>
Lorenz, E. N.: Deterministic Nonperiodic Flow, J. Atmos.
Sci., 20, 130–141, <a href="https://doi.org/10.1175/1520-0469(1963)020&lt;0130:DNF&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0469(1963)020&lt;0130:DNF&gt;2.0.CO;2</a>,
1963.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Luego et al.(2020)</label><mixed-citation>
Luego, D., Martino, L., Elvira, V., and Särkkä, S.: A survey of Monte Carlo
methods for parameter estimation, EURASIP J. Adv. Sig.
Pr., 2020, 25, <a href="https://doi.org/10.1186/s13634-020-00675-6" target="_blank">https://doi.org/10.1186/s13634-020-00675-6</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Morzfeld et al.(2018)</label><mixed-citation>
Morzfeld, M., Adams, J., Lunderman, S., and Orozco, R.: Feature-based data assimilation in geophysics, Nonlin. Processes Geophys., 25, 355–374, <a href="https://doi.org/10.5194/npg-25-355-2018" target="_blank">https://doi.org/10.5194/npg-25-355-2018</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Neumeyer(2004)</label><mixed-citation>
Neumeyer, N.: A central limit theorem for two-sample U-processes, Stat.
Probabil. Lett., 67, 73–85,
<a href="https://doi.org/10.1016/j.spl.2002.12.001" target="_blank">https://doi.org/10.1016/j.spl.2002.12.001</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Ollinaho et al.(2012)</label><mixed-citation>
Ollinaho, P., Laine, M., Solonen, A., Haario, H., and Järvinen, H.: NWP model
forecast skill optimization via closure parameter variations, Q.
J. Roy. Meteor. Soc., 139, 1520–1532,
<a href="https://doi.org/10.1002/qj.2044" target="_blank">https://doi.org/10.1002/qj.2044</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Ollinaho et al.(2013)</label><mixed-citation>
Ollinaho, P., Bechtold, P., Leutbecher, M., Laine, M., Solonen, A., Haario, H., and Järvinen, H.: Parameter variations in prediction skill optimization at ECMWF, Nonlin. Processes Geophys., 20, 1001–1010, <a href="https://doi.org/10.5194/npg-20-1001-2013" target="_blank">https://doi.org/10.5194/npg-20-1001-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Ollinaho et al.(2014)</label><mixed-citation>
Ollinaho, P., Järvinen, H., Bauer, P., Laine, M., Bechtold, P., Susiluoto, J., and Haario, H.: Optimization of NWP model closure parameters using total energy norm of forecast error as a target, Geosci. Model Dev., 7, 1889–1900, <a href="https://doi.org/10.5194/gmd-7-1889-2014" target="_blank">https://doi.org/10.5194/gmd-7-1889-2014</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Pedlosky(1987)</label><mixed-citation>
Pedlosky, J.: Geophysical Fluid Dynamics, Springer-Verlag, New York, 22–57,
1987.

</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Price et al.(2018)</label><mixed-citation>
Price, L. F., Drovandi, C. C., Lee, A., and Nott, D. J.: Bayesian Synthetic
Likelihood, J. Comput. Graph. Stat., 27, 1–11,
<a href="https://doi.org/10.1080/10618600.2017.1302882" target="_blank">https://doi.org/10.1080/10618600.2017.1302882</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Robert and Casella(2004)</label><mixed-citation>
Robert, C. and Casella, G.: Monte Carlo Statistical Methods, Springer-Verlag, New York, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Roeckner et al.(2003)</label><mixed-citation>
Roeckner, E., Bäuml, G., Bonaventura, L., Brokopf, R., Esch, M., Giorgetta, M., Hagemann, S., Kirchner, I., Kornblueh, L., Manzini, E., Rhodin, A., Schlese, U., Schulzweida, U., and Tompkins, A.: The atmospheric general circulation model ECHAM 5. PART I: Model
description, Report,  MPI fur Meteorologie, Hamburg, 349, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Rougier(2013)</label><mixed-citation>
Rougier, J.: “Intractable and unsolved”: some
thoughts on statistical data assimilation with uncertain static parameters,
Philos. T. R. Soc. S.-A, 371, 371, <a href="https://doi.org/10.1098/rsta.2012.0297" target="_blank">https://doi.org/10.1098/rsta.2012.0297</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Sacks et al.(1989)</label><mixed-citation>
Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P.: Design and Analysis
of Computer Experiments, Stat. Sci., 4, 409–423, 1989.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Sivashinsky(1977)</label><mixed-citation>
Sivashinsky, G.: Nonlinear analysis of hydrodynamic instability in laminar
flames – I. Derivation of basic equations, Acta Astronaut., 4, 1177–1206, <a href="https://doi.org/10.1016/0094-5765(77)90096-0" target="_blank">https://doi.org/10.1016/0094-5765(77)90096-0</a>, 1977.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Solonen et al.(2012)</label><mixed-citation>
Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., and Järvinen,
H.: Efficient MCMC for Climate Model Parameter Estimation: Parallel Adaptive
Chains and Early Rejection, Bayesian Anal., 7, 715–736,
<a href="https://doi.org/10.1214/12-BA724" target="_blank">https://doi.org/10.1214/12-BA724</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Springer et al.(2019)</label><mixed-citation>
Springer, S., Haario, H., Shemyakin, V., Kalachev, L., and Shchepakin, D.:
Robust parameter estimation of chaotic systems, Inv. Prob. Imag., 13, 1189–1212, <a href="https://doi.org/10.3934/ipi.2019053" target="_blank">https://doi.org/10.3934/ipi.2019053</a>,
2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Stevens et al.(2013)</label><mixed-citation>
Stevens, B., Giorgetta, M., Esch, M., Mauritsen, T., Crueger, T., Rast, S.,
Salzmann, M., Schmidt, H., Bader, J., Block, K., Brokopf, R., Fast, I.,
Kinne, S., Kornblueh, L., Lohmann, U., Pincus, R., Reichler, T., and
Roeckner, E.: Atmospheric component of the MPI-M Earth System Model:
ECHAM6, J. Adv. Model. Earth Sy., 5, 146–172,
<a href="https://doi.org/10.1002/jame.20015" target="_blank">https://doi.org/10.1002/jame.20015</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Tarantola(2005)</label><mixed-citation>
Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter
Estimation, Society for Industrial and Applied Mathematics, Philadelphia, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Wood(2010)</label><mixed-citation>
Wood, S.: Statistical inference for noisy nonlinear ecological dynamic systems,
Nature, 466, 1102–1104, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Yiorgos Smyrlis(1996)</label><mixed-citation>
Yiorgos Smyrlis, D. P.: Computational study of chaotic and ordered solutions of
the Kuramoto-Shivashinsky equation, NASA Contractor Report 198283, 96-12, NASA reports, Hampton, Virginia,
1996.
</mixed-citation></ref-html>--></article>
