<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-14-7659-2021</article-id><title-group><article-title>Model calibration using ESEm v1.1.0 – an open, scalable<?xmltex \hack{\break}?> Earth system
emulator</article-title><alt-title>Model calibration using ESEm v1.1.0</alt-title>
      </title-group><?xmltex \runningtitle{Model calibration using ESEm v1.1.0}?><?xmltex \runningauthor{D. Watson-Parris et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Watson-Parris</surname><given-names>Duncan</given-names></name>
          <email>duncan.watson-parris@physics.ox.uk</email>
        <ext-link>https://orcid.org/0000-0002-5312-4950</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Williams</surname><given-names>Andrew</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-5457-5075</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Deaconu</surname><given-names>Lucia</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-2143-1523</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Stier</surname><given-names>Philip</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-1191-0128</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Atmospheric, Oceanic and Planetary Physics, Department of Physics,
University of Oxford, Oxford, UK</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Faculty of Environmental Science and Engineering, University of Babeș-Bolyai, Cluj-Napoca, Romania</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Duncan Watson-Parris (duncan.watson-parris@physics.ox.uk)</corresp></author-notes><pub-date><day>20</day><month>December</month><year>2021</year></pub-date>
      
      <volume>14</volume>
      <issue>12</issue>
      <fpage>7659</fpage><lpage>7672</lpage>
      <history>
        <date date-type="received"><day>2</day><month>August</month><year>2021</year></date>
           <date date-type="rev-request"><day>27</day><month>August</month><year>2021</year></date>
           <date date-type="rev-recd"><day>14</day><month>November</month><year>2021</year></date>
           <date date-type="accepted"><day>17</day><month>November</month><year>2021</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 Duncan Watson-Parris et al.</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021.html">This article is available from https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d1e116">Large computer models are ubiquitous in the Earth
sciences. These models often have tens or hundreds of tuneable parameters
and can take thousands of core hours to run to completion while generating
terabytes of output. It is becoming common practice to develop emulators as
fast approximations, or surrogates, of these models in order to explore the
relationships between these inputs and outputs, understand uncertainties, and
generate large ensembles datasets. While the purpose of these surrogates may
differ, their development is often very similar. Here we introduce ESEm: an
open-source tool providing a general workflow for emulating and validating a
wide variety of models and outputs. It includes efficient routines for
sampling these emulators for the purpose of uncertainty quantification and
model calibration. It is built on well-established, high-performance
libraries to ensure robustness, extensibility and scalability. We
demonstrate the flexibility of ESEm through three case studies using ESEm to
reduce parametric uncertainty in a general circulation model and explore
precipitation sensitivity in a cloud-resolving model and scenario
uncertainty in the CMIP6 multi-model ensemble.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e128">Computer models are crucial tools for their diagnostic and predictive power
and are applied to every aspect of the Earth sciences. These models have
tended to increase in complexity to match the increasing availability of
computational resources and are now routinely run on large supercomputers
producing terabytes of output at a time. While this added complexity can
bring new insights and improved accuracy, sometimes it can be useful to run
fast approximations of these models, often referred to as surrogates (Sacks
et al., 1989). These surrogates have been used for many years to allow
efficient exploration of the sensitivity of model output to its inputs (L. A. Lee
et al., 2011; Ryan et al., 2018), generation of large ensembles of model
realizations (Holden et al., 2014, 2019; Williamson et al., 2013), and
calibration of models (Holden et al., 2015a; Cleary et al., 2021; Couvreux et
al., 2021). Although relatively common, these workflows invariably use
custom emulators and bespoke analysis routines, limiting their
reproducibility and use by non-statisticians.</p>
      <p id="d1e131">Here we introduce ESEm, a general tool for emulating Earth systems models
and a framework for using these emulators, with a focus on model
calibration, broadly defined as finding model parameters that produce model
outputs compatible with available observations. Unless otherwise stated,
model parameters in this context refer to constant, scalar model inputs
rather than, e.g. boundary conditions. This tool builds on the development
of emulators for uncertainty quantification and constraint in the aerosol
component of general circulation models (Regayre et al., 2018; L. A. Lee et al.,
2011; Johnson et al., 2020; Watson-Parris et al., 2020) but is applicable
much more broadly, as we will show.</p>
      <p id="d1e134">Figure 1 shows a schematic of a typical model
calibration workflow that ESEm enables, assuming a simple “one shot” design
for simplicity. Once the gridded model data have been generated they must be
co-located (resampled) onto the same temporal and spatial locations as the
observational data that will be used for calibration in order to minimize
sampling uncertainties (Schutgens et al., 2016a, b). The Community
Intercomparison Suite (CIS; Watson-Parris et al., 2016) is an open-source
Python library that makes this kind of<?pagebreak page7660?> operation very simple. The output is
an Iris (Met Office, 2020b) cube-like object, a representation of a Climate and
Forecast (CF)-compliant NetCDF file, which includes all of the necessary
coordinate and metadata to ensure traceability and allow easy combination
with other tools. ESEm uses the same representations throughout to allow
easy input and output of the emulated datasets, plotting and validation and
also allows chaining operations with other related tools such as Cartopy
(Met Office, 2020a) and Xarray (Hoyer and Hamman, 2016). Once the data have been
read and co-located, they are split into training and validation (and
optionally test) sets before performing emulation over the training data
using the ESEm interface. This emulator can then be validated and used for
inference and calibration.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e140">A schematic of a typical workflow using CIS and ESEm to perform
model emulation and calibration. Note that only the locations of the observed
data are used for resampling the model data.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021-f01.png"/>

      </fig>

      <p id="d1e149">Emulation is essentially a multi-dimensional regression problem and ESEm
provides three main options for performing these fits – Gaussian processes
(GPs), convolutional neural networks (CNNs) and random forests (RFs). Based
on a technique for estimating the location of gold in South Africa from
sparse mining information known as Kriging and formalized by Matheron (1963), GPs have become a popular tool for non-parametric interpolation and
an important tool within the field of supervised machine learning. Kennedy
and O'Hagan (2001) first described the use of GPs for the calibration of
computer models, which forms the basis of current approaches. GPs are
particularly well suited to this task since they provide robust estimates
and uncertainties to non-linear responses, even in cases with limited
training data. Despite initial difficulties with their scalability as
compared to, e.g. neural networks, recent advances have allowed for deeper,
more expressive (Damianou and Lawrence, 2013) GPs that can be trained on
ever larger volumes of training data (Burt et al., 2019). Despite their
prevalent use in other areas of machine learning, CNNs and RFs have not been
widely used in model emulation. Here we include both as examples of
alternative approaches to demonstrate the flexible emulation interface and to motivate broader usage of the tool. For example, Sect. 5.1
shows the use of a RF emulator for exploring precipitation susceptibility in
a cloud-resolving model.</p>
      <p id="d1e152">One common use of an emulator is to perform model calibration. By
definition, any computer model has a number of inputs and outputs. The model
inputs can be high-dimensional boundary conditions or simple scalar
parameters, and while large uncertainties can exist in the boundary
conditions, our focus here is on the latter. These input parameters can often
be uncertain, either due to a lack of physical analogue or lack of
available data. Assuming that suitable observations of the model output are
available, one may ask which values of the input parameters give the best
output as measured against the observations. This model “tuning” is often
done by hand, leading to ambiguity and potentially sub-optimal configurations
(Mauritsen et al., 2012). The difficulty in this task arises because, while
the computer model is designed to calculate the output based on the inputs,
the inverse process is normally not possible directly. In some cases, this
inverse can be estimated and the process of generating an inverse of the
model, known as inverse modelling, has a long history in hydrological
modelling (e.g. Hou and Rubin, 2005). The inverse of individual atmospheric
model components can be determined using adjoint methods (Partridge et al.,
2011; Karydis et al., 2012; Henze et al., 2007), but these require bespoke
development and are not amenable to large multi-component models. Simple
approaches can be used to determine chemical and aerosol emissions based on
atmospheric composition, but these implicitly assume that the relationship
between emissions and atmospheric concentration is reasonably well predicted
by the model (C. Lee et al., 2011). More generally, attempting to infer the
best model inputs to match a given output is variously referred to as
“calibration”, “optimal parameter estimation” and “constraining”. In many
cases finding these optimum parameters requires many evaluations of the
model, which may not be feasible for large or complex models, and thus
emulators are used as a surrogate. ESEm provides a number of options for
performing this inference, from simple rejection sampling to more complex
Markov chain Monte Carlo (MCMC) techniques.</p>
      <p id="d1e155">Despite their increasing popularity, no general-purpose toolset exists for
model emulation in the Earth sciences. Each project must create and validate
their own emulators, with all of the associated data handling and
visualization code that necessarily accompanies them. Further, this code
remains closed source, discouraging replication and extension of the
published work. In this paper we aim not only to describe the ESEm tool but
also to elucidate the general process of emulation with a number of distinct
examples, including model calibration, in the hope of demonstrating its
usefulness to the field. A description of the pedagogical example used to
provide context for the framework description is provided in Sect. 2. The
emulation workflow and the three models included with ESEm are provided in
Sect. 3. We then discuss the sampling of these emulators for inference in
Sect. 4, before providing two more specific example uses in Sect. 5 and
some concluding remarks in Sect. 6.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Exemplar problem</title>
      <p id="d1e166">While we endeavour to describe the technical implementation of ESEm in
general terms, we will refer back to a specific example use case throughout
in order to aid clarity. This example case concerns the estimation of
absorption aerosol optical depth (AAOD) due to anthropogenic black carbon
(BC), which is highly uncertain due to limited observations and estimates of
both pre-industrial and present-day biomass burning emissions, and large
uncertainties in key microphysical processes and parameters in climate
models (Bellouin et al., 2020).</p>
      <p id="d1e169">Briefly, the model considered here is ECHAM6.3-HAM2.3 (Tegen et al., 2018;
Neubauer et al., 2019), which<?pagebreak page7661?> calculates the distribution and evolution of
both internally and externally mixed aerosol species in the atmosphere and
their effect on both radiation and cloud processes. We generate an ensemble
of 39 model simulations for the year of 2017 over three uncertain input
parameters: (1) a scaling of the emissions flux of BC by between 0.5 and 2
times the baseline emissions, (2) a scaling on the removal rate of BC
through wet deposition (the main removal mechanism of BC) by between 0.33 and
3 times the baseline values, and (3) a scaling of the imaginary refractive
index of BC (which determines its absorptivity) between 0.2 and 0.8. The
parameter sets are created using maximin Latin hypercube sampling (Morris
and Mitchell, 1995) where the scaling parameters (1 and 2) are sampled from
log-uniform distributions, while the imaginary part of the refractive index
is sampled from a normal distribution centred around 0.7. These parameter
ranges were determined by expert elicitation and designed to cover the
broadest plausible range of uncertainty. Unless otherwise stated, five of the
simulations are retained for testing while the rest are used for training
the emulators (see Sect. 3.1 for more details). The model AAODs are
emulated at their native resolution of approximately 1.8<inline-formula><mml:math id="M1" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> longitude at the Equator (<inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mn mathvariant="normal">192</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">96</mml:mn></mml:mrow></mml:math></inline-formula> grid cells).</p>
      <p id="d1e193">For simplicity, in this paper we then compare the monthly mean model-simulated aerosol absorption optical depth with observations of the same
quantity in order to better constrain the global radiative effect of these
perturbations. A full analysis including in-situ compositional and
large-scale satellite observations, as well as an estimation of the effect
of the constrained parameter space on estimates of effective radiative
forcing will be presented elsewhere.</p>
      <p id="d1e196">Here we step through each of the emulation and inference procedures used to
determine a reduced uncertainty in climate model parameters, and hence AAOD,
by maximally utilizing the available observations.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Emulation engines</title>
      <p id="d1e207">Given the huge variety of geophysical models and their applications and the
broad (and rapidly expanding) variety of statistical models available to
emulate them, ESEm uses an object-oriented (OO) approach to provide a
generic emulation interface. This interface is designed in such a way as to
encourage additional model engines, either in the core package through
pull requests or more informally as a community resource. The inputs
include an Iris cube or xarray DataArray with the leading dimension representing the stack of
training samples and any other keyword arguments the emulator may require
for training. Using either user-specified or default options for the model
hyper-parameters and optimization techniques, the model is then easily fit
to the training data and validated against the held-back validation data.</p>
      <p id="d1e210">In this section we describe the inputs expected by the emulator and the
three emulation engines provided by default in ESEm.</p>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Input data preparation</title>
      <p id="d1e220">In many circumstances the observations we would like to use to compare and
calibrate our model against are provided on a very different spatial and
temporal sampling than the model itself. Typically, a model might use a
discretized representation of space-time, whereas observations are typically
point-like measurements or retrievals. Naively comparing point observations
with gridded model output can lead to large sampling biases (Schutgens et
al., 2017). By collocating the models and observations (e.g. by using CIS), we can minimize this error. An <monospace>ensemble_collocate</monospace>
utility is provided in ESEm to use CIS to efficiently collocate multiple
ensemble members onto the same observations. Other sources of
observation–model error may still be present, and accounting for these
will be discussed in Sect. 4.</p>
      <?pagebreak page7662?><p id="d1e226">In Earth sciences these (resampled) model values are typically very large
datasets with many millions of values. With sufficient computing power these
can be emulated directly; however, there is often a lot of redundancy in the
data due to, e.g. strong spatial and temporal correlations, and this
brute-force approach is wasteful and can make calibration difficult (as
discussed in Sect. 4). The use of summary statistics to reduce this volume
while retaining most of the information content is a mature field (Prangle,
2018) and is already widely used (albeit informally) in many cases. The
summary statistic can be as simple as a global weighted average, or it could
be an empirical orthogonal function (EOF)-based approach (Ryan et al.,
2018). Although some techniques for automatically finding such statistics
are becoming available (Fearnhead and Prangle, 2012), this usually requires
knowledge of the underlying data, and we leave this step for the user to
perform using the standard tools available (e.g. Dawson, 2016) as required.</p>
      <p id="d1e229">Once the data have been resampled and summarized they should be split into
training, validation and test sets. The training data are used to fit the
models, while the validation portion of the data are used to measure their
accuracy while exploring and tuning hyper-parameters (including model
architectures). The test data are held back for final testing of the model.
Typically, a <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:mn mathvariant="normal">70</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">20</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> split is used. Excellent tools exist for preparing
these splits, including for more advanced <inline-formula><mml:math id="M4" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-fold cross-validation, and we
include interfaces for such implementations in scikit-learn (Pedregosa et
al., 2011) and routines for generating simple qualitative validation
plots. Both scikit-learn and Keras (Chollet, 2015) include routines for
automating the process of hyper-parameter optimization, with more advanced
Bayesian optimization approaches available with the GPFlowOpt (Knudde et
al., 2017) package. These all share many of the same dependencies as ESEm,
making installation very simple.</p>
      <p id="d1e255">The input parameter space can also be reduced to enable more interpretable
and robust emulation (also known as feature selection). ESEm provides a
utility for filtering parameters based on the Bayesian (or Akaike)
information content (BIC; Akaike, 1974) of the regression coefficients for
a lasso least-angle regression (LARS) model, using the scikit-learn
implementation. This provides an objective estimate of the importance of the
different input parameters and allows removing any parameters that do not
affect the output of interest. A complementary approach may be to apply
feature importance tests to trained emulators to determine their sensitivity
to particular input parameters.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Gaussian process engine</title>
      <p id="d1e266">Gaussian processes (GPs) are a popular choice for model emulation due to
their simple formulation and robust uncertainty estimates, particularly in
cases of relatively small amounts of training data. Many excellent texts are
available to describe their implementation and use (Rasmussen and Williams,
2005), and we only provide a short description here. Briefly, a GP is a
stochastic process (a distribution of continuous functions) and can be
thought of as an infinite dimensional normal distribution (hence the name).
The statistical properties of the normal distributions and the tools of
Bayesian inference allow tractable estimation of the posterior distribution
of functions given a set of training data. For a given mean function, a GP
can be completely described by its second-order statistics, and thus the choice
of covariance function (or kernel) can be thought of as a prior over the
space of functions it can represent. Typical kernels include constant,
linear, radial basis function (RBF or squared exponential), and Matérn
<inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> which are only differentiable once and twice, respectively.
Kernels can also be designed to represent any aspect of the functions of
interest, such as non-stationarity or periodicity. This choice can often be
informed by the physical setting and provides greater control and
interpretability to the resulting model compared to, e.g. neural networks.
Fitting a GP involves an optimization of the remaining hyper-parameters,
namely the kernel length scale and smoothness.</p>
      <p id="d1e293">A number of libraries are available that provide GP fitting with varying
degrees of maturity and flexibility. By default, ESEm uses the open-source
GPFlow (Matthews et al., 2017) library for GP-based emulation. GPFlow builds
on the heritage of the GPy library (GPy, 2012) but is based on the
TensorFlow (Abadi et al., 2016) machine learning library with out-of-the-box
support for the use of graphical processing units (GPUs), which can
considerably speed up the training of GPs. It also provides support for
sparse and multi-output GPs. By default, ESEm uses a zero mean and a
combination of linear, RBF and polynomial kernels that are suitable for the
smooth and continuous parameter response expected for the examples used in
this paper and related problems. However, given the importance of the kernel
for determining the form of the functions generated by the GP, we have also
included the ability for users to specify combinations of other common
kernels and mean functions. For a clear
description of some common kernels and their combinations, as well as work
towards automated methods for choosing them, see, e.g. Duvenaud (2014). For stationary kernels, GPFlow
automatically performs automatic relevance determination (ARD), allowing
length scales to be learnt independently for each input dimension. The user
is also able to specify which dimensions should be active for each kernel in
the case where the input dimension can be reduced (as discussed above).</p>
      <p id="d1e296">The framework provided by GPFlow also allows for multi-output GP regression,
and ESEm takes advantage of this to automatically provide regression over
each of the output features provided in the training data. Figure 2 shows
the emulated response from the ESEm-generated GP emulation of absorption
aerosol optical depth (AAOD) using a “Constant <inline-formula><mml:math id="M7" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> Linear” kernel for one
specific set of the three parameters outlined in Sect. 2 chosen from the
test set (not shown during training). The emulator does an excellent job<?pagebreak page7663?> at
reproducing the spatial structure of the AAOD for these parameters and
exhibits errors that are less than an order of magnitude smaller than the predicted
values and significantly smaller than, e.g. typical model and observational
uncertainties.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e309">Example emulation of absorption aerosol optical depth (AAOD) for a
given set of three model parameters (broadly scaling emissions of black
carbon, removal of black carbon and the absorptivity of black carbon) as
output by <bold>(a)</bold> the full ECHAM-HAM aerosol climate model, <bold>(b)</bold> a Gaussian
process emulation, <bold>(c)</bold> a random forest emulation, and <bold>(d)</bold> a convolutional
neural network emulator for parameter combinations that were not seen
during training. The differences between ECHAM-HAM and the
emulators are also shown <bold>(e–g)</bold>.</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021-f02.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Neural network engine</title>
      <p id="d1e342">Through the development of automatic differentiation and batch gradient
descent it has become possible to efficiently train very large (millions of
parameters), deep (dozens of layers) neural networks (NNs) using large amounts
(terabytes) of training data. The price of this scalability is the risk of
overfitting and the lack of any information about the uncertainty of the
outputs. However, both of these shortcomings can be addressed using a
technique known as “dropout” whereby individual weights are randomly set to
zero and effectively “dropped” from the network. During training this has
the effect of forcing the network to learn redundant representations and
reduce the risk of overfitting (Srivastava et al., 2014). More recently it
was shown that applying the same technique during inference casts the NN as
approximating Bayesian inference in deep Gaussian processes and can provide
a well-calibrated uncertainty estimate on the outputs (Gal and Ghahramani,
2015). The convolutional layers within these networks also take into account
spatial correlations that cannot currently be directly modelled by GPs
(although dimension reduction in the input can have the same effect). The
main drawback with a CNN-based emulator is that they typically need a much
larger amount of training data than GP-based emulators.</p>
      <p id="d1e345">While fully connected neural networks have been used for many years, even in
climate science (Knutti et al., 2006; Krasnopolsky et al., 2005), the recent
surge in popularity has been powered by the increases in expressibility
provided by deep, convolutional neural networks (CNNs) and the
regularization techniques (such as early stopping) that prevent these huge
models from over-fitting the large amounts of training data required to
train them. Many excellent introductions can be found elsewhere, but
(briefly) a neural network consists of a network of nodes connecting (through
a variety of architectures) the inputs to the target outputs via a series of
weighted activation functions. The network architecture and activation
functions are typically chosen a priori, and following this the model weights are
determined through a combination of back-propagation and (batch) gradient
descent until the outputs match (defined by a given loss function) the
provided training data. As previously discussed, the random dropping of
nodes (by setting the weights to zero), termed dropout, can provide
estimates of the prediction uncertainty of such networks. The computational
efficiency of such networks and the rich variety of architectures available
have made them the tool of choice in many machine learning settings, and
they are starting to be used in climate sciences for emulation (Dagon et
al., 2020), although the large amounts of training data required have so far
limited their use somewhat.</p>
      <p id="d1e348">ESEm uses the Keras library (Chollet, 2015) with the TensorFlow back end to
provide a flexible interface for constructing and training CNN models, and a
simple, fairly shallow architecture is included as an example. This default
model takes the input parameters and passes them through an initial fully
connected layer before passing through two transposed convolutional layers
that perform an inverse convolution and act to “spread out” the parameter
information spatially. The results of this default model are shown in Fig. 2c, which shows the predicted AAOD from a specific set of three model
parameters. While the emulator clearly has some skill and produces the
large-scale structure of the AAOD, the error compared to the full ECHAM-HAM
output is larger than the GP emulator at around 10 % of the absolute
values. This is primarily due to the limited training data available in this
example (34 simulations). In addition, this “simple” network still contains nearly
a million trainable parameters, and thus an even simpler network would probably
perform better given the linearity of the model response to these
parameters.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Random forests</title>
      <p id="d1e359">ESEm also provides the option for emulation with random forests using the
open-source implementation provided by scikit-learn. Random forest
estimators are comprised of an ensemble of decision trees; each decision
tree is a recursive binary partition over the training data, and the
predictions are an average over the predictions of the decision trees
(Breiman, 2001). As a result of this architecture, random forests (along
with other algorithms built on decision trees) have three main attractions.
Firstly, they require very little pre-processing of the inputs as the binary
partitions are invariant to monotonic rescaling of the training data.
Secondly, and of particular importance for climate problems, they are unable
to extrapolate outside of their training data because the predictions are
averages over subsets of the training dataset. As a result of this, a random
forest trained on output from an idealized climate model was shown to automatically
conserve water and energy (O'Gorman and Dwyer, 2018). Finally, their
construction as a combination of binary partitions lends itself to model
responses that might be non-stationary or discontinuous.</p>
      <p id="d1e362">These features are of particular importance for problems involving the
parameterization of sub-grid processes in climate models (Beucler et al.,
2021), and as such, although parameterization is not the purpose of ESEm, we
include a simple random forest implementation and hope to build on this in the
future.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Calibration</title>
      <p id="d1e374">Having trained a fast, robust emulator this can be used to calibrate our
model against available observations. Generally,<?pagebreak page7664?> this problem involves
estimating the model parameters that could give rise to (or best match) the
available observations. More formally, we can define a model as a function
<inline-formula><mml:math id="M8" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> of input parameters <inline-formula><mml:math id="M9" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula> and outputs <inline-formula><mml:math id="M10" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula>: <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mi>F</mml:mi><mml:mfenced close=")" open="("><mml:mi mathvariant="italic">θ</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:math></inline-formula>. Generally, both <inline-formula><mml:math id="M12" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M13" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula> are high dimensional and may
themselves be functions of space and time. Given a set of observations of
<inline-formula><mml:math id="M14" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula>, denoted <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, we would like to calculate the inverse: <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msup><mml:mi>F</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mfenced open="(" close=")"><mml:mi>Y</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e467">This inverse is unlikely to be well defined since many different
combinations of parameters could feasibly result in a given output, and thus we
take a probabilistic approach. In this framework we would like to know the
posterior probability distribution of the input parameters: <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>. Using Bayes'
theorem, we can write this as follows:
          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M18" display="block"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where the probability of an output given the input parameters, <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>, is referred to as the
likelihood. While the model is capable of sampling this distribution,
generally the full distribution is unknown and intractable, and we must
approximate this likelihood.</p>
      <p id="d1e567">Depending on the purpose of the calibration and assumptions about the form
of <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>, different techniques
can be used. For example, in order to determine a (conservative) estimate of the
parametric uncertainty in the model, we can use approximate
Bayesian computation (ABC) to determine those parameters that are plausible
given a set of observations. Alternatively, we may wish to know the optimal
parameters to best match a set of observations and techniques based on Markov Chain Monte Carlo might be more appropriate. Both of these sampling
strategies are available in ESEm, and we introduce each of them here.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Approximate Bayesian computation</title>
      <p id="d1e596">The simplest ABC approach seeks to approximate the likelihood using only
samples from the simulator and a discrepancy function <inline-formula><mml:math id="M21" display="inline"><mml:mi mathvariant="italic">ρ</mml:mi></mml:math></inline-formula>:
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M22" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:mfenced><mml:mo>∝</mml:mo><mml:mo movablelimits="false">∫</mml:mo><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi>Y</mml:mi><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">d</mml:mi><mml:mi>Y</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mspace linebreak="nobreak" width="1em"/><mml:mo>≈</mml:mo><mml:mo movablelimits="false">∫</mml:mo><mml:mi>I</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">ρ</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:mfenced><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi>Y</mml:mi><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">d</mml:mi><mml:mi>Y</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
          where the indicator function <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mi>I</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mfenced open="{" close=""><mml:mtable class="array" columnalign="left left"><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>x</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mtext> is  true </mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>x</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mtext> is  false</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M24" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is a small discrepancy. This can
then be integrated numerically using, e.g. Monte Carlo sampling of <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Any of those parameters for which <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mi mathvariant="italic">ρ</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:mfenced><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula> are accepted, and those which do not are rejected. As <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>→</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>, all parameters are accepted, and we recover <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. For <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, it can be shown that we generate samples from the
posterior <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula> exactly
(Sisson et al., 2018).</p>
      <p id="d1e881">In practice, however, the simulator proposals will never exactly match the
observations and we must make a pragmatic choice for both <inline-formula><mml:math id="M31" display="inline"><mml:mi mathvariant="italic">ρ</mml:mi></mml:math></inline-formula> and
<inline-formula><mml:math id="M32" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. ESEm includes an implementation of the “implausibility metric”
(Williamson et al., 2013; Craig et al., 1996; Vernon et al., 2010), which
defines the discrepancy in terms of the standardized Cartesian distance
between the observations and the emulator mean (<inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>):
            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M34" display="block"><mml:mrow><mml:mi mathvariant="italic">ρ</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub><mml:mfenced close=")" open="("><mml:mi mathvariant="italic">θ</mml:mi></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mfenced open="|" close="|"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow><mml:msqrt><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">E</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">S</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where the total standard deviation is taken to be the squared sum of the
emulator variance (<inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">E</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, where available) and the uncertainty
in the observations (<inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>) and due to representation (<inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>) and structural model uncertainties (<inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">S</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>). As
described above, the representation uncertainty represents the degree to
which observations at a particular time and location can be expected to
match the (typically aggregate) model output (Schutgens et al., 2016a, b).
While reasonable approximates can often be made of this and the
observational uncertainties, the model structural uncertainties are
typically unknown. In<?pagebreak page7665?> some cases, a multi-model ensemble may be available,
which can provide an indication of the structural uncertainties for
particular observables (Sexton et al., 1995), but these are likely to
underestimate true structural uncertainties as models typically share many
key processes and assumptions (Knutti et al., 2013). Indeed, one benefit of
a comprehensive analysis of the parametric uncertainty of a model is that
this structural uncertainty can be explored and determined (Williamson et
al., 2015).</p>
      <p id="d1e1044">Framed in this way, <inline-formula><mml:math id="M39" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> can be thought of as representing the
number of standard deviations the (emulated) model value is from the
observations. While this can be treated as a free parameter and may be
specified in ESEm, it is common to choose <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> since it can be
shown that for unimodal distributions values of <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mi mathvariant="italic">σ</mml:mi></mml:mrow></mml:math></inline-formula> correspond to a
greater than 95 % confidence bound (Vysochanskij and Petunin, 1980).</p>
      <p id="d1e1076">This approach is closely related to the approach of “history matching”
(Williamson et al., 2013) and can be shown to be identical in the case of
fixed <inline-formula><mml:math id="M42" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> and uniform priors (Holden et al., 2015b). The key
difference being that history matching may result in an empty posterior
distribution; that is, it may find no plausible model configurations that
match the observations. In contrast, with ABC the epsilon is typically
treated as a hyper-parameter that can be tuned in order to return a
suitably large number of posterior samples. Both <inline-formula><mml:math id="M43" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> and the prior
distributions can be specified in ESEm and it can thus be used to perform
either analysis. The speed at which samples can typically be generated from
the emulator means we can keep <inline-formula><mml:math id="M44" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> fixed as in history matching and
generate as many samples as is required to estimate the posterior
distribution.</p>
      <p id="d1e1101">When multiple (<inline-formula><mml:math id="M45" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>) observations are used (as is often the case) <inline-formula><mml:math id="M46" display="inline"><mml:mi mathvariant="bold-italic">ρ</mml:mi></mml:math></inline-formula>
can be written as a vector
of implausibilities, <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mo>(</mml:mo><mml:msubsup><mml:mi>Y</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi>O</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub><mml:mfenced open="(" close=")"><mml:mi mathvariant="italic">θ</mml:mi></mml:mfenced><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> or simply <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, and a
modified method of rejection or acceptance must be used. While the full
multivariate implausibility can be estimated, it requires careful
consideration of the covariance structure (Vernon et al., 2010). An obvious
choice is to require <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>∀</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>i</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>∈</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>; however,
this can become restrictive for large <inline-formula><mml:math id="M50" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> due to the curse of
dimensionality. The first step should be to reduce <inline-formula><mml:math id="M51" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> through the use of
summary statistics as described above. After that, the simplest solution is
to require that the maximum implausibility be below our threshold:
<inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:munder><mml:mo>max⁡</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula> (e.g., Vernon et al., 2010). An alternative is
to introduce a tolerance (<inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> such that only some proportion of <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>
need be smaller than <inline-formula><mml:math id="M55" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>: <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>-</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>)</mml:mo><mml:mo>&lt;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M57" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> is the Heaviside function (Johnson et al., 2020),
although this is a somewhat unsatisfactory approach that can hide potential
structural uncertainties. On the other hand, choosing <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> as a first
approximation and then identifying any particular observations that
generate a very large implausibility provides a mechanism for identifying
potential structural (or observational) errors. These can then be removed
and noted for further investigation.</p>
      <p id="d1e1316">In order to illustrate this approach, we apply AERONET (AErosol RObotic
NETwork) observations of AAOD to the problem of constraining ECHAM-HAM model
parameters as described in Sect. 2. The AERONET sun photometers directly
measure solar irradiances at the surface in clear-sky conditions, and by
performing almucantar sky scans are able to estimate the single-scattering
albedo, and hence AAOD, of the aerosol in its vicinity (Dubovik and King,
2000; Holben et al., 1998). Daily average observations are taken from all
available stations for 2017 and co-located with monthly model outputs using
linear interpolation. Figure 3 shows the posterior distribution for the
parameters described in Sect. 2 if uniform priors are assumed, and a
Gaussian process emulator is calibrated with these observations. Of the
million points sampled from this emulator, 729 474 (73 %) are retained as
being compatible with the observations with <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula>. Lower values of both
the imaginary part of the refractive index (IRI500) and the emissions
scaling parameter (BCnumber) are shown to be more compatible with the
observations than higher values, while the rate of wet deposition (Wetdep)
is less constrained. Hence, higher values of IRI500 and BCnumber can be
ruled out as implausible given these observations (within the assumptions of
our prior GP model choices and observational and structural model
uncertainties).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e1333">The posterior distribution of parameters representing the
plausible space of parameters for the example perturbed parameter ensemble
experiment having been calibrated with a GP against observed absorbing
aerosol optical depth measurements from AERONET. The diagonal histograms
represent marginal distributions of each parameter while the off-diagonal
scatterplots represent samples from the joint distributions. The colour
represents the (average) emulated AAOD for each parameter combination. </p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021-f03.png"/>

        </fig>

      <p id="d1e1342">The matrix of implausibilities, <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">ρ</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, can also provide very
useful information regarding the information content of each observation
with respect to the various parameter<?pagebreak page7666?> combinations. Observations with narrow
distributions of small implausibility provide little constraint value,
whereas observations with a broad implausibility provide useful constraints
on the parameters of interest. Observations with narrow distributions of
high implausibility are useful indications of previously unknown structural
uncertainties in the model.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Markov chain Monte Carlo (MCMC)</title>
      <p id="d1e1371">The ABC method described above is simple and powerful but somewhat
inefficient as it repeatedly samples from the same prior. In reality, each
rejection or acceptance of a set of parameters provides us with extra
information about the “true” form of <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula> so that the sampler could spend
more time in plausible regions of the parameter space. This can then allow
us to use smaller values of <inline-formula><mml:math id="M62" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> and hence find better approximations
of <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e1419">Given the joint probability distribution described by Eq. (2) and an initial
choice of parameters <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and (emulated) output <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, the acceptance
probability <inline-formula><mml:math id="M66" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> of a new set of parameters (<inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is given by
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M68" display="block"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mfenced><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e1550">In the default implementation of MCMC calibration ESEm uses the
TensorFlow probability implementation of Hamiltonian Monte Carlo (HMC)
(Neal, 2011), which uses the gradient information automatically calculated by
TensorFlow to inform the proposed new parameters <inline-formula><mml:math id="M69" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>. For simplicity,
we assume that the proposal distribution is symmetric, i.e. <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>=</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>, which is implemented as a
zero log-acceptance correction in the initialization of the TensorFlow
target distribution. The target log probability provided to the TensorFlow
HMC algorithm is then
            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M71" display="block"><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>log⁡</mml:mi><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>log⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:msup><mml:mi>Y</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>+</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>log⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="italic">θ</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>-</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>log⁡</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi>p</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:mi>Y</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>-</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>log⁡</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mi mathvariant="italic">θ</mml:mi></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
      <p id="d1e1692">Note that for this implementation the distance metric <inline-formula><mml:math id="M72" display="inline"><mml:mi mathvariant="italic">ρ</mml:mi></mml:math></inline-formula> must be cast
as a probability distribution with values [0, 1]. We therefore assume that
this discrepancy can be approximated as a normal distribution centred about
the emulator mean (<inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) with a standard deviation equal to the sum of
the squares of the variances as described in Eq. (3):
            <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M74" display="block"><mml:mrow><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mi>p</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>≈</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:msqrt><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mstyle><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mstyle scriptlevel="+1"><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle scriptlevel="+1"><mml:mfrac><mml:mrow><mml:msup><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">E</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">E</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>Y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">S</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e1848">The implementation will then return the requested number of accepted samples
and report the acceptance rate, which provides a useful metric for
tuning the algorithm. It should be noted that MCMC algorithms can be
sensitive to a number of key parameters, including the number of burn-in
steps used (and discarded) before sampling occurs and the step size. Each of
these can be controlled via keyword arguments to the sampler.</p>
      <p id="d1e1851">This approach can provide much more efficient sampling of the emulator and
provide improved parameter estimates, especially when used with informative
priors that can guide the sampler.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Extensions</title>
      <p id="d1e1862">While ABC and MCMC form the backbone of many parameter estimation
techniques, there has been a large amount of research on improved
techniques, particularly for complex simulators with high-dimensional
outputs. See Cranmer et al. (2020) for an excellent recent review of the
state-of-the art techniques, including efforts to emulate the likelihood
directly utilizing the “likelihood ratio trick” and even including
information from the simulator itself (Brehmer et al., 2020). The sampling
interface for ESEm has been designed to decouple the emulation technique
from the sampler and enable easy implementation of additional samplers as
required.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Other use cases</title>
      <p id="d1e1875">In order to demonstrate the generality of ESEm for performing emulation
and/or inference over a variety of Earth science datasets, here we introduce
two further examples.</p>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Cloud-resolving model sensitivity</title>
      <p id="d1e1885">In this example, we use an ensemble of large-domain simulations of realistic
shallow cloud fields to explore the sensitivity of shallow precipitation to
local changes in the environment. The simulation data we use for training
the emulator is taken from a recent study (Dagan and Stier, 2020a) that
performed ensemble daily simulations for a 1-month period during
December 2013 over the ocean to the east of Barbados, sampling the
variability associated with shallow convection. Each day of the month
consisted of two runs, both forced by realistic boundary conditions taken
from reanalysis but with different cloud droplet number concentrations
(CDNCs) to represent clean and polluted conditions. The altered CDNC was
found to have little impact on the precipitation rate in the simulations,
and thus we simply treat the CDNC change as a perturbation to the initial
conditions and combine the two CDNC runs from each day together to increase
the amount of data available for training the emulator. At hourly
resolution, this provides 1488 data points.</p>
      <p id="d1e1888">However, given that precipitation is strongly tied to the local cloud
regime, not fully controlling for cloud regime can introduce spurious
correlations when training the emulator. As such we also filter out all
hours that are not<?pagebreak page7667?> associated with shallow convective clouds. To do this,
we consider domain-mean vertical profiles of total cloud water content
(liquid <inline-formula><mml:math id="M75" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> ice), <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and filter out all hours where the vertical sum
of <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> below 600 hPa exceeds 10<inline-formula><mml:math id="M78" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> kg/kg. This condition allows us to
filter out hours associated with the onset and development of deep
convection in the domain and mask out hours with high cirrus
layers or hours dominated by transient mesoscale convective activity advected in by the boundary conditions. After this, we are left with 850
hourly data points that meet our criteria and can be used to train the
emulator.</p>
      <p id="d1e1932">As our predictors we choose five representative cloud-controlling factors
from the literature (Scott et al., 2020), namely, in-cloud liquid water path
(LWP), geopotential height at 700 hPa (<inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>), estimated inversion
strength (EIS), sea surface temperature (SST) and the vertical pressure
velocity at 700 hPa (<inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>). All quantities are domain-mean
features, and the LWP is a column average.</p>
      <p id="d1e1957">We then develop a regression model to predict shallow precipitation as a
function of these five domain-mean features using the scikit-learn random
forest implementation within ESEm. After validating the model using
leave-one-out cross-validation, we then retrain the model using the full
dataset and use this model to predict the precipitation across a wide range
of values environmental values.</p>
      <p id="d1e1961">Finally, for the purpose of plotting, we reduce the dimensionality of our
final prediction by averaging over all features excluding LWP and <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>
and then plot in LWP-<inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> space. This allows us to effectively account
for (or marginalize out) those other environmental factors and investigate
the sensitivity of precipitation to LWP for a given <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, as shown in
Fig. 4. LWP and <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> were chosen for plotting purposes as they are
mutually uncorrelated and thus span the two-dimensional space effectively.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e2010">The mean precipitation emulated by ESEm using a random forest
model trained on the five environmental factors diagnosed from an ensemble
of cloud-resolving models as described in the text. Panel <bold>(a)</bold> shows a
validation plot of the emulated precipitation values against the model
values using leave-one-out cross-validation. Panel <bold>(b)</bold> shows the emulated
precipitation plotted as a function of liquid water path and geopotential
height at 700 hPa (<inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>) by averaging over the
remaining three dimensions corresponding to SST, EIS and
<inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. A random subset of 140 of the training
points are also shown overlaid on the emulated precipitation as scatter
points, with the scatter outlines showing the relative error between the
emulator and the training data.</p></caption>
          <?xmltex \igopts{width=412.564961pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021-f04.png"/>

        </fig>

      <p id="d1e2047">Figure 4a illustrates how the random forest regression model can capture
most of the variance in shallow precipitation from the cloud-resolving
simulations, with an <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.81 and a root-mean-square error (RMSE) of
0.01 mm/h. Additionally, the model captures basic physical features
such as the non-negativity of precipitation without requiring additional
constraints. The coloured surface in the Fig. 4b shows the two-dimensional
truncation of the model predictions after averaging over all features except
LWP and <inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and shows that the model is behaving physically by
predicting an increase in precipitation at larger LWP and lower <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">700</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e2083">While emulators have previously been used to investigate the behaviour of
shallow cloud fields in high-resolution models (e.g. using GPs, Glassmeier
et al., 2019), this example demonstrates that random forests are another
promising approach, particularly due to their extrapolation properties.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Exploring CMIP6 scenario uncertainty</title>
      <p id="d1e2094">The sixth coupled model intercomparison project (CMIP6); (Eyring et al.,
2016) coordinates a large number of formal model intercomparison projects
(MIPs), including ScenarioMIP (O'Neill et al., 2016), which explored the
climate response to a range of future emissions scenarios. While internal
variability and model uncertainty can dominate the uncertainties in future
temperature responses to these future emissions scenarios over the next
30–40 years, uncertainty in the scenarios themselves dominates the total
uncertainty by the end of the century (Hawkins and Sutton, 2009;
Watson-Parris, 2021). Efficiently exploring this uncertainty can be useful
for policy makers to understand the full range of temperature responses to
different mitigation policies. While simple climate models are typically
used for this purpose (e.g., Smith et al., 2018; Geoffroy et al., 2013),
statistical emulators can also be of use.</p>
      <p id="d1e2097">Here we provide a simple example of emulating the global mean surface
temperature response to a change in CO<inline-formula><mml:math id="M90" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> concentration and aerosol
loading. For these purposes we consider a change in aerosol optical depth
(AOD) and the cumulative emissions of CO<inline-formula><mml:math id="M91" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> as compared to the start of
the ScenarioMIP simulations (averaged over 2015–2020). We use the global
mean AOD and cumulative CO<inline-formula><mml:math id="M92" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> at 2050 and 2100 for each model (11
models were used in this example: CanESM5, ACCESS-ESM1-5, ACCESS-CM2,
MPI-ESM1-2-HR, MIROC-ES2L, HadGEM3-GC31-LL, UKESM1-0-LL, MPI-ESM1-2-LR,
CESM2, CESM2-WACCM and NorESM2-LM) across the five main scenarios (SSP119,
SSP126, SSP245, SSP370, SSP585 and SSP434). The mean was taken over model
submissions for which multiple ensemble members were available to reduce
model internal variability. As shown in Fig. 5, a simple Gaussian process
regression model is able to fit the resulting temperature change well across
the range of training data. We can see that the emulator uncertainty
increases away from the CMIP6 model values as expected and largely reflects
the inter-model spread within the range of scenarios explored here.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e2129">Global mean surface temperature response to a change in aerosol
optical depth (AOD) or cumulative atmospheric CO<inline-formula><mml:math id="M93" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> concentration
relative to the 2015–2020 average as emulated by ESEm using Gaussian process
regression trained on CMIP6 ScenarioMIP outputs (shown as circles, the
multi-model mean for each scenario is shown as a square point). The contour
lines represent the 1<inline-formula><mml:math id="M94" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> uncertainty in the emulator values (in
Kelvin).</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021-f05.png"/>

        </fig>

      <p id="d1e2155">Using a MCMC sampler, we are able to generate a joint probability
distribution for the required change in AOD and CO<inline-formula><mml:math id="M95" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in order to meet
2.0<inline-formula><mml:math id="M96" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> temperature rises since pre-industrial times as shown in Fig. 6 (assuming the present-day simulations start at <inline-formula><mml:math id="M97" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula>0.8<inline-formula><mml:math id="M98" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> for
simplicity). The effect of a decrease in (cooling) aerosol on the remaining
carbon budget for a given temperature target is clear. It should be noted
though that the short lifetime of aerosol means that while aerosol emissions
can affect the year of crossing a certain temperature threshold, stabilizing
at that temperature requires net-zero emissions of CO<inline-formula><mml:math id="M99" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> regardless of
the aerosol.</p>
      <p id="d1e2201">While more physically interpretable emulators are appropriate for such
important estimates, the advantage these statistical emulators have over simple impulse response models, for example, is the ability to generalize to
high-dimensional outputs, such as those shown in Fig. 2 (in addition, see, e.g.
Mansfield et al., 2020). They can also account for the full complexity of
Earth system models and the many processes they represent. This is
straightforward to achieve with ESEm.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e2206">The joint probability distribution for a change in aerosol optical
depth (AOD) or cumulative atmospheric CO<inline-formula><mml:math id="M100" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> concentration relative to the
2015–2020 average compatible with a change of 1.2 K global mean surface
temperature (approximately 2<inline-formula><mml:math id="M101" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> above pre-industrial
temperatures) as sampled from a Gaussian process emulator using MCMC
accounting for emulator uncertainties. The solid black line corresponds to a
change of 1.2 K by interpolating the emulator surface shown in Fig. 5.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7659/2021/gmd-14-7659-2021-f06.png"/>

        </fig>

</sec>
</sec>
<?pagebreak page7668?><sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions</title>
      <?pagebreak page7669?><p id="d1e2242">We present ESEm – a Python library for easily emulating and calibrating
Earth system models. Combined with the popular geospatial libraries Iris and
CIS, ESEm makes reading, collocating and emulating a variety of model and
Earth system data straightforward. The package includes Gaussian process,
neural network and random forest emulation engines, and a minimal, clearly
defined interface allows for simple extension in addition to providing tools for validating
these emulators. ESEm also includes three popular techniques for calibration
(or inference), optimized using TensorFlow to enable efficient sampling of
the emulators. By building on fast and robust libraries in a modular way we
hope to provide a framework for a variety of common workflows.
<?xmltex \hack{\newpage}?>
We have demonstrated the use of ESEm for model parameter constraint and
optimal estimation with a simple perturbed parameter ensemble example. We
have also shown how ESEm can be used to fit high-dimensional response
surfaces over an ensemble of cloud-resolving model simulations in order to
determine the sensitivity of precipitation to environmental parameters in
these simulations. Such approaches can also be useful in marginalizing over
potentially confounding variables in observational data. Finally, we
presented the use of ESEm for the emulation of the multi-model CMIP6
ensemble in order to explore the global mean temperature response to changes
in aerosol loading and CO<inline-formula><mml:math id="M102" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> concentration between the handful of
prescribed scenarios available in ScenarioMIP.</p>
      <p id="d1e2256">There are many opportunities to build on this framework, introduce the
latest inference techniques (Brehmer et al., 2020) and to bring this
setting of parameter estimation closer to the large body of work in data
assimilation. While this has historically focussed on improving estimates of
time-varying boundary conditions (the model “state”), recent work has
explored using these approaches to concurrently estimate constant model
parameters (Brajard et al., 2020). We hope this tool will provide a useful
framework with which to explore such ideas.</p>
      <p id="d1e2259">We strive to ensure reliability in the library through the use of automated
unit tests and coverage metrics. We also provide comprehensive documentation
and a number of example notebooks to ensure useability and accessibility.
Through the use of a number of worked examples we hope also to have shed
some light on this at times seemingly mysterious sub-field.</p>
</sec>

      
      </body>
    <back><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d1e2266">The ESEm code, including that used to generate the plots in this paper is
available here: <ext-link xlink:href="https://doi.org/10.5281/zenodo.5466563" ext-link-type="DOI">10.5281/zenodo.5466563</ext-link> (Watson-Parris et al., 2021).</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d1e2275">The BC PPE data are available here: <ext-link xlink:href="https://doi.org/10.5281/zenodo.3856645" ext-link-type="DOI">10.5281/zenodo.3856645</ext-link> (Watson-Parris and Deaconu, 2020). The ensemble CRM data are available
here: <ext-link xlink:href="https://doi.org/10.5281/zenodo.3785603" ext-link-type="DOI">10.5281/zenodo.3785603</ext-link> (Dagan and Stier, 2020b). The raw CMIP6 data
used here are available through the Earth System Grid Federation and can be
accessed through different international nodes, e.g. <uri>https://esgf-index1.ceda.ac.uk/search/cmip6-ceda/</uri> (last access: 16 July 2021). The derived dataset is
available in the ESEm repository: <ext-link xlink:href="https://doi.org/10.5281/zenodo.5466563" ext-link-type="DOI">10.5281/zenodo.5466563</ext-link> (Watson-Parris et al., 2021).</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e2293">DWP designed the package and led its development. AD contributed the
precipitation example and random forest module. LD provided the AAOD example and
dataset. PS provided supervision and funding acquisition. DWP prepared the
manuscript with contributions from all co-authors.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e2299">The contact author has declared that neither they nor their co-authors have any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e2305">While every effort has been made to make this tool easy to use and
generally applicable, the example models provided make many implicit (and
explicit) assumptions about the functional form and statistical properties
of the data being modelled. Like any tool, the ESEm framework can be
misused. Users should familiarize themselves with the models being used and
consult the many excellent textbooks on this subject if in any doubt as to
their appropriateness for the task at hand. <?xmltex \hack{\newline}?><?xmltex \hack{\newline}?>
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e2314">This work has evolved through numerous projects in collaboration with Ken
Carslaw, Lindsay Lee, Leighton Regayre and Jill Johnson, and we thank them
for sharing their insights and their R scripts from which this package is
inspired. Those previous collaborations were funded by Natural Environment
Research Council (NERC) grants NE/G006148/1 (AEROS), NE/J024252/1 (GASSP)
and E/P013406/1 (A-CURE), which we gratefully acknowledge.</p><p id="d1e2316">For this work specifically, Duncan Watson-Parris and Philip Stier acknowledge funding from NERC
projects NE/P013406/1 (A-CURE) and NE/S005390/1 (ACRUISE), as well as from
the European Union's Horizon 2020 research and innovation programme iMIRACLI
under Marie Skłodowska-Curie grant agreement no. 860100. Philip Stier additionally
acknowledges support from the ERC project RECAP and the FORCeS project under
the European Union's Horizon 2020 research programme with grant agreement nos. 724602 and 821205. Andrew Williams acknowledges funding from the Natural Environment
Research Council, Oxford DTP, award NE/S007474/1. Lucia Deaconu acknowledges funding
from NERC project NE/P013406/1 (A-CURE).</p><p id="d1e2318">The authors also gratefully acknowledge useful discussions with Dino
Sedjonovic, Shahine Bouabid and Daniel Partridge, as well as the support of
Amazon Web Services through an AWS Machine Learning Research Award and
NVIDIA through a GPU research grant. We further thank Victoria Volodina and
one anonymous reviewer for their thorough and considered feedback that
helped improve this paper.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e2323">This research has been supported by the Natural Environment Research Council (grant nos. NE/P013406/1, NE/S005390/1, and NE/S007474/1), the European Research Council, H2020 European Research Council (RECAP, grant no.  724602; and FORCeS, grant no. 821205), and the H2020 Marie Skłodowska-Curie Action (grant no. 860100).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e2330">This paper was edited by Steven Phipps and reviewed by Victoria Volodina and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><?label 1?><mixed-citation>Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.,
Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow,
I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L.,
Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah,
C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K.,
Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden,
P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow:
Large-Scal<?pagebreak page7670?>e Machine Learning on Heterogeneous Distributed Systems, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1603.04467">arXiv:1603.04467</ext-link>
2016.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><?label 1?><mixed-citation>Akaike, H.: A new look at the statistical model identification, IEEE T.
Automat. Contr., 19, 716–723, <ext-link xlink:href="https://doi.org/10.1109/tac.1974.1100705" ext-link-type="DOI">10.1109/tac.1974.1100705</ext-link>, 1974.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><?label 1?><mixed-citation>Bellouin, N., Quaas, J., Gryspeerdt, E., Kinne, S., Stier, P.,
Watson-Parris, D., Boucher, O., Carslaw, K. S., Christensen, M., Daniau, A.
-L., Dufresne, J. -L., Feingold, G., Fiedler, S., Forster, P., Gettelman,
A., Haywood, J. M., Lohmann, U., Malavelle, F., Mauritsen, T., McCoy, D. T.,
Myhre, G., Mülmenstädt, J., Neubauer, D., Possner, A., Rugenstein,
M., Sato, Y., Schulz, M., Schwartz, S. E., Sourdeval, O., Storelvmo, T.,
Toll, V., Winker, D., and Stevens, B.: Bounding Global Aerosol Radiative
Forcing of Climate Change, Rev. Geophys., 58, e2019RG000660, <ext-link xlink:href="https://doi.org/10.1029/2019RG000660" ext-link-type="DOI">10.1029/2019RG000660</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><?label 1?><mixed-citation>Beucler, T., Pritchard, M., Rasp, S., Ott, J., Baldi, P., and Gentine, P.:
Enforcing Analytic Constraints in Neural Networks Emulating Physical
Systems, Phys. Rev. Lett., 126, 098302,
<ext-link xlink:href="https://doi.org/10.1103/physrevlett.126.098302" ext-link-type="DOI">10.1103/physrevlett.126.098302</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><?label 1?><mixed-citation>Brajard, J., Carrassi, A., Bocquet, M., and Bertino, L.: Combining data
assimilation and machine learning to emulate a dynamical model from sparse
and noisy observations: A case study with the Lorenz 96 model, J. Comput.
Sci.-Neth., 44, 101171, <ext-link xlink:href="https://doi.org/10.1016/j.jocs.2020.101171" ext-link-type="DOI">10.1016/j.jocs.2020.101171</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><?label 1?><mixed-citation>Brehmer, J., Louppe, G., Pavez, J., and Cranmer, K.: Mining gold from
implicit models to improve likelihood-free inference, P. Natl. Acad. Sci. USA.,  117, 5242–5249,
<ext-link xlink:href="https://doi.org/10.1073/pnas.1915980117" ext-link-type="DOI">10.1073/pnas.1915980117</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><?label 1?><mixed-citation>Breiman, L.: Random Forests, Mach. Learn., 45, 5–32,
<ext-link xlink:href="https://doi.org/10.1023/a:1010933404324" ext-link-type="DOI">10.1023/a:1010933404324</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><?label 1?><mixed-citation>Burt, D. R., Rasmussen, C. E., and  van der Wilk, M.: Rates of Convergence
for Sparse Variational Gaussian Process Regression, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1903.03571">arXiv:1903.03571</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><?label 1?><mixed-citation>Chollet, F.: Keras, available at: <uri>https://keras.io</uri> (last access: 12 September 2021), 2015.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><?label 1?><mixed-citation>Cleary, E., Garbuno-Inigo, A., Lan, S., Schneider, T., and Stuart, A. M.:
Calibrate, emulate, sample, J. Comput. Phys., 424, 109716,
<ext-link xlink:href="https://doi.org/10.1016/j.jcp.2020.109716" ext-link-type="DOI">10.1016/j.jcp.2020.109716</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><?label 1?><mixed-citation>Couvreux, F., Hourdin, F., Williamson, D., Roehrig, R., Volodina, V.,
Villefranque, N., Rio, C., Audouin, O., Salter, J., Bazile, E., Brient, F.,
Favot, F., Honnert, R., Lefebvre, M., Madeleine, J., Rodier, Q., and Xu, W.:
Process-Based Climate Model Development Harnessing Machine Learning: I. A
Calibration Tool for Parameterization Improvement, J. Adv. Model Earth. Sy., 13, e2020MS002217,
<ext-link xlink:href="https://doi.org/10.1029/2020ms002217" ext-link-type="DOI">10.1029/2020ms002217</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><?label 1?><mixed-citation>Craig, P. S., Goldstein, M., Seheult, A. H., and Smith, J. A.: Bayes linear
strategies for history matching of hydrocarbon reservoirs, in: Bayesian
Statistics, vol. 5, edited by: Bernado, J. M., Berger, J. O., Dawid, A. P.,
and Smith, A. F. M., Clarendon Press, Oxford, UK, 69–95, 1996.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><?label 1?><mixed-citation>Cranmer, K., Brehmer, J., and Louppe, G.: The frontier of simulation-based
inference, P. Natl. Acad. Sci. USA, 117, 30055–30062,
<ext-link xlink:href="https://doi.org/10.1073/pnas.1912789117" ext-link-type="DOI">10.1073/pnas.1912789117</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><?label 1?><mixed-citation>Dagan, G. and Stier, P.: Ensemble daily simulations for elucidating cloud–aerosol interactions under a large spread of realistic environmental conditions, Atmos. Chem. Phys., 20, 6291–6303, <ext-link xlink:href="https://doi.org/10.5194/acp-20-6291-2020" ext-link-type="DOI">10.5194/acp-20-6291-2020</ext-link>, 2020a.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><?label 1?><mixed-citation>Dagan, G. and  Stier, P.:  Data of the paper: Ensemble daily simulations for elucidating cloud–aerosol interactions under a large spread of realistic environmental conditions, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.3785603" ext-link-type="DOI">10.5281/zenodo.3785603</ext-link>, 2020b.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><?label 1?><mixed-citation>Dagon, K., Sanderson, B. M., Fisher, R. A., and Lawrence, D. M.: A machine learning approach to emulation and biophysical parameter estimation with the Community Land Model, version 5, Adv. Stat. Clim. Meteorol. Oceanogr., 6, 223–244, <ext-link xlink:href="https://doi.org/10.5194/ascmo-6-223-2020" ext-link-type="DOI">10.5194/ascmo-6-223-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><?label 1?><mixed-citation>Damianou, A. C. and Lawrence, N. D.: Deep Gaussian Processes, in: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR Proceedings of Machine Learning Research,  Scottsdale, Arizona, USA, 207–215, 2013.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><?label 1?><mixed-citation>Dawson, A.: eofs: A Library for EOF Analysis of Meteorological,
Oceanographic, and Climate Data, J. Open Res. Softw., 4, e14,
<ext-link xlink:href="https://doi.org/10.5334/jors.122" ext-link-type="DOI">10.5334/jors.122</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><?label 1?><mixed-citation>Dubovik, O. and King, M. D.: A flexible inversion algorithm for retrieval of
aerosol optical properties from Sun and sky radiance measurements, J. Geophys.
Res.-Atmos., 105, 20673–20696, <ext-link xlink:href="https://doi.org/10.1029/2000jd900282" ext-link-type="DOI">10.1029/2000jd900282</ext-link>,
2000.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><?label 1?><mixed-citation>Duvenaud, D.: Automatic model construction with Gaussian processes, Doctoral thesis, <ext-link xlink:href="https://doi.org/doi.org/10.17863/CAM.14087" ext-link-type="DOI">doi.org/10.17863/CAM.14087</ext-link>,  2014.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><?label 1?><mixed-citation>Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, <ext-link xlink:href="https://doi.org/10.5194/gmd-9-1937-2016" ext-link-type="DOI">10.5194/gmd-9-1937-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><?label 1?><mixed-citation>Fearnhead, P. and Prangle, D.: Constructing summary statistics for
approximate Bayesian computation: semi-automatic approximate Bayesian
computation, J. Roy. Stat. Soc. Ser. B, 74,
419–474, <ext-link xlink:href="https://doi.org/10.1111/j.1467-9868.2011.01010.x" ext-link-type="DOI">10.1111/j.1467-9868.2011.01010.x</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><?label 1?><mixed-citation>Gal, Y. and Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, in: Proceedings of The 33rd International Conference on Machine Learning, PMLR Proceedings of Machine Learning Research, New York, New York, USA, 1050–1059, 2015.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><?label 1?><mixed-citation>Geoffroy, O., Saint-Martin, D., Olivié, D. J. L., Voldoire, A., Bellon,
G., and Tytéca, S.: Transient Climate Response in a Two-Layer
Energy-Balance Model. Part I: Analytical Solution and Parameter Calibration
Using CMIP5 AOGCM Experiments, J. Climate, 26, 1841–1857,
<ext-link xlink:href="https://doi.org/10.1175/jcli-d-12-00195.1" ext-link-type="DOI">10.1175/jcli-d-12-00195.1</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><?label 1?><mixed-citation>Glassmeier, F., Hoffmann, F., Johnson, J. S., Yamaguchi, T., Carslaw, K. S., and Feingold, G.: An emulator approach to stratocumulus susceptibility, Atmos. Chem. Phys., 19, 10191–10203, <ext-link xlink:href="https://doi.org/10.5194/acp-19-10191-2019" ext-link-type="DOI">10.5194/acp-19-10191-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><?label 1?><mixed-citation>GPy: GPy: A Gaussian process framework in python, available at: <uri>http://github.com/SheffieldML/GPy</uri> (last access: 1 August 2021),  2012.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><?label 1?><mixed-citation>Hawkins, E. and Sutton, R.: The Potential to Narrow Uncertainty in Regional
Climate Predictions, B. Am. Meteorol. Soc., 90, 1095–1108,
<ext-link xlink:href="https://doi.org/10.1175/2009bams2607.1" ext-link-type="DOI">10.1175/2009bams2607.1</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><?label 1?><mixed-citation>Henze, D. K., Hakami, A., and Seinfeld, J. H.: Development of the adjoint of GEOS-Chem, Atmos. Chem. Phys., 7, 2413–2433, <ext-link xlink:href="https://doi.org/10.5194/acp-7-2413-2007" ext-link-type="DOI">10.5194/acp-7-2413-2007</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><?label 1?><mixed-citation>Holben, B. N., Eck, T. F., Slutsker, I., Tanré, D., Buis, J. P., Setzer,
A., Vermote, E., Reagan, J. A., Kaufman, Y. J., Nakajima, T., Lavenu, F.,
Jankowiak, I., and Smirnov, A.: AERONET – A Federated Instrument Network and
Data Archive fo<?pagebreak page7671?>r Aerosol Characterization, Remote Sens. Environ., 66, 1–16,
<ext-link xlink:href="https://doi.org/10.1016/s0034-4257(98)00031-5" ext-link-type="DOI">10.1016/s0034-4257(98)00031-5</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><?label 1?><mixed-citation>Holden, P. B., Edwards, N. R., Garthwaite, P. H., Fraedrich, K., Lunkeit, F., Kirk, E., Labriet, M., Kanudia, A., and Babonneau, F.: PLASIM-ENTSem v1.0: a spatio-temporal emulator of future climate change for impacts assessment, Geosci. Model Dev., 7, 433–451, <ext-link xlink:href="https://doi.org/10.5194/gmd-7-433-2014" ext-link-type="DOI">10.5194/gmd-7-433-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><?label 1?><mixed-citation>Holden, P. B., Edwards, N. R., Hensman, J., and Wilkinson, R. D.: ABC for
climate: dealing with expensive simulators, arXiv [preprint],  <ext-link xlink:href="https://arxiv.org/abs/1511.03475">arXiv:1511.03475</ext-link>, 2015a.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><?label 1?><mixed-citation>Holden, P. B., Edwards, N. R., Garthwaite, P. H., and Wilkinson, R. D.:
Emulation and interpretation of high-dimensional climate model outputs, K. Appl. Stat., 42, 2038–2055,
<ext-link xlink:href="https://doi.org/10.1080/02664763.2015.1016412" ext-link-type="DOI">10.1080/02664763.2015.1016412</ext-link>, 2015b.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><?label 1?><mixed-citation>Holden, P. B., Edwards, N. R., Rangel, T. F., Pereira, E. B., Tran, G. T., and Wilkinson, R. D.: PALEO-PGEM v1.0: a statistical emulator of Pliocene–Pleistocene climate, Geosci. Model Dev., 12, 5137–5155, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-5137-2019" ext-link-type="DOI">10.5194/gmd-12-5137-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><?label 1?><mixed-citation>Hou, Z. and Rubin, Y.: On minimum relative entropy concepts and prior
compatibility issues in vadose zone inverse and forward modeling, Water
Resour. Res., 41, W12425, <ext-link xlink:href="https://doi.org/10.1029/2005wr004082" ext-link-type="DOI">10.1029/2005wr004082</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><?label 1?><mixed-citation>Hoyer, S. and Hamman, J.: xarray: N-D labeled Arrays and Datasets in Python,
J. Open Res. Softw., 5, 10, <ext-link xlink:href="https://doi.org/10.5334/jors.148" ext-link-type="DOI">10.5334/jors.148</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><?label 1?><mixed-citation>Johnson, J. S., Regayre, L. A., Yoshioka, M., Pringle, K. J., Turnock, S. T., Browse, J., Sexton, D. M. H., Rostron, J. W., Schutgens, N. A. J., Partridge, D. G., Liu, D., Allan, J. D., Coe, H., Ding, A., Cohen, D. D., Atanacio, A., Vakkari, V., Asmi, E., and Carslaw, K. S.: Robust observational constraint of uncertain aerosol processes and emissions in a climate model and the effect on aerosol radiative forcing, Atmos. Chem. Phys., 20, 9491–9524, <ext-link xlink:href="https://doi.org/10.5194/acp-20-9491-2020" ext-link-type="DOI">10.5194/acp-20-9491-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><?label 1?><mixed-citation>Karydis, V. A., Capps, S. L., Russell, A. G., and Nenes, A.: Adjoint sensitivity of global cloud droplet number to aerosol and dynamical parameters, Atmos. Chem. Phys., 12, 9041–9055, <ext-link xlink:href="https://doi.org/10.5194/acp-12-9041-2012" ext-link-type="DOI">10.5194/acp-12-9041-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><?label 1?><mixed-citation>Kennedy, M. C. and O'Hagan, A.: Bayesian calibration of computer models, J.
Roy. Stat. Soc. Ser. B, 63, 425–464,
<ext-link xlink:href="https://doi.org/10.1111/1467-9868.00294" ext-link-type="DOI">10.1111/1467-9868.00294</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><?label 1?><mixed-citation>Knudde, N.,  van der Herten, J., Dhaene, T., and Couckuyt, I.: GPflowOpt: A
Bayesian Optimization Library using TensorFlow, arXiv [preprint],  <ext-link xlink:href="https://arxiv.org/abs/1711.03845">arXiv:1711.03845</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><?label 1?><mixed-citation>Knutti, R., Meehl, G. A., Allen, M. R., and Stainforth, D. A.: Constraining
Climate Sensitivity from the Seasonal Cycle in Surface Temperature, J.
Climate, 19, 4224–4233, <ext-link xlink:href="https://doi.org/10.1175/jcli3865.1" ext-link-type="DOI">10.1175/jcli3865.1</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><?label 1?><mixed-citation>Knutti, R., Masson, D., and Gettelman, A.: Climate model genealogy:
Generation CMIP5 and how we got there, Geophys. Res. Lett., 40, 1194–1199,
<ext-link xlink:href="https://doi.org/10.1002/grl.50256" ext-link-type="DOI">10.1002/grl.50256</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><?label 1?><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Chalikov, D. V.: New
Approach to Calculation of Atmospheric Model Physics: Accurate and Fast
Neural Network Emulation of Longwave Radiation in a Climate Model, Mon.
Weather Rev., 133, 1370–1383, <ext-link xlink:href="https://doi.org/10.1175/mwr2923.1" ext-link-type="DOI">10.1175/mwr2923.1</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><?label 1?><mixed-citation>Lee, C., Martin, R. V., Donkelaar, A. van, Lee, H., Dickerson, R. R., Hains,
J. C., Krotkov, N., Richter, A., Vinnikov, K., and Schwab, J. J.: SO<inline-formula><mml:math id="M103" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
emissions and lifetimes: Estimates from inverse modeling using in situ and
global, space-based (SCIAMACHY and OMI) observations, J. Geophys. Res.-Atmos.,    116, D06304, <ext-link xlink:href="https://doi.org/10.1029/2010jd014758" ext-link-type="DOI">10.1029/2010jd014758</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><?label 1?><mixed-citation>Lee, L. A., Carslaw, K. S., Pringle, K. J., Mann, G. W., and Spracklen, D. V.: Emulation of a complex global aerosol model to quantify sensitivity to uncertain parameters, Atmos. Chem. Phys., 11, 12253–12273, <ext-link xlink:href="https://doi.org/10.5194/acp-11-12253-2011" ext-link-type="DOI">10.5194/acp-11-12253-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><?label 1?><mixed-citation>Mansfield, L. A., Nowack, P. J., Kasoar, M., Everitt, R. G., Collins, W. J.,
and Voulgarakis, A.: Predicting global patterns of long-term climate change
from short-term simulations using machine learning, Npj Clim. Atmos.
Sci., 3, 44, <ext-link xlink:href="https://doi.org/10.1038/s41612-020-00148-5" ext-link-type="DOI">10.1038/s41612-020-00148-5</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib46"><label>46</label><?label 1?><mixed-citation>Matheron, G.: Principles of geostatistics, Econ. Geol., 58, 1246–1266,
<ext-link xlink:href="https://doi.org/10.2113/gsecongeo.58.8.1246" ext-link-type="DOI">10.2113/gsecongeo.58.8.1246</ext-link>, 1963.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><?label 1?><mixed-citation>Matthews, A.,   G. de G.,   van der Wilk, M., Nickson, T.,   Fujii, K.,
Boukouvalas, A., Leon-Villagra, P., Ghahramani, Z., and Hensman, J.:
GPflow: A Gaussian process library using TensorFlow, J. Math. Learn. Res., 18, 1–6, 2017.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><?label 1?><mixed-citation>Mauritsen, T., Stevens, B., Roeckner, E., Crueger, T., Esch, M., Giorgetta,
M., Haak, H., Jungclaus, J., Klocke, D., Matei, D., Mikolajewicz, U., Notz,
D., Pincus, R., Schmidt, H., and Tomassini, L.: Tuning the climate of a
global model, J. Adv. Model Earth. Sy., 4, M00A01,
<ext-link xlink:href="https://doi.org/10.1029/2012ms000154" ext-link-type="DOI">10.1029/2012ms000154</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><?label 1?><mixed-citation>Met Office: Cartopy: a cartographic python library with a Matplotlib
interface, available at: <uri>http://scitools.org.uk/cartopy</uri> (last access: 1 August 2021), 2020a.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><?label 1?><mixed-citation>Met Office: Iris: A Python library for analysing and visualising
meteorological and oceanographic data sets, available at: <uri>https://scitools.org.uk/iris/</uri> (last access: 1 October 2021),  2020b.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><?label 1?><mixed-citation>Morris, M. D. and Mitchell, T. J.: Exploratory designs for computational
experiments, J. Stat. Plan Infer., 43, 381–402,
<ext-link xlink:href="https://doi.org/10.1016/0378-3758(94)00035-t" ext-link-type="DOI">10.1016/0378-3758(94)00035-t</ext-link>, 1995.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><?label 1?><mixed-citation>Neal, R.: MCMC Using Hamiltonian Dynamics, in: Handbook of Markov Chain Monte Carlo, edited by:   Brooks, S.,  Gelman,  A.,  Jones, G., and Meng, X.-L.,   Chapman and Hall/CRC, New York,   <ext-link xlink:href="https://doi.org/10.1201/b10905-6" ext-link-type="DOI">10.1201/b10905-6</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><?label 1?><mixed-citation>Neubauer, D., Ferrachat, S., Siegenthaler-Le Drian, C., Stier, P., Partridge, D. G., Tegen, I., Bey, I., Stanelle, T., Kokkola, H., and Lohmann, U.: The global aerosol–climate model ECHAM6.3–HAM2.3 – Part 2: Cloud evaluation, aerosol radiative forcing, and climate sensitivity, Geosci. Model Dev., 12, 3609–3639, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-3609-2019" ext-link-type="DOI">10.5194/gmd-12-3609-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><?label 1?><mixed-citation>O'Gorman, P. A. and Dwyer, J. G.: Using Machine Learning to Parameterize
Moist Convection: Potential for Modeling of Climate, Climate Change, and
Extreme Events, J. Adv. Model Earth. Sy., 10, 2548–2563,
<ext-link xlink:href="https://doi.org/10.1029/2018ms001351" ext-link-type="DOI">10.1029/2018ms001351</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><?label 1?><mixed-citation>O'Neill, B. C., Tebaldi, C., van Vuuren, D. P., Eyring, V., Friedlingstein, P., Hurtt, G., Knutti, R., Kriegler, E., Lamarque, J.-F., Lowe, J., Meehl, G. A., Moss, R., Riahi, K., and Sanderson, B. M.: The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6, Geosci. Model Dev., 9, 3461–3482, <ext-link xlink:href="https://doi.org/10.5194/gmd-9-3461-2016" ext-link-type="DOI">10.5194/gmd-9-3461-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><?label 1?><mixed-citation>Partridge, D. G., Vrugt, J. A., Tunved, P., Ekman, A. M. L., Gorea, D., and Sorooshian, A.: Inverse modeling of cloud-aerosol interactions – Part 1: Detailed response surface analysis, Atmos. Chem. Phys., 11, 7269–7287, <ext-link xlink:href="https://doi.org/10.5194/acp-11-7269-2011" ext-link-type="DOI">10.5194/acp-11-7269-2011</ext-link>, 2011.</mixed-citation></ref>
      <?pagebreak page7672?><ref id="bib1.bib57"><label>57</label><?label 1?><mixed-citation>Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.:
Scikit-learn: Machine Learning in Python, J. Mach. Learn., 12, 2825–2830, 2011.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><?label 1?><mixed-citation>Prangle, D.: Summary Statistics, in: Handbook of Approximate Bayesian Computation, edited by:   Sisson, S. A.,  Fan, Y.,  and Beaumont, M. A., Chapman and Hall/CRC, New York,
2018.</mixed-citation></ref>
      <ref id="bib1.bib59"><label>59</label><?label 1?><mixed-citation>Rasmussen, C. E. and Williams, C. K. I.: Gaussian Processes for Machine
Learning, The MIT Press, <ext-link xlink:href="https://doi.org/10.7551/mitpress/3206.001.0001" ext-link-type="DOI">10.7551/mitpress/3206.001.0001</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib60"><label>60</label><?label 1?><mixed-citation>Regayre, L. A., Johnson, J. S., Yoshioka, M., Pringle, K. J., Sexton, D. M. H., Booth, B. B. B., Lee, L. A., Bellouin, N., and Carslaw, K. S.: Aerosol and physical atmosphere model parameters are both important sources of uncertainty in aerosol ERF, Atmos. Chem. Phys., 18, 9975–10006, <ext-link xlink:href="https://doi.org/10.5194/acp-18-9975-2018" ext-link-type="DOI">10.5194/acp-18-9975-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib61"><label>61</label><?label 1?><mixed-citation>Ryan, E., Wild, O., Voulgarakis, A., and Lee, L.: Fast sensitivity analysis methods for computationally expensive models with multi-dimensional output, Geosci. Model Dev., 11, 3131–3146, <ext-link xlink:href="https://doi.org/10.5194/gmd-11-3131-2018" ext-link-type="DOI">10.5194/gmd-11-3131-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib62"><label>62</label><?label 1?><mixed-citation>Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P.: Design and
Analysis of Computer Experiments, Stat. Sci., 4, 409–423,
<ext-link xlink:href="https://doi.org/10.1214/ss/1177012413" ext-link-type="DOI">10.1214/ss/1177012413</ext-link>, 1989.</mixed-citation></ref>
      <ref id="bib1.bib63"><label>63</label><?label 1?><mixed-citation>Schutgens, N., Tsyro, S., Gryspeerdt, E., Goto, D., Weigum, N., Schulz, M., and Stier, P.: On the spatio-temporal representativeness of observations, Atmos. Chem. Phys., 17, 9761–9780, <ext-link xlink:href="https://doi.org/10.5194/acp-17-9761-2017" ext-link-type="DOI">10.5194/acp-17-9761-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib64"><label>64</label><?label 1?><mixed-citation>Schutgens, N. A. J., Partridge, D. G., and Stier, P.: The importance of temporal collocation for the evaluation of aerosol models with observations, Atmos. Chem. Phys., 16, 1065–1079, <ext-link xlink:href="https://doi.org/10.5194/acp-16-1065-2016" ext-link-type="DOI">10.5194/acp-16-1065-2016</ext-link>, 2016a.</mixed-citation></ref>
      <ref id="bib1.bib65"><label>65</label><?label 1?><mixed-citation>Schutgens, N. A. J., Gryspeerdt, E., Weigum, N., Tsyro, S., Goto, D., Schulz, M., and Stier, P.: Will a perfect model agree with perfect observations? The impact of spatial sampling, Atmos. Chem. Phys., 16, 6335–6353, <ext-link xlink:href="https://doi.org/10.5194/acp-16-6335-2016" ext-link-type="DOI">10.5194/acp-16-6335-2016</ext-link>, 2016b.</mixed-citation></ref>
      <ref id="bib1.bib66"><label>66</label><?label 1?><mixed-citation>
Scott, R. C., Myers, T. A., Norris, J. R., Zelinka, M. D., Klein, S. A., Sun, M., and Doelling, D. R.:  Observed Sensitivity of Low-Cloud Radiative Effects to Meteorological Perturbations over the Global Oceans, J. Climate, 33, 7717–7734, 2020.</mixed-citation></ref>
      <ref id="bib1.bib67"><label>67</label><?label 1?><mixed-citation>Sexton, D. M. H., Murphy, J. M., Collins, M., and Webb, M. J.: Multivariate
probabilistic projections using imperfect climate models part I: outline of
methodology, Clim. Dynam., 38, 2513–2542,
<ext-link xlink:href="https://doi.org/10.1007/s00382-011-1208-9" ext-link-type="DOI">10.1007/s00382-011-1208-9</ext-link>, 1995.</mixed-citation></ref>
      <ref id="bib1.bib68"><label>68</label><?label 1?><mixed-citation>Sisson, S. A., Fan, Y., and Beaumont, M. A.: Handbook of approximate
Bayesian computation, Chapman and Hall/CRC, New York, edited by:  Sisson, S. A.,  Fan, Y., and  Beaumont, M. A., 2018.
 </mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib69"><label>69</label><?label 1?><mixed-citation>Smith, C. J., Forster, P. M., Allen, M., Leach, N., Millar, R. J., Passerello, G. A., and Regayre, L. A.: FAIR v1.3: a simple emissions-based impulse response and carbon cycle model, Geosci. Model Dev., 11, 2273–2297, <ext-link xlink:href="https://doi.org/10.5194/gmd-11-2273-2018" ext-link-type="DOI">10.5194/gmd-11-2273-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib70"><label>70</label><?label 1?><mixed-citation>Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and
Salakhutdinov, R.: Dropout: A Simple Way to Prevent Neural Networks from
Overfitting, J. Mach. Learn. Res., 15, 1929–1958, 2014.</mixed-citation></ref>
      <ref id="bib1.bib71"><label>71</label><?label 1?><mixed-citation>Tegen, I., Neubauer, D., Ferrachat, S., Siegenthaler-Le Drian, C., Bey, I., Schutgens, N., Stier, P., Watson-Parris, D., Stanelle, T., Schmidt, H., Rast, S., Kokkola, H., Schultz, M., Schroeder, S., Daskalakis, N., Barthel, S., Heinold, B., and Lohmann, U.: The global aerosol–climate model ECHAM6.3–HAM2.3 – Part 1: Aerosol evaluation, Geosci. Model Dev., 12, 1643–1677, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-1643-2019" ext-link-type="DOI">10.5194/gmd-12-1643-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib72"><label>72</label><?label 1?><mixed-citation>Vernon, I., Goldstein, M., and Bower, R. G.: Galaxy formation: a Bayesian
uncertainty analysis, Bayesian Anal., 5, 619–669,
<ext-link xlink:href="https://doi.org/10.1214/10-ba524" ext-link-type="DOI">10.1214/10-ba524</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib73"><label>73</label><?label 1?><mixed-citation>Vysochanskij, D. F. and Petunin, Y. I.: Justification of the 3<inline-formula><mml:math id="M104" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> rule
for unimodal distributions, Theory of Probability and Mathematical Statistics, 21, 25–36, 1980.</mixed-citation></ref>
      <ref id="bib1.bib74"><label>74</label><?label 1?><mixed-citation>Watson-Parris, D.: Machine learning for weather and climate are worlds
apart, Philos. T. Roy. Soc., 379, 20200098,
<ext-link xlink:href="https://doi.org/10.1098/rsta.2020.0098" ext-link-type="DOI">10.1098/rsta.2020.0098</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib75"><label>75</label><?label 1?><mixed-citation>Watson-Parris, D. and  Deaconu, L.:   Example Perturbed Parameter Ensemble (Black Carbon) (1.0), Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.3856645" ext-link-type="DOI">10.5281/zenodo.3856645</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib76"><label>76</label><?label 1?><mixed-citation>Watson-Parris, D., Schutgens, N., Cook, N., Kipling, Z., Kershaw, P., Gryspeerdt, E., Lawrence, B., and Stier, P.: Community Intercomparison Suite (CIS) v1.4.0: a tool for intercomparing models and observations, Geosci. Model Dev., 9, 3093–3110, <ext-link xlink:href="https://doi.org/10.5194/gmd-9-3093-2016" ext-link-type="DOI">10.5194/gmd-9-3093-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib77"><label>77</label><?label 1?><mixed-citation>Watson-Parris, D., Bellouin, N., Deaconu, L., Schutgens, N., Yoshioka, M.,
Regayre, L. A., Pringle, K. J., Johnson, J. S., Smith, C. J., Carslaw, K.
S., and Stier, P.: Constraining uncertainty in aerosol direct forcing,
Geophys. Res. Lett.,  47, e2020GL087141, <ext-link xlink:href="https://doi.org/10.1029/2020gl087141" ext-link-type="DOI">10.1029/2020gl087141</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib78"><label>78</label><?label 1?><mixed-citation>Watson-Parris, D.,  Williams, A., and  Monticone, P.:  duncanwp/ESEm: v1.1.0 (v1.1.0), Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.5466563" ext-link-type="DOI">10.5281/zenodo.5466563</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib79"><label>79</label><?label 1?><mixed-citation>Williamson, D., Goldstein, M., Allison, L., Blaker, A., Challenor, P.,
Jackson, L., and Yamazaki, K.: History matching for exploring and reducing
climate model parameter space using observations and a large perturbed
physics ensemble, Clim. Dynam., 41, 1703–1729,
<ext-link xlink:href="https://doi.org/10.1007/s00382-013-1896-4" ext-link-type="DOI">10.1007/s00382-013-1896-4</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib80"><label>80</label><?label 1?><mixed-citation>Williamson, D., Blaker, A. T., Hampton, C., and Salter, J.: Identifying and
removing structural biases in climate models with history matching, Clim.
Dynam., 45, 1299–1324, <ext-link xlink:href="https://doi.org/10.1007/s00382-014-2378-z" ext-link-type="DOI">10.1007/s00382-014-2378-z</ext-link>, 2015.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Model calibration using ESEm v1.1.0 – an open, scalable Earth system emulator</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.,
Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow,
I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L.,
Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah,
C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K.,
Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden,
P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow:
Large-Scale Machine Learning on Heterogeneous Distributed Systems, arXiv [preprint], <a href="https://arxiv.org/abs/1603.04467" target="_blank">arXiv:1603.04467</a>
2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>Akaike, H.: A new look at the statistical model identification, IEEE T.
Automat. Contr., 19, 716–723, <a href="https://doi.org/10.1109/tac.1974.1100705" target="_blank">https://doi.org/10.1109/tac.1974.1100705</a>, 1974.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>Bellouin, N., Quaas, J., Gryspeerdt, E., Kinne, S., Stier, P.,
Watson-Parris, D., Boucher, O., Carslaw, K. S., Christensen, M., Daniau, A.
-L., Dufresne, J. -L., Feingold, G., Fiedler, S., Forster, P., Gettelman,
A., Haywood, J. M., Lohmann, U., Malavelle, F., Mauritsen, T., McCoy, D. T.,
Myhre, G., Mülmenstädt, J., Neubauer, D., Possner, A., Rugenstein,
M., Sato, Y., Schulz, M., Schwartz, S. E., Sourdeval, O., Storelvmo, T.,
Toll, V., Winker, D., and Stevens, B.: Bounding Global Aerosol Radiative
Forcing of Climate Change, Rev. Geophys., 58, e2019RG000660, <a href="https://doi.org/10.1029/2019RG000660" target="_blank">https://doi.org/10.1029/2019RG000660</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>Beucler, T., Pritchard, M., Rasp, S., Ott, J., Baldi, P., and Gentine, P.:
Enforcing Analytic Constraints in Neural Networks Emulating Physical
Systems, Phys. Rev. Lett., 126, 098302,
<a href="https://doi.org/10.1103/physrevlett.126.098302" target="_blank">https://doi.org/10.1103/physrevlett.126.098302</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>Brajard, J., Carrassi, A., Bocquet, M., and Bertino, L.: Combining data
assimilation and machine learning to emulate a dynamical model from sparse
and noisy observations: A case study with the Lorenz 96 model, J. Comput.
Sci.-Neth., 44, 101171, <a href="https://doi.org/10.1016/j.jocs.2020.101171" target="_blank">https://doi.org/10.1016/j.jocs.2020.101171</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>Brehmer, J., Louppe, G., Pavez, J., and Cranmer, K.: Mining gold from
implicit models to improve likelihood-free inference, P. Natl. Acad. Sci. USA.,  117, 5242–5249,
<a href="https://doi.org/10.1073/pnas.1915980117" target="_blank">https://doi.org/10.1073/pnas.1915980117</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>Breiman, L.: Random Forests, Mach. Learn., 45, 5–32,
<a href="https://doi.org/10.1023/a:1010933404324" target="_blank">https://doi.org/10.1023/a:1010933404324</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>Burt, D. R., Rasmussen, C. E., and  van der Wilk, M.: Rates of Convergence
for Sparse Variational Gaussian Process Regression, arXiv [preprint], <a href="https://arxiv.org/abs/1903.03571" target="_blank">arXiv:1903.03571</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>Chollet, F.: Keras, available at: <a href="https://keras.io" target="_blank"/> (last access: 12 September 2021), 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>Cleary, E., Garbuno-Inigo, A., Lan, S., Schneider, T., and Stuart, A. M.:
Calibrate, emulate, sample, J. Comput. Phys., 424, 109716,
<a href="https://doi.org/10.1016/j.jcp.2020.109716" target="_blank">https://doi.org/10.1016/j.jcp.2020.109716</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>Couvreux, F., Hourdin, F., Williamson, D., Roehrig, R., Volodina, V.,
Villefranque, N., Rio, C., Audouin, O., Salter, J., Bazile, E., Brient, F.,
Favot, F., Honnert, R., Lefebvre, M., Madeleine, J., Rodier, Q., and Xu, W.:
Process-Based Climate Model Development Harnessing Machine Learning: I. A
Calibration Tool for Parameterization Improvement, J. Adv. Model Earth. Sy., 13, e2020MS002217,
<a href="https://doi.org/10.1029/2020ms002217" target="_blank">https://doi.org/10.1029/2020ms002217</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>Craig, P. S., Goldstein, M., Seheult, A. H., and Smith, J. A.: Bayes linear
strategies for history matching of hydrocarbon reservoirs, in: Bayesian
Statistics, vol. 5, edited by: Bernado, J. M., Berger, J. O., Dawid, A. P.,
and Smith, A. F. M., Clarendon Press, Oxford, UK, 69–95, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>Cranmer, K., Brehmer, J., and Louppe, G.: The frontier of simulation-based
inference, P. Natl. Acad. Sci. USA, 117, 30055–30062,
<a href="https://doi.org/10.1073/pnas.1912789117" target="_blank">https://doi.org/10.1073/pnas.1912789117</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>Dagan, G. and Stier, P.: Ensemble daily simulations for elucidating cloud–aerosol interactions under a large spread of realistic environmental conditions, Atmos. Chem. Phys., 20, 6291–6303, <a href="https://doi.org/10.5194/acp-20-6291-2020" target="_blank">https://doi.org/10.5194/acp-20-6291-2020</a>, 2020a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
Dagan, G. and  Stier, P.:  Data of the paper: Ensemble daily simulations for elucidating cloud–aerosol interactions under a large spread of realistic environmental conditions, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.3785603" target="_blank">https://doi.org/10.5281/zenodo.3785603</a>, 2020b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
Dagon, K., Sanderson, B. M., Fisher, R. A., and Lawrence, D. M.: A machine learning approach to emulation and biophysical parameter estimation with the Community Land Model, version 5, Adv. Stat. Clim. Meteorol. Oceanogr., 6, 223–244, <a href="https://doi.org/10.5194/ascmo-6-223-2020" target="_blank">https://doi.org/10.5194/ascmo-6-223-2020</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>Damianou, A. C. and Lawrence, N. D.: Deep Gaussian Processes, in: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR Proceedings of Machine Learning Research,  Scottsdale, Arizona, USA, 207–215, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>Dawson, A.: eofs: A Library for EOF Analysis of Meteorological,
Oceanographic, and Climate Data, J. Open Res. Softw., 4, e14,
<a href="https://doi.org/10.5334/jors.122" target="_blank">https://doi.org/10.5334/jors.122</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>Dubovik, O. and King, M. D.: A flexible inversion algorithm for retrieval of
aerosol optical properties from Sun and sky radiance measurements, J. Geophys.
Res.-Atmos., 105, 20673–20696, <a href="https://doi.org/10.1029/2000jd900282" target="_blank">https://doi.org/10.1029/2000jd900282</a>,
2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>Duvenaud, D.: Automatic model construction with Gaussian processes, Doctoral thesis, <a href="https://doi.org/doi.org/10.17863/CAM.14087" target="_blank">https://doi.org/doi.org/10.17863/CAM.14087</a>,  2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, <a href="https://doi.org/10.5194/gmd-9-1937-2016" target="_blank">https://doi.org/10.5194/gmd-9-1937-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>Fearnhead, P. and Prangle, D.: Constructing summary statistics for
approximate Bayesian computation: semi-automatic approximate Bayesian
computation, J. Roy. Stat. Soc. Ser. B, 74,
419–474, <a href="https://doi.org/10.1111/j.1467-9868.2011.01010.x" target="_blank">https://doi.org/10.1111/j.1467-9868.2011.01010.x</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>Gal, Y. and Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, in: Proceedings of The 33rd International Conference on Machine Learning, PMLR Proceedings of Machine Learning Research, New York, New York, USA, 1050–1059, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>Geoffroy, O., Saint-Martin, D., Olivié, D. J. L., Voldoire, A., Bellon,
G., and Tytéca, S.: Transient Climate Response in a Two-Layer
Energy-Balance Model. Part I: Analytical Solution and Parameter Calibration
Using CMIP5 AOGCM Experiments, J. Climate, 26, 1841–1857,
<a href="https://doi.org/10.1175/jcli-d-12-00195.1" target="_blank">https://doi.org/10.1175/jcli-d-12-00195.1</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
Glassmeier, F., Hoffmann, F., Johnson, J. S., Yamaguchi, T., Carslaw, K. S., and Feingold, G.: An emulator approach to stratocumulus susceptibility, Atmos. Chem. Phys., 19, 10191–10203, <a href="https://doi.org/10.5194/acp-19-10191-2019" target="_blank">https://doi.org/10.5194/acp-19-10191-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>GPy: GPy: A Gaussian process framework in python, available at: <a href="http://github.com/SheffieldML/GPy" target="_blank"/> (last access: 1 August 2021),  2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>Hawkins, E. and Sutton, R.: The Potential to Narrow Uncertainty in Regional
Climate Predictions, B. Am. Meteorol. Soc., 90, 1095–1108,
<a href="https://doi.org/10.1175/2009bams2607.1" target="_blank">https://doi.org/10.1175/2009bams2607.1</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation> Henze, D. K., Hakami, A., and Seinfeld, J. H.: Development of the adjoint of GEOS-Chem, Atmos. Chem. Phys., 7, 2413–2433, <a href="https://doi.org/10.5194/acp-7-2413-2007" target="_blank">https://doi.org/10.5194/acp-7-2413-2007</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>Holben, B. N., Eck, T. F., Slutsker, I., Tanré, D., Buis, J. P., Setzer,
A., Vermote, E., Reagan, J. A., Kaufman, Y. J., Nakajima, T., Lavenu, F.,
Jankowiak, I., and Smirnov, A.: AERONET – A Federated Instrument Network and
Data Archive for Aerosol Characterization, Remote Sens. Environ., 66, 1–16,
<a href="https://doi.org/10.1016/s0034-4257(98)00031-5" target="_blank">https://doi.org/10.1016/s0034-4257(98)00031-5</a>, 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation> Holden, P. B., Edwards, N. R., Garthwaite, P. H., Fraedrich, K., Lunkeit, F., Kirk, E., Labriet, M., Kanudia, A., and Babonneau, F.: PLASIM-ENTSem v1.0: a spatio-temporal emulator of future climate change for impacts assessment, Geosci. Model Dev., 7, 433–451, <a href="https://doi.org/10.5194/gmd-7-433-2014" target="_blank">https://doi.org/10.5194/gmd-7-433-2014</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>Holden, P. B., Edwards, N. R., Hensman, J., and Wilkinson, R. D.: ABC for
climate: dealing with expensive simulators, arXiv [preprint],  <a href="https://arxiv.org/abs/1511.03475" target="_blank">arXiv:1511.03475</a>, 2015a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>Holden, P. B., Edwards, N. R., Garthwaite, P. H., and Wilkinson, R. D.:
Emulation and interpretation of high-dimensional climate model outputs, K. Appl. Stat., 42, 2038–2055,
<a href="https://doi.org/10.1080/02664763.2015.1016412" target="_blank">https://doi.org/10.1080/02664763.2015.1016412</a>, 2015b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>Holden, P. B., Edwards, N. R., Rangel, T. F., Pereira, E. B., Tran, G. T., and Wilkinson, R. D.: PALEO-PGEM v1.0: a statistical emulator of Pliocene–Pleistocene climate, Geosci. Model Dev., 12, 5137–5155, <a href="https://doi.org/10.5194/gmd-12-5137-2019" target="_blank">https://doi.org/10.5194/gmd-12-5137-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>Hou, Z. and Rubin, Y.: On minimum relative entropy concepts and prior
compatibility issues in vadose zone inverse and forward modeling, Water
Resour. Res., 41, W12425, <a href="https://doi.org/10.1029/2005wr004082" target="_blank">https://doi.org/10.1029/2005wr004082</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>Hoyer, S. and Hamman, J.: xarray: N-D labeled Arrays and Datasets in Python,
J. Open Res. Softw., 5, 10, <a href="https://doi.org/10.5334/jors.148" target="_blank">https://doi.org/10.5334/jors.148</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>
Johnson, J. S., Regayre, L. A., Yoshioka, M., Pringle, K. J., Turnock, S. T., Browse, J., Sexton, D. M. H., Rostron, J. W., Schutgens, N. A. J., Partridge, D. G., Liu, D., Allan, J. D., Coe, H., Ding, A., Cohen, D. D., Atanacio, A., Vakkari, V., Asmi, E., and Carslaw, K. S.: Robust observational constraint of uncertain aerosol processes and emissions in a climate model and the effect on aerosol radiative forcing, Atmos. Chem. Phys., 20, 9491–9524, <a href="https://doi.org/10.5194/acp-20-9491-2020" target="_blank">https://doi.org/10.5194/acp-20-9491-2020</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>Karydis, V. A., Capps, S. L., Russell, A. G., and Nenes, A.: Adjoint sensitivity of global cloud droplet number to aerosol and dynamical parameters, Atmos. Chem. Phys., 12, 9041–9055, <a href="https://doi.org/10.5194/acp-12-9041-2012" target="_blank">https://doi.org/10.5194/acp-12-9041-2012</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>Kennedy, M. C. and O'Hagan, A.: Bayesian calibration of computer models, J.
Roy. Stat. Soc. Ser. B, 63, 425–464,
<a href="https://doi.org/10.1111/1467-9868.00294" target="_blank">https://doi.org/10.1111/1467-9868.00294</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>Knudde, N.,  van der Herten, J., Dhaene, T., and Couckuyt, I.: GPflowOpt: A
Bayesian Optimization Library using TensorFlow, arXiv [preprint],  <a href="https://arxiv.org/abs/1711.03845" target="_blank">arXiv:1711.03845</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>Knutti, R., Meehl, G. A., Allen, M. R., and Stainforth, D. A.: Constraining
Climate Sensitivity from the Seasonal Cycle in Surface Temperature, J.
Climate, 19, 4224–4233, <a href="https://doi.org/10.1175/jcli3865.1" target="_blank">https://doi.org/10.1175/jcli3865.1</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>Knutti, R., Masson, D., and Gettelman, A.: Climate model genealogy:
Generation CMIP5 and how we got there, Geophys. Res. Lett., 40, 1194–1199,
<a href="https://doi.org/10.1002/grl.50256" target="_blank">https://doi.org/10.1002/grl.50256</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Chalikov, D. V.: New
Approach to Calculation of Atmospheric Model Physics: Accurate and Fast
Neural Network Emulation of Longwave Radiation in a Climate Model, Mon.
Weather Rev., 133, 1370–1383, <a href="https://doi.org/10.1175/mwr2923.1" target="_blank">https://doi.org/10.1175/mwr2923.1</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>Lee, C., Martin, R. V., Donkelaar, A. van, Lee, H., Dickerson, R. R., Hains,
J. C., Krotkov, N., Richter, A., Vinnikov, K., and Schwab, J. J.: SO<sub>2</sub>
emissions and lifetimes: Estimates from inverse modeling using in situ and
global, space-based (SCIAMACHY and OMI) observations, J. Geophys. Res.-Atmos.,    116, D06304, <a href="https://doi.org/10.1029/2010jd014758" target="_blank">https://doi.org/10.1029/2010jd014758</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>Lee, L. A., Carslaw, K. S., Pringle, K. J., Mann, G. W., and Spracklen, D. V.: Emulation of a complex global aerosol model to quantify sensitivity to uncertain parameters, Atmos. Chem. Phys., 11, 12253–12273, <a href="https://doi.org/10.5194/acp-11-12253-2011" target="_blank">https://doi.org/10.5194/acp-11-12253-2011</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>Mansfield, L. A., Nowack, P. J., Kasoar, M., Everitt, R. G., Collins, W. J.,
and Voulgarakis, A.: Predicting global patterns of long-term climate change
from short-term simulations using machine learning, Npj Clim. Atmos.
Sci., 3, 44, <a href="https://doi.org/10.1038/s41612-020-00148-5" target="_blank">https://doi.org/10.1038/s41612-020-00148-5</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>Matheron, G.: Principles of geostatistics, Econ. Geol., 58, 1246–1266,
<a href="https://doi.org/10.2113/gsecongeo.58.8.1246" target="_blank">https://doi.org/10.2113/gsecongeo.58.8.1246</a>, 1963.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>Matthews, A.,   G. de G.,   van der Wilk, M., Nickson, T.,   Fujii, K.,
Boukouvalas, A., Leon-Villagra, P., Ghahramani, Z., and Hensman, J.:
GPflow: A Gaussian process library using TensorFlow, J. Math. Learn. Res., 18, 1–6, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>Mauritsen, T., Stevens, B., Roeckner, E., Crueger, T., Esch, M., Giorgetta,
M., Haak, H., Jungclaus, J., Klocke, D., Matei, D., Mikolajewicz, U., Notz,
D., Pincus, R., Schmidt, H., and Tomassini, L.: Tuning the climate of a
global model, J. Adv. Model Earth. Sy., 4, M00A01,
<a href="https://doi.org/10.1029/2012ms000154" target="_blank">https://doi.org/10.1029/2012ms000154</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>Met Office: Cartopy: a cartographic python library with a Matplotlib
interface, available at: <a href="http://scitools.org.uk/cartopy" target="_blank"/> (last access: 1 August 2021), 2020a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>Met Office: Iris: A Python library for analysing and visualising
meteorological and oceanographic data sets, available at: <a href="https://scitools.org.uk/iris/" target="_blank"/> (last access: 1 October 2021),  2020b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>Morris, M. D. and Mitchell, T. J.: Exploratory designs for computational
experiments, J. Stat. Plan Infer., 43, 381–402,
<a href="https://doi.org/10.1016/0378-3758(94)00035-t" target="_blank">https://doi.org/10.1016/0378-3758(94)00035-t</a>, 1995.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>Neal, R.: MCMC Using Hamiltonian Dynamics, in: Handbook of Markov Chain Monte Carlo, edited by:   Brooks, S.,  Gelman,  A.,  Jones, G., and Meng, X.-L.,   Chapman and Hall/CRC, New York,   <a href="https://doi.org/10.1201/b10905-6" target="_blank">https://doi.org/10.1201/b10905-6</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>
Neubauer, D., Ferrachat, S., Siegenthaler-Le Drian, C., Stier, P., Partridge, D. G., Tegen, I., Bey, I., Stanelle, T., Kokkola, H., and Lohmann, U.: The global aerosol–climate model ECHAM6.3–HAM2.3 – Part 2: Cloud evaluation, aerosol radiative forcing, and climate sensitivity, Geosci. Model Dev., 12, 3609–3639, <a href="https://doi.org/10.5194/gmd-12-3609-2019" target="_blank">https://doi.org/10.5194/gmd-12-3609-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>O'Gorman, P. A. and Dwyer, J. G.: Using Machine Learning to Parameterize
Moist Convection: Potential for Modeling of Climate, Climate Change, and
Extreme Events, J. Adv. Model Earth. Sy., 10, 2548–2563,
<a href="https://doi.org/10.1029/2018ms001351" target="_blank">https://doi.org/10.1029/2018ms001351</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>O'Neill, B. C., Tebaldi, C., van Vuuren, D. P., Eyring, V., Friedlingstein, P., Hurtt, G., Knutti, R., Kriegler, E., Lamarque, J.-F., Lowe, J., Meehl, G. A., Moss, R., Riahi, K., and Sanderson, B. M.: The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6, Geosci. Model Dev., 9, 3461–3482, <a href="https://doi.org/10.5194/gmd-9-3461-2016" target="_blank">https://doi.org/10.5194/gmd-9-3461-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>Partridge, D. G., Vrugt, J. A., Tunved, P., Ekman, A. M. L., Gorea, D., and Sorooshian, A.: Inverse modeling of cloud-aerosol interactions – Part 1: Detailed response surface analysis, Atmos. Chem. Phys., 11, 7269–7287, <a href="https://doi.org/10.5194/acp-11-7269-2011" target="_blank">https://doi.org/10.5194/acp-11-7269-2011</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.:
Scikit-learn: Machine Learning in Python, J. Mach. Learn., 12, 2825–2830, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>Prangle, D.: Summary Statistics, in: Handbook of Approximate Bayesian Computation, edited by:   Sisson, S. A.,  Fan, Y.,  and Beaumont, M. A., Chapman and Hall/CRC, New York,
2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>59</label><mixed-citation>Rasmussen, C. E. and Williams, C. K. I.: Gaussian Processes for Machine
Learning, The MIT Press, <a href="https://doi.org/10.7551/mitpress/3206.001.0001" target="_blank">https://doi.org/10.7551/mitpress/3206.001.0001</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>60</label><mixed-citation>
Regayre, L. A., Johnson, J. S., Yoshioka, M., Pringle, K. J., Sexton, D. M. H., Booth, B. B. B., Lee, L. A., Bellouin, N., and Carslaw, K. S.: Aerosol and physical atmosphere model parameters are both important sources of uncertainty in aerosol ERF, Atmos. Chem. Phys., 18, 9975–10006, <a href="https://doi.org/10.5194/acp-18-9975-2018" target="_blank">https://doi.org/10.5194/acp-18-9975-2018</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>61</label><mixed-citation>Ryan, E., Wild, O., Voulgarakis, A., and Lee, L.: Fast sensitivity analysis methods for computationally expensive models with multi-dimensional output, Geosci. Model Dev., 11, 3131–3146, <a href="https://doi.org/10.5194/gmd-11-3131-2018" target="_blank">https://doi.org/10.5194/gmd-11-3131-2018</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>62</label><mixed-citation>Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P.: Design and
Analysis of Computer Experiments, Stat. Sci., 4, 409–423,
<a href="https://doi.org/10.1214/ss/1177012413" target="_blank">https://doi.org/10.1214/ss/1177012413</a>, 1989.
</mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>63</label><mixed-citation>Schutgens, N., Tsyro, S., Gryspeerdt, E., Goto, D., Weigum, N., Schulz, M., and Stier, P.: On the spatio-temporal representativeness of observations, Atmos. Chem. Phys., 17, 9761–9780, <a href="https://doi.org/10.5194/acp-17-9761-2017" target="_blank">https://doi.org/10.5194/acp-17-9761-2017</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>64</label><mixed-citation> Schutgens, N. A. J., Partridge, D. G., and Stier, P.: The importance of temporal collocation for the evaluation of aerosol models with observations, Atmos. Chem. Phys., 16, 1065–1079, <a href="https://doi.org/10.5194/acp-16-1065-2016" target="_blank">https://doi.org/10.5194/acp-16-1065-2016</a>, 2016a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>65</label><mixed-citation>Schutgens, N. A. J., Gryspeerdt, E., Weigum, N., Tsyro, S., Goto, D., Schulz, M., and Stier, P.: Will a perfect model agree with perfect observations? The impact of spatial sampling, Atmos. Chem. Phys., 16, 6335–6353, <a href="https://doi.org/10.5194/acp-16-6335-2016" target="_blank">https://doi.org/10.5194/acp-16-6335-2016</a>, 2016b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>66</label><mixed-citation>
Scott, R. C., Myers, T. A., Norris, J. R., Zelinka, M. D., Klein, S. A., Sun, M., and Doelling, D. R.:  Observed Sensitivity of Low-Cloud Radiative Effects to Meteorological Perturbations over the Global Oceans, J. Climate, 33, 7717–7734, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>67</label><mixed-citation>Sexton, D. M. H., Murphy, J. M., Collins, M., and Webb, M. J.: Multivariate
probabilistic projections using imperfect climate models part I: outline of
methodology, Clim. Dynam., 38, 2513–2542,
<a href="https://doi.org/10.1007/s00382-011-1208-9" target="_blank">https://doi.org/10.1007/s00382-011-1208-9</a>, 1995.
</mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>68</label><mixed-citation>Sisson, S. A., Fan, Y., and Beaumont, M. A.: Handbook of approximate
Bayesian computation, Chapman and Hall/CRC, New York, edited by:  Sisson, S. A.,  Fan, Y., and  Beaumont, M. A., 2018.

</mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>69</label><mixed-citation>Smith, C. J., Forster, P. M., Allen, M., Leach, N., Millar, R. J., Passerello, G. A., and Regayre, L. A.: FAIR v1.3: a simple emissions-based impulse response and carbon cycle model, Geosci. Model Dev., 11, 2273–2297, <a href="https://doi.org/10.5194/gmd-11-2273-2018" target="_blank">https://doi.org/10.5194/gmd-11-2273-2018</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>70</label><mixed-citation>Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and
Salakhutdinov, R.: Dropout: A Simple Way to Prevent Neural Networks from
Overfitting, J. Mach. Learn. Res., 15, 1929–1958, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>71</label><mixed-citation>Tegen, I., Neubauer, D., Ferrachat, S., Siegenthaler-Le Drian, C., Bey, I., Schutgens, N., Stier, P., Watson-Parris, D., Stanelle, T., Schmidt, H., Rast, S., Kokkola, H., Schultz, M., Schroeder, S., Daskalakis, N., Barthel, S., Heinold, B., and Lohmann, U.: The global aerosol–climate model ECHAM6.3–HAM2.3 – Part 1: Aerosol evaluation, Geosci. Model Dev., 12, 1643–1677, <a href="https://doi.org/10.5194/gmd-12-1643-2019" target="_blank">https://doi.org/10.5194/gmd-12-1643-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>72</label><mixed-citation>Vernon, I., Goldstein, M., and Bower, R. G.: Galaxy formation: a Bayesian
uncertainty analysis, Bayesian Anal., 5, 619–669,
<a href="https://doi.org/10.1214/10-ba524" target="_blank">https://doi.org/10.1214/10-ba524</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>73</label><mixed-citation>Vysochanskij, D. F. and Petunin, Y. I.: Justification of the 3<i>σ</i> rule
for unimodal distributions, Theory of Probability and Mathematical Statistics, 21, 25–36, 1980.
</mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>74</label><mixed-citation>Watson-Parris, D.: Machine learning for weather and climate are worlds
apart, Philos. T. Roy. Soc., 379, 20200098,
<a href="https://doi.org/10.1098/rsta.2020.0098" target="_blank">https://doi.org/10.1098/rsta.2020.0098</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib75"><label>75</label><mixed-citation>
Watson-Parris, D. and  Deaconu, L.:   Example Perturbed Parameter Ensemble (Black Carbon) (1.0), Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.3856645" target="_blank">https://doi.org/10.5281/zenodo.3856645</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib76"><label>76</label><mixed-citation>Watson-Parris, D., Schutgens, N., Cook, N., Kipling, Z., Kershaw, P., Gryspeerdt, E., Lawrence, B., and Stier, P.: Community Intercomparison Suite (CIS) v1.4.0: a tool for intercomparing models and observations, Geosci. Model Dev., 9, 3093–3110, <a href="https://doi.org/10.5194/gmd-9-3093-2016" target="_blank">https://doi.org/10.5194/gmd-9-3093-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib77"><label>77</label><mixed-citation>Watson-Parris, D., Bellouin, N., Deaconu, L., Schutgens, N., Yoshioka, M.,
Regayre, L. A., Pringle, K. J., Johnson, J. S., Smith, C. J., Carslaw, K.
S., and Stier, P.: Constraining uncertainty in aerosol direct forcing,
Geophys. Res. Lett.,  47, e2020GL087141, <a href="https://doi.org/10.1029/2020gl087141" target="_blank">https://doi.org/10.1029/2020gl087141</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib78"><label>78</label><mixed-citation>
Watson-Parris, D.,  Williams, A., and  Monticone, P.:  duncanwp/ESEm: v1.1.0 (v1.1.0), Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.5466563" target="_blank">https://doi.org/10.5281/zenodo.5466563</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib79"><label>79</label><mixed-citation>Williamson, D., Goldstein, M., Allison, L., Blaker, A., Challenor, P.,
Jackson, L., and Yamazaki, K.: History matching for exploring and reducing
climate model parameter space using observations and a large perturbed
physics ensemble, Clim. Dynam., 41, 1703–1729,
<a href="https://doi.org/10.1007/s00382-013-1896-4" target="_blank">https://doi.org/10.1007/s00382-013-1896-4</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib80"><label>80</label><mixed-citation>Williamson, D., Blaker, A. T., Hampton, C., and Salter, J.: Identifying and
removing structural biases in climate models with history matching, Clim.
Dynam., 45, 1299–1324, <a href="https://doi.org/10.1007/s00382-014-2378-z" target="_blank">https://doi.org/10.1007/s00382-014-2378-z</a>, 2015.
</mixed-citation></ref-html>--></article>
