Global sensitivity analysis (GSA) is a powerful approach in identifying which
inputs or parameters most affect a model's output. This determines
which inputs to include when performing model calibration or uncertainty
analysis. GSA allows quantification of the sensitivity index (SI) of a
particular input – the percentage of the total variability in the output
attributed to the changes in that input – by averaging over the other inputs
rather than fixing them at specific values. Traditional methods of computing
the SIs using the Sobol and extended Fourier Amplitude
Sensitivity Test (eFAST) methods involve running a
model thousands of times, but this may not be feasible for computationally
expensive Earth system models. GSA methods that use a statistical emulator in
place of the expensive model are popular, as they require far fewer model
runs. We performed an eight-input GSA, using the Sobol and eFAST methods, on
two computationally expensive atmospheric chemical transport models using
emulators that were trained with 80 runs of the models. We considered two
methods to further reduce the computational cost of GSA: (1) a
dimension reduction approach and (2) an emulator-free approach. When
the output of a model is multi-dimensional, it is common practice to build a
separate emulator for each dimension of the output space. Here, we used
principal component analysis (PCA) to reduce the output dimension, built an
emulator for each of the transformed outputs, and then computed SIs of the
reconstructed output using the Sobol method. We considered the global
distribution of the annual column mean lifetime of atmospheric methane, which
requires

Sensitivity analysis is a powerful tool for understanding the behaviour of a numerical model. It allows quantification of the sensitivity in the model outputs to changes in each of the model inputs. If the inputs are fixed values such as model parameters, then sensitivity analysis allows study of how the uncertainty in the model outputs can be attributed to the uncertainty in these inputs. Sensitivity analysis is important for a number of reasons: (i) to identify which parameters contribute the largest uncertainty to the model outputs, (ii) to prioritise estimation of model parameters from observational data, (iii) to understand the potential of observations as a model constraint and (iv) to diagnose differences in behaviour between different models.

By far, the most common types of sensitivity analysis are those performed
one at a time (OAT) and locally. OAT sensitivity analysis involves running a
model a number of times, varying each input in turn, whilst fixing other
inputs at their nominal values. For example, Wild (2007) showed that the
tropospheric ozone budget was highly sensitive to differences in global

Global sensitivity analysis (GSA) overcomes this OAT issue by quantifying the sensitivity of each input variable by averaging over the other inputs rather than fixing them at nominal values. However, the number of sensitivity analysis studies using this global method has been very small. Ferretti et al. (2016) found that out of around 1.75 million research articles surveyed up to 2014, only 1 in 20 of studies mentioning “sensitivity analysis” also use or refer to “global sensitivity analysis”. A common type of GSA is the variance-based method, which operates by apportioning the variance of the model's output into different sources of variation in the inputs. More specifically, it quantifies the sensitivity of a particular input – the percentage of the total variability in the output attributed to the changes in that input – by averaging over the other inputs rather than fixing them at specific values. The Fourier Amplitude Sensitivity Test (FAST) was one of the first of these variance-based methods (Cukier et al., 1973). The classical FAST method uses spectral analysis to apportion the variance, after first exploring the input space using sinusoidal functions of different frequencies for each input factor or dimension (Saltelli et al., 2012). Modified versions of FAST include the extended FAST (eFAST) method which improves its computational efficiency (Saltelli et al., 1999) and the random-based-design (RBD) FAST method which samples from the input space more efficiently (Tarantola et al., 2006). Another widely used GSA method is the Sobol method (Homma and Saltelli, 1996; Saltelli, 2002; Sobol, 1990), which has been found to outperform FAST (Saltelli, 2002). Most applications of the Sobol and FAST methods involve a small number of input factors. However, Mara and Tarantola (2008) carried out a 100-input sensitivity analysis using the RBD version of FAST and a modified version of the Sobol method and found that both methods gave estimates of the sensitivity indices (SIs) that were close to the known analytical solutions. A downside to the Sobol method is that a large number of runs of the model typically need to be carried out. For the model used in Mara and Tarantola (2008), 10 000 runs were required for the Sobol method but only 1000 were needed for FAST.

If a model is computationally expensive, carrying out 1000 simulations may not be feasible. A solution is to use a surrogate function for the model called a meta-model that maps the same set of inputs to the same set of outputs but is computationally much faster. Thus, much less time is required to perform GSA using the meta-model than using the slow-running model. A meta-model can be any function that maps the inputs of a model to its outputs, e.g. linear or quadratic functions, splines, neural networks. A neural network, for example, works well if there are discontinuities in the input–output mapping, but such a method can require thousands of runs of the computationally expensive model to train it (particularly if the output is highly multi-dimensional) which will likely be too time-consuming. Here, we use a statistical emulator because it requires far fewer training runs and it has two useful properties. First, an emulator is an interpolating function which means that at inputs of the expensive model that are used to train the emulator, the resulting outputs of the emulator must exactly match those of the expensive model (Iooss and Lemaître, 2015). Secondly, for inputs that the emulator is not trained at, a probability distribution of the outputs that represents their uncertainty is given (O'Hagan, 2006). The vast majority of emulators are based on Gaussian process (GP) theory due to its attractive properties (Kennedy and O'Hagan, 2000; O'Hagan, 2006; Oakley and O'Hagan, 2004), which make GP emulators easy to implement while providing accurate representations of the computationally expensive model (e.g. Chang et al., 2015; Gómez-Dans et al., 2016; Kennedy et al., 2008; Lee et al., 2013). A GP is a multivariate normal distribution applied to a function rather than a set of variables. The original GP emulator in a Bayesian setting was developed by Currin et al. (1991) (for a basic overview, see also O'Hagan, 2006) and is mathematically equivalent to the kriging interpolation methods used in geostatistics (e.g. Cressie, 1990; Ripley, 2005). Kriging regression has been used as an emulator method since the 1990s (Koehler and Owen, 1996; Welch et al., 1992). More recently, there has been considerable interest in using this kriging emulator approach for practical purposes such as GSA or inverse modelling (Marrel et al., 2009; Roustant et al., 2012). Examples of its application can be found in atmospheric modelling (Carslaw et al., 2013; Lee et al., 2013), medicine (Degroote et al., 2012) and electrical engineering (Pistone and Vicario, 2013).

For GSA studies involving multi-dimensional output, a traditional approach is to apply a separate GP emulator for each dimension of the output space. However, if the output consists of many thousands of points on a spatial map or time series (Lee et al., 2013), then the need to use thousands of emulators can impose substantial computational constraints even using the FAST methods. A solution is to adopt a GSA method that does not rely on an emulator but is based on generalised additive modelling (Mara and Tarantola, 2008; Strong et al., 2014, 2015b) or on a partial least squares approach (Chang et al., 2015; Sobie, 2009). A separate generalised additive model (GAM) can be built for each input against the output of the expensive model, and the sensitivity of the output to changes in each input is then computed using these individual GAM models. Partial least squares (PLS) is an extension of the more traditional multivariate linear regression where the number of samples (i.e. model runs in this context) can be small, and they may even be less that the number of inputs (Sobie, 2009).

An alternative way of reducing the computational constraints is to use principal component analysis (PCA) to reduce the dimensionality of the output. This means that we require far fewer emulators to represent the outputs, reducing the GSA calculations by a large margin, although there is some loss of detail. This emulator–PCA hybrid approach has been successfully used in radiative transfer models (Gómez-Dans et al., 2016), a very simple chemical reaction model (Saltelli et al., 2012) and general circulation models (Sexton et al., 2012). While we hypothesise that both emulator-free and PCA-based methods are suited to large-scale GSA problems (e.g. those involving more than 20 input factors), a focus of our work is to determine the accuracy of these methods for a smaller-scale GSA study.

Recent research comparing different GSA methods based on Gaussian process emulators has been limited in application to relatively simple models and low-dimensional output (Mara and Tarantola, 2008). Using two computationally expensive models of global atmospheric chemistry and transport – namely the Frontier Research System for Global Change/University of California at Irvine (FRSGC/UCI) and Goddard Institute for Space Studies (GISS) models – we compare the accuracy and efficiency of global sensitivity analysis using emulators and emulator-free methods, and we investigate the benefits of using PCA to reduce the number of emulators needed. We compare and contrast a number of ways of computing the first-order sensitivity indices for the expensive atmospheric models: (i) the Sobol method using an emulator, (ii) the extended FAST method using an emulator, (iii) generalised additive modelling, (iv) a partial least squares approach and (v) an emulator–PCA hybrid approach. Hereafter, we refer to (i) and (ii) as emulator-based GSA methods and (iii) and (iv) as emulator-free GSA methods.

Global atmospheric chemistry and transport models simulate the composition
of trace gases in the atmosphere (e.g.

In this study, we performed GSA on two such
atmospheric models. We used the FRSGC/UCI chemistry transport model (CTM)
(Wild et al., 2004; Wild and Prather, 2000) and the GISS general circulation model
(GCM) (Schmidt et al., 2014; Shindell et al., 2006). We used results from 104
model runs carried out with both of these models from a comparative GSA
study (Wild et al., 2018). This involved varying eight inputs or
parameters over specified ranges using a maximin Latin hypercube design:
global surface

For brevity and generality, we hereafter refer to each of the atmospheric
chemical transport models as a simulator. A common way of conducting global
sensitivity analysis for each point in the output space of the simulator –
where the output consists of, for example, a spatial map or a time series –
is to compute the first-order sensitivity indices (SIs) using variance-based
decomposition; this apportions the variance in simulator output (a scalar)
to different sources of variation in the different model inputs. Assuming
the input variables are independent of one another – which they are for this study – the first-order SI, corresponding to the

Summary of algebraic terms used in this study that are common to all of most of the statistical methods described in this study. For brevity, the terms that are specific to a particular method are not listed here.

The Sobol method, developed in the 1990s, is much faster than brute force at
computing the terms in Eq. (

An alternative and even faster way of estimating the terms in Eq. (

More specifically, we express the model

Using a domain of frequencies given by

With

Further details of eFAST are given in Saltelli et al. (1999). The differences between the original and the extended versions of the FAST method are given in Appendix A.

When the simulator is computationally expensive to run – like the atmospheric chemical transport models used here – we substitute it with an emulator which is a surrogate of the expensive simulator but much faster to run. If we are confident that the emulator is accurate, then we can compute the first-order SIs from the Sobol and eFAST methods using the outputs of the emulator rather than the simulator. Mathematically, an emulator is a statistical model that mimics the input–output relationship of a simulator. As stated in the introduction, an emulator is an interpolating function at model outputs it is trained at and gives a probability distribution and other outputs (O'Hagan, 2006).

An emulator is trained using

Developed around the same time, the kriging interpolation methods used in geostatistics are mathematically equivalent to the GP methods developed by Currin et al. (1991) (e.g. Cressie, 1990; Ripley, 2005). Kriging-based emulators have been used for 25 years (Koehler and Owen, 1996; Welch et al., 1992), with recent implementations including the DICE-Kriging R packages used for GSA and inverse modelling (Marrel et al., 2009; Roustant et al., 2012). Since the latter approach is computationally faster, we adopted the DICE-Kriging version of the GP emulator for this study. For the statistical theory behind both emulator versions and descriptions of related R packages, see Hankin (2005) and Roustant et al. (2012).

For GSA studies involving highly multi-dimensional output, the time to
compute the SIs can be significantly reduced by employing an emulator-free
GSA approach. In this study, we consider two such methods using (i) GAM and (ii) a PLS
regression approach. For both the GAM and PLS methods, we used

A GAM is a generalised linear model where the
predictor variables are represented by smooth functions (Wood, 2017). The
general form of a GAM is

The PLS method is the only one of the four GSA
methods considered here that is not variance-based (Chang et al., 2015).
Multivariate linear regression (MLR) is a commonly used tool to represent a
set of outputs or response variables (

As an alternative approach for speeding up the sensitivity analysis calculations, we computed the SIs from the Sobol GSA method using a hybrid approach involving PCA to reduce the dimensionality of the output space, and then used separate Gaussian process emulators for each of the transformed outputs (Gómez-Dans et al., 2016; Saltelli et al., 2012; Sexton et al., 2012). After performing the emulator runs, we then reconstruct the emulator output on the original output space, from which we compute the sensitivity indices.

PCA transforms the outputs onto a projected space with maximal variance.
Mathematically, we obtain the matrix of transformed outputs

This technique of reducing the dimension of the output space from

We use this formula to compute the

Flowchart for order of tasks to complete in order to perform
GSA on a computationally expensive model. The
ranges on the inputs, on which its design is based, are determined by
expert elicitation. For approach 1, each dimension (dim.) of the output
consists of a different spatial or temporal point of the same variable
(

The sequence of tasks to complete when performing global sensitivity
analysis is shown schematically in Fig. 1. The choice of inputs (e.g.
parameters) to include in the sensitivity analysis will depend upon which
have the greatest effects, based on expert knowledge of the model and field
of study. Expert judgement is also needed to define the ranges of these
inputs. A space-filling design such as maximin Latin hypercube sampling or sliced Latin
hypercube sampling (Ba et al., 2015) is required in order to sample from
the input space with the minimum sufficient number of model runs. We used

If we are employing an emulator, the next stage is to build the emulator using
the training runs. The number of training runs (

The final stage is to compute the first-order SIs for all the inputs; these
quantify the sensitivity of the output to changes in each input. The SIs are
also known as the main effects. The eFAST, Sobol and GAM approaches can also
be used to compute the total effects, defined as the sum of the
sensitivities of the output to changes in input

Annual column mean

Since the emulators we employed are based on a scalar output, we built a
separate emulator for each of the

The sensitivity indices (percentage of the total variance in a
given output) for the four dominant inputs, with the output given as the annual column mean

The sensitivity indices (percentage of the total variance in a
given output) for the four dominant inputs, with the output given as the annual column mean

As expected, the two emulator-based global sensitivity analysis (GSA) approaches
(eFAST and Sobol) produced almost identical global maps of first-order
SIs (%) of

Statistics (mean, 95th percentile and 99th percentile) of the
distribution of differences in sensitivity indices (SIs) between pairs of
methods. For each comparison, the

Our results show that the GAM emulator-free GSA method produces very similar estimates of the SIs to the emulator-based methods (Figs. 3, 4; row a vs. c for Sobol versus GAM). The 95th and 99th percentiles of differences of the emulator-based methods (e.g. Sobol) versus GAM are 5 and 9 % for FRSGC, and 7 and 10 % for GISS (Fig. 5, M1 versus M3). For both models, the PLS non-emulator-based method produced SIs that were significantly different from those using the eFAST and Sobol methods (Figs. 3, 4; row a vs. d for Sobol vs. PLS). For FRSGC, the mean and 95th percentile of the differences in SIs for the Sobol versus PLS methods are around 21 and 31 %, while for GISS the corresponding values are around 14 and 23 % (Fig. 5, M1 versus M4). Thus, our results indicate that the PLS method is not suitable for use as an emulator-free approach to estimating the SIs.

The global map of SIs using the emulator–PCA hybrid approach compared well
to those from the emulator-only approach (Figs. 3, 4; row a vs. e). The
95th and 99th percentiles of differences between the two
approaches were 6 and 10 %, respectively, for FRSGC (Fig. 5a, M1 versus M5)
and 3 and 5 %, respectively, for GISS (Fig. 5b, M1 versus M5). These
are both higher than the corresponding values for the emulator-only methods
(Fig. 5, M1 versus M2;

Our results align with the consensus that the eFAST method or other modified
versions of the FAST method (e.g. RBD-FAST) produce very similar SIs to the
Sobol method. Mathematically, the two methods are equivalent (Saltelli
et al., 2012) and when the analytical (true) values of the SIs can be
computed, both methods are able to accurately estimate these values
(Iooss and Lemaître, 2015; Mara and Tarantola, 2008). However, many
studies have noted that the Sobol method requires more simulator (or emulator)
runs to compute the SIs. Saltelli et al. (2012) state that

Given recent interest in applying GAMs to perform GSA (Strong et al., 2015a, b, 2014), only Stanfill et al. (2015) have compared how they perform against other variance-based approaches. The authors found that first-order SIs estimated from the original FAST method were very close to the true values using 600 executions of the model, whereas the GAM approach only required 90–150 model runs. This is roughly consistent with our results, as we estimated the SIs using 80 runs of the chemistry models for GAM and 1000 runs of the emulator for the eFAST method.

There are a limited number of studies comparing the accuracy of the SIs of the GAM method amongst different models, as in our study. Stanfill et al. (2015) found that the GAM method was accurate at estimating SIs based on a simple model (three to four parameters) as well as a more complex one (10 parameters). However, if more models of varying complexity and type (e.g. process versus empirical) were to apply the GAM approach, we expect that while GAM would work well for some models, for others the resulting SIs may be substantially different from those produced using the more traditional Sobol or eFAST methods. Saltelli et al. (1993) suggests that the performance of a GSA method can be model dependent, especially when the model is linear versus non-linear or monotonic versus non-monotonic, or if transformations are applied on the output (e.g. logarithms) or not. This is particularly true for GSA methods based on correlation or regression coefficients (Saltelli et al., 1999), which might explain why the SIs calculated from the PLS method in our analysis also disagreed with those of the eFAST/Sobol methods for the FRSGC versus GISS models. Not all GSA methods are model dependent; for example, the eFAST method is not (Saltelli et al., 1999).

For both chemistry models, using PCA to significantly reduce the number of emulators needed resulted in SIs very similar to those calculated using an emulator-only approach. For the GISS model, this was encouraging given that the spread of points and their bias in the emulator versus simulator scatter plot were noticeably larger than those of the FRSGC model (Fig. 2c, d). If we had increased the number of principle components so that 99.9 % of the variance in the output was captured rather than 99 %, following Verrelst et al. (2016), then we would expect less bias in the validation plot for GISS. However, the poor validation plots did not translate into poorly estimated SIs for the emulator–PCA approach. On the contrary, the estimated SIs for GISS are consistent with the estimated SIs using either emulator-only approach (Fig. 5).

The use of PCA in variance-based global sensitivity analysis studies is
relatively new but has great potential for application in other settings. De
Lozzo and Marrel (2017) used an atmospheric gas dispersion model to
simulate the evolution and spatial distribution of a radioactive gas into
the atmosphere following a chemical leak. The authors used principal
component analysis to reduce the dimension of the spatio-temporal output map
of gas concentrations to speed up the computation of the Sobol sensitivity
indices for each of the

Our work extends the work of Wild et al. (2018) who used the same
training inputs and the same atmospheric chemical transport models (FRSGC
and GISS) but different outputs. Instead of using highly multi-dimensional
output of tropospheric methane lifetime values at different spatial
locations, Wild et al. (2018) used a one-dimensional output of
global tropospheric methane lifetime. Using the eFAST method, the authors
found that global methane lifetime was most sensitive to change in the
humidity input for the FRSGC model, while for the GISS model the surface

GSA studies for computationally expensive models involving a small number of inputs (e.g.

GSA is a powerful tool for understanding model
behaviour, for diagnosing differences between models and for determining
which parameters to choose for model calibration. In this study, we compared
different methods for computing first-order sensitivity indices for
computationally expensive models based on modelled spatial distributions of
CH

The R code to carry out global sensitivity analysis using the methods
described in this paper is available in Sects. S2–S7 of the
Supplement. This R code as well as the R code used to validate
the emulators can also be found via

The inputs and outputs of the FRSGC chemistry model that were used to train
the emulators in this paper can be found via

For the Sobol method, Saltelli (2002) and Tarantola et al. (2006) suggest using eight
variants of Eq. (

Thus, the

Using Eq. (A1b), Eq. (

The supplement related to this article is available online at:

ER and OW designed the study. ER conducted the analysis and wrote the manuscript, and OW gave feedback during the analysis and writing phases. OW, FO and AW provided output from the global atmospheric model runs needed to carry out the analysis. LL advised on statistical aspects of the analysis. All coauthors gave feedback on drafts of the manuscript.

The authors declare that they have no conflict of interest.

This work was supported by the Natural Environment Research Council (grant number NE/N003411/1). Edited by: Andrea Stenke Reviewed by: three anonymous referees