Calibrating a global atmospheric chemistry transport model using Gaussian process emulation and ground-level concentrations of ozone and carbon monoxide

Atmospheric chemistry transport models are important tools to investigate the local, regional and global controls on atmospheric composition and air quality. To ensure that these models represent the atmosphere adequately, it is important to compare their outputs with measurements. However, ground based measurements of atmospheric composition are typically sparsely distributed and representative of much smaller spatial scales than those resolved in models; thus, direct comparison incurs uncertainty. In this study, we investigate the feasibility of using observations of one or more atmospheric constituents to estimate parameters in chemistry transport models and to explore how these estimates and their uncertainties depend upon representation errors and the level of spatial coverage of the measurements. We apply Gaussian process emulation to explore the model parameter space and use monthly averaged ground-level concentrations of ozone (O3) and carbon monoxide (CO) from across Europe and the US. Using synthetic observations, we find that the estimates of parameters with greatest influence on O3 and CO are unbiased, and the associated parameter uncertainties are low even at low spatial coverage or with high representation error. Using reanalysis data, we find that estimates of the most influential parameter – corresponding to the dry deposition process – are closer to its expected value using both O3 and CO data than using O3 alone. This is remarkable because it shows that while CO is largely unaffected by dry deposition, the additional constraints it provides are valuable for achieving unbiased estimates of the dry deposition parameter. In summary, these findings identify the level of spatial representation error and coverage needed to achieve good parameter estimates and highlight the benefits of using multiple constraints to calibrate atmospheric chemistry transport models.

Changes in atmospheric composition due to human activities make an important contribution to 2 Earth's changing climate (Stocker et al., 2013) and to outdoor air pollution, which is currently 3 responsible for about 4.2 million deaths worldwide each year (Cohen et al., 2017). Chemistry 4 transport models (CTMs) simulate the production, transport, and removal of key atmospheric 5 constituents, and are important tools for understanding variations in atmospheric composition 6 across space and time. They permit investigation of future climate and emission scenarios that 7 fully account for the interactions and feedbacks that characterise physical, chemical and 8 dynamical processes in the atmosphere. For practical application, CTMs need to reproduce the 9 magnitude and variation in pollutant concentrations observed at a wide range of measurement 10 locations. Where biases occur, these can often be reduced by improving process representation 11 through adjusting model parameters so that the CTM matches the measurements to a sufficient 12 level of accuracy (e.g. Menut et al., 2014). While estimation of model parameters is common in 13 many fields of science, it is rarely attempted with atmospheric chemistry models because they 14 are computationally expensive to run and it is thus burdensome to perform the large number of 15 model runs required to explore model parameter space. Instead, data assimilation has become a 16 standard method for ensuring that model states are consistent with measurements, usually 17 treating model parameters as fixed (Khattatov et   In this study, we explore computationally efficient ways of estimating parameters in 20 chemistry transport models, focusing on two important tropospheric constituents, ozone (O3) and 21 carbon monoxide (CO). Ozone is a major pollutant that is produced in the troposphere by 22 oxidation of precursors such as CO and hydrocarbons, which are emitted during combustion showed that significant decreases of 28% and 6% have occurred in Eastern North America 7 and Europe, respectively, but increases of 20% and 45% in south-east and east Asia (Chang et 8 al., 2017). In recent decades, a similar pattern of decreases in CO in Europe and North America 9 and increases over parts of Asia has also been observed (Granier et al., 2011). To fully explain 10 and attribute these changes, a thorough understanding of the processes controlling these 11 pollutants is needed. 12 To assess the performance of CTMs, it is essential to compare simulations of 13 tropospheric chemical composition with measurements. A comprehensive evaluation of 15 14 global models found that they broadly matched measured O3, but that modelled O3 was biased 15 high in the northern hemisphere and biased low in the southern hemisphere (Young et al., 2018). 16 The models were unable to capture the long-term trends in tropospheric O3 observed at different 17 altitudes. Similar biases were found in an independent study of long-term trends involving three 18 chemistry climate models (Parrish et al., 2014). While identification of these model biases is 19 informative, correcting the deficiencies is challenging because it is often unclear why different 20 models perform well at certain times and for certain places, but poorly elsewhere (Young et al.,21 2018). A practical solution is to perform global sensitivity analysis to identify the parameters or 22 processes that influence the model results most and then to calibrate the model to estimate these that is typically unavailable from simpler approaches. 3 The principal challenge with performing global sensitivity analysis and model calibration 4 is that they require thousands of model runs, and this is infeasible for a typical global CTM that  (Salter et al., 2018). In this study, we apply these approaches to 14 models of tropospheric ozone for the first time to demonstrate the feasibility of parameter 15 estimation. 16 We identify three issues that need to be addressed for successful atmospheric model understood. The effect of representation errors was explored in simple terrestrial Carbon model 1 by Hill et al. (2012), who found that as these errors decreased, the accuracy of parameter 2 estimates improved. 3 Secondly, the spatial coverage of atmospheric composition measurements is typically 4 relatively poor, and this limits our ability to estimate parameters accurately. Thus, it is important 5 to explore how the spatial coverage of measurements affects estimates of model parameters and 6 their associated uncertainties. 7 Thirdly, evaluation of atmospheric chemistry models is typically performed for different NO to constrain a photochemical box model, and found estimates of column OH that were 12-20 40% higher than those from unconstrained CTMs. They also found that although the CTMs 21 simulated O3 well, they underestimated NOx by a factor of two, explaining the discrepancy in 22 column OH. To address these gaps in knowledge, we estimate the probability distributions of eight 1 parameters from a CTM, given surface O3 and CO concentrations from the USA and Europe. 2 We focus on model calibration with a limited number of parameters as a proof of concept, but 3 show how this could be expanded to a much wider range of parameters in future. To overcome 4 the excessive computational burden of running the model a large number of times, we replace the 5 model with a fast surrogate using Gaussian process emulation. After evaluation of the emulator 6 to ensure that it is an accurate representation of the input-output relationship of the CTM, we 7 investigate how well model parameters can be estimated from chemical measurement data. We 8 quantify the impacts of measurement representation error and spatial coverage on the bias and 9 uncertainty in the estimated model parameters and highlight the extent to which parameter 10 estimates can be improved using measurements of different variables simultaneously.   Table 1. These do not encompass all sources of uncertainty in the model, 6 but are broadly representative of major uncertainties across a range of different processes. To 7 provide a simple and easily interpretable approach to calibration, we define a scaling factor that 8 spans the range of uncertainty in each process, and these scaling factors form the parameters that 9 we aim to calibrate. The choice of parameters and uncertainty ranges are described in more detail 10 in Wild et al. (2020). For this study, we focus on monthly-mean surface O3 and CO distributions 11 at the model native grid resolution of 2.8°×2.8° and compare with observations over North 12 America and Europe for model calibration (Fig. 1)  using 4D-Var data assimilation (Flemming et al. 2017). This reanalysis data closely resembles 1 observed O3 and CO where measurements are available and has the benefit of complete global 2 coverage, allowing us to test the importance of measurement coverage directly. 3 Reanalysis data for O3 and CO are available for 2003-2015, and we average the data by 4 month across this period to provide a climatological comparison. The control run of the 5 FRSGC/UCI model matches CO from the reanalysis data reasonably well (Fig. 2), but 6 overestimates surface O3. Overestimation of O3 in continental regions has been noted in previous 7 studies and is partly a consequence of rapid photochemical formation from fresh emissions that 8 is magnified at coarse model resolution (Wild and Prather, 2006). For this exploratory study we 9 bias-correct the modelled surface O3 by reducing it by 25%, following the approach taken by 10 Shindell et al. (2018), so that it matches the reanalysis data (Fig. 2a). This adjustment accounts 11 for the effect of chemical processes and model resolution which are not explored in this study, 12 and provides a firmer foundation for investigating the effects of other processes. values. Synthetic O3 and CO data were generated by adding different levels of representation 1 error for each level of spatial coverage. In mathematical terms: where for the ith point in space or time, refers to the synthetic data for O3 or CO, 3 ( ) is the O3 or CO from the model control run, and is generated from a Normal 4 distribution with mean of zero and standard deviation that is directly proportional to the 5 magnitude of ( ). In this case, = × ( ) where p is a representation error 6 scaling factor that we varied. We included p as one of the parameters to estimate for the 7 reanalysis data and found values in the range 0.16-0.19. Thus, when using the synthetic data we 8 set the representation error scaling factor for these variables to p = 0.01, 0.1, 0.2 and 0.3.   3 We replace the CTM with a surrogate model that maps the inputs of the CTM (the eight 4 parameters listed in Table 1) with its outputs (surface O3 and CO). We employ a surrogate so predicts the output of the model with no uncertainty at the input points it is trained at. 10 Thirdly, it gives a complete probability distribution, as a measure of uncertainty, for estimates of 11 the model output at points it is not trained at. has a probability distribution represented by a mean function ( ) and a covariance function

Gaussian Process Emulation -theory
The mean function is given by: where ℎ( ) is a 1×(q+1) vector given by (1, ), is a vector of coefficients determined by   We overcome this by computing parts of equation (2) prior to these evaluations. Specifically, we 10 compute the vectors , and for all points in the output space, where denotes 11 − , the last part of m(x) from equation (2). We store these three objects as three  We estimate the eight model parameters using Bayesian statistics via the software package Just search is accepted on two conditions: (1) the set is consistent with the prior probability 10 distribution, which for our study was a set of Uniform distributions with the lower and upper 11 bounds given by the defined ranges in Table 1; and (2) the resulting modelled values using the 12 proposed set of parameters are consistent with measurements, which is assessed using the 13 following Gaussian likelihood function: where N is the number of measurements used, ( ) is the ith model output (1 ≤ i ≤ N) using the 15 proposed parameter set , is the measurement corresponding to the ith model output and is 16 the representation error for measurement . 17 We ran three parallel chains for 10,000 iterations each. After discarding the first half of 18 these iterations as 'burn in', we thinned the chains by a factor of five to reduce within-chain 19 autocorrelation. Convergence was assessed using the Brooks-Gelman-Rubin diagnostic tool 20 for each parameter, which we summarize using their posterior means and 95% credible intervals 1 (CIs) defined by the 2.5 th and 97.5 th percentiles (Gelman et al., 2013). We used the R language 2 to code up our configuration of the MCMC algorithm. 3 2.8 Experimental approach 4 We first perform a global sensitivity analysis to identify the parameters which have the greatest 5 influence on the two variables we consider. We then perform parameter estimation using the spatial coverage of these measurements over the regions considered over a wide range: 2.5, 5, 12 10, 20, 40 and 100%. We focus on surface O3 only, surface CO only and then both variables 13 together. We then use the reanalysis data to represent the measurements, focussing on the effects 14 of spatial coverage alone, and estimating the representation error p from this independent dataset.

15
The 90 different scenarios we consider are summarised in Table 2. 16  North America considered here, the simulated monthly mean concentrations of surface O3 are 20 most sensitive to dry deposition and, to a lesser extent, to isoprene emissions (Fig. 4). This is not  3.2 Estimation of scaling parameters using synthetic data 10 We next use synthetic observation data to calibrate the model and estimate scaling parameters.

11
For synthetic data we use the model control run with a specified level of representation error 12 (Table 2), and the default model parameters define the true scaling that we aim to retrieve.

13
Prescribing surface O3 with very little error (p = 0.01) gives an estimate of the dry deposition 14 scaling parameter, which has the largest influence on modelled surface O3, close to its true value 15 and the uncertainty is small even when the spatial coverage of measurements is only 2.5% ( Fig.   16 7, column 1). As the representation error is increased to p = 0.1, the parameter uncertainty is 17 larger at low spatial coverage but the mean estimate remains unbiased (Fig. 7, column 2). The 18 uncertainty at all levels of spatial coverage becomes larger as p increases to 0.2 and 0.3, but the 19 means remain very close to the true values (Fig. 7, columns 3 and 4). Surface CO is largely 20 unaffected by dry deposition, and thus provides very little constraint on the scaling parameter.

21
The effect of prescribing surface CO and O3 together is very similar to that of using surface O3 Using surface CO alone with very little representation error (p = 0.01), the mean estimate 1 of the isoprene emission scaling parameter is equal to the true value with very little uncertainty, 2 regardless of the spatial coverage (Fig. 8, column 1). When the representation error is increased 3 to p = 0.1, the estimate remains very close to the true value, but the uncertainty is substantially 4 higher at low spatial coverage (2.5% and 5%) than at higher coverage (40% and 100%) (Fig. 8,   5 column 2). The estimates deviate further from the truth at higher levels of representation error (p 6 = 0.2 and 0.3) and the uncertainty is greater (Fig. 8, columns 3 and 4). Estimates of the isoprene 7 scaling parameter are less accurate than those of the dry deposition scaling parameter as the 8 posterior means are further from the true value of the parameter and the uncertainty intervals are 9 wider (Fig. 8 vs Fig. 7). As with our findings for dry deposition, the posterior means and the 10 lengths of the uncertainty intervals for the isoprene scaling parameter remain relatively 11 unchanged when surface O3 data is prescribed at the same time. 12 Our findings for the boundary layer mixing scaling parameter follow a similar pattern to 13 the other two parameters (Fig. 9). In all combinations of representation error and spatial 14 coverage, we find that the mean estimates are unbiased. Furthermore, we find that the parameter 15 uncertainty is significantly smaller when the spatial coverage is 10% or higher when p = 0.1, 16 20% or higher when p = 0.2, and 40% or higher when p = 0.3 (Fig. 9, Table 2). It is clear from 17 these results that the scalings for these three model parameters can be successfully estimated 18 from synthetic data with low uncertainty when the representation error is low, and that the 19 estimates remain good, albeit with higher uncertainty, at higher representation error if the spatial 20 coverage is relatively good. We consider next the reanalysis data for surface O3 and CO which are based on assimilated 1 concentrations from the ECMWF model and are thus independent of the FRSGC/UCI model.  Using the reanalysis data for surface O3 alone, we find that the posterior means and the uncertainty is reduced (Fig. 10a). In contrast, using surface O3 and CO together results in an 16 estimate closer to 1 and an additional reduction in uncertainty (Fig. 10g). Inclusion of surface 17 CO measurements, as an additional constraint to surface O3, results in an estimate of the dry 18 deposition parameter closer to that modelled along with a reduction in the associated uncertainty.

19
Using surface CO alone, estimates of the isoprene scaling parameter lie in the central part 20 of the defined range, whilst estimates of the boundary layer mixing scaling parameter lie in the 21 upper half of the defined range (Fig 10e,f). For both parameters, increasing the spatial coverage estimating either of these parameters results in very little difference in the magnitude of the 1 estimate or in the associated uncertainty ( Fig. 10e vs 10h; Fig. 10f vs 10i).   For the reanalysis data, we treat the representation error as a parameter for the MCMC 10 algorithm to estimate along with the eight model parameters. This is possible because we 11 assume that the measured value of O3 is proportional to the simulated value from a forward run 12 of the FRSGC/UCI model, although such an assumption may not be possible in other situations.

13
An alternative approach to estimate the representation error would be to carry out an intensive  We find that as the volume of measurements increase, the estimates of the model parameters are 1 closer to the truth and the width of the credible intervals decrease. This is particularly clear for 2 the dry deposition and isoprene emission scaling parameters when using both O3 and CO 3 concentrations (Figs 8 and 9). While this highlights the value of good spatial coverage, we note 4 that the benefits are greatly reduced if the representation error is relatively high. For the 5 boundary layer mixing parameter, we find little decrease in the credible intervals using synthetic 6 CO data with the highest representation error (p = 0.3), where the spatial coverage is less than 7 20% (Fig. 9, row 2). In contrast, at the p = 0.1 level, a large decrease in uncertainty is seen 8 between the 2.5% and 20% coverage. Similar effects are seen, to a lesser extent, for the dry 9 deposition and isoprene scaling parameters as the spatial coverage increases. 10 Our results using synthetic data show that while the size of the uncertainty intervals vary 11 substantially depending on the spatial coverage or representation error, the posterior means are 12 for the most part very close to the true values. Deviation from these typically occurs when the 13 measurements contain less information either due to low spatial coverage or high representation 14 error. However, the uncertainty intervals include the true values of the parameters for all the 15 experimental scenarios considered here, unlike in Hill et al. (2012). This gives strong confidence 16 in the reliability of the MCMC method used to estimate the parameters. 17 4.3 Applying multiple constraints 18 The importance of multiple constraints was most apparent for scenarios involving the 19 reanalysis data. For the dry deposition scaling parameter, which explains much of the variance 20 in surface O3 (Fig. 4), we iund that using O3 data alone results in mean estimates that are in the 21 upper half of the range of possible values (Fig 10a). However, including CO data brought the ( Fig. 10g). This is remarkable given that dry deposition is not an important process for 1 controlling CO, and highlights the coupling between processes that permits constraints on one 2 process from one variable to influence those on another. However, it is consistent with previous intervals when using O3 and CO together rather than a single constraint. This reveals that the 7 importance of using multiple constraints is dependent on the process and on the variable 8 constrained. A judicious choice of these could allow a particular process to be targeted. 9 Overall, our estimates of the dry deposition and isoprene emission scaling parameters are close 10 to a priori values from the FRSGC/UCI CTM. In contrast, our estimates of the boundary layer 11 mixing scaling parameter are substantially larger than those from the model, suggesting that this 12 process is not represented well in the model. Our results have demonstrated the feasibility of using measurement data to constrain model 15 parameters under the right conditions. We have chosen to use synthetic data as they have allowed 16 us to vary the spatial coverage and to investigate the effects of representation error which is 17 poorly characterised when using real measurements data. Quantifying this type of error for real 18 measurements is difficult because measurement sites are relatively sparse and are often 19 representative of a limited area rather than the larger area typical of a model grid-square. 20 However, this study has allowed us to estimate the representation error associated with the 21 reanalysis data, and in the absence of more information these values could be used as a guide 22 when applying surface measurements as a constraint. values to be close to 1, but using surface O3 reanalysis data alone we found posterior mean 5 scaling parameters approaching 1.4, with credible intervals that did not include 1 (Fig. 10a). 6 This likely reflects overestimation of surface O3 in continental regions in the CTM and may 7 reflect uncertainties and biases in other processes not considered here, most notably in the 8 chemical formation and destruction of O3 and in model transport processes. In the absence of 9 consideration of the uncertainty in these processes in this feasibility study, the dry deposition 10 parameter is used as a proxy process to reduce O3 concentrations. This is an example of CO goes some way to addressing this, but does not remove the problem. Before applying real 14 surface measurements to constrain the CTM, we propose a more comprehensive assessment of 15 model uncertainties with a wider range of parameters so that the constraints can more directly 16 inform process understanding and model development. this feasibility study we have shown that surface O3 has a large sensitivity to dry deposition, and 1 that surface CO is most sensitive to isoprene emissions and boundary layer mixing processes, as 2 expected. We find that estimates of the scaling parameters for these processes are dependent on 3 the spatial coverage and representation error of the surface O3 and CO data. Our parameter 4 estimates become less uncertain as coverage increases and as the representation error decreases, 5 whilst remaining unbiased. Furthermore, we show that using two separate data constraints, in 6 this case surface O3 and CO, instead of a single one can result in mean parameter estimates that 7 are much closer to their likely true values. However, this is dependent on the processes 8 considered and constraints applied, and while it effective for dry deposition here, we find 9 relatively little improvement in the estimates or uncertainties for isoprene emission or boundary 10 layer mixing processes that are also considered here.

11
The approach we adopt here provides a means of constraining atmospheric models with 12 observations and identifying sources of model error at a process level. Our results suggest that 13 dry deposition and isoprene emissions are represented relatively well in the FRSGC/UCI CTM 14 but that boundary layer mixing processes may be somewhat underestimated. However, we have 15 explored the effect of only eight parameters in this study and consideration of a more complete 16 set of processes, including those governing photochemistry and dynamics, is needed to generate 17 more realistic constraints for key pollutants such as O3. We aim to expand this study to 18 investigate a more extensive range of parameters and processes and to constrain with a wider The formula for the covariance function ( , ′) from §2.2 is given by: Author contributions 10 ER and OW designed the study. ER carried out the statistical analyses, and OW ran the 11 FRSGC/UCI model and provided the outputs that were used to train and validate the emulators.

12
ER wrote the paper with input from OW.  corresponding to the other four parameters are shown in Figure S2 of the supplementary material. 5 6 7 Figure 6. Sensitivity indices representing the percentage of the variance in surface CO in the 8 FRSGC/UCI model output due to changes in each input parameter. Maps of sensitivity indices for the 9 other four parameters are shown in Figure S3 of the supplementary material. based on reanalysis datasets from scenarios 73-90 (table 1). The first and second rows show these 4 parameters estimated using one stream of data (O3 for the first row and CO for the second row), while the 5 third row shows estimates using two data streams (O3 and CO).  varying: (i) the type of data (synthetic or reanalysis); (ii) the representation error used for the synthetic 4 data (p) where ( ) is the control run output of the CTM and is the amount of statistical noise 5 added; (iii) the percentage coverage of grid-squares in the USA and Europe. For the synthetic data the 24 6 scenarios correspond to a full factorial combination of four levels of representation error and six levels of 7 spatial coverage, while for the reanalysis data the six scenarios correspond to the six levels of spatial 8 coverage. 9