Constraining the carbon cycle in JULES-ES-1.0

McNeall, Douglas; Robertson, Eddy; Wiltshire, Andy

doi:https://doi.org/10.5194/gmd-17-1059-2024

Articles | Volume 17, issue 3

https://doi.org/10.5194/gmd-17-1059-2024

Special issue:

Joint UK Land Environment Simulator (JULES) – configurations,...

https://doi.org/10.5194/gmd-17-1059-2024

Articles | Volume 17, issue 3

Model evaluation paper

08 Feb 2024

Model evaluation paper |

| 08 Feb 2024

Constraining the carbon cycle in JULES-ES-1.0

Douglas McNeall, Eddy Robertson, and Andy Wiltshire

Abstract

Land surface models are an important tool in the study of climate change and its impacts, but their use can be hampered by uncertainties in input parameter settings and by errors in the models. We apply uncertainty quantification (UQ) techniques to constrain the input parameter space and corresponding historical simulations of JULES-ES-1.0 (Joint UK Land Environment Simulator Earth System), the land surface component of the UK Earth System Model, UKESM1.0. We use an ensemble of historical simulations of the land surface model to rule out ensemble members and corresponding input parameter settings that do not match modern observations of the land surface and carbon cycle. As JULES-ES-1.0 is computationally expensive, we use a cheap statistical proxy termed an emulator, trained on the ensemble of model runs, to rule out parts of the parameter space where the simulator has not yet been run. We use history matching, an iterated approach to constraining JULES-ES-1.0, running an initial ensemble and training the emulator, before choosing a second wave of ensemble members consistent with historical land surface observations. We successfully rule out 88 % of the initial input parameter space as being statistically inconsistent with observed land surface behaviour. The result is a set of historical simulations and a constrained input space that are statistically consistent with observations. Furthermore, we use sensitivity analysis to identify the most (and least) important input parameters for controlling the global output of JULES-ES-1.0 and provide information on how parameters might be varied to improve the performance of the model and eliminate model biases.

Download & links

Received: 18 Nov 2022 – Discussion started: 03 Feb 2023 – Revised: 20 Sep 2023 – Accepted: 20 Nov 2023 – Published: 08 Feb 2024

The works published in this journal are distributed under the Creative Commons Attribution 4.0 License. This license does not affect the Crown copyright work, which is re-usable under the Open Government Licence (OGL). The Creative Commons Attribution 4.0 License and the OGL are interoperable and do not conflict with, reduce, or limit each other.

© Crown copyright 2024

1 Introduction

Land surface models are widely used for the study and projection of climate change and its impacts, but differences between the models and the systems they represent can limit their effectiveness (Fisher and Koven, 2020). These differences can be caused by fundamental errors or a lack of knowledge of real-world processes or by the simplifications and compromises required to build and run the computationally expensive computer models that we use to simulate those processes. Uncertainty quantification (UQ) methods have been developed in order to identify and quantify the differences between the models and the real world, how to minimise those differences, and how to assess their impact on the use of models in policy advice.

Complex land surface models contain a large number of tuneable input parameters – numbers which represent simplifications of processes which are either unnecessary or too computationally expensive to include in a simulation. The value of a particular input parameter can materially affect the output of a model, but it is often unclear exactly how until the corresponding simulations are run. Input parameters may represent some real-world quantity, and so a modeller may have a belief about the correct value of that input parameter that they are able to represent with a probability distribution. As processes interact within a model, input parameters can rarely be tuned in isolation to each other. For many climate applications, a goal is to choose the ranges of input parameters so that the model behaves in a manner consistent with our understanding of the behaviour of the true system and that is consistent with our knowledge of the value that those input parameters have in the real world. The input parameters of such a model are said to be constrained, and any simulations of future behaviour will be constrained by our choice of input settings to be consistent with our best understanding of the system.

As we have uncertainty about the structure and the behaviour of the true system, through a lack of knowledge or through uncertain observations, the value of the valid configurations of input parameters or their associated probability distributions is uncertain. In practice, modellers often use constrained ranges of input parameters to run collections of model evaluations termed ensembles, crudely representing uncertainty about the behaviour of the true system.

Perturbed parameter ensembles (PPEs) are useful for the evaluation of complex and computationally expensive climate models, including land surface models. A PPE is a collection of simulator runs, where input parameters are systematically varied, in line with a set of principles consistent with our knowledge and experiment objectives. PPEs allow a quantification of the relationship between the model input parameters and its output. They can therefore be used for the quantification and propagation of uncertainty in parameter constraint, in sensitivity of the model outputs to perturbations, and can give hints as to the size and location of model structural errors.

PPEs are now standard practice in the study of climate models for uncertainty quantification, climate and impacts projection (e.g. Sexton et al., 2021; Edwards et al., 2019), parameterisation improvement (Couvreux et al., 2021), sensitivity analysis (SA) (Carslaw et al., 2013), or as part of a strategy for model development and bias correction (e.g. Williamson et al., 2015; McNeall et al., 2016, 2020; Hourdin et al., 2017.

The size of PPEs is often limited by the computational expense of the complex models that they are used for, and so there is a natural tension between model complexity, resolution and the length of simulations, and the number of ensemble members available for parameter uncertainty quantification. A cheap surrogate model (sometimes termed metamodel or emulator) is useful to maximise the utility of an ensemble in these situations. Emulators are statistical models or machine learning algorithms, usually trained on a PPE that has been carefully designed to cover input space effectively. Gaussian process emulators, used in this study, have an advantage in that they are naturally flexible and natively include uncertainty estimates as to their error, but simpler linear model and more complex machine learning approaches are possible and can be effective.

1.1 Experiment structure

In this paper, we develop a comprehensive PPE of JULES-ES-1.0 (Joint UK Land Environment Simulator Earth System) and compare it with observations to find a set of ensemble members that are broadly consistent with historical behaviour of the land surface. Our focus in on global totals of carbon-cycle-relevant quantities, which are of direct interest for carbon budget assessments such as the Global Carbon Budget project (Friedlingstein et al., 2022). The ensemble is designed to be a flexible basis for a number of initial analyses and provide a foundation for further exploration in uncertainty quantification of both historical and future projections of the land surface and carbon cycle. We would like an ensemble with members that fully represent and sample the spread of uncertainty we have in historical land surface characteristics and behaviour. We would also like an ensemble large enough to effectively train an emulator, in order to perform a number of analyses.

Our approach could be regarded as a top-down initial exploration, in that we choose a large number (32) of parameters to vary simultaneously; we perturb them by a large margin due to the relatively weak beliefs we have about their true values and a desire to explore a broad range of model sensitivities; and we focus on the model performance at global levels, averaged over long time periods. The approach is data-led in that it relies on comparison with data to winnow out poor simulations and their corresponding parameter settings. An equally useful complementary approach might be bottom-up – focusing on smaller numbers of parameters from individual processes within the land surface, perturbing those parameters by smaller amounts, and focusing on regional or local details and seasonal time periods. However, we decided on the exploratory top-down approach in which we risked a large number of ensemble members not producing a recognisable carbon cycle but with the view we would benefit most in our understanding of our model behaviours and sensitivities.

We use iterated refocusing, or history matching, to sequentially constrain our model. We first run an exploratory ensemble in Sect. 1.2 and remove any members (and their corresponding input parameter configurations) which produce obviously bad or no output in Sect. 2.1. We use the remaining members to build a Gaussian process emulator and then run a second-wave ensemble with members chosen from input parameter regions calculated to have a good chance of producing model output consistent with observations. The formal part of this process uses a Gaussian process emulator and history matching to find the regions of input space that stand a good chance of producing realistic model output. Details of this process are outlined in Sect. 2.4. The intention is that any further work on building ensembles in the future could go ahead from the basis that model variants get the global overview correct and go on to further constrain the model by focusing on finer details. This would avoid being over-focused on finer regional or temporal details at the expense of large-scale behaviour. We apply our observational constraints to the new set of simulations, leaving a set of historical model runs broadly consistent with modern observations of the carbon cycle.

Next, we use all valid ensemble members from both waves to build a new set of more accurate Gaussian process emulators and find the best estimate of the input space where the model is likely to produce output that matches reality in Sect. 2.7.2.

Our next analysis uses Gaussian process emulators to produce several different types of sensitivity analysis, with the aim of quantifying the relationship between inputs and model outputs and robustly identifying the most (and least) important parameters for the outputs of interest. This analysis is found in Sect. 3.

Finally, we discuss how these results help us learn about the model in Sect. 4 and offer some conclusions in Sect. 5.

1.2 Experiment set-up

The aim of the experiment is to iteratively constrain the land surface model by subjecting it to increasingly demanding comparisons with reality. The experiment consists of two cycles of running a simulator ensemble, followed by constraint. Each of these cycles is termed a wave. The first wave is exploratory, and the design of the second wave is informed by the outcome of the first. We outline some of the technical details of setting up the experiment in this section.

We wish to isolate the effects of uncertain parameters on the land surface, rather than evaluating the effects of climate biases or interactions with other components of the UK Earth System Model, UKESM1.0. We therefore run the first ensemble of JULES-ES-1.0 with each member driven by the same historical climate data – a reanalysis from the Global Soil Wetness project Phase 3 (Kim, 2017). The simulations include land use change and rising atmospheric CO₂, following the LS3MIP protocol (van den Hurk et al., 2016) spun-up to a pre-industrial state in 1850 by cycling the 1850–1869 climate but with fixed 1850 CO₂ concentrations. A total of 1000 years of spin-up is performed, and then each member is run transiently through to 2014. Each ensemble member has a different configuration of 32 input parameters, identified as potentially important in influencing land surface dynamics (see Table A1 in the Appendix for a full list of perturbed parameters). The parameters were perturbed randomly, in a maximin Latin hypercube configuration (McKay et al., 1979), shown to be an effective space-filling design for building accurate emulators (Urban and Fricker, 2010). Parameter ranges were defined by the model developers as being likely to at least produce output from the model. We identify model variants, and their associated input configurations, where the model produces output that is consistent – within uncertain limits – with modern observations of the carbon cycle. We use Gaussian process emulators trained individually on each type of model output of interest to allow us to visualise and explore these relationships as if we had a much larger ensemble.

1.3 Land surface model

We use JULES-ES-1.0, the current Earth system (ES) configuration of the Joint UK Land Environment Simulator (JULES). JULES-ES forms the terrestrial land surface component of the UK Earth System model, UKESM1.0 (Sellar et al., 2019). JULES simulates the exchange of heat, water, and momentum between the land surface and the atmosphere, as well as biogeochemical feedbacks through carbon, methane, and biogenic volatile organic compounds (BVOCs). This JULES-ES configuration is based on JULES GL7 (Wiltshire et al., 2020), with interactive vegetation via the TRIFFID dynamic vegetation model (Cox, 2001), nutrient limitation via the nitrogen scheme (Wiltshire et al., 2021), and updated plant physiology (Harper et al., 2018). JULES-ES includes four land classes, natural, non-vegetated, pasture and cropland, to represent land use change. It has 13 plant functional types (PFTs): 9 natural PFTs compete for space in the natural fraction and 2 in each of the cropland and pasture land, respectively, also compete for space. Non-vegetated surfaces are represented as urban, lake, ice, and bare-soil surface tiles. Urban, lake, and ice fractions remain constant, while bare soil is a function of vegetation dynamics. Vegetation coverage is a function of productivity, disturbance, and intra-PFT competition for space. The bare-soil fraction is the remainder of the grid box after competition. Land use change is implemented via ancillary files of crop and pasture coverage, and TRIFFID dynamically removes PFT coverage to assign space to a new land cover class. During land abandonment, the space is allocated to bare soil and TRIFFID, colonising the available land. This configuration includes the nitrogen cycle (see Wiltshire et al., 2021), and the availability of nitrogen limits the assimilation of carbon and the turnover of soil carbon. In the crop land classes, perfect fertiliser application is assumed, such that crop PFTs are assumed not to be nitrogen-limited.

1.4 First-wave design

We set the budget for the initial exploratory ensemble (designated wave00) to 499 members (plus standard member), a little over 10 members per input parameter recommended as a rule of thumb by Loeppky et al. (2009), even allowing for a proportion of the ensemble to be held out for emulator validation purposes.

The initial ensemble must explore the limits of parameter space and ensure that any future constrained ensemble would be interpolating and not extrapolating from the initial design. We therefore asked model developers to set ranges on the parameter perturbations to be as wide as possible, while still having a good chance of running and at least providing output. We elicited reasonable multiplication factors from the modeller, based on the uncertainty for each parameter, and an estimate of the limits of the parameters at which the model would even run (see Fig. 1). The multiplication factors were perturbed in a space-filling design, in order to best cover input space and help build Gaussian process emulators, which can have problems if design points are spaced too closely. We chose a maximin Latin Hypercube design using the R package Latin Hypercube Samples or LHS (Carnell, 2021). The design chosen was the Latin hypercube with the largest minimum distance from a set of 10 000 generated candidates.

Many of the input parameters have different values for the 13 different PFTs in JULES. If each of these were perturbed independently, there would potentially be an intolerably large input space. Even accounting for the fact that many of the input settings would not be available (for example, perturbing some input values too far would effectively turn one PFT into another), the input space would still be of a very high dimension. In addition, each input setting would need careful thought from the modeller, and the resulting input space would be highly non-cuboid and complex. This would require a very large input of time and effort from both modellers and statisticians.

To combat this explosion of dimensions and effort, we chose the pragmatic option to perturb each of the PFTs together by a multiplication factor for each parameter. This choice was computationally convenient for a top-down, globally averaged experiment, but we see great potential for optimising the values of these parameters for individual PFTs in further work (a good example is Baker et al., 2022). Many of the multiplication factors varied the parameter range between half (0.5) and double (2) the parameter standard value. Some parameters were a switch and thus were set at zero or one. When the design was generated on a 0–1 axis, the design points were simply rounded to zero or one in this parameter.

https://gmd.copernicus.org/articles/17/1059/2024/gmd-17-1059-2024-f01

Figure 1Parameter multiplication factor ranges the initial ensemble (wave00) design.

Constraining the carbon cycle in JULES-ES-1.0

1.1 Experiment structure

1.2 Experiment set-up

1.3 Land surface model

1.4 First-wave design

2.1 Failure analysis

2.2 First-wave results (level 1a constraint)

2.3 Level 2 constraints

2.4 History matching

2.5 Second-wave (wave01) design

2.6 Second-wave results

2.7 Induced constraints

2.7.1 Constraints in model outputs

2.7.2 Constraining input space with an emulator

3.1 Global one-at-a-time sensitivity

3.2 Sensitivity of constraint outputs

3.3 FAST sensitivity analysis

3.4 Monte Carlo filtering

3.5 Screening input variables by ranking

4.1 Lessons learnt

D1 Leave-one-out metrics

D1.1 Assessing emulator variance

D2 Predicting members that satisfy level 2 constraints