Emulation of high-resolution land surface models with

. Land surface models are typically integrated into global climate projections, but as their spatial resolution increases the prospect of using them to aid in local policy decisions becomes more appealing. If these complex models are to be used to make local decisions, then a full quantiﬁcation of uncertainty is necessary, but the computational cost of running just one (cid:58)(cid:58)(cid:58) full simulation at high resolution can hinder proper analysis. 5 Statistical emulation is an increasingly common technique for developing fast approximate models in a way that maintains accuracy but also provides comprehensive uncertainty bounds for the approximation. In this work, we develop a statistical emulation framework for land surface models which acknowledges the forcing data fed into the model, providing , (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) enabling (cid:58)(cid:58)(cid:58) fast (cid:58) predictions at a high resolution. (cid:58)(cid:58) To (cid:58)(cid:58)(cid:58) do (cid:58)(cid:58)(cid:58) so, (cid:58)(cid:58)(cid:58) our (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) emulation (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)


Introduction
Land surface models (LSMs) represent the terrestrial biosphere within weather and climate models, focusing on hydrometeorology and biogeophysical coupling with the atmosphere.The latter includes nutrient flows between vegetation and soils, and the turbulent exchange of CO2, heat, moisture, and momentum between the land surface and the atmosphere.The Joint UK Land Environment Simulator (JULES, Cox et al. 1998, Best et al. (2011), Clark et al. (2011)) :::::::::::::::::::::::::::::::::::::::::: (Cox et al., 1998;Best et al., 2011;Clark et al., 201 is an example of an LSM; used for a variety of applications and temporal/spatial scales as part of the UK Met Office's Unified Modelling system.LSMs can be used to further scientific understanding of land surface processes and to inform policy decisions.For both applications, increased confidence in simulated results and knowledge of model uncertainty is needed, which typically involves running the model many times with varied forcings and parameters (Booth et al., 2012;Murphy et al., 2004).
The computational cost of running these models limits the number of runs that can be obtained, constraining the resulting analysis.
An important factor in computational cost versus practical relevance is the resolution at which the model can be run.Whilst there is a general, and not always justifiable, push towards higher resolution across climate modelling, when using LSMs to support landscape decisions (whether these be local/national government policy decisions, or landowner investment decisions), ensuring that the model is able to inform at decision relevant resolutions is critical.For example, if considering different policies to incentivise farmers to alter land use (by giving over land to tree planting for example), LSM simulations run to help understand the efficacy and risks of different policies would need to be at a high enough resolution to capture areas the size of individual farms (say at 1km).For example, :::: This ::: can ::: be :::::::: incredibly :::::: costly; : in Ritchie et al. (2019), at 1.5km x :: × : 1.5km resolution for Great Britain (77980 grid cells), JULES took 30 :::::::::::: approximately :::: 25.5 hours to simulate 11 years : a :::::: decade : (with 20 years of spin up simulation to allow any input parameters to influence the present day ::::::::: present-day : land surface, which took 1 additional hour) on 144 cores :::: takes :::::::::::: approximately ::: 19 :::::::: additional :::::: hours) :: on ::: 72 ::::::::: processors of the UK NERC/Met Office MONSooN supercomputer.Quantifying uncertainty for even a single policy option, let alone a diverse array of policies, would not be feasible using high resolution JULES directly.
Statistical surrogate models, also known as emulators, have been developed to combat this issue (Sacks et al., 1989;Kennedy and O'Hagan, 2001).An emulator is a statistical model that, once built, facilitates fast predictions of the output of a computer model, with :::::::: quantified : uncertainty in the prediction ::::::::: predictions, and without any further simulation.The resulting statistical model provides a powerful tool for exploring, understanding, and improving the process-based model from which it is built.
Fast predictions of land surface models could enable better decision making ::::::::::::: decision-making, improve scientific understanding, and enable the effective linking of multiple models, all while quantifying the various uncertainties involved.
Emulators have been widely used by the climate community.They have been used to study the Met Office's coupled models (Williamson et al., 2013), including developing the UK Climate Projections in 2009 and 2018 (Sexton et al., 2012(Sexton et al., , 2019)).Hemmings et al. (2015) build mechanistic emulators for specific locations for an ocean biogeochemical model; Petropoulos et al. (2014) conduct a comprehensive sensitivity analysis using an emulator for a land surface model; and Williamson et al. (2017) calibrates an ocean model using emulators.Such efforts typically only emulate a few key locations or summary statistics of interest, for example; : , McNeall et al. ( 2020) emulate (and calibrate) JULES considering only three averaged locations.These types of analysis are useful for :: can ::: be ::::: useful :: in : understanding the sensitivity of the model output to its different parameters, and for constraining parameter space, yet :: but : cannot be used as surrogates to the full model to support decision making ::::: when :::::: needed :: to :::::: support ::::: local ::::::::::::: decision-making.
Emulating the spatio-temporal output in order to use the :: an emulator as a surrogate to the model, :: full :::::: model has also received a lot of attention.:::::::: attention.Lu and Ricciuto (2019) attempt to emulate an LSM at a higher resolution, where they reduce the dimensionality of the output via singular value decomposition.This is a well-known strategy (Higdon et al., 2008), but lowering the dimension in this way can lead to a loss of information and interpretability, with documented negative effects (Salter et al., 2019).Additionally, Lu and Ricciuto (2019) use a neural network to construct their emulator.Whilst neural networks can be capable tools, they do not provide a complete quantification of the various uncertainties, which can be an essential quality when dealing with complex LSMs.
Land surface models typically do not exchange information laterally between grid cells (LSMs with more sophisticated hydrology schemes can be an exception to this, where groundwater and rivers can flow between grid cells).Therefore, the structure of the spatio-temporal outputs is often controlled entirely by a set of pre-known forcing data (which could be observational data, or outputs from an atmospheric model, or pre-selected by a practitioner).For example, JULES relies on a set of driving data; providing information about the weather on a sub-daily time step, and various soil properties.In many cases, the land surface model output can be treated as independent in space and time : , conditional on the forcing data.With this framing, the land surface model only outputs a spatial-temporal map because it is input a spatial-temporal map.For example, JULES can be run such that grid cells : In ::::: other :::::: words, :::: grid :::: cells :: in :::::: LSMs :::: often : do not "talk" to each other: what happens in one grid cell has no bearing on what happens in a neighbouring grid cell (except in that their forcing data is likely to be similar, and thus ::: that :: is :::: why their outputs are likely to be similar).
In this paper we outline a framework for building emulators of LSMs, leveraging this interesting property of many LSMs to facilitate the emulation of the high-dimensional output.This framework is described in Section 2. We then demonstrate the capabilities of such an emulator by emulating JULES at 1km resolution in Section 3. We use the ::: also ::: 2 Methods

Emulating a Land Surface Model
We consider the case of an LSM that ::::: which : outputs spatial maps that vary in time.Spatio-temporal maps can be very high dimensional; for example, our study area is Great Britain, where there are 230,615 grid cells (using a grid resolution of 1 km x :: × 1 km ) ::: grid ::::: cells.If Gross Primary Productivity (GPP) was output dailythen the result : , :::: then :: the ::::::::: simulation ::::: result ::: for : a :::::: single ::: year : would be an approximately 8.5 million dimensional outputeven if JULES was only run for a single year.
Spatio-temporal ::: The ::::::::::::: spatio-temporal : correlation structure in LSM's :: an ::::: LSM : is often inherited through its input forcing data.For example, JULES solves the same differential equations independently in each grid cell and no information is passed horizontally by the solver.As such, we choose to treat an S × T dimensional output as S × T ::: ST : different 1D outputs, each with a different set of forcing data inputs (where S represents the number of spatial locations and T the number of timesteps).
Whilst the assumption of spatial independence (conditional on the forcing data) can be sensible, temporal dependence ::::::::::: independence : is often less so.In JULES, various internal state variables are stored and updated at each time step, which provides some temporal structure.For example, the soil moisture (which is modelled by JULES) depends not only on the precipitation at the current time step, but also previous time steps.Another example is the leaf area index: future carbon assimilation and respiration depend on the leaf area index from the previous time step.

Data abundance
The formulation above in Section 2.2 ::::::: discussed ::::::::: previously, where each grid cell and each time step is treated as an independent (conditional on the forcing data) data point, results in tens of millions of data points per year for even a single simulation.
Theoretically, an abundance of data should greatly improve predictive capabilities.However, Gaussian processes are not designed for large data sets, as they require the inversion of an n × n covariance matrix, where n is the number of data points; : , and so computational time scales with n 3 .In computer experiments, roughly 10 data points per input dimension is normally expected and recommended (Loeppky et al., 2009), ::: and ::: so ::: tens ::: of ::::::: millions :: of :::: data ::::: points :: is :: far ::::::: beyond ::::::: standard.We could use other statistical models here instead (such as a standard linear regression model, or a neural network), but the flexibility and the uncertainty estimates produced by : of : a GP are desirable.

Obtaining an Ensemble
To build an emulator of the type described above :::::::: previously, we need several simulations of :::: from the land surface model.
Standard practice would be to run the land surface model for the entire area of interest (for example, Great Britain), and then repeat this several times with different parameter values to obtain an initial ensemble (Murphy et al. (2004); Booth et al. (2012) ; Williamson et al. (2017)) :::::::::::::::::::::::::::::::::::::::::::::::::::: (Murphy et al., 2004;Booth et al., 2012;Williamson et al., 2017).Our assumption of LSM grid cell independence means that we do not need to run the LSM for the entire area of interest.Instead, we select To see the potential benefits of this approach, consider an example where the area of interest contains 200,000 grid cells, and we can :::: could : afford to run the LSM for this area 10 times, each time with a different θ value.Alternatively, we could run 2,000,000 different θ values, each one run at only one coordinate.Collectively, this ::::: second :::::: option : should still provide a good coverage of the different possible forcing data, but we also ::::: would : now have much better coverage in θ, whilst not :::::: without requiring any additional simulation effort.

Emulating JULES
To demonstrate the above ::::::: outlined framework, we build an emulator for JULES.We narrow our focus into only investigating :: to :::: only ::::::::: investigate : Gross Primary Productivity (GPP), which is a measure of plant photosynthesis.We begin with GPP as this is the entry point of carbon into the terrestrial carbon cycle.Further work could focus on emulating other aspects of the land carbon cycle (such as net primary productivity ::: Net ::::::: Primary ::::::::::: Productivity, or vegetation and soil stores).The carbon cycle is relevant for studies of climate change, and for these applications JULES is typically run globally with a course spatial resolution (at least 0.5 • x 0.5 • ).However our study was : is : motivated by an increasing need for detailed process models to inform decisions about land use and management at a much higher resolution.Therefore, : we run and emulate JULES at a 1km x : × : 1km resolution for Great Britain 1 .
JULES grid cells :::: Grid :::: cells ::: in :::::: JULES : are subdivided into tiles representing vegetated and non-vegetated surfaces.On vegetated tiles, JULES calculates the GPP for different plant functional types (PFTs).The grid cell GPP is a weighted average of the PFT-dependent GPP (depending on the fractional area covered by each PFT).The 5 PFTs we use are: deciduous broadleaf trees (BT), evergreen needleleaf trees (NT), C3 grasses (C3g), shrubs (SH) and cropland (Cr).The fractional area covered by each PFT is set based on land cover data (see Appendix B : C : for details on the JULES simulations).We build 5 independent emulators, one for each PFT.These independent emulators can then be summed together, weighted by the PFT fractions, to provide the final emulator for overall GPP.In summary, the overall emulator is then: with where γ j is the fraction of PFT j in the grid cell, GPP j is the PFT-j-specific GPP value, and θ j is the collection of input parameters that are relevant for PFT j.This overall emulator is simply the sum of 5 distinct emulators, one per PFT.Building one emulator for each PFT and then summing them together makes use of more information known to JULES, which should improve the accuracy of the overall emulator.It also reduces the dimension of θ in each individual emulator.

Emulator Performance
It is an essential part of the process to check the accuracy of an emulator, just as it is with the land surface model itself.When training the emulator(s), we hold 10% of the data points aside at random.These held-out data points can then be used to test the accuracy of the emulator(s).For each of the 5 PFT-specific emulators, we obtain emulator predictions for 1000 randomly chosen points from the held-out testing data set, and obtain the 2 standard deviation intervals.These 2 standard deviation intervals should approximately correspond with the 95.4% certainty interval; and so roughly 95.4% of the held-out data should lie within the intervals.Table 3 shows that each of the 5 emulators performs well, with accuracy rates between 94.8% and 95.6%.
With this emulator we can then make fast predictions of 8-day average GPP without needing to run JULES again.We can do this for any time, location and scenario (whether historical or hypothetical), and for any tuning parameter settings (assuming reasonable ranges) : , rapidly on a personal laptop.As an example, Figure 2 shows two 1km resolution maps of predicted GPP for Great Britain, for June 1st 2004 2 , obtained from the emulator, using two different tuning parameter settings.
The emulator is given no information about location, and thus the spatial structure in the predictions is inherited entirely from the forcing data provided.Other maps like these can be produced for different scenarios representing a range in environmental data, parameter settings, land use or PFT fractions.
The ability to emulate a land surface model at high resolution opens many potential avenues of research.For the rest of this section we explore two such avenues: sensitivity analysis and calibration.

Sensitivity Analysis
With a Gaussian process emulator it is possible to obtain an automatic, preliminary, sensitivity analysis.If the covariance function, k, in Equation ( 1) is chosen to be a "non-isotropic squared exponential covariance function" (sometimes known as an " :: an : automatic relevance determination kernel "); ::: (as :: in ::: this :::::: paper), : the emulator will automatically obtain estimates of the relative importance of different inputs (i.e.how sensitive the outputs are to individual inputs) if a constant mean function is ::: also : chosen (Rasmussen and Williams, 2006).This choice of k is available as an option in any ::::: almost ::: all : Gaussian process software, ::::: often :: as ::: the ::::::: default.In training the emulator, lengthscale estimates are obtained, which provide a measure of how far away two points need to be in a given dimension before they become uncorrelated.As such, a smaller lengthscale suggests a stronger relationship between the parameter and the output, and thus greater importance.Figure 3 plots these estimates for each input : , ::: for :::: each :::: PFT.
Clearly, each PFT has a different relationship between the inputs and GPP, but some overall patterns are visible.For one, the forcing data is, in general, more important than the parameter settings.The air surface temperature (tas); , : the gradient of a specific grid cell (slope); : , : the amount of shortwave radiation (rsds); : , and humidity (huss) , all appear important for all PFTs.

Tuning/Calibration
Formal tuning, or calibration, of the various input parameters in a land surface model can be conducted ::: also ::: be ::::::::: performed :::: more :::::: easily using an emulator.Tuning is the process of choosing the parameters such that the resulting outputs match up with real life observations.Without an emulator, an exhaustive search of the different possible parameter settings is prohibitively expensive, and so tuning in practice :::: often : involves some degree of arbitrariness, relying heavily on subjective experience and instinct.Alternatively, an optimisation procedure can be taken (Raoult et al., 2016;Peylin et al., 2016), but this can be computationally intensive, the results will not quantify uncertainties completely, and no alternative options are provided if the final result does not agree with scientific belief.
With an emulator, many different parameter settings can be directly tested, facilitating an efficient exploration of the parameter space.History Matching :::::::: matching is a straightforward method for testing different parameter settings, ruling out inputs as 'implausible' if observed data does not match with the model output (Craig et al., 1997), and has already been successfully applied to other climate models (Williamson et al., 2013;Couvreux et al., 2021;Hourdin et al., 2021).Given observed data y obs , observational error σ 2 obs , a tolerance to model error σ 2 M D , a mean prediction for the output of the model E[y(θ)] and a predictive covariance of the model output Σ(θ) (the latter two are provided by an emulator); then , : the implausibility of any given parameter setting θ can be calculated as: where I is the identity matrix.Implausibility is similar to mean square error, but each grid cell is weighted according to its uncertainty (which has components due to observation error, structural ::::: model error and emulator variance).A larger implausibility indicates a greater mismatch between the observations and the output from the land surface model (indicating that the parameter setting, θ, that can be ruled out).A conservative threshold for rejecting a parameter setting can be taken as the 99.5% quantile of a ::: the χ 2 distribution with l degrees of freedom (where l is the dimension of the observation) (Vernon et al., 2010).
To demonstrate history matching for JULES, we consider a small subset of our grid cells (1000 points chosen to maintain a good coverage of the forcing data), and we randomly sample a time point for each coordinate.We then use the observed GPP data from MODIS (more details in the appendix ::::: details :: in ::::::::: Appendix :: D) for these locations and times, taking the mean of the two observations as the ::: two ::::::: separate ::::::::::: observations :: as :::: 'the' : observation.We take the observational error standard deviation , σ obs , as the standard deviation between the two individual observations plus an additional, conservative, 20% of the mean value (because of the inaccuracy of standard deviation estimates : a ::::::: standard :::::::: deviation ::::::: estimate obtained from only 2 samples ).
The map on the left in Figure 2 was produced using a non-implausible parameter setting, and the map on the right used an implausible parameter setting.There is a clear difference between the two maps.Spatial gradients in the not ruled-out yet parameter setting are much more gentle; with no large extremes in GPP, but still a distinct spatial pattern (for example, lower GPP in the Scottish highlands, and a decrease in GPP going from west to east and corresponding to rainfall gradients).The GPP map corresponding to the ruled-out parameter setting appears very extreme; predicting larger changes over relatively small areas, as well as generally very large values for GPP, with the extremes > 15gC/m 2 /day 3 .
As an illustrative example of the iterative tuning process; Figure 4 presents different possible outputs ::::::::: trajectories : from different possible parameter settings.

Discussion
We :: In :::: this ::::: work, ::: we : have outlined a framework for emulating land surface models using sparse Gaussian processes.This framework takes into account a unique feature of many land surface models, incorporating the information contained within the set of external forcing data.Under this framework, a substantially better coverage of the input parameters can be obtained in the initial simulation ensembles, without additional simulation effort and without compromising analytical capabilities.The use of sparse GPs for emulating a land surface model is the first we are aware of and is natural here where the ensembles are made artificially large, relative to standard computer experiments.The resulting land surface emulators are incredibly flexible, facilitating fast prediction, parameter tuning and sensitivity analysis, among other experimental goals.Specific modifications were made to build an emulator for JULES, but the overall procedure can remain the same for many LSMs.Every :::::::: However, :::: every : land surface model is differenthowever, and so various modifications to the procedure could be made , ::::: should ::: be ::::: made depending on the different :::::: specific priorities and interests of the modeller.
For ::::: Along :::: with ::::::::: extensions ::: to ::::::: practical :::: use, :::: there ::: are :: a ::: few ::::::::::::: methodological ::::::::: extensions ::: to ::: this ::::: work ::: that :::: can :: be :::::::::: envisioned.for :::::::: potential ::::::: extreme, ::::::: unseen, :::::::: scenarios.: :::: Also, ::: for : the outlined emulator framework, the modification wherein the high dimensional spatial-temporal output is converted to a large set of 1D outputs allows the information in the forcing data to be readily incorporated and the dimension of the output to be shrunk.However, removing the time structure (instead of only the spatial) is, in many cases, a simplification rather than an assumption.Re-adding the time structure may be an interesting direction for future work.This could be via a dynamic emulator as discussed previously, by using dimension reduction techniques over the time dimension, or by including previous time steps of the driving data W (t−i)s as inputs to the emulator.The current modifications explained in Section 2.2 seem to provide an acceptably accurate emulator.Although the 95% certainty intervals produced by the PFT-specific emulators for JULES contain the truth :::::: roughly : 95% of the time, the emulators are ::: can ::: still ::: be considerably erroneous on very rare occasions.Also, even though GPP is always greater than 0, sometimes the emulators will predict GPP to be less than 0 (which is easily rectified by converting any negative predictions to 0, but this does distort the uncertainty intervals when GPP is predicted small/negative).As such, it should also be noted that, just like a land surface model, an emulator can always be improved further :::: upon.
Regarding JULES itself, the preliminary sensitivity analysis in Section 3.2 identified that, in general, the forcing data has a greater influence on GPP than the parameter settings.This is perhaps obvious, as the amount of plant activity depends heavily on the environment; but this does suggest improving the accuracy of environmental data should be a key priority for practitioners working with JULES.This agrees with the results from McNeall et al. (2020).The sensitivity analysis performed in Section 3.2 was only a preliminary, automatic, sensitivity analysis.A more comprehensive sensitivity analysis using this emulator (perhaps in the format of Oakley and O'Hagan (2004)) is left for future work.
Similarly, while the above ::::::: obtained tuning for JULES ensured that the remaining parameter settings match well with observed GPP, this does not necessarily mean that the parameters will match well with other outputs of JULES.Emulating and matching multiple distinct outputs is an interesting avenue for future experimentation with JULES, especially when it is entirely possible that certain parameter settings will be good for one output, but mutually exclusive parameter settings will be good for a different output (this effect is observed by McNeall et al. (2020)).This would mean that no parameter settings will provide an overall good match (the so-called terminal case (Salter et al., 2019)), and suggests the degree of model discrepancy is larger than initially thought.Discovering which observations a land surface model can and cannot reproduce, and which parameter settings are better and which are worse, can be useful information for quantifying and rectifying the flaws in a land surface model.
As an additional note regarding tuning: reducing the set of non-implausible parameter settings does not necessarily impose any individual bounds on the various parameters.For example, if there were only 2 parameters, both promoting plant growth, then it is entirely reasonable to believe a large value for the first and a small value for the second could match with reality, while believing that a small value in the first and a large in the second could also match with reality.What would be important here, is to rule-out small values for both and large values for both.With 53 parameters, rather than just 2, this situation is essentially guaranteed.
where the second term is the KL divergence between prior p(u) and q(u).The first term can be expressed as a sum of univariate expectations of individual data points so that As such, during optimisation of L for the various parameters (m, S, Z and the covariance hyperparameters), the data can be sub-sampled at each iteration (still obtaining an unbiased estimator for L).These smaller sub-samples are called minibatches, and provide a second computational speed-up (Hensman et al., 2013).
The above ::::: These two scores together constitute a type of 'maximin Latin hypercube score'.Normally this would be sufficient, but as mentioned above ::::: before, we have had to work with the historical means Ŵs rather than the actual inputs W ts .The intra-cell variation does still matter for the emulator, as a grid cell which has a very variable climate is more useful -providing more information into how JULES reacts to different forcing values.As such, our third score, C, is is simply the difference between the historical 90 th percentile and the historical 10 th percentile (summed over each ::: grid :::: cell :: in ::: the ::::::: potential ::: set :: of :::: grid :::: cells :::: Ŵts , ::::::: summed :::: over :::: each : forcing dimension) -providing a measure of how variable that grid cell is 5 .
To select a set of grid cells to run JULES for, we then combine these individual scores as: 6 A set of grid cells with a higher combined score is a better set of grid cells to run.We randomly obtain many potential sets of grid cells and score each of them, the highest scoring set of grid cells is then the one chosen.The λ i in the above ::: this : equation are weights to ensure the three component scores are on the same scale, and equal 1/(max(i) − min(i)).
To choose the set of parameters that go with the grid cells, many Latin hypercubes :: (of ::::::::: dimension ::: 53) : were generated (which is possible because the tuning parameters θ can take any value in their domain), and the combined design ( Ŵs , θ) with the maximum minimum distance was chosen.
For specifics; more :::: More : potential designs can be obtained if these schemes are run for a longer period of time, thus providing a better final result.The wave 1 grid cell selection scheme, the wave 1 θ selection scheme and the calibration data grid cell selection scheme were all ran for 5 hours.For the wave 2 design the grid cell selection scheme and the θ selection scheme were each ::::::: schemes :::: were ::: all ran for 1 hour.These times were certainly overkill :: are :::::::: certainly :::::::: excessive, especially given the large number of JULES runs obtained, and a much shorter run time would have been sufficient.

Appendix C: Appendix -JULES Configuration
The JULES configuration used in this study is JULES version 5.6, based off Blyth et al. (2019).:::::::::::::: Blyth et al. (2019) : .Full details are provided in that paper, but here we provide relevant background information for interpreting the emulator results.
We also note any changes from Blyth et al. (2019).:::::::::::::: Blyth et al. (2019) : .The 5 PFTs used were chosen for their relevance to UK landscapes.Three non-vegetated surfaces can also be present in each grid cell: lakes, urban, and bare soil.The fractional coverage of these eight tiles is prescribed in each grid cell based on the CEH Land Cover Map 2000 (Fuller et al. 2002).
The model is run at a half-hour time step.The daily driving meteorology is interpolated to half-hourly using a disaggregation scheme (Williams and Clark, 2014) ::::::::::::::: (Blyth et al., 2019).Model fluxes such as GPP, NPP, respiration, latent heat, sensible heat, and runoff are calculated every half hour.At the end of each day, a phenology scheme updates leaf area index based on temperature functions of leaf growth and senescence.Every ten days, the dynamic vegetation component of JULES (TRIFFID) updates vegetation and soil carbon stores.Competition between vegetation types is turned off in this configuration of the model.

Table 3 .
A table of the accuracy rates of each of the 5 pft specific emulators.A perfect emulator would have an accuracy rate of 95.4%; larger values imply under-confidence, smaller values imply over-confidence.
Estimated length scales from the Gaussian process emulators; a smaller value implies a stronger relationship between the input and GPP.Orange points represent tuning parameters and blue dots represent environmental data.Descriptions for the tuning parameter abbreviations can be found in Table1, and descriptions for the environmental data can be found in Appendix C (with 1 and 2 representing an upper and lower soil layer) ::: can ::: be :::: Table:: 2.