Description and Evaluation of the JULES-ES setup for ISIMIP2b

. Global studies of climate change impacts that use future climate model projections also require projections of land 10 surface changes. Simulated land surface performance in Earth System models is often affected by the atmospheric models’ climate biases, leading to errors in land surface projections. Here we run the Joint UK Land Environment Simulator Earth System configuration (JULES-ES) land surface model with Inter Sectoral Impacts Model Intercomparison Project 2 nd phase future projections (ISIMIP2b) bias-corrected climate model data from 4 global climate models (GCMs). The bias correction reduces the impact of the climate biases present in individual models. We evaluate JULES-ES performance against present-15 day observations to demonstrate its usefulness for providing required information for impacts such as fire and river flow. We include a standard JULES-ES configuration without fire as a contribution to ISIMIP2b and JULES-ES with fire as a potential future development. Simulations for gross primary productivity (GPP), evapotranspiration (ET) and albedo compare well against observations. Including fire improves the simulations, especially for ET and albedo and vegetation distribution, with some degradation in shrub cover and river flow. This configuration represents some of the most current earth system science 20 for land surface modelling. The suite associated with this configuration provides a basis for past and future phases of ISIMIP, providing a simulation setup, postprocessing and initial evaluation using the International Land Model Benchmarking (ILAMB) project. This suite ensures that it is as

cies. ISIMIP provides a consistent framework for assessing impacts, using a large ensemble of impacts models across various sectors (Warszawski et al., , 2014. ISIMIP has recently completed its second phase, having more than 60 modelling groups contributing simulations to ISIMIP2a (reanalysis-driven hindcasts) and ISIMIP2b (bias-corrected global-climate-model-driven historical and future scenarios). We present the JULES-ES configuration and experimental set-up that has contributed to ISIMIP2b and will be the basis for further development in subsequent ISIMIP phases. An advantage of using JULES-ES as an offline impact model is that it uses a prescribed atmosphere; without these feedbacks, JULES-ES is 10 000-20 000 times more computationally efficient than the closely aligned land surface scheme coupled to the atmosphere in UKESM1 (Sellar et al., 2019), and, in using a multi-model climate ensemble, it samples scientific uncertainty in our understanding of the climate system that would not be possible within a single climate model framework.
This paper briefly describes the changes to JULES-GL7  that form the JULES-ES configuration, the ISIMIP set-up and an evaluation of the arising simulations. JULES-ES has been widely evaluated and applied for global biogeochemical modelling (Sellar et al., 2019;Slevin et al., 2017), including in the Global Carbon Budget (Friedlingstein et al., 2020). Here we focus on using JULES for impact applications. Alongside this work, we provide a suite to run JULES-ES, following the ISIMIP2b modelling protocol  for tailored impact projections that are consistent across sectors such as water and biomes. The suite includes the code to postprocess output into ISIMIP formatted netCDF output and run the International Land Model Benchmarking (ILAMB) system to allow quick evaluation (see Sect. S1). Data from the JULES-ES ISIMIP2b suite have been submitted to the biomes and water ISIMIP2b sectors, are available via the ISIMIP model archive (https://data.isimip.org/, last access: 27 June 2023) and provide simulations of the historical and future land surface in the Popescu et al. (2022) wildfire report. The JULES ISIMIP2b simulations with fire provide the basis for the contribution of JULES to the next Fire Model Intercomparison Project (FireMIP), which will use the ISIMIP3 set-up. The historical simulations and their evaluation are shown in Sect. 3, with the discussion and conclusions in Sects. 4 and 5, respectively.

JULES-ES configuration
To better represent the variation in plant traits and managed land, we extend the standard representation of 5 plant functional types (PFTs) to 13 in JULES-ES, building on , with 4 managed and 9 natural PFTs. Natural PFTs are extended by splitting trees into deciduous and evergreen types and then distinguishing between temperate and tropical broadleaf evergreen trees. These additional PFTs represent a wider range of leaf lifespans and metabolic capacities. Evergreen trees typically have less access to nutrients, higher leaf mass per unit area, longer lifespans and lower carbon assimilation and respiration rates, whereas a deciduous PFT typically has leaves with a higher nutrient concentration, shorter lifespan and lower leaf mass per unit area. Tropical broadleaf evergreen trees have lower maximum carbon assimilation rates than temperate trees. The nine natural PFTs used are tropical broadleaf evergreen trees (BET-Tr), temperate broadleaf evergreen trees (BET-Te), broadleaf deciduous trees (BDT), needleleaf evergreen trees (NET), needleleaf deciduous trees (NDT), C 3 grasses (C3G), C 4 grasses (C4G), evergreen shrubs (ESh) and deciduous shrubs (DSh). Harper et al. (2016) also updated several parameters required for calculating photosynthesis and respiration using the TRY database (Kattge et al., 2011). They also reduced the bias in model simulations by tuning parameters relating to leaf dark respiration, canopy radiation, canopy nitrogen, stomatal conductance, root depth, and temperature sensitivities of the maximum carboxylation rate of RuBisCO (Vcmax) based on available observations. The four managed PFTs are C3 and C4 crop and pasture (C3Cr, C4Cr, C3Pa and C4Pa) PFTs and are functionally similar to natural grasses, but, in the case of crops, are assumed not to be nitrogen limited, and litter carbon is removed as a simple representation of crop harvest (Sellar et al., 2019;. This simple fertilisation of crops through lack of nitrogen limitation does not make a distinction between mineral and manure fertiliser. Both crop and pasture surface types undergo land use change according to externally forced time-varying land use, but the pasture is not grazed and is otherwise unmanaged. Within the respective crop and pasture fraction, only the C 3 and C 4 crop and pasture PFTs are allowed to grow, with the area of each determined by the TRIFFID dynamic vegetation module . Outside of the managed land area, the nine natural PFTs (including natural C 3 grasses and C 4 grasses) can grow in the remainder of the grid box once the non-vegetated surfaces have been accounted for (urban, ice and lakes, represented by corresponding non-vegetation surface types, and ocean, which is not simulated). As the net prescribed crop or pasture fraction increases with land use change, natural vegetation is removed from the portion of the grid box into which agriculture has expanded, representing anthropogenic land clearance. Conversely, when crop and pasture areas are reduced, the natural PFTs are allowed to recolonise the vacated grid box fraction. The vacated grid box fraction is initially bare soil, and the existing natural PFTs gradually expand their coverage into this area; the rate of expansion is determined by the TRIFFID vegetation dynamics scheme and will follow a succession of faster-growing grass PFTs, followed by shrubs and then trees. Bare soil occupies any remaining space, once the vegetation dynamics have been simulated. Simple representations of fertilisation and harvesting are applied to the crop PFTs, but otherwise, these are physiologically identical to the natural grasses. After accounting for land use, the fractional coverage and biomass of each PFT within a grid box is determined by the TRIFFID dynamic vegetation model. Inter-PFT competition is based on vegetation height, with the taller vegetation shading and therefore dominating other PFTs .
The other major change introduced in JULES-ES is a representation of nitrogen and nutrient limitation effects on ecosystem carbon assimilation. The nitrogen component of JULES is described in Wiltshire et al. (2021). In brief, JULES-ES represents all the key terrestrial nitrogen processes. Inputs to the land surface are via biological fixation, fertilisation and nitrogen deposition, with losses from the land surface occurring via leaching and gas loss, with nitrogen deposition being externally provided to the model. JULES simulates a nitrogen-limited ecosystem by reducing the carbon use efficiency if there is insufficient available nitrogen to satisfy plant nitrogen demand. This results in a reduced net primary production (NPP) when we include nitrogen limitation. The soil biogeochemistry is based on the representation of the four-pool RothC soil carbon , consisting of decomposable plant material (DPM), resistant plant material (RPM), microbial biomass (BIO), and humus (HUM). For each soil carbon pool, there is an equivalent soil nitrogen pool (Wiltshire et al., 2021). Nitrogen transfers between the organic and inorganic nitrogen pools depend on decomposition rates and the carbon to nitrogen ratio of the organic pool.
Another important change is the inclusion of a fire module. Fire is simulated in JULES by the fire model INFERNO (INteractive Fire and Emission algoRithm for Natural envirOnments; Mangeon et al., 2016). Burnt area is calculated from flammability and ignitions. Temperature, saturation vapour pressure, relative humidity and precipitation, together with the soil moisture and fuel load from JULES, give flammability by PFT and the prescribed population density  gives human ignitions, while the prescribed climatological lightning gives natural ignitions. Previous evaluation of JULES-INFERNO, which compared model output against observational constraints, showed that the burnt area is relatively insensitive to changes in lightning Burton et al., 2022). Here we use IN-FERNO coupled to the dynamic vegetation model TRIFFID , enabling carbon cycle feedbacks from fire onto the land surface via vegetation mortality, regrowth and burnt litter fluxes. Recent updates to INFERNO allow fire mortality to vary by PFT, and updates to the representation of land use and PFTs in JULES allow for reduced burning in C 3 crop and C 4 crop PFTs (Burton et al., 2020), based on global trends of agricultural fire suppression (Bistinas et al., 2014;Andela et al., 2017). Background mortality rates were reduced compared to the model without fire to account for the extra vegetation mortality from fire, as per . We use fire and background mortality in Burton et al. (2020). C 3 pasture and C 4 pasture burn at the same rate as natural grasses, as per Burton et al. (2019), which reflects observational constraints that show an increase in burnt area for pasture areas Bistinas et al., 2014).

Modifications for ISIMIP2b
In JULES with dynamical vegetation switched on, the TRIF-FID period is a number (in days) that defines the frequency at which the TRIFFID dynamic vegetation code is called. In the ISIMIP JULES-ES configuration, it has been reduced from a 10 to a 1 d period to allow for shorter restart periods, which were necessary to meet the diagnostic requirements of ISIMIP (a large number of variables on short temporal scales). A daily TRIFFID period also allows the vegetation dynamics to respond more realistically to variations in plant productivity on shorter timescales, although the effect of this change is minimal in test historical runs.
Another key difference to the standard set-up of JULES is the use of daily meteorological driving data. JULES needs a model time step of no more than 1 h to accurately simulate the diurnal cycle and exchange of heat, water and momentum and to avoid numerical instabilities. In the ISIMIP experimental set-up, we use the internal disaggregation (Williams and Clark, 2014) to calculate driving data values at the model time step of 1 h. The method uses the IMOGEN model disaggregation (Huntingford et al., 2010) to initially disaggregate to every 3 h, which the model linearly interpolates to 1 h. The diurnal cycle of downward shortwave radiation is calculated from the position of the Sun in the sky. Temperature is calculated from a sinusoidal function, with a maximum 0.15 of a day's length after local noon and normalised by the diurnal temperature range. Downward longwave radiation is a linear function of temperature, and specific humidity is kept below saturation at each time step. Precipitation is considered to occur in a single event, with a globally specified duration parameter (6 h for convective rainfall, 1 h for large-scale rainfall and convective snowfall and large-scale snowfall). Convective rainfall occurs when temperature exceeds 288.15 K. The model assumes convective rainfall is more intense and so leads to more runoff and less infiltration into the soil. Given that precipitation events do not, by construction, overlap with midnight (GMT) on average, this produces a spurious trapezoidal diurnal cycle, which is zero at midnight (GMT; Williams and Clark, 2014). Precipitation above 350 mm d −1 is redistributed. Note that convective precipitation occurs only on a fraction of the grid box , set to 30 % in the ISIMIP2b runs and, within this fraction, is modelled as a negative exponential distribution (Dolman and Gregory, 1992). Therefore, the grid box average intensity is not the same as the effective intensity at a point. Given the strong effect of intensity on canopy intercep-tion and runoff, the water cycle in the model is sensitive to the duration parameter choices (Williams and Clark, 2014).

ISIMIP2b protocol
The ISIMIP2b experiments focus on understanding different levels of mitigation. They use two Representative Concentration Pathway (RCP) scenarios to explore the international commitments made under the Paris Agreement to stabilise global warming at well below 2 • C, relative to preindustrial mean temperatures. ISIMIP2b uses simulations from the Coupled Model Intercomparison Project 5 (CMIP5), using an historical scenario  and the RCP2.6 and RCP6.0 concentration pathways (post-2005) to represent a higher ambition, lower temperature outcome and a low ambition pathway, respectively (Riahi et al., 2017). Land use data and population density are based on the Shared Socioeconomic Pathway (SSP2) scenario and applied to RCP2.6 and RCP6.0 simulations. Lightning ignitions for fire are from the LIS/OTD version 2.3.2015 climatology (Cecil, 2006) and applied for the historical reference period and the future, as per Rabin et al. (2017) and Hantson et al. (2020). To capture a range of climate sensitivities, the following four CMIP5 global climate model (GCM) driving models are chosen: GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR and MIOC5. The GCM driving data are global, bias-corrected, daily data at 0.5 • resolution (Hempel et al., 2013;Lange, 2018). Bias correction means we can compare the observed and simulated impacts during the historical reference period, with a smooth transition into the future period. The bias-correction methodology adjusts multiyear monthly mean distribution throughout the historical and RCP periods, based on comparison of GCM output against EWEMBI reanalysis data distributions (Dee et al., 2011), for the period 1979 to 2013, using CMIP5 RCP8.5 post-2005 (the end of the CMIP5 historic period), such that trends and interannual variability are preserved in absolute and relative terms for temperature and non-negative variables, respectively (Lange, 2018). EWEMBI combines data from multiple reanalysis sources to cover all the required variables. The variables bias-corrected for ISIMIP2b are listed in Table 1, as reproduced from Frieler et al. (2017). The ISIMIP2b bias correction includes humidity in addition to shortwave and longwave radiation, using quantile mapping. Transfer functions are used to adjust the distributions of daily anomalies from monthly mean values. ISIMIP2b bias-correction methods adjust distributions independently for each variable, grid cell and month, preserving the statistical dependencies between variables in space and changes over time. The biascorrection approach preserves the trends (and therefore sensitivities) from different GCMs but removes absolute biases found over the reference period from the historical and RCP periods. Each GCM therefore has a different variability and simulated climate outside of the reference period but with a smooth transition going from the historical period into the reference period or from the reference period into the future. Some small biases remain after bias correction, particularly in precipitation (Fig. S1), where biases are a small fraction of the total local rainfall but can affect precipitation particularly in the South America (Fig. S2). As part of the set-up provided here, we include code for preparing JULES data for submitting to ISIMIP and ensuring that it conforms to the strict protocols (see Sect. S1) and the ILAMB system for rapid evaluation of the simulations (see Sect. S2).

Model evaluation
We evaluate the model for key impacts sectors and use the International Land Model Benchmarking (ILAMB) tool (Collier et al., 2018) to assess model performance for gross primary productivity (GPP), evapotranspiration (ET), runoff and albedo. ILAMB evaluates performance against observations from remote sensing, reanalysis data and FLUXNET site measurements and produces graphical and statistical scores of model results. The model values and the observation datasets we use for the evaluation are given in Table 1. As ILAMB does not include vegetation cover evaluation, we also include the Manhattan metric (Kelley et al., 2013) comparison with ESA CCI (European Space Agency Climate Change Initiative) land cover for tree, shrub, wood and grass cover (Harper et al., 2022). Results and further details of the ILAMB and vegetation cover analysis are provided in Sect. S2. We evaluate the historical simulations separately for each GCM because the bias correction preserves interannual differences between GCMs. We conduct an evaluation over common time periods between observations and simulation, using the historical period and, for observational periods beyond the end of 2006, RCP6.0.

The experimental suite
To run JULES, we collect all the tasks and input files that are needed into what is known as a suite; this allow runs to be reproduced using exactly the same applications, options and commands as previously run, possibly by another user, scheduling them to run based on the dependencies between the tasks. Full instructions on how to run JULES in a suite are provided on GitHub (https://jules-lsm.github.io/tutorial/ bg_info/tutorial_julesrose/jr_structures.html#jrsuite, last access: 27 June 2023). We provide a full set-up for running the ISIMIP2b simulations using JULES-ES in the form of the suite u-cc669, which is available via the Met Office Science Repository Service (MOSRS; https://code.metoffice.gov.uk/ trac/roses-u, last access: 27 June 2023; see the data availability section for more information). The bias-corrected driving data are available from ISIMIP at https://www.isimip. org/gettingstarted/input-data-bias-adjustment/ (last access: 27 June 2023). We also use the following datasets from ISIMIP. In situations in which the preprocessing of these  (Harper et al., 2022) datasets requires the use of these data within JULES, this preprocessing code is also part of the suite.
-CO 2 concentration -Future land use patterns -Nitrogen deposition -Land-sea mask The Total Runoff Integrating Pathways (TRIP) river-routing model allows JULES to collect and route water through river channels, essentially converting runoff to river discharge or river flow. A TRIP 0.5 • river-routing ancillary file is also required for these runs, which is available from http://hydro. iis.u-tokyo.ac.jp/~taikan/TRIPDATA/ (last access: 27 June 2023).

Water
We evaluated the water cycle using runoff derived from Dai (2021) for the 50 largest river catchments. Estimated observed mean basin runoff combines river flow measurements at downstream gauge stations with a river flow model to estimate the flow at the river mouth. By assuming that there are no losses from the river, we calculated the long-term mean of the basin-averaged runoff by dividing the river flow at the river mouth by the basin area.
Particularly in temperate regions north and south of the Equator, simulations using all four sets of driving data show similar biases. However, differences between the driving datasets are the greatest in tropical or sub-tropical river catchments (Fig. 1). This is particularly evident in the Amazon basin, where mean runoff biases range from approximately 0 in the simulation driven by HADGEM2-ES to more than −0.6 mm d −1 in the simulation driven by IPSL-CM5A-LR. Strong variations between the simulations are also seen in the Brahmaputra basin, with some smaller variations in the Changjiang basin (Fig. 2). The general underestimate of runoff in the higher latitudes may be due to the treatment of moisture infiltration into partially frozen soils (see below) but could also be caused by biases in precipitation estimates due to the gauge undercatch of snow (Adam and Lettenmaier, 2003). In arid and semi-arid basins, river flow and runoff tend to be overestimated, which could be due to missing processes such as river channel evaporation and transmission losses (Haddeland et al., 2011;Döll and Siebert, 2002) and anthropogenic water extraction, which is primarily irrigation (Richey et al., 2015). Figure 2 compares the long-term monthly mean river flow (1980-2014 inclusive, where observations are available) over the six largest rivers to those of the downstream observations in Dai (2021). All the simulations reproduce the overall sea-  sonal river flow over the Amazon. After the wet season, the modelled river flows decline earlier than observed, and the simulated river flows at low water are too low. This could be due to too much evaporation in the drier months (see JJA in Fig. S3) and/or the simulated speed of flow through the soil and/or river channel being too fast, although GFDL-ESM2M and IPSL-CM5A-LR driving precipitation data also have a dry bias during the dry season (Fig. S2). All simulations overestimate river flow over the Democratic Republic of the Congo (hereafter abbreviated as Congo), mainly due to overestimates in the rainy months. This could be driven by too little evaporation from the vegetation canopy or from flooded areas (see ET in Figs. 3 and S3). The simulations reproduce the seasonal river flow over the Orinoco well. The timings of peak river flow for the Changjiang and Brahmaputra are well simulated, although the amplitudes are too low. Dams, which we do not model, are likely to affect the observed seasonal cycle. The simulated river flow over the Mississippi lags observations by several months. This lag is also evident over many high-latitude basins (not shown). The observed river flow peak is mainly driven by spring melt, whereas the simulated river flow peak is in line with that of precipitation. This is due to this configuration allowing significant soil infiltration of snowmelt when the soil surface is mainly frozen, rather than resulting in surface runoff. This may also result in the underestimate of annual river flow because once the water has infiltrated the soil then it may then be evaporated.

Surface fluxes
Global gross primary productivity (GPP) is 134-137 PgC yr −1 , depending on driving data, which is above the estimate of IPCC AR6 (Canadell et al., 2021) of 113 PgC yr −1 but agrees well with the estimates of 146 ± 21 PgC yr −1 in Cheng et al. (2017 ; Table S1). Net biome productivity (NBP) is 0.94-1.46 PgC yr −1 between 2011-2020, within the 1.0-2.8 PgC yr −1 range estimated by the Global Carbon Budget (Friedlingstein et al., 2022). All simulations show positive GPP biases in similar regions, such as central and southern Africa, south of the Himalaya and east towards Bangladesh and Myanmar, compared to the observations in Jung et al. (2011; Fig. 3). South America is a more complicated picture, with Brazil broadly split between negative GPP to the northeast and positive GPP to the country's southeast. For Brazil, the ET bias has a more northwest-southeast split, with the northwest having a slightly negative bias and the southeast more positive. The northwest bias in ET and the bias in GPP in South America is more prominent and widespread in early wet season (September-November) when driven by climate data from GFDL-ESM2M and IPSL-CM5A-LR (see Fig. S4) and is due to a longer dry season in both sets of driving data (Fig. S2), with rains starting in October rather than September. The areas in the far north of Columbia, Bolivia and Argentina also have a negative bias in both GPP and ET across the simulations. Australia also has a north-south split, with a slightly positive ET bias to the north and the inverse to the south. Albedo ( Fig. 3; right column) generally shows a positive bias across most regions and simulations. However, there are small regions with a negative bias, for example, south of the Sahel and small regions at higher northern latitudes. Eastern Siberia has a positive bias in all simulations (see Sect. 4).

Biomes
All simulations show similar vegetation cover patterns that largely follow observations, capturing high tree cover fractions in boreal and tropical forests, grass cover in tropical, temperate and boreal grasslands and bare ground in arid regions (Fig. 4). There are, however, some biases common to all simulations. The tree cover fraction is too high globally, with a simulated range of 4.97-5.31 Mkm 2 , which is higher than the observations and depends on driving data (Table S2). Shrub cover and grasses dominate eastern Siberian taiga in the model instead of the observed high tree cover (Fig. 4). There is also slightly too much shrub cover in tropical forests at the expense of tree cover, contributing to a global bias in shrub cover of 1.22-1.31 Mkm 2 . Conversely, simulated tropical tree cover is too high in savanna regions, giving the impression of more continuous and fewer fragmented forests across the tropics (Fig. 4). Boundaries between temperate and warm temperate forests and tropical forests are too sharp, suggesting that JULES-ES does not capture processes in temperate woodland transition. Savanna and grasslands tend to be too narrow, with more bare soil in the models in semi-arid regions such as southern Africa, the Mojave Desert and the Sahel.

Fire
In addition to the simulations without fire submitted to the archive, we performed additional simulations with fire and fire feedbacks switched on. These fire simulations provide burnt area and alter vegetation cover, carbon, fluxes, albedo and runoff . Burnt area is similar across all four simulations (Fig. 5). The model simulates the present-day burnt area well compared to satellite observations, with the global total burnt area average for 2000-2020 observed by MODIS CCI v5.1 as being 4.55 Mkm 2 , and the model simulating between 3.94 and 4.43 Mkm 2 , depending on the driving data (Table S3). The model captures the high burnt area in southern hemispheric Africa -a common area of low bias in global fire models (Hantson et al., 2020). The model also performs better than other FireMIP models at simulating the high burnt areas in northern hemispheric Africa, though fire is still lower than in the observations. This is partly due to very low simulated burnt areas in Nigeria's Guinean savanna, which Kelley et al. (2019) show  (Jung et al., 2011), GLEAM for ET (Miralles et al., 2011) and GEWEX SRB for albedo (Stackhouse et al., 2011). is a consequence of global parameterisations of population density and agricultural drivers of burnt area.
We also show an area in Australia that is burnt too little, which is a common problem across fire models (Hantson et al., 2020), even in models optimised to burnt area observations Bistinas et al., 2014). This may be due to the unique fire ecology in northern Australia (Kelley, 2014) and to high uncertainty in observations of burnt area (Giglio et al., 2010). High burning in South America occurs in areas where cropland fragmentation reduces the burnt area beyond the extent of agricultural areas Andela et al., 2017), which is hard to reproduce in tile-and PFT-based models (Hantson et al., 2016). We also simulate an area that is burnt too little in Eurasia. Some of the observed burnt areas at these high latitudes come from peatland fires, which, as in most other fire models (Rabin et al., 2017), are not simulated in INFERNO. Observational burnt area products tend to underestimate the levels of burning in forests (Randerson et al., 2012), which may explain the slight bias towards burnt areas that are simulated to be too high in tropical forests.
Fire is simulated well in savanna bands (15 • N and 15 • S), which improves the representation of tree cover by reduc-ing the positive bias compared to observations ( Fig. 6; Table S2). The inclusion of interactive fire has the overall effect of decreasing simulated global tree cover (from 38.66-39, depending on driving data to 34.32-35.65 Mkm 2 ) to be more in line with the observations (34.86 Mkm 2 ) and increasing grasses and bare soil. In the tropics, this tends to bring the modelled forest (areas dominated by trees) to being more in line with the observations, with high simulated tree cover restricted to observed forested areas. However, including fire does reduce tree cover in savanna areas (Fig. S5), particularly in Africa, which means the spatial distribution of trees in this region compared with observations is not as good as without fire (Table S2). There is an increase and improvement in tree cover in some high-latitude North American areas due to changes in background mortality in the with-fire simulation. Fire reduces shrub cover to well below observations (Table S2), though given the well-documented issues in distinguishing tall and short woody vegetation (Gerard et al., 2017;Adzhar et al., 2022), it is probably more meaningful to assess total woody cover. Here, including fire reduces model bias (from 6.19 to 6.59 to −3.60 to −2.25 Mkm 2 ) and improves the spatial pattern (Table S2). Including fire reduces the global bias of high GPP when compared against  (Harper et al., 2022). The second column is the difference between the observations and the simulated ensemble mean. Top to bottom rows show the tree, shrub, grass and unvegetated (bare) fraction. estimations by the Global Carbon Budget (Friedlingstein et al., 2020), bringing global total down by ∼ 2 PgC yr −1 across all simulations (Table S1). However, fire does slightly degrade the GPP spatial pattern (Table S1). Including fire also reduced global NBP (Table S1) by 0.12-0.38 PgC yr −1 , depending on driving data.
Fire alleviates some of the high bias in ET, improving the model's overall performance (Table S4). Without fire, in semi-arid areas we already overestimate river flow (probably due, in part, to human extraction). The addition of fire lowers ET, thereby increasing river flow bias in semiarid regions, which slightly degrades overall runoff performance (Fig. S6). Fire also improves spatial pattern of albedo (Table S5), though seasonal performance decreases. This is in line with well-documented biases in fire seasonal cycles across all global fire models, which tend to have longerthan-observed fire seasons in tropical savanna, and in humandominated fire regimes, the season timing shifts and is often late compared to observations (Hantson et al., 2016(Hantson et al., , 2020. Previous evaluations of JULES configurations incorporating INFERNO also show these biases (Burton et al., , 2022(Burton et al., , 2023Hantson et al., 2020).

Discussion
We have presented simulations of the JULES-ES land surface model, run according to the ISIMIP2b protocols using biascorrected climate model data from four GCMs for the historical period. The configuration will be used to perform simulations under two future scenarios (RCP2.6 and RCP6.0). The JULES-ES ISIMIP2b configuration simulates the surface fluxes (GPP, ET and albedo) reasonably well. Including fire improves the ET and albedo but not the GPP, which is biased to be high. Including fire in the simulations currently degrades the runoff (Table S6).  . Modelled vegetation cover without fire (left) and with fire (middle) compared to observations from ESA CCI land cover and fire (right). Dark green, light green, light brown and dark brown indicate the tree, shrub, grass and unvegetated fraction of the latitude band. The shaded transition between colours indicates the ensemble range, which is quite narrow, indicating agreement across ensemble members. Black hashing indicates the burnt area, with observations taken from MODIS CCI v5.1 (Chuvieco et al., 2018). In the middle column, the burnt area from the four driving models is shown by hatching at four different angles.
The configurations of JULES can capture the seasonal cycle of many of the largest rivers, although high-latitude rivers and managed rivers are generally not captured as well. Including irrigation and structural hydrological developments, such as dams and reservoirs, would likely improve the simulations of managed rivers. Previously, Falloon et al. (2011) found that GCM precipitation biases contribute to errors in TRIP river flows for some basins in both HadGEM1 and HadCM3. In this study, we use bias-corrected data, which reduces these errors, meaning that differences in JULES-ES results between the driving models are due to differences in inter-seasonal or interannual variability between driving models (e.g. Fig. S2). However, errors in evapotranspiration, runoff generation or other missing processes e.g. snow accumulation and snowmelt processes, could also contribute. Uncertainty in precipitation due to the sparsity of observation networks and the undercatch of solid precipitation for highlatitude (Falloon et al., 2011) and high-altitude rivers, for example, in the Himalaya (Mathison et al., 2015), means that it is difficult to interpret model performance in these basins.
Some basins show the same bias direction in runoff ( Fig. 1) and ET (Fig. 3); it is notable that the Amazon is too dry for both variables and the Nile too wet, whereas we would expect opposing biases if they were from land surface simulation. In these basins, there is a dry and wet bias (respectively) in the driving precipitation data (Fig. S1). HadGEM2-ES and IPSL-CM5A-LR have particularly dry driving data in the Amazon, and this results in the driest runoff and ET in the simulation. This translates to biases in GPP (Fig. 3) and vegetation cover (Fig. 4). So, while ISIMIP bias correction reduces climate model biases compared to those in an Earth system model (see evaluation in Sellar et al., 2019) or when run with a non-bias-corrected climate (Burton et al., 2022), they are not eliminated.
Land cover is an important factor for the surface fluxes. Grassy regions, for example, correlate with the regions of positive ET bias. The annual mean global GPP biases are small, but this is not the case for GPP on a seasonal timescale and over a smaller region such as South America. The albedo and land cover area bias are also closely related. For example, if JULES simulates a larger number of trees than observed, then this may lead to a lower albedo than observed, and vice versa. Conversely, higher grass cover just north of the Sahel corresponds to simulated low-albedo biases. Vegetation impacts on albedo are particularly important at high latitudes, where there is snow cover; for example, the positive albedo bias in eastern Siberia is because JULES simulates too few trees and too much grass there. Grasses are more readily buried below snow than trees, making these areas more reflective (Sellar et al., 2019), which in turn affects the albedo. JULES represents the bending and partial burying of vegetation by snow (Ménard et al., 2014); however, the settings controlling this interaction described in Sellar et al. (2019) have been tuned for the coupled UKESM1 model rather than the standalone JULES model. Eastern Siberia is a vulnerable region, which has experienced increased climaterelated impacts, including heatwaves (Ciavarella et al., 2021) and fires ; it is very likely that climate change will exacerbate these feedbacks by the end of the 21st century (Popescu et al., 2022). Developments by Mercado et al. (2018), which improve the representation of plant acclimation to thermal stress, may improve spatial variations across different vegetation types in JULES.
In general, the simulations with fire improve the spatial distribution of plant functional types and vegetation productivity, particularly for tropical forests, the boundary between forests and savannas, and in North America. Developing JULES to include fire processes will improve simulations for these areas and properly capture the climate impacts on vegetation cover and carbon fluxes. The results show that there are too few trees compared to observations for the western parts of Brazil. The simulations with fire improve tree cover in savanna, which is consistent with the findings of Staver et al. (2011) andLasslop et al. (2016); however, there is still ongoing discussion around how much impact fire really has on tree cover in the savanna compared to other dry disturbances such as wind throw, heat stress and rainfall distribution (Veenendaal et al., 2018;Brovkin et al., 2009).

Conclusions
We have presented a configuration of JULES-ES set-up to run and generate output following the ISIMIP protocols. We provide a suite for running the simulations that includes driving data, ancillaries, postprocessing and a firstlook evaluation (ILAMB) for any phase of ISIMIP. Outputs using this set-up were submitted to the biomes and water ISIMIP2b sectors, and our evaluation helps inform of any difference between JULES-ES and other models participating in ISIMIP2b. The suite also provides a starting point for further JULES-ES developments. We evaluate a set-up with the representation of fire, using the INFERNO fire model in anticipation of ISIMIP3, which will include a fire sector. We show that including fire has an impact on model results, and that it is important to include in simulations of climate impacts. While fire mostly improves model performance, it does degrade certain vegetation distributions (for example, by simulating too little larch forest) and runoff. However, fire has a substantial impact on both ecosystem composition and hydrological processes and should therefore still be included when studying impacts under changing climate and environmental conditions. Therefore, while documentation of the configuration without fire will be useful for anyone using previously submitted results, we recommend using the configuration with fire in any future JULES-ES development. Future work using this configuration and the new phases of ISIMIP will focus on using the full benefit and extent of the ISIMIP ensemble to enable more in-depth exploration of climate impacts, together with the quantification of Earth sys-tem uncertainties, co-benefits of mitigation and adaptation to climate change.
Code and data availability. The JULES-ES for ISIMIP configuration (based on JULES version 5.5) is preserved at https:// code.metoffice.gov.uk/trac/roses-u/browser/b/k/8/8/6 (last access: 27 June 2023; fire off) and https://code.metoffice.gov.uk/trac/ roses-u/browser/c/f/1/3/7 (last access: 27 June 2023; fire on). JULES and associated configurations are freely available for noncommercial research use, as set out in the JULES user terms and conditions (http://jules-lsm.github.io/access_req/JULES_Licence. pdf, last access: 27 June 2023). For a comprehensive guide on how to access, install and run the configurations used in this research, we direct the reader to Appendix A in Wiltshire et al. (2020), which is available at https://gmd.copernicus.org/articles/13/483/ 2020/#section6. Note that in order to view and use the JULES-ES source code, access will be required from the Met Office Science Repository Service (https://code.metoffice.gov.uk/trac/home, last access: 27 June 2023) and is available to those who have signed the JULES user agreement. The easiest way to access the repository is by completing the online form to register at http://jules-lsm.github. io/access_req/JULES_access.html (last access: 27 June 2023).
© British Crown Copyright 2022, the Met Office. All rights reserved. The software is provided by the Met Office to the topical editor at Geoscientific Model Development under the software licence for peer review (use, duplication or disclosure of this code is subject to the restrictions as set forth in the aforementioned software licence for peer review). The software is provided to facilitate the peer review of this paper, "Description and Evaluation of the JULES-ES set-up for ISIMIP2b", and should be used and distributed to authorised persons for this purpose only. The software is extracted from the Unified Model (UM) and JULES trunks, with the revisions of the MOSRS repositories corresponding to the stated version, having passed both science and code reviews according to the UM and JULES working practices.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Review statement. This paper was edited by Sam Rabin and reviewed by two anonymous referees.