Multi-site evaluation of the JULES land surface model using global and local data

This study evaluates the ability of the JULES land surface model (LSM) to simulate photosynthesis using local and global data sets at 12 FLUXNET sites. Model parameters include site-specific (local) values for each flux tower site and the default parameters used in the Hadley Centre Global Environmental Model (HadGEM) climate model. Firstly, gross primary productivity (GPP) estimates from driving JULES with data derived from local site measurements were compared to observations from the FLUXNET network. When using local data, the model is biased with total annual GPP underestimated by 16 % across all sites compared to observations. Secondly, GPP estimates from driving JULES with data derived from global parameter and atmospheric reanalysis (on scales of 100 km or so) were compared to FLUXNET observations. It was found that model performance decreases further, with total annual GPP underestimated by 30 % across all sites compared to observations. When JULES was driven using local parameters and global meteorological data, it was shown that global data could be used in place of FLUXNET data with a 7 % reduction in total annual simulated GPP. Thirdly, the global meteorological data sets, WFDEI and PRINCETON, were compared to local data to find that the WFDEI data set more closely matches the local meteorological measurements (FLUXNET). Finally, the JULES phenology model was tested by comparing results from simulations using the default phenology model to those forced with the remote sensing product MODIS leaf area index (LAI). Forcing the model with daily satellite LAI results in only small improvements in predicted GPP at a small number of sites, compared to using the default phenology model.


Introduction
The atmosphere and biosphere are closely coupled and carbon is transported between the two via the carbon cycle (Cao and Woodward, 1998).Although the carbon cycle is significantly affected by global warming, much still remains to be understood about its behaviour (Schimel, 2007).Atmospheric CO 2 represents only a small amount of carbon in the Earth system with the rest tied up in various reservoirs (Ciais et al., 2013).These reservoirs can be either sources (release more carbon than they absorb) or sinks (absorb more carbon than they release).Sources can be either man-made (combustion of fossil fuels, deforestation) or natural (plant and litter decomposition, soil respiration, ocean release) and sinks include land vegetation, soils, oceans and geological reservoirs, such as deep-sea carbonate sediments and the upper mantle (Ciais et al., 2013).Of the carbon dioxide emitted into the atmosphere from the burning of fossil fuels, roughly half remains in the atmosphere and the rest is absorbed by carbon sinks on land and in the oceans (Le Quéré et al., 2009).
Global warming can affect terrestrial ecosystems in two ways.Firstly, increasing atmospheric CO 2 concentrations have led to an increase in photosynthesis (Beck et al., 2011;Fensholt et al., 2012), which has increased both carbon uptake and storage by terrestrial ecosystems (Norby et al., 2005;Leakey et al., 2009).This is known as CO 2 fertilisation.This increase in atmospheric CO 2 has led to an increase in growing season leaf area index (LAI); see Piao et al. (2006).It also reduces plant transpiration and increases plant water use efficiency through the partial closure of stomata (Warren et al., 2011).Secondly, a warmer climate can accelerate the decomposition of litter and soil organic carbon, and increase plant respiration.
Published by Copernicus Publications on behalf of the European Geosciences Union.a The soil texture fractions (%) are used to compute the soil hydraulic and thermal characteristics.
b At some of the flux tower sites, the precipitation variable was separated into a rainfall rate (kg m −2 s −1 ) and snowfall rate (kg m −2 s −1 ).
Predictions of the future uptake of atmospheric CO 2 by the terrestrial biosphere are uncertain and this uncertainty comes from whether the terrestrial biosphere will continue to be a sink or source for CO 2 .The Coupled Climate-Carbon Cycle Model Intercomparison Project (C4MIP) was the first major study to examine the coupling between climate change and the carbon cycle (Friedlingstein et al., 2006).One of its main conclusions was the reduced efficiency of the earth system, in particular the land carbon sink, to absorb increased anthropogenic CO 2 .However, the magnitude of this effect depended on the model used.
Land surface models (LSMs) are an important component of climate models and simulate the interaction between the atmosphere and terrestrial biosphere.They represent the surface energy and water balance, climate effect of snow and carbon fluxes (Pitman, 2003) and are considered the lower boundary condition for global climate models (GCMs) (Best et al., 2011).GCMs require the carbon, water and energy fluxes between the land surface and atmosphere to be specified.Meteorological data, vegetation and soil characteristics are provided as inputs to LSMs, and using these, LSMs can predict fluxes, such as latent and sensible heat, upward longwave radiation and net ecosystem exchange of CO 2 , which is used to determine global atmospheric CO 2 concentrations.Various LSMs have been designed over the last 40 years to calculate these fluxes (Dai et al., 2003).
The earliest GCMs to include a representation of the land surface based it on the simple "bucket" model.In this model, the soil is assumed to have a fixed water capacity (like a bucket) and at each land grid box and time step, the bucket is filled with precipitation and emptied by evaporation (Carson, 1982).The excess above its capacity is termed runoff.This model does not take vegetation or soil types into account.The second generation of land surface schemes attempted to explicitly represent the effects of vegetation in surface energy balance calculations and include the Biosphere-Atmosphere Transfer Scheme (BATS) (three soil layers and one vegetation layer) (Dickinson, 1986) and the Simple Biosphere (SiB) Model (three soil layers and two vegetation layers) (Sellers et al., 1986).The current generation of models include the biological control of evapotranspiration with biochemical models of leaf photosynthesis linked to the biophysics of stomatal conductance (Farquhar et al., 1980;Bonan, 2008), and can respond to changes in atmospheric CO 2 in a more realistic way.
LSM components are designed using results from research literature, idealised laboratory experiments and observations from limited field campaigns (Stöckli et al., 2008;Williams et al., 2009).This can lead to sources of uncertainty in the parameterisation of processes; and as LSMs become more advanced, there is a need to understand their complexity and accuracy.LSMs can be tested in a variety of ways.Multimodel intercomparison projects provide a measure of how various LSMs behave under controlled conditions (Schaefer et al., 2012;Cadule et al., 2010;Randerson et al., 2009;Dirmeyer et al., 2006;Henderson-Sellers et al., 1996).Parameter perturbation experiments evaluate a single model and numerous simulations are performed where either one parameter is changed at a time within a given range (Knorr, 2000;Knorr and Heimann, 2001;El Maayar et al., 2002) or maximum and minimum values of parameters are used (Hallgren and Pitman, 2000).Recently, in the LSM community, there has been effort to create a more standardised form of model evaluation known as benchmarking, whereby publicly available data sets, at various temporal and spatial resolutions, along with metrics and areas of model performance to be evalu-ated, are used by different modelling groups to test model performance (Abramowitz, 2012;Luo et al., 2012).This has previously been carried out by Abramowitz et al. (2008) and Blyth et al. (2011).Blyth et al. (2011) evaluated JULES at 10 FLUXNET sites, representing a range of biomes and climatic conditions, where model parameter values were taken as if the model was embedded in a GCM, in order to assess the model's ability to predict observed water and carbon fluxes.We extended this work by performing model simulations whereby model parameters (Table 1) were set to observe local site conditions and were then compared to those using global and satellite data.Local site conditions were those relevant to a particular flux tower site and were obtained from the research literature, communications with site Primary Investigator and the Ameriflux data archive.Global data referred to the model parameters taken from data sets used by the global operational version of JULES and meteorological data from global gridded data sets extracted for each flux tower grid box.The satellite data referred to LAI data from the MODerate resolution Imaging Spectroradiometer (MODIS) instrument, aboard NASA's Earth Observing System (EOS) satellites, Terra and Aqua (http://modis.gsfc.nasa.gov).
In this study, we used 12 FLUXNET sites that cover a range of ecosystem types; temperate (6), boreal (2), mediterranean (2) and tropical (2) (Table 2), to investigate differences between using local, global and satellite-derived data sets when performing model simulations with JULES version 3.0 (Clark et al., 2011;Best et al., 2011).In particular, we wished to address the following research questions: -How well does JULES perform when using the best available local meteorological and parameter data sets?Can the model simulate interannual variability?
-How well does JULES perform when using global data?
-Of the global meteorological data sets used in this study, which one compares best to FLUXNET data?
-Are improvements in simulated gross primary productivity (GPP) observed when forcing JULES with daily satellite phenology compared to using the default phenology module?
2 Methods and model

Model description
The Joint UK Land Environment Simulator (JULES) is the land surface scheme of the  (Cox et al., 1999).JULES is a mechanistic model and is able to model such processes as photosynthesis, evapotranspiration, soil and snow physics, and soil microbial activity (Blyth et al., 2011).Each model grid box is composed of nine different surface types, five of which are vegetation, referred to as plant functional types (PFTs) (broadleaf trees, needleleaf trees, C3 (temperate) grass, C4 (tropical) grass and shrubs), and four non-vegetation types (urban, inland water, bare soil and land-ice).Each grid box can be made up of the first eight surface types or is land-ice.For single-point model simulations, as used in this study, each point is treated as a grid box with data such as surface type fractions, soil texture fractions and meteorological data used as input to the model.The surface fluxes of CO 2 associated with photosynthesis are computed on each time step for each PFT using a coupled photosynthesis-stomatal conductance model (Cox et al., 1998).These accumulated carbon fluxes are passed to TRIF-FID (Top-down Representation of Interactive Foliage and Flora Including Dynamics), JULES' dynamic global vegetation model and also its terrestrial carbon cycle component (Cox, 2001).TRIFFID updates the areal coverage, LAI and canopy height for each PFT on a longer time step (usually every 10 days), based on the net carbon available to it and competition with other vegetation types (Cox, 2001).For these model simulations, vegetation competition was disabled, which meant that the PFT fractions for each site were prescribed and did not vary with time.If vegetation competition was switched on during the spin-up process, this would have introduced error into the model simulations due to unrealistic vegetation fractions.
In JULES, phenology is, typically, updated once per day by multiplying the annual maximum LAI by a scaling factor, which is calculated by using temperature-dependent leaf turnover rates (Clark et al., 2011).When calculating GPP, a multi-layer canopy was used for the scaling up of leaflevel photosynthesis to canopy level.The option used took into account the vertical gradient of canopy photosynthetic capacity (decreasing leaf nitrogen from top to bottom of canopy) and included light inhibition of leaf respiration.LAI is calculated for each canopy level (default number is 10), with a maximum LAI prescribed for each PFT.Clark et al. (2011) contains more information on the options available for the calculation of canopy photosynthesis.Two versions of JULES were used in this study.JULES3.0 is the original and publicly available release code of JULES version 3.0.The source code can be downloaded from https://jules.jchmr.org/.In addition, JULES3.0 was modified in order to force it with daily MODIS LAI (JULESmod).The local (stand-alone) and global operational versions of JULES are quite similar.Since UM v8.1 (using JULES v3.0), the JULES code for both have been the same with some exceptions, such as the UM/standalone initialisation code.The science code (e.g.photosynthesis, hydrology and soil processes) remains the same between the two.A more detailed description of JULES can be found in Clark et al. (2011) and Best et al. (2011).

Experimental design
Offline single point simulations of GPP were performed at each of the 12 flux tower sites using various global and local data sets (Table 3).Correct simulation of GPP is important since errors in its calculation can propagate through the model and affect biomass and other flux calculations, such as net ecosystem exchange (NEE) (Schaefer et al., 2012).In JULES, NEE is not a model output and is calculated as total ecosystem respiration minus GPP.The correct representation of leaf level stomatal conductance has an influence on GPP and transpiration.Errors in GPP can also introduce errors into simulated latent and sensible heat fluxes.These study sites (Blyth et al., 2011;Abramowitz et al., 2008, Table 5) were chosen to validate model performance in carbon flux simulation since gap-filled meteorological data, local observations of vegetation and soil characteristics and observed GPP fluxes were available.
One year model simulations were performed and span a range of years due to limited availability of local gap-filled meteorological data, observations of GPP fluxes and vegetation characteristics (Table 2).Prior to performing the model simulations, the soil carbon pools at each site were brought to equilibrium using a 10 year spin-up by cycling 5 year averaged meteorological data (in equilibrium mode), followed by a 1000 year spin-up by cycling observed meteorological data (in dynamical mode).At Tumbarumba, Santarem Km67 and Santarem Km83, 3 year averaged meteorological data was used in the first part of the spin-up process due to limited data availability.More information on model spin-up can be found in Clark et al. (2011).

Data
JULES requires meteorological data at 6-hourly intervals or less in order to drive the model offline.In this study, half-hourly/hourly meteorological data was used for model runs using local data and 3-hourly data for simulations using global data.For offline simulations, the model requires downward short-wave and long-wave radiation (W m −2 ), rainfall and snowfall rate (kg m −2 s −1 ), air temperature (K), wind speed (m s −1 ), surface pressure (Pa) and specific humidity (kg kg −1 ) (Table 1).Gap-filled meteorological forcing data at the local scale was obtained from the FLUXNET network and data at the global scale was obtained from two gridded data sets: WFDEI (Weedon et al., 2014(Weedon et al., , 2011) ) and that developed by Sheffield et al. (2006) (referred to as PRINCE-TON).
Vegetation and soil parameters (Table 1) were adjusted to local or global values depending on the model simulations (Table 3) performed at the 12 flux tower sites.Local vegetation (Tables 5, 6) and soil parameters (not shown) were obtained from the research literature, communications with site Primary Investigator and the Ameriflux data archive.Global vegetation (Tables 5, 6) and soil parameters (not shown) were taken from data sets used in the global operational version of JULES as used in the Hadley Centre Global Environmental Model (HadGEM) climate model.These data sets include the Global Land Cover Characterization (version 2) database (http://edc2.usgs.gov/glcc/glcc.php)(PFT fractions), and the Harmonized World Soil Database (version 1.2) (Nachtergaele et al., 2012) (soil texture fractions).
There are several global LAI data sets available, such as ECOCLIMAP (1992) (Masson et al., 2003), CYCLOPES (1997CYCLOPES ( -2007) ) (Baret et al., 2007), GLOBCARBON (1998GLOBCARBON ( -2003) ) (Deng et al., 2006), MOD15 (2000-present) (Yang et al., 2006) and MISR LAI (2000-present) (Diner et al., 2008;Hu et al., 2007).For the majority of sites used in this study, gap-filled meteorological data and GPP flux observations are only available for the 2000s and therefore, a global data set of satellite LAI was required that covered this period.We used the MODIS LAI product because it is a high spatial and temporal resolution data set with global coverage.

Forcing data
FLUXNET, a "network of regional networks", is a global network of micrometeorological tower sites that measure the exchange of carbon dioxide, water vapour and energy between the biosphere and atmosphere across a range of biomes and timescales (Baldocchi et al., 2001).Data and site information are available online at: http://www.fluxnet.ornl.gov/.Over 500 tower sites are located worldwide on five continents and are used to study a range of vegetation types, such as temperate conifer and broadleaved (deciduous and evergreen) forests, tropical and boreal forests, crops, grasslands, wetlands, and tundra (Baldocchi et al., 2001).
The WATCH Forcing Data (WFD)  was created in the framework of the Water and Global Change (WATCH) project (http://www.eu-watch.org/),which sought to assess the terrestrial water cycle using land surface models and general hydrological models.WFD was derived using the 40 years ECMWF Re-Analysis (ERA-40) for 1958-2001 and data for 1901-1957 was obtained using random years extracted from the ERA-40 data (Weedon et al., 2011).WFD was extended by applying the WFD methodology to the ERA-Interim data for the 1979-2009 period (WFDEI) (Weedon et al., 2014).Within WFD and WFDEI, there are two precipitation products: the first corrected using the Climate Research Unit at the University of East Anglia (CRU) observations; and the second using Global Precipitation Climatology Centre (GPCC) observations.The WFDEI data sets incorporating the GPCC-and CRU-corrected precipitation products are referred to as WFDEI-GPCC and WFDEI-CRU, respectively.WFDEI is only available for land points including Antarctica, and consists of 3-hourly, regularly (latitudelongitude) gridded data at half-degree (0.5 • × 0.5 • ) resolution.This resolution produces a global grid of 360 × 720 grid cells and is equivalent to a surface resolution of about 56 km × 56 km at the Equator and 56 km × 32 km at 55 • N (temperate regions).
The Sheffield et al. (2006) data set (PRINCETON) is a global 60-year meteorological data set for driving land surface models developed by the Land Surface Hydrology Research Group at Princeton University.PRINCETON is only available for land points (excluding Antarctica), and consists of 3-hourly, 1 • resolution, meteorological data for the 1948-2008 period.This data set has a resolution half that of WFDEI with a global grid of 180 × 360 grid cells and is equivalent to a surface resolution of about 111 km × 111 km at the Equator and 111 km × 64 km at 55 • N. The resolution (both spatial and temporal) of the meteorological data can affect the output of land surface and atmospheric chemistry models (Pugh et al., 2013;Ashworth et al., 2010;Ito et al., 2009;Guenther et al., 2006) and may introduce a systematic bias.

Observational data
Local observations of GPP were obtained from the FLUXNET network.Flux tower sites use the eddy covariance method to measure net ecosystem exchange (NEE), which is defined as the net flux of CO 2 , and is separated into GPP and ecosystem respiration with a "flux-partitioning algorithm" (Reichstein et al., 2005).There are a number of approaches used to separate NEE into its two component fluxes, which include extrapolating night-time respiration measurements to the daytime and fitting light-response curves to daytime NEE measurements (Lasslop et al., 2010).In addition to flux-partitioning, the data must also be gap-filled due to unfavourable meteorological conditions and instrument failure (Reichstein et al., 2005).These processes carry with them some uncertainty, which must be quantified.Hagen et al. (2006) found that the uncertainty at the half-hourly timescale was of the order of the observations themselves (i.e.∼ 100 %), but only ∼ 10 % at the annual timescales for a temperate deciduous forest.

Ecological and soil data
The Global Land Cover Characterization (version 2) database, generated by the US Geological Survey, the University of Nebraska-Lincoln, and the European Commission's Joint Research Centre, is a 1 km resolution global land cover data set for use in environmental and modelling research (Loveland et al., 2000).Land cover is classified into 17 categories using the International Geosphere-Biosphere Programme (IGBP) scheme.The land cover category for each of the flux tower sites was extracted from the GLCC database (IGBP code in Table 5).These IGBP codes are then used to derive the annual maximum LAI and canopy height for each PFT from the look-up tables used in the global operational version of JULES.Further information on how these variables are derived can be found in Appendix A.
Global soil texture fractions (% of sand, silt and clay) for each of the 12 FLUXNET sites (not shown here) were extracted from the Harmonized World Soil Database (version 1.2) (HWSD) (Nachtergaele et al., 2012).The equations used to compute soil hydraulic and thermal characteristics were taken from the Unified Model Documentation Paper No. 70 (Jones, 2007).Note that the equations in Jones (2007) apply only to mineral soils, as organic soils behave differently (Gornall et al., 2007).In this study, the soils are classified as mineral at all 12 sites.Since the HWSD contains soil textures for two soil depths (0-30 and 30-100 cm) and JULES contains four soil layers (thicknesses of 0.1, 0.25, 0.65 and 2.0), the 0-30 cm soil textures were assigned to the top two model soil layers (thicknesses 0.1 and 0.25 m, respectively), and the 30-100 cm textures were assigned to the bottom two layers (thicknesses 0.65 and 2.0 m, respectively).The local soil textures are provided as site averages and therefore, each model soil layer (four in total) is assigned the same set of soil textures.

MODIS LAI products
The MODIS LAI product, computed from MODIS spectral reflectances, provides continuous and consistent LAI coverage for the entire global land surface at 1 km resolution (Yang et al., 2006).Some gaps and noise in the data are possible due to the presence of cloudiness, seasonal snow cover and instrument problems, and this can limit the usefulness of the product (Gao et al., 2008;Lawrence and Chase, 2007).In this study, we use the MODIS Land Product Subsets, created by the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), which provide summaries of selected MODIS Land Products for use in model validation and field site characterisation and include data for more than 1000 field sites and flux towers (http://daac.ornl.gov/MODIS/).
The MODIS Land Product Subsets (ASCII format) contain LAI data for a 7 km × 7 km grid of 49 pixels, with each pixel representing the 1 km × 1 km scale, at 8-day composite intervals.The average of the 3 × 3 pixel grid box centred on the flux tower is taken to be that day's LAI value.Only pixel values with an even quality control (QC) flag was used for the averaging and this produced a time-series of 8-day observations at each of the sites.Missing data were dealt with by using the previous good value in the time-series.The exception to this was Bondville, where missing data occurred in January 2000, since MODIS only started recording data in February 2000 (this year was used due to limited data availability at the site).To gap-fill the missing data, an 11-year average was computed and the missing data replaced with the average for January 2000.Finally, each time-series of 8-day composite values was linearly interpolated to obtain a daily LAI time-series.

Outline of experiments
This section describes the model simulations performed in the study.In the model simulation names, local and global refer to the parameter set and F, WEIG, WEIC and P refer to the meteorological forcing data set used (Table 3).Vegetation competition has been switched off for all model simulations.

Effect of local data on simulated GPP
Using JULES3.0,we compared model simulations using local parameter and meteorological data sets (local-F; Table 3) to observations of GPP from the FLUXNET network.For this set of model simulations, the default phenology model (used to update LAI) and TRIFFID were used.
The ability of the model to simulate interannual variability was also examined.Multi-year model simulations were performed for six of the sites using local data; one from each of the various climate zones (Harvard Forest, Vaira Ranch, Hyytiala, Santarem Km67), the Southern Hemisphere site (Tumbarumba) and the temperate site, Morgan Monroe.Since meteorological data was available for multiple years at these sites, but not model parameter data, the same parameter data sets used for the single-year runs (Table 2) would be used for the multi-year runs at specific sites.

Effect of global data on simulated GPP
Using JULES3.0,we compared model simulations using parameter sets from the HadGEM model and global meteorological data (global-WEIG, global-WEIC and global-P; Table 3) to observations of GPP from the FLUXNET network.
In addition to this, we quantified how much error is introduced into model simulations (using local model parameters) when using global (WFDEI-GPCC) instead of local meteorological data (local-WEIG and local-F in Table 3).In these model simulations, the default phenology model and TRIF-FID were used.

Comparison of global to local meteorological data
The WFDEI-GPCC, WFDEI-CRU and PRINCETON data sets were compared to FLUXNET to find out which one more closely captures the local meteorological conditions.

Daily satellite phenology
Using JULES3.0 and JULESmod, we tested the ability of the JULES phenology model to simulate the seasonal cycle of GPP by comparing model simulations, where JULES uses MODIS LAI data (local-FM and local-FNM; Table 3) to those using the default phenology model (local-F; Table 3).When using the default phenology module, LAI was computed internally by scaling the annual maximum LAI, which was then used to calculate GPP.When forcing JULES with daily MODIS data (local-FM), the phenology module was switched off and the MODIS LAI then used to compute GPP.For model simulations using MODIS data and the default phenology module (local-FNM), the annual maximum MODIS LAI is set to be the annual maximum LAI.Vegetation competition has been switched off and local parameters used for both sets of model simulations (local-FM, local-FNM).

Model analyses
To quantify differences between output from the various model simulations and observations, we used root mean squared error (RMSE) (Eq.1), which is a measure of the average error of the simulations, and bias (Eq.2), which is the average difference between model and observations (a measure of under-or overprediction) with the absolute (Eq. 3) and percentage differences (Eq.4): (2) x t and x o, t are model and observed daily GPP fluxes, respectively, which have been smoothed using a 7-day moving aver- age since we are interested in the long-term average and not daily variability.n is the number of paired values (number of days in year).The absolute difference ( GPP) between the model and observations is the absolute value of the difference in total annual GPP for each and the percentage difference ( %) is the absolute difference divided by the observed total annual GPP: In order to describe JULES' ability to reproduce simulated GPP, a simple, but subjective, ranking system using qualitative terms (Very well, Good and Poorly) was devised based on RMSE and bias (Table 4, Fig. 2a).These ranges were used as interannual variability was about ±1 g C m −2 day −1 in both RMSE and bias (Sect.3.1).

Effect of local data on simulated GPP
When driven with local meteorological and parameter data sets (local-F; Fig. 1), JULES has a negative bias with total annual GPP underestimated by 16 % (3049 g C m −2 year −1 ; Table 7) across all sites compared to observations.By using local data, JULES performs very well (see Fig. 2a and Table 4 for definition of qualitative terms used to describe model performance) at the temperate forest sites, Harvard Forest, Morgan Monroe, Hyytiala and Tharandt, where RMSEs range from 1.1-1.4g C m −2 day −1 , biases from −0.2 to +0.3 g C m −2 day −1 (Fig. 2a) and absolute differences from 40-211 g C m −2 year −1 (Table 7) and good at Vaira Ranch with an RMSE of 2.78 g C m −2 day −1 , bias of −0.19 g C m −2 day −1 and absolute difference of 71 g C m −2 year −1 .The model performs poorly at Tumbarumba, El Saler, Bondville and the tropical sites, Santarem Km67 and Santarem Km83, with RMSEs ranging from 1.8-4.1 g C m −2 day −1 , biases from −3.7 to −0.2 g C m −2 day −1 and absolute differences from 71-1340 g C m −2 year −1 .At the temperate forest sites, JULES simulates the summer carbon uptake and leaf onset and senescence very well.For example, at the needleleaf forests, Hyytiala and Tharandt, the model correctly captures the timing and magnitude of the seasonal cycle of GPP (Fig. 1).JULES is able to capture the beginning and ending of the growing season, but underestimates the summer carbon uptake at Tumbarumba, a temperate sclerophyll forest (forests dominated by plants that have hard leaves and are adapted to drought) (Fig. 1).At the tropical sites, Santarem Km67 and Santarem Km83, the seasonal cycle has been modelled poorly, with the total annual  2).The dashed lines on (a) show the regions defined by the qualitative terms (Table 4) used to describe model performance.
Table 7. Absolute and percentage differences between model simulated and observed (FLUXNET) total annual GPP (gC m −2 year −1 ) at the 12 flux tower sites.GPP obs is the observed total annual GPP, GPP is the absolute difference (Eq. 3) between the model and observed total annual GPP, and % is the percentage difference (Eq.4) between the model and observed total annual GPP.Values highlighted in bold mean that the difference is negative (i.e.GPP obs < GPP model ).The total value for each of the model simulations was computed using the differences and not the absolute differences.GPP being underestimated by 42 % (1340 g C m −2 year −1 ) and 21 % (583 g C m −2 year −1 ), respectively (Table 7).JULES can simulate interannual variability when using local data with average RMSEs across all six sites for all years being within 0.7 g C m −2 day −1 and average biases within 1.2 g C m −2 day −1 of model results from the corresponding single-site runs (Fig. 3).Interannual variability is captured very well at the temperate sites (Harvard Forest, Hyytiala  2) and represent data from model simulations performed for the year specified in Table 2, with results from other years plotted using the model simulation year and labels coloured the same as the original site label.

FLUXNET
and Morgan Monroe) and Vaira Ranch with RMSEs ranging from +1 to +3 g C m −2 day −1 and biases from +1 to −1 g C m −2 day −1 .As observed with the single-site model simulations, the model fails to capture interannual variability at Santarem Km67 and Tumbarumba (Fig. 3).
Overall, JULES performs very well with the use of local data (meteorological and parameter data sets) with negative biases observed at the tropical sites and the Southern Hemisphere site, Tumbarumba, with the same trend also observed when the model simulates interannual variability.

Effect of global data on simulated GPP
By replacing the local data with global parameter and meteorological data, JULES had a much greater negative bias with total annual GPP underestimated by 30 % (6703 g C m −2 year −1 ; Table 7) on average across all sites compared to observations (global-WEIG, global-WEIC and global-P; Fig. 1).This is also shown in the annual average GPP, which has been plotted for each of the model simulations and observations at the 12 sites (Fig. 1) and the percentage differences (Table 7), which are, in general, larger for simulations using global data than for those using local.This trend occurs at all sites, with the exception of the wetland site, Kaamanen, and Santarem Km83, where modelled total annual GPP (2684 g C m −2 year −1 and 492 g C m −2 year −1 , respectively) is overestimated (global-P; Table 7) compared to model runs using only local data (2141 g C m −2 year −1 and 119 g C m −2 year −1 , respectively; Table 7).
As well as quantifying differences in model simulations using either local or global data, it is useful to know how global meteorological data affects local model runs.Global meteorological data can be used in place of FLUXNET data in order to drive JULES (local-WEIG; Table 3).This is important for ecological research sites where there is limited or no local meteorological data available.Using the WFDEI-GPCC meteorological data set (local-WEIG; Table 3) to force the model increases the negative bias of model simulations using only local data (Fig. 2f), with a 7 % reduction in simulated total annual GPP (15 469 g C m −2 year −1 for local-F reduced to 14 193 g C m −2 year −1 for local-WEIG; Table 7).
Forcing the model with WFDEI-GPCC (local-WEIG) results in decreases in model performance (increases in bias and RMSE) at the majority of sites.The tropical sites, Santarem Km67 and Santarem Km83, are two exceptions and show a noticeable improvement in modelled yearly GPP (66 and 61 % reduction of bias, respectively) and changes to modelled seasonal cycle (25 % increase and 65 % reduction of RMSE, respectively).However, at some sites, such as Tharandt, Kaamanen and Hyytiala, forcing JULES with global meteorological data has not introduced large negative biases into GPP predictions (Table 7), with RMSEs ranging from 1.1-1.3g C m −2 year −1 (Fig. 2f).
In general, we found the meteorological data had a greater impact on modelled GPP fluxes than model parameters.Larger differences exist between local-WEIG and local-F (local WEIG-F ; Fig. 2d), which differ only in the atmospheric forcings data set used, compared to between global-WEIG and local-WEIG (global − local WEIG ; Fig. 2e), which differ only in the model parameter sets used.
The ability of JULES to capture yearly GPP (bias) and the seasonal cycle (RMSE) is affected at the majority of sites when using global meteorological data (Fig. 2d), with improvements observed at Santarem Km67 and Santarem Km83.However, model parameters were found to affect bias at all 12 sites (Fig. 2e) with the tropical sites being the most influenced.With the exception of Tumbarumba, biases associated with meteorological data compensate for those associated with model parameters at the tropical sites (global WEIG − local F ; Fig. 2c).
Overall, we found that with the use of global data (model parameter and meteorological data), model performance decreased from very well to good and poorly at most sites, with the exception of the tropical sites.Driving JULES with global meteorological data introduces biases into single site simulations.At the majority of sites, these biases are negative, but at tropical sites, the global meteorological data  2).Note that before computing bias and RMSE, the meteorological data was normalised against the annual mean for each site.improves model performance.We found the meteorological data to have a greater impact on GPP fluxes than model parameters.

Global vs. local meteorological data
As well as quantifying the error introduced into model simulations by using global meteorological data instead of local, we also compared the global meteorological data to local data.Only the downward short-wave and long-wave diation fluxes, precipitation and surface air temperature variables have been compared to FLUXNET values, since these variables play the most influential role of the meteorological forcings in canopy photosynthesis and light propagation in JULES (Alton et al., 2007).In order to compare the meteorological data sets, the data was normalised against the annual mean for each site before computing the RMSE and bias.
Of the two global meteorological data sets used in this study, the WFDEI data set compares best to FLUXNET (lower RMSEs and biases than PRINCETON) at the majority of sites (Fig. 4).Surface air temperatures compare best to local meteorological measurements with average RMSEs of 0.4 and 0.7 % (7 day filtered RMSE expressed as percentages of the annual mean value) (1.5 and 2.4 K) across all sites for the WFDEI and PRINCETON data sets, respec-tively (Fig. 4d), followed by the downward short-wave radiation fluxes with average RMSEs of 13 and 17 % (27.0 and 33.2 W m −2 ) for WFDEI and PRINCETON, respectively (Fig. 4a), and downward long-wave radiation fluxes with average RMSEs of 4 and 5 % (18.9 and 25.0 W m −2 ) for WFDEI and PRINCETON, respectively (Fig. 4b).Precipitation data from global data sets differ most from local values with RMSEs of 112-178 % (2.7-4.4 mm day −1 ) for WFDEI-GPCC, WFDEI-CRU and PRINCETON, respectively, which may be due to how the precipitation products of each global data set is corrected (Weedon et al., 2011;Sheffield et al., 2006).
In addition to comparing the global meteorological variables to their local values, we also examine the two precipitation products, WFDEI-GPCC (GPCC-corrected) and WFDEI-CRU (CRU-corrected), within the WFDEI data set.We found WFDEI-GPCC and WFDEI-CRU compare equally well at the 12 FLUXNET sites (Fig. 4c) with average RMSEs of 2.7 and 2.8 mm day −1 , respectively.Differences between GPCC-and CRU-corrected precipitation RMSEs are small (0.0-1.4 g C m −2 day −1 ) at individual flux tower sites.When forcing JULES with WFDEI, there is little difference when either WFDEI-GPCC or WFDEI-CRU is used as the precipitation product, with average RMSEs of 2.9 and 2.8 g C m −2 day −1 , respectively, across all sites, al-  2).data and vegetation parameters, such as V cmax , may have a greater impact on predicted GPP than LAI.
Overall, when JULES is forced with daily MODIS LAI small improvements (8 % reduction in average RMSE; local-FM) in predicted GPP are observed at a number of sites, though there exists a negative bias associated with using MODIS data.By setting the annual maximum MODIS LAI to be the annual maximum LAI at each site, the model performs equally well (0.04 % reduction in average RMSE; local-FNM) to local model simulations.We also observed improvements in simulated GPP at sites with low LAI values, such as grasslands, when JULES is forced with daily LAI.

How well does JULES perform when using the best available local meteorological and parameter data sets compared to those using global data?
At more than half of the sites, JULES performs very well when using local meteorological and parameter data sets with a negative bias observed for the remaining sites (Fig. 2a).
At the six sites where multi-year model simulations were performed, interannual variability is captured by the model using local data with the exception of Santarem Km67 and Tumbarumba.This trend is also observed with the singleyear runs.The use of global parameter and meteorological data sets introduces a negative bias into GPP simulations at all sites with the exception of the mediterranean site, El Saler, and the tropical sites (Fig. 2b).Using local parameter and global meteorological data to drive JULES (local-WEIG) increases the negative bias of local model simulations (local-F) (Fig. 2f).We observed decreases in model performance at the majority of sites, with the exceptions being the tropical sites (Santarem Km67/Km83).At some sites, such as Hyytiala and Kaamanen, using global meteorological data produced similar results (Fig. 2a, f) to using FLUXNET data.
Our results compare well with the evaluation of JULES by Blyth et al. (2011), where parameters were obtained as though the model was embedded in a GCM.Differences between the two studies include different model versions and global meteorological data sets used.Comparing our results with Fig. 3 of Blyth et al. (2011), we also found simulated photosynthesis to be underestimated for the temperate forests (Harvard Forest, Tharandt and Morgan Monroe), grasslands (Fort Peck), mediterranean sites (El Saler) and the tropical forests (Santarem Km67), and overestimated for the wetlands (Kaamanen).We observed that the use of local observations of site characteristics, such as PFT fractions and vegetation properties, lead to improvements in model performance at more than half of the sites (Fig. 2a), though errors still exist with percentage differences ranging from 2-12 %.
Differences between global and local data include PFT fractions (Table 5), soil texture fractions, vegetation parameters (Table 6) and meteorological data.At some sites, such as Bondville and Santarem Km67/Km83, the global and local values for LAI and V cmax were markedly different (Fig. 6), though for the majority of sites, global and local LAI values were quite close (Fig. 6a), whereas global V cmax values were underestimated compared to local values (below dashed line in Fig. 6b).Overall, the MODIS LAI values were closer to the local values and in general, lower than global values (Fig. 6a).In general, we found the meteorological data to play a more important role than model parameters in determining GPP fluxes at sites, such as Santarem Km67 and Santarem Km83.At these sites, the meteorological forcing data was the primary driver of productivity and biases associated with the global meteorological data compensated for incorrect parameter values.However, at Tumbarumba, incorrectly predicted GPP was due to model error rather than meteorological data or model parameters.We performed a temperature sensitivity study at Tumbarumba using local meteorological and parameter data sets (local-F; Table 3).The winter and spring surface air temperatures (May-October) of the FLUXNET data were increased by increments of 1 • C and the model was re-ran each time.Improvements in simulated seasonal cycle were observed, but only at high surface air temperatures (an increase in 7 • C).Since the model performed poorly when using both global and local data meteorological data, we can assume that this is due to the model itself rather than the forcing data.Tumbarumba is classified as a sclerophyll forest and JULES does not have this land cover type.We assigned the Needleleaf (NL) PFT to JULES at this site.The introduction of the correct PFT and associated parameters may improve the results at this site.

Of the global meteorological data sets used in this
study which one compares best to FLUXNET data?
At the majority of sites, the WFDEI data set compares better to local meteorological measurements (FLUXNET) than the PRINCETON data set does (Fig. 4).This is likely due to the WFDEI data set being derived from the ECMWF Re-analysis (ERA-Interim) data set (Dee et al., 2011).The ERA-Interim re-analysis is a higher resolution data set (∼ 0.75 • × 0.75 • ; equivalent to a surface resolution of about 83 km × 83 km at the Equator and 83 km × 48 km at 55 • N) than the NCEP-NCAR re-analysis (2.0 • ×2.0 • ; equivalent to a surface resolution of about 222 km × 222 km at the Equator and 222 km × 128 km at 55 • N), from which the PRINCE-TON data set is derived (Kistler et al., 2001).The ERA-Interim re-analysis also uses a more advanced data assimilation system than the NCEP-NCAR re-analysis (Kistler et al., 2001;Weedon et al., 2014).
At the sites considered, differences between global and local values for downward short-wave and long-wave radiation fluxes and surface air temperatures are quite small (Fig. 4a, b and d), with average percentage RMSEs ranging from 0.4-17 % (expressed as percentages of the annual mean value), while larger differences are observed for precipitation (Fig. 4c), with average percentage RMSEs ranging from 112-178 %.At the majority of sites, there is a negative bias associated with precipitation (Fig. 4c), but this will have little effect on GPP fluxes since JULES is relatively insensitive to precipitation (Galbraith et al., 2010).For the remaining meteorological variables, there is a positive surface air temperature bias, but no dominant bias associated with the radiation fluxes.However, at individual sites, such as the tropical sites, Santarem Km67 and Santarem Km83, biases in the meteorological data can affect model results.In general, we found that using MODIS data resulted in only small decreases in RMSE at a limited number of sites, compared to using locally observed LAI.At sites where model performance improved, improvements were a result of setting the annual maximum LAI to be the annual maximum MODIS LAI rather than forcing the model with daily MODIS LAI.The largest improvements in simulated GPP occur at sites with low annual LAI, such as the grassland (Vaira Ranch, Fort Peck, Kaamanen) and cropland (Bondville) sites and the tropical sites (Santarem Km67 and Santarem Km83).At the boreal sites, Tharandt and Hyytiala, the MODIS LAI tended to be quite noisy and this led to underestimated GPP (Fig. 5c).
We found that at sites where the MODIS LAI time-series was noisy (large day-to-day variations), this resulted in decreased model performance.At some of the flux tower sites, the MODIS data failed to capture aspects of the seasonal cycle of leaf phenology, such as the magnitude of the seasonal cycle (Tharandt, El Saler) and the beginning and end of the growing season (Bondville).For example, at Tumbarumba, the MODIS instrument estimated the annual maximum LAI to be 6.08 m −2 m −2 and the daily LAI to be quite noisy, whereas the ground level observations show it to be 2.5 m 2 m −2 (Table 6) and LAI to be constant for much of the year.
The MODIS instrument provides a valuable source of information that can be used by land surface models.However, in this study, the quality of the LAI data can affect model performance.At the tropical sites, MODIS was unable to capture the magnitude of seasonal variation in LAI with MODIS overestimating the locally observed annual maximum LAI at Santarem Km67 and Santarem Km83 by 28 and 10 %, respectively (Table 6).It was also unable to correctly capture LAI during the Amazonian rainy season, which runs from December to June, as a result of increased cloud cover.The MODIS LAI is very noisy in these regions, but should be constant throughout the year.
Overall, we found JULES' phenology module performed very well at the temperate sites and poorly at the tropical and cropland sites.The ability of the phenology model to simulate GPP fluxes very well at temperate sites, with slight underestimation of the summer carbon uptake and phase shift (leaf onset and senescence), may be due to its design; temperature-dependent for the BL/NL PFT classes, with model parameters tuned for temperate regions.Forcing the model with MODIS LAI only slightly improved model performance.However, setting the annual maximum LAI for each PFT to be the annual maximum MODIS LAI resulted in improved model performance, without the computational overhead of forcing JULES with daily satellite data.More accurate GPP predictions could be possible with the inclusion of tropical PFTs, such as tropical evergreen broadleaf and tropical deciduous broadleaf, with associated model parameters and a phenology model modified to take these tropical PFTs into account.

Conclusions
We performed a multi-site evaluation of the JULES LSM using local, global and satellite data.In general, we found that when using local meteorological and parameter data sets, JULES performed very well (Fig. 2a and Table 4) at temperate sites with a negative bias observed at the tropical and cropland sites.At a limited number of sites, the model was able to simulate interannual variability using local data, with the exception of the tropical site, Santarem Km67, and Tumbarumba.
The use of global data worsens model performance by introducing negative biases into model simulations of GPP at the majority of sites, with the exception of the tropical sites.The improvement in model simulated GPP when using local values of vegetation properties implies that global values may be incorrect.At sites where model performance improved using global data, this was due to biases associated with the meteorological data.We observed that the meteorological data had a greater impact on modelled GPP fluxes than model parameters.
The use of meteorological data extracted from global meteorological data sets was used to drive JULES.We found that global meteorological data increased the negative biases of local model simulations at all sites with the exception of the tropical sites, where GPP predictions were improved.Of the two global meteorological data sets used in this study, the WFDEI data set more closely captures the local meteorological conditions, though we found that the PRINCETON data set results in improved performance at some of the sites due to positive biases associated with the downward radiation fluxes and surface air temperature.This implies that there are compensating errors within the model, which need to be identified and addressed.
LAI is an important parameter used in the calculation of canopy photosynthesis.Model simulations using local and MODIS data displayed improvements in modelled GPP compared to using only local data.Improvements in modelled GPP were observed at the beginning and ending of the growing season.Using MODIS data for the annual maximum LAI allows for improved model performance without the complication of assimilating daily satellite data into the model.We found the default phenology module allowed JULES to perform very well at temperate sites, but not at the tropical sites.More realistic simulation of the seasonal cycle of GPP was observed at sites with low LAI values, such as grasslands.Even though we have described the MODIS data as being noisy at a number of sites, it provides a valuable source of information as it is a high spatial and temporal resolution data set.It allows a better understanding of plant response to climate and is a useful aid to modellers.
Although only a limited number of model parameters were modified at the 12 flux tower sites, due to limited data availability at FLUXNET sites, we showed that with more accurate information regarding flux tower sites, improved predictions of GPP are possible.However, negative biases still exist in this situation due to model error and incorrect modelling of tropical processes.We suggest that improved model performance with regards to the terrestrial carbon cycle could be achieved with the introduction of more PFT classes, such as tropical evergreen broadleaf and tropical deciduous broadleaf, and their associated model parameters and a phenology model that can properly simulate carbon fluxes in both temperate and tropical regions.

Figure 1 .
Figure 1.Seasonal cycle of model-predicted (local-F, global-WEIG, global-WEIC and global-P in Table3) and observed GPP fluxes, smoothed with a 7-day moving average window, at the 12 FLUXNET sites (HF: Harvard Forest; VA: Vaira Ranch; MM: Morgan Monroe; HY: Hyytiala; TH: Tharandt; TUM: Tumbarumba: ES: El Saler; FP: Fort Peck; KA: Kaamanen; S67: Santarem Km67; S83: Santarem Km83; BO: Bondville).Model simulation years are given in Table2.The thick lines refer to FLUXNET observations (blue) and simulated GPP from local-F model simulations (red).Annual averages for model simulations and observations are plotted as thick dots on the right of each plot in the same colours.

Figure 2 .
Figure 2. Comparison of modelled and observed GPP using bias and RMSE at the 12 FLUXNET sites (HF: Harvard Forest; VA: Vaira Ranch; MM: Morgan Monroe; HY: Hyytiala; TH: Tharandt; TUM: Tumbarumba; ES: El Saler; FP: Fort Peck; KA: Kaamanen; S67: Santarem Km67; S83: Santarem Km83; BO: Bondville) for three sets of model simulations: (a) local-F; (b) global-WEIG; and (f) local-WEIG (Table 3).(c) displays the differences between bias and RMSE for global-WEIG and local-F model simulations; (d) differences between local-WEIG and local-F model simulations; and (e) differences between global-WEIG and local-WEIG model simulations.Marked on (c), (d) and (e) next to the figure letter are how the sets of model simulations differ.The site labels are coloured according to their climate zone (Table2).The dashed lines on (a) show the regions defined by the qualitative terms (Table4) used to describe model performance.

Figure 3 .
Figure 3. Multi-year comparison of modelled and observed GPP using bias and RMSE at six FLUXNET sites (HF: Harvard Forest; VA: Vaira Ranch; MM: Morgan Monroe; HY: Hyytiala; TUM: Tumbarumba; S67: Santarem Km67) for model simulations using local parameter and meteorological data (local-F).The site labels are coloured according to their climate zone (Table2) and represent data from model simulations performed for the year specified in Table2, with results from other years plotted using the model simulation year and labels coloured the same as the original site label.

Figure 6 .
Figure 6.Comparison of (a) global, MODIS (site annual maximum) and local leaf area index (LAI) and (b) global and local maximum rate of Rubisco carboxylase activity (V cmax ) at the 12 FLUXNET sites (HF: Harvard Forest; VA: Vaira Ranch; MM: Morgan Monroe; HY: Hyytiala; TH: Tharandt; TUM: Tumbarumba; ES: El Saler; FP: Fort Peck; KA: Kaamanen; S67: Santarem Km67; S83: Santarem Km83; BO: Bondville).The LAI data displayed for each study site refer to the annual maximum LAI of the dominant PFT.The site labels are coloured according to their climate zone (Table 2) and in (a), the lighter shades are the MODIS data.The dashed grey lines represent LAI and V cmax , where global, MODIS and local values match, with overestimated global and MODIS values above the dashed line and underestimated values below it.

D.
Slevin et al.: Multi-site evaluation of the JULES land surface model using global and local data 4.3 Are improvements in simulated GPP observed when forcing JULES with daily satellite phenology compared to using the default phenology module?

Table 1 .
Model parameters and meteorological variables that are altered between global and local model simulations.

Table 3 .
Types of model simulations performed in this study.

Slevin et al.: Multi-site evaluation of the JULES land surface model using global and local dataTable 4 .
Definition of qualitative terms used to describe JULES' ability to simulate GPP when compared to observed FLUXNET GPP.Both RMSE and Bias have units of g C m −2 day −1 .Starting at "Very well", the term associated with the first condition satisfied is used to describe model performance.

Table 5 .
Vegetation (PFT) and non-vegetation land cover type (BL: broadleaf tree, NL: needleleaf tree, C3g: C3 grass, C4g: C4 grass, sh: shrubs, bs: bare soil) fractions at the 12 FLUXNET sites.For each site, the first row refers to global data and the second refers to local.

Table 6 .
Local and global biophysical parameters (site annual maximum LAI, canopy height and V cmax ) at the 12 FLUXNET sites.For each site, the first row refers to global data, the second refers to local and the third refers to satellite.Online data was accessed in April 2013.