Evaluation of a Global Vegetation Model using time series of satellite vegetation indices

Atmospheric CO2 drives most of the greenhouse effect increase. One major uncertainty on the future rate of increase of CO2 in the atmosphere is the impact of the anticipated climate change on the vegetation. Dynamic Global Vegetation Models (DGVM) are used to address this question. ORCHIDEE is such a DGVM that has proven useful for climate change studies. However, there is no objective and methodological way to accurately assess each new available version on the global scale. In this paper, we submit a methodological evaluation of ORCHIDEE by correlating satellite-derived Vegetation Index time series against those of the modeled Fraction of absorbed Photosynthetically Active Radiation (FPAR). A perfect correlation between the two is not expected, however an improvement of the model should lead to an increase of the overall performance. We detail two case studies in which model improvements are demonstrated, using our methodology. In the first one, a new phenology version in ORCHIDEE is shown to bring a significant impact on the simulated annual cycles, in particular for C3 Grasses and C3 Crops. In the second case study, we compare the simulations when using two different weather fields to drive ORCHIDEE. The ERA-Interim forcing leads to a better description of the FPAR interannual anomalies than the simulation forced by a mixed CRU-NCEP dataset. This work shows that long time series of satellite observations, despite their uncertainties, can identify weaknesses in global vegetation models, a necessary first step to improving them. Correspondence to: F. Maignan (maignan@lsce.ipsl.fr)


Introduction
Dynamic Global Vegetation Models (DGVM) quantify energy and mass fluxes between the surface and the atmosphere.They are either used as a component within meteorological forecasting schemes and climate models or they are used in a stand-alone mode, forced by weather and climate fields, to help to quantify and understand the variability of carbon, water or nutrients fluxes and pools.One major uncertainty for the future rate of increase of CO 2 in the atmosphere is the impact of the anticipated climate change on the vegetation (IPCC, 2007).An increase of CO 2 is expected to have beneficiary impacts on the vegetation photosynthesis and growth (leading to a net sink of carbon).However, changes in temperature, radiation and precipitation, as a result of CO 2 induced climate change, can have either positive or negative impacts on the carbon balance of ecosystems (Ciais et al., 2005).These impacts differ across seasons (Piao et al., 2008) and across regions (Running et al., 2006).As such, there is a need to assess the quality of the vegetation models at a global scale, especially as some of them are used for prominent predictions of the carbon cycleclimate feedbacks in the 21st century using coupled-models (see Cox et al., 2000;Friedlingstein et al., 2006).Recent efforts have been initiated to benchmark vegetation models at different scales (Abramowitz, 2005;Gulden et al., 2008;Cadule et al., 2010) and international projects are emerging such as the Carbon-Land Model intercomparison Project (C-LAMP, Randerson et al., 2009), the International Land-Atmosphere Model Benchmarking project (ILAMB, Blyth et al., 2011;Cadule et al., 2010) and the LandFlux-EVAL project (Seneviratne et al., 2009;Mueller et al., 2011).We F. Maignan et al.: Evaluation of a Global Vegetation Model using time series of satellite vegetation indices focus here on the global scale, using satellite measurements to evaluate model outputs.Even if a model modification is evaluated favorably when comparing model outputs with measurements at sites, it still has to be evaluated with dedicated global tools, such as the one we present here, when applied on a global scale.
The objectives of this paper are twofold.First, we present a robust method for a quantitative evaluation of any DGVM performance using properly calibrated and corrected satellite time series.Second, we apply this method to evaluate the performances of different versions of the ORCHIDEE vegetation model (different phenologies and different climate drivers).
The model and satellite data are described in Sect. 2. Section 3 details the methodology and Sects.4 and 5 present the results.Discussion and conclusions are given in Sect.6.

Model and data
2.1 The ORCHIDEE model ORCHIDEE (Krinner et al., 2005) is a DGVM developed at the Institut Pierre Simon Laplace (IPSL).It models carbon, water and energy fluxes as well as the dynamics of biomass, soil carbon and soil water pools.It has been used at the site level (Jung et al., 2007a) and on a global scale (Krinner et al., 2005;Piao et al., 2008).The current version (1.9.5) is available on request to the authors.
ORCHIDEE models the dynamics of 13 different Plant Functional Types (PFT) (Prentice et al., 1992).Each PFT contains different plant species which grow under similar soil properties and climatic factors (temperature and precipitation), and which share the same physiological processes such as cold tolerance, or water needs (Cramer, 2002).The dynamics of each PFT are controlled by a common set of equations but with different parameters.An exception is the phenology, for which each PFT is assigned a specific set of equations as described in Botta et al. (2000).The ORCHIDEE model can either use a fixed distribution of PFTs (with fractional coverage specified for each model grid) or it can calculate the PFT distribution dynamically, according to local climate and competition between different PFTs.As we are evaluating the model over a short recent period, we chose to use a fixed vegetation distribution, derived from recent observations, rather than a dynamic one.The state of the art models of vegetation dynamics are still rather crude and lead to biases in simulated vegetation types as compared to the observations (Krinner et al., 2005;Bonan and Levis, 2006).Table 1 shows the list of the 13 PFTs along with their acronym used in this paper.
The various versions of the ORCHIDEE model are due either to structural changes (i.e.changes in the processes or the controlling parameters) or to different input parameters, such as meteorological forcing fields.An objective methodology is needed for the evaluation of these versions, in order to further improve carbon fluxes estimates on a global scale.The primary carbon flux calculated by ORCHIDEE is the Gross Primary Productivity (GPP).In a study by Jung et al. (2007b), it was shown that the modeling of GPP was primarily sensitive to differences between models rather than to differences in meteorological forcing.We assume that these conclusions should also hold for the Leaf Area Index (LAI), as it is closely related to the GPP.We will therefore focus on this ORCHIDEE key-variable as it can be compared to actual satellite data.In this paper, we will thus evaluate the impact of structural changes in the phenology model on the modeled LAI, and the impact of two different global meteorological forcing fields.

The phenology models
ORCHIDEE implements a prognostic leaf cycle through climate driven leaf onset models, and leaf senescence processes governing turnover rates.The climate driven leaf onset phenological models in ORCHIDEE are mainly derived from the work of Botta et al. (2000).The authors used AVHRR satellite data to select and calibrate local phenology models (Chuine, 2000) for the global scale, for approximately ten biomes.In ORCHIDEE, the Summer-green PFTs leaf onset is driven by air temperature only.The two Broad-leaved Summer-green PFTs use a classical Growing-Degree-Day (GDD) model, where the GDD cut-off value depends on the Number of Chilling Days (NCD), following a decreasing exponential relationship.The Boreal Needleleaf Sumer-green PFT (mainly Larix decidua) model uses a simple Number of Growing Days (NGD) threshold model.The Tropical Broad-leaved Raingreen PFT leaf onset is driven only by moisture availability while the Grass and Crop PFTs use a model mixing a GDD threshold and moisture availability criteria.All constants appearing in these models are PFT-dependant.The four Evergreen PFTs have no associated onset models.
The leaf senescence is driven by two different processes: the first one depends on climatic conditions and the second one depends on a critical leaf life span.In the climate driven senescence models, the process for all Summer-green PFTs is based on conditions related to temperature decrease and the one for the Tropical Broad-leaved Raingreen PFT is based on a lesser moisture availability condition.For all Tree PFTs, when senescence is declared a predefined turnover rate is set.The senescence model for the Grass and Crop PFTs is a mixed one, with temperature and moisture availability conditions governing a climate dependent turnover rate.In the leaf age related senescence models, a mean leaf life span is set for each PFT, and the turnover rate of leaves increases sharply where mean leaf age reaches the leaf life span.Leaf senescence of Evergreen PFTs is only driven by leaf age.
In the current version, two drawbacks have been identified by users and the new phenology scheme is based on the corrections proposed below for these drawbacks.The first drawback is related to the choice of the reference date at which the GDD calculation begins.This date is variable and corresponds to the end of the former growing cycle.Although functionally realistic, this system generates chaotic behavior (due to non-linear onset and senescence phenological models using thresholds) and leads, for Grasses and Crops, to a seemingly erratic tendency of "grow and decay".This results in very irregular LAI time series, with unrealistically large interannual variations compared to the observations.On the other hand, a fixed date cannot be chosen as ORCHIDEE is designed to run for a wide range of time periods and climates, including paleo-climates.For these reasons, the new phenology scheme uses the winter solstice, calculated for each orbital condition, as the reference date for GDD calculations.Fixing the date for the start of GDD calculations for any given simulation leads to a more stable phenological cycle.
The second drawback in the current ORCHIDEE version is that crops share the same leaf onset and senescence climate driven phenology models as grasses.This is quite unrealistic, even with different PFT parameters, as the crop growing season length is generally much shorter than that of grasses because the harvest abruptly terminates the seasonal cycle.A complex approach for crop representation was developed by coupling ORCHIDEE to the STICS agronomic model (Gervois et al., 2008), but this option is not yet included in the current ORCHIDEE version.In parallel, a simpler approach has been adopted in the new phenology version, which now includes a specific climate driven senescence model for crops, based on the work done in Bondeau et al. (2007) for the DGVM LPJ managed Land (LPJmL).This crops senescence model is simply based on a GDD threshold.At present ORCHIDEE is able to simulate only one C3 Crop type and only one C4 Crop type.We thus selected, in the new phenology scheme, parameter values corresponding to winter wheat for the C3 Crop PFT and parameter values corresponding to maize for the C4 Crop PFT.

The meteorological forcings
The ORCHIDEE model generally uses 6-hourly meteorological fields of the following variables: surface pressure, 2-m air temperature, 2-m specific humidity, rainfall and snowfall precipitation, surface wind, downward solar and longwave radiation.We first evaluated the effect of driving ORCHIDEE by the latest ECMWF re-analysis, ERA-Interim (ERA-I), which is currently available from 1989 to the present (Berrisford et al., 2009).The variables are defined on a Gaussian grid with a spatial resolution of approximately 70 km.We also evaluated the effect of a mixed CRU-NCEP meteorological forcing dataset, based on the 6-hourly 2.5 • NCEP/NCAR re-analysis (Kalnay et al., 1996), combined with the CRU TS 2.1 monthly anomalies (Mitchell and Jones, 2005).Precipitation and radiation fields are known to be of a rather poor quality in re-analyses and are better represented in climatologies; moreover re-analyses only cover the second half of the 20th century up to the present.The fusion of climatologies and re-analyses combines the advantage of the two datasets, although there are inconsistencies between the parameters.The re-analyses datasets then need to be corrected using monthly mean fields derived from surface meteorological stations observations (Mitchell and Jones, 2005).The CRU climate dataset is available for the 1901-2002 period.We performed monthly linear regressions between CRU and NCEP during the common period  of the two datasets, and used the results of these regressions to correct the NCEP data over the whole period .It is expected that the resulting fields have reliable mean values and show realistic temporal variations.The mixed dataset is provided at a resolution of 0.5 • (see more details at http://dods.extra.cea.fr/data/p529viov/cruncep/).
For each simulation, the model is first run for several decades until all carbon reservoirs reach their steady state equilibrium, indicated by a close-to-zero decadal-mean value of the Net Ecosystem Productivity (NEP) over each point of the globe.Once equilibrium is reached, the model is run from 1901 to 2008 for a CRU-NCEP simulation, and from 1989 to 2008 for an ERA-Interim simulation.As is discussed below, the MODIS (Moderate Resolution Imaging Spectroradiometer) satellite data are only available after 2000.As such, the period of interest for our research is set from 2000 to 2008.
ORCHIDEE is used at the spatial grid of the meteorological forcings.This requires that the PFT distribution map, as well as the soil map that gives the proportions of silt, sand and clay (Zobler, 1986), are interpolated on the same grid.As a consequence, the impact of the spatial scale must be accounted for and, as such, is analyzed in Sect.5.3.

PFT spatial distribution
As explained above, we chose to use fixed distributions of PFTs.For the simulations that use the ERA-Interim meteorological forcings, the PFT distribution was derived from www.geosci-model-dev.net/4/1103/2011/Geosci.Model Dev., 4, 1103-1114, 2011 high-resolution vegetation maps such as CORINE over Europe (Heymann et al., 1993) and UMd (Hansen et al., 2000) for the rest of the world.The CRU-NCEP simulation that is used for comparison was produced earlier in another context and uses a different PFT distribution, derived from the standard Olson classification at 5 km (Vérant et al., 2004).Differences between the two PFT maps are detailed and the respective fractional coverages for each of the PFTs are illustrated in the Supplement.

Satellite data
For the evaluation of the model simulations, we use products from the MODIS instrument, on board the Terra satellite.MODIS is a key instrument of the NASA Earth Observing System and provides data for atmospheric, oceanic and land surfaces studies.The primary inputs of our processing are the daily surface reflectances, after correction for atmospheric absorption and scattering (Vermote et al., 2002).
We use reduced-resolution data products at the 5 km spatial resolution of the Climate Modeling Grid.The visible (620-670 nm) and near infrared (841-876 nm) reflectances are used to constrain a directional reflectance model, which is then applied to correct the measurements for directional effects (Vermote et al., 2009).This procedure retains the highest temporal resolution (daily, cloud cover permitting) without the noise generated by the day-to-day changes in observation geometry.From the corrected reflectances, the Normalized Difference Vegetation Index (NDVI) is calculated.Based on the irregular variation of the NDVI, Vermote et al. (2009) estimate that individual measurements of the NDVI have a noise of 0.02 or less for most tropical and midlatitudes pixels.This noise is approximately 0.03 for equatorial areas (due to a large cloud cover), some high-latitudes areas (due to snow, clouds and large zenithal angles), and south-east of Asia (due to a large aerosol load).The noise is further reduced by temporal interpolation of the individual measurements (see below).This satellite product is, in principle, available every day; however, due to cloud cover, the actual temporal resolution is decreased.To generate a consistent dataset with a daily resolution, we made a temporal interpolation using a polynomial fit on the 10 observations that are closest in time.The interpolated value is considered invalid if the difference in time between the day of interest and the closest observation is larger than 15 days.

Selecting variables
NDVI satellite data quantify the vegetation cover (Tucker, 1979) and may be logically used to evaluate the modeled Leaf Area index (LAI), which is a key-variable of DGVMs as it is directly related to GPP through the photosynthesis process.However, it is well known that the NDVI tends to saturate for large LAIs (Myneni et al., 1997) because, as leaf cover gets larger, fewer and fewer photons can penetrate and illuminate the lower leaf layers, so that the latter have a very limited impact on the measured reflectance.There is therefore no linear relationship between LAI and NDVI.However, a much more linear relationship exists between the NDVI and the Fraction of absorbed Photosynthetically Active Radiation (FPAR) (Knyazikhin et al., 1998).To a first approximation, FPAR can be estimated from LAI using a simple Beer's law with a general purpose extinction coefficient value of 0.5 (Monsi and Saeki, 1953): We will therefore compare the measured NDVI and the simulated FPAR time series.Although NDVI and FPAR are related, a number of variables other than FPAR, such as vegetation geometry, soil reflectance, fractional cover, mixture of grass understory with trees, measurement noise, or leaf spectral signatures, also impacts NDVI.Hence, there is no expectation of a robust correspondence between the two variables but we argue that these perturbing factors apply to all versions of the ORCHIDEE model and have a similar impact.Therefore, an improved version of the model should better fit the satellite observations.A quantitative evaluation is therefore obtained through the correlation between the modeled FPAR and satellite observed NDVI time series.As the correlation will be meaningful only for time series with an annual cycle, it has not been calculated if the standard deviations of both time series are lower than a threshold of 0.04, which is larger than the noise of a pixel NDVI time series.

Averaging satellite observations at the model resolutions
The satellite data and the model outputs are provided at different spatial and temporal resolutions.It is therefore necessary to apply some interpolation and averaging.In our analysis, the correlations are computed over the time series at a weekly or monthly scale and at the vegetation model grid scale.Satellite data are provided at a spatial resolution that is much higher than that of the model.It is then straightforward to degrade the resolution of satellite data to that of the model through a simple averaging of the pixels that fall into the model box.As for the temporal averaging, we use the interpolation procedure described above to generate consistent daily time series.A simple averaging is then applied to generate the weekly or monthly datasets.The temporal and spatial averaging is expected to reduce the noise of the NDVI time series even more, by decreasing the effect of random errors present in satellite observations.We also produce interannual anomaly time series of both NDVI and FPAR, by computing a mean annual cycle over the full observation period and removing it from each year of the corresponding time series.The comparison between the modeled FPAR anomaly time series and the satellite NDVI anomaly time series enables us to quantify the model's ability to reproduce interannual variations in the vegetation leaf cycle, in response to climate and weather interannual variability.

Correlation maps
For any grid box, the satellite and model time series are used to compute a correlation coefficient.The spatial distribution of these coefficients is shown in correlation maps, which are informative in identifying areas where the model reproduces the observed variations with various degree of accuracy.We analyze both the correlation maps based on the original time series, and those based on the anomalies.The former are indicative of the model ability to reproduce the seasonal cycle, whereas the latter are representative of the model performance for reproducing interannual anomalies, as a response to anomalous weather patterns.

Median correlation as a simple scoring value
Correlation maps are certainly needed to analyze the model performance.However, a single value for the scoring of a particular simulation is needed if we want to rank different ORCHIDEE versions.To reduce the model performance to a single parameter, several statistics are possible.Although there is very little difference between the median and the mean, we favor the former as it gives less weight to outliers: we thus select the weighted median value of the correlation map (Ratel, 2006), the "weight" being the surface area of each model box, to avoid over-representation of high latitudes in regular grids.

Scoring per PFT
Although a single scoring value is useful for ranking purposes, one may also want to evaluate the model for each PFT independently to get a more precise diagnostic and to see for which PFTs improvements have occurred and for which PFTs modifications are still needed.This is not an obvious task as most model boxes include a mix of several PFTs.We recall that the FPAR simulations are performed for each of the 12 PFTs independently (not for Bare Soil).We have used the same high-resolution vegetation maps, that were used to derive the PFT distribution for the ERA-Interim simulations, to identify the dominant PFT at the resolution of the satellite dataset (5 km).For each model box, we identified the dominant PFT.The box was not used further when the fraction of this dominating PFT was less than 50 %.We then identified the satellite pixels that were assigned the same PFT as the dominant PFT of the box, and averaged the satellite NDVI of these pixels.The same time series and correlation analysis could then be done as described above in Sects.3.2, 3.3 and 3.4, but using the dominant PFT of the box and averaging only the satellite pixels corresponding to this PFT.This procedure results in a scoring for each of the 12 PFTs.

Correlations at the model box level
An example of the methodology is shown at the model box level in Fig. 1.The plots are generated for a model box located in Botswana (see inset).The top figure shows the observed NDVI and the modeled FPAR for the ERA-I-based simulation with the new phenology version, over the nineyear period.Both time series show a clear annual cycle with vegetation growth towards the end of the year, and senescence around April (see the mean annual cycle).Although the agreement is not striking, there is some correspondence between the two time series.The correlation coefficient, (r), is 0.59.The bottom plot shows the same time series, but after the mean annual cycle has been subtracted.Again, the correspondence is still only moderate, but there is nevertheless a decent correlation, indicating that the model is able to reproduce some of the interannual signal observed by the satellite.

Correlation map
The corresponding map of correlation coefficients is shown in Fig. 2.This particular figure was obtained based on the simulation forced by the ERA-Interim meteorology and the new phenology version of ORCHIDEE.The highest correlations (>0.8) are found over high temperate latitudes, especially Europe and large parts of North America.There are also high correlations over tropical Raingreen Africa and Brazil.Approximately 10 % of the boxes do not exhibit a significant seasonal cycle, neither in the measurements nor in the model results.They are represented in grey in Fig. 2.These boxes are mostly located either over desert or in the tropical evergreen forest areas.Still, low correlations (close to zero) are found over equatorial forests (Amazonia, Central Africa, Indonesia), indicating a real model deficiency in these regions.A closer analysis of these latter regions shows that the satellite NDVI time series exhibits a significant annual cycle, while the modeled FPAR shows either no cycle or a phase-shifted one.Indeed, in Amazonia, a marked annual cycle has been observed over evergreen forests using MODIS Enhanced Vegetation Index (EVI; Huete et al., 2006;Moreau et al., 2010) or MODIS LAI (Myneni et al., 2007).LAI and EVI are preferred in these studies to FPAR and NDVI as they do not saturate over dense vegetation and then have larger amplitude (Huete et al., 2002).Hence, the extension of the grey zone could have been smaller if we have used EVI instead of NDVI.The failure of the ORCHIDEE vegetation model over these regions needs to be further investigated.As an example, the BIOME-BGC model has been shown to underestimate rooting depth in the Amazon, an important parameter because it controls water stress during the dry season, when plants have access to moisture only in deeper soil (Ichii et al., 2007).A similar deficiency in the ORCHIDEE model was recently shown when assimilating FLUXNET data to optimize the model parameters (Verbeeck et al., 2011).Furthermore, seasonal variations in leaf phenology for evergreen tropical forests are currently being introduced in ORCHIDEE (de Weirdt et al., 2010).

Application to the comparison of different simulations
We now discuss the application of the general methodology described in Sect. 3 to two sets of model versions.The first one focuses on a structural modification of the model phenology, while the second concerns the input meteorological fields.

Evaluation of the phenology modeling
Figure 3 shows the surface-weighted normalized histograms of two monthly correlation maps, in blue for the simulation with the current phenology and in red for the simulation with the new phenology modules.Both simulations are driven with the same ERA-Interim meteorology.A histogram shifted to the right is desirable.The comparison of the two histograms clearly indicates that the new modeling of the phenology performs better; the surface-weighted median value of the monthly correlations is 0.67 for the new phenology against 0.57 for the current one.
To better interpret the comparison, it is necessary to additionally identify the regions where the improvement is the most significant.Figure 4 shows the difference between the new phenology correlation map and the current version correlation map.Most of the mid and high latitude regions of the Northern Hemisphere are improved with the new phenology version.On the other hand, a large area in southern China is significantly degraded.This area corresponds to double and triple-cropping cultivation, where the current crop phenology of ORCHIDEE with an ill-defined seasonality is in better agreement with the NDVI observations (Piao et al., 2010).There are consistent patterns, which indicate that the  impact of the structural model modification is not random but rather biome-dependent.
Figure 5 shows the surface-weighted median correlations for each of the 12 PFTs together with their respective surfaces and for the two phenology schemes.Only the boxes where the specific PFT is dominant (>50 % fractional cover) are considered here.Figure 5 confirms that the ability of the model to reproduce the observed annual cycles strongly depends on the PFT.As already explained above, the Tropical Broad-leaved Evergreen PFT shows a poor correlation.The best score on the other hand is found for the Temperate Broad-leaved Summer-green PFT. Figure 5 shows that the improvement for the new phenology version is essentially for the C3 Grass PFT, with a weighted median correlation value of 0.72 rather than 0.48.The C4 Crop PFT also shows an improvement, but it describes a very small surface.There is no improvement for the C3 Crop PFT, which is surprising as some of the phenology modifications were targeted at this specific PFT.
This outcome thus needs to be analyzed more in detail.The correlation histograms for the C3 Crop PFT are shown in Fig. 6.A majority (53 %) of the model boxes show an improved correlation with the observation, but a large number show a significant degradation, with correlations that actually become negative after the modeling change.In the new modeling scheme, the C3 Crops phenology is based on the dynamic of the winter wheat in Europe.Hence it clearly improves results for boxes where this type of crop is dominant (Europe, Great Plains, Manchuria), such as that shown in Fig. 7, top.The current version of ORCHIDEE (blue curve) simulates, for the C3 Crop PFT, a large seasonal cycle with large interannual variations; on the other hand, the new phenology version (red curve) shows a cycle that is well in phase with the NDVI observation (black curve).However, there are a large number of C3 crops other than wheat, including rice, soybean, cotton, barley, each with multiple cultivars and thus multiple phenologies.A single model box containing any given percentage of the C3 Crop PFT may in reality include a mix of different crops with different phenologies.Hence other boxes behave less favorably with the new phenology version, as shown in Fig. 7, bottom, for a box in south-eastern China that clearly exhibits a double cropping NDVI cycle (black curve).The FPAR simulated by the new phenology (red curve) is in phase with the first observed crop cycle but, as only one crop type is simulated, there is no modeled signal corresponding to the second observed crop cycle, thus resulting in an absence of correlation.The FPAR simulated by the current version, similar to that of a grassland, has a large seasonal cycle that is somewhat closer to the succession of multiple crops.However this is not an acceptable solution.Although the changes in the phenology lead to a general improvement, there is a clear necessity to at least enlarge the number of crop types simulated by ORCHIDEE, in order to get realistic seasonal cycles on a global scale.Fig. 7. Time series for two C3 Crops boxes located respectively in the Great Plains of North America (upper plot) and south-east of China (lower plot); NDVI is in black, the FPAR for the current phenology is in blue and the FPAR for the new phenology is in red.The plots give only the signals related to the C3 Crops fraction of the boxes.For the upper plot, the new phenology delivers a C3 Crops FPAR whose phasing with the C3 Crops NDVI has largely been improved (r = 0.85), as compared to the C3 Crops FPAR of the current phenology (r = 0.43).For the lower plot box, the C3 Crops NDVI exhibits a regular double cropping signal, the C3 Crops FPAR delivered by the new phenology is in phase with the first observed crop cycle but, as only one crop type is simulated, there is no modeled signal corresponding to the second observed crop cycle, thus resulting in an absence of correlation (r = 0.03).

Evaluation of the meteorological forcings
We now evaluate the impact of using two different sets of global meteorological inputs.The procedure is similar as to the one above and we use the ORCHIDEE version with the new phenology scheme.When looking at the modeled time series over the same time frame against those of the satellite data, the weighted median values of the correlation are rather similar.The ERA-I forcing leads to a median value of r = 0.67 while CRU-NCEP leads to r = 0.66.The two different meteorology fields thus do not lead to significant global differences in the mean seasonal cycle generated by ORCHIDEE.We then compare the anomaly time series.Reproducing the interannual variations of leaf activity is more difficult than reproducing the mean annual cycle and, as expected, the mean correlations are much lower than those obtained with the original time series.Although the modelmeasurement correspondence is poor, it is nevertheless significantly better (given the large number of grid boxes, more than 20 000) for the ERA-I forcing (r = 0.25) than for the CRU-NCEP forcing (r = 0.19).This result may indicate the better quality of the ERA-I re-analysis interannual signal compared to CRU-NCEP, and the model ability to make use of it.Figure 8 represents the scoring per PFT of the NDVI versus FPAR anomaly time series.As the simulations have two different vegetation maps, we only compare the results where the vegetation maps are coherent.Hence a box is selected in each simulation only if the vegetation fraction of the dominant PFT is higher than 0.5 and if the dominant PFT of the nearest box of the other simulation is the same, and with a vegetation fraction higher than 0.5, even though Jung et al. (2007b) did not identify different land cover maps as a key factor for different outputs.This analysis per PFT confirms that the ERA-I-based simulation reproduces the interannual variability better than the CRU-NCEP simulation for a majority of cases.This holds true for the Tropical Broadleaved Raingreen PFT, for all three Temperate PFTs, and for the two C3 PFTs.There are no common boxes for the two C4 PFTs and the Boreal Broad-leaved Summer-green PFT.At face value CRU-NCEP leads to a better median correlation for the two Evergreen PFTs.The simulations perform similarly for the Boreal Needleleaf Summer-green.
We performed a similar study per season (December, January and February representing Winter in the Northern Hemisphere for example, and the seasons in the Southern Hemisphere with a six-month shift).We found that Spring gives the best scorings for the modeled FPAR anomaly time series, which is expected as the onset is very sensitive to climate modifications, at least in the northern mid-latitudes (Schwartz et al., 2006;Richardson et al., 2010).The best scorings are for the C3 Crop PFT (r = 0.54 for ERA-I) and the Boreal Needleleaf Summer-green (r = 0.37 for CRU-NCEP).There is no significant difference between seasons, so that no specific season drives the better scoring of ERA-I for anomaly time series.

Impact of the spatial resolution
We now study how the satellite NDVI versus modeled FPAR correlation is sensitive to the spatial resolution.For this objective, we used the CRU-NCEP simulation as its regular 0.5 • grid is easier to handle than the irregular ERA-I Gaussian grid, and can be degraded through a simple averaging.Figure 9 shows the median correlations for both the time series and their interannual anomalies for a spatial resolution of 0.5 • to 10 • ; the model-observation correspondence increases as the spatial resolution gets coarser.A simple explanation is that spatial averaging decreases the non systematic errors.Such random errors are certainly present in the satellite data.They may also be present in the model results, in particular due to random errors in the meteorological forcing.The significant improvement in the correlation with coarser resolution puts our earlier result on the better performance of the ERA-I forcing compared to that of CRU-NCEP into question.Indeed, the ERA-I analysis has a resolution of 0.7 • , against 0.5 • for that of CRU-NCEP, and one may therefore question whether the comparison is fair.From the data points shown in Fig. 9, we performed a simple polynomial fit and used the fit to extrapolate the CRU-NCEP results to a hypothetical resolution of 0.7 • (vertical dashed lines in Fig. 9).The interpolated correlations are 0.67 for the time series and 0.21 for the anomalies, which is still statistically significantly lower than the 0.25 value obtained with the ERA-I forcing.
Therefore, although the spatial resolution may explain some of the better results obtained with the ERA-I meteorological fields, it cannot explain them fully, and the data of the latter are most likely of better quality, in particular when considering the interannual variations.

Discussion and conclusions
This work demonstrates a global satellite database can be used to rank several versions of the ORCHIDEE model.Evaluations of DGVMs using satellite data have been published before, but not using the spatial and temporal resolution of the present study.For instance, Krinner et al. (2005) focused on the mean LAI over a 5 yr period at coarse scale (4 • × 2.5 • ).Other studies (e.g.Kim and Wang, 2005;Twine and Kucharik, 2008) focused over North America and did not analyze all biomes.To our knowledge, there is no previously published work that compares the annual cycle and the interannual anomalies of the model outputs and the satellite observations.
Although the analysis presented above uses the correlation as a metric, we also performed similar analysis but using the "relative Root Mean Square Error" (rRMSE), calculated between the NDVI and a scaled FPAR.We reached the same conclusion as with the median correlation, that the proposed new phenology is improved over the current one (with a lower rRMSE of 17 % against 20 %).The ERA-I and CRU-NCEP forcings reached similar scores (respectively 17 % and 17 % for the time series and 132 % and 134 % for the anomaly time series).

Fig. 1 .
Fig. 1.Top plot: Monthly time series of the NDVI (black thick curve) and the FPAR (red thick curve) over the MODIS period, for a model box located in Bostwana (21.0 • S, 22.7 • E, see cyan diamond on the map).The correlation coefficient of the two time series is given on the right under the top plot (r = 0.59).Using the nine years between 2000-2008, a mean annual cycle for both the NDVI and FPAR signals is derived (middle left plot).We then subtract each year these mean annual cycles from their respective time series and get NDVI and FPAR anomaly time series (bottom plot).The correlation coefficient of the two anomaly time series is given on the right under the bottom plot (r = 0.52).

Fig. 2 .
Fig. 2. Correlation map between the NDVI and the FPAR monthly time series for the 2000-2008 period and the ERA-I simulation with the new phenology scheme.The correlation is not calculated when no annual cycle in the NDVI and FPAR is detected (grey areas).

Fig. 3 .
Fig. 3. NDVI/FPAR monthly correlations weighted histograms for the current phenology (blue curve) and for the new phenology (red curve) simulations, both forced by ERA-I fields (binsize = 0.02).The vertical dashed lines indicate the weighted median values.

Fig. 4 .
Fig. 4. Difference between the correlation maps for the new phenology and the current phenology.

Fig. 5 .
Fig. 5.Surface-weighted median values per PFT, for the NDVI/FPAR time series correlations.The blue bars are for the current phenology and the red bars for the new phenology.The black thick line indicates the total surface of the boxes used to assess the PFT median correlation.

Fig. 6 .
Fig. 6.Weighted normalized histograms of the correlations between the partial NDVI and partial FPAR for the PFT C3 Crops, in blue for the current phenology and in red for the new phenology (binsize = 0.02).

Fig. 8 .
Fig. 8. Surface-weighted median values per PFT, for the anomaly NDVI/FPAR time series correlations.The green bars are for CRU-NCEP meteorological forcing and the red bars for ERA-Interim meteorological forcing.The black thick line indicates the total surface of the boxes used to assess the PFT median correlation.

Fig. 9 .
Fig. 9. Surface-weighted median values for the NDVI/FPAR time series (thick black diamonds) and anomaly time series correlations (thick red diamonds), as a function of the spatial resolution.The black and red curves are the results of degree 5 polynomial fit.The dashed lines indicate the extrapolation to the theoretical resolution of 0.7 • using the polynomial fit.

Table 1 .
Vegetation types and their abbreviations as used in ORCHIDEE.