Articles | Volume 14, issue 9
Geosci. Model Dev., 14, 5891–5913, 2021
Geosci. Model Dev., 14, 5891–5913, 2021

Methods for assessment of models 29 Sep 2021

Methods for assessment of models | 29 Sep 2021

Using the International Tree-Ring Data Bank (ITRDB) records as century-long benchmarks for global land-surface models

Using the International Tree-Ring Data Bank (ITRDB) records as century-long benchmarks for global land-surface models
Jina Jeong1, Jonathan Barichivich2,3, Philippe Peylin2, Vanessa Haverd4,, Matthew Joseph McGrath2, Nicolas Vuichard2, Michael Neil Evans5, Flurin Babst6,7,8, and Sebastiaan Luyssaert1 Jina Jeong et al.
  • 1Department of Ecological Sciences, VU University, 1081HV Amsterdam, the Netherlands
  • 2Laboratoire des Sciences du Climat et de l’Environnement, IPSL, CNRS/CEA/UVSQ, 91191 Gif-sur-Yvette, France
  • 3Instituto de Geografía, Pontificia Universidad Católica de Valparaíso, Valparaíso, Brasil 2950, Chile
  • 4CSIRO Oceans and Atmosphere, Canberra, ACT 2601, Australia
  • 5Department of Geology & ESSIC, University of Maryland, MD 20742-4211, USA
  • 6Dendro Sciences Group, Swiss Federal Research Institute WSL, 8903 Birmensdorf, Switzerland
  • 7School of Natural Resources and the Environment, University of Arizona, Tucson, AZ 85721, USA
  • 8Laboratory of Tree-Ring Research, University of Arizona, Tucson, AZ 85721, USA
  • deceased, 29 January 2021

Correspondence: Jina Jeong (


The search for a long-term benchmark for land-surface models (LSMs) has brought tree-ring data to the attention of the land-surface modelling community, as tree-ring data have recorded growth well before human-induced environmental changes became important. We propose and evaluate an improved conceptual framework of when and how tree-ring data may, despite their sampling biases, be used as century-long hindcasting targets for evaluating LSMs. Four complementary benchmarks – size-related diameter growth, diameter increment of mature trees, diameter increment of young trees, and the response of tree growth to extreme events – were simulated using the ORCHIDEE version r5698 LSM and were verified against observations from 11 sites in the independent, unbiased European biomass network datasets. The potential for big-tree selection bias in the International Tree-Ring Data Bank (ITRDB) was investigated by subsampling the 11 sites from European biomass network. We find that in about 95 % of the test cases, using ITRDB data would result in the same conclusions as using the European biomass network when the LSM is benchmarked against the annual radial growth during extreme climate years. The ITRDB data can be used with 70 % confidence when benchmarked against the annual radial growth of mature trees or the size-related trend in annual radial growth. Care should be taken when using the ITRDB data to benchmark the annual radial growth of young trees, as only 50 % of the test cases were consistent with the results from the European biomass network. The proposed maximum tree diameter and annual growth increment benchmarks may enable the use of ITRDB data for large-scale validation of the LSM-simulated response of forest ecosystems to the transition from pre-industrial to present-day environmental conditions over the past century. The results also suggest ways in which tree-ring width observations may be collected and/or reprocessed to provide long-term validation tests for land-surface models.

1 Introduction

Earth system models integrate numerical sub-models of atmospheric circulation, ocean dynamics and biogeochemistry, sea ice dynamics, and biophysical and biogeochemical processes at the land surface. Climate projections made by Earth system models have been a cornerstone of the all Assessment Reports of the Intergovernmental Panel on Climate Change (IPCC2013) and, as such, have made a tremendous impact on global environmental policy (Paris Agreements2015). The credibility of projections of the future climate from any Earth system model in part relies on the ability of each of its above-mentioned four sub-models to accurately reproduce the past (McGuffie and Henderson‐Sellers2005). Although long-term changes that date back to pre-industrial conditions (Luo et al.2012) have been documented for vegetation distribution through pollen-based reconstructions (Cao et al.2019), land-surface models (LSMs) currently lack a long-term benchmark for forest ecosystem functioning. The absence of long-term benchmarks is thought to contribute substantially to uncertainties in simulated future global carbon stocks in soil and vegetation (Friedlingstein et al.2006, 2014) and, as such, to climate projections (Fig. S1a).

Tree-ring records provide annual information on historical tree growth and physiology in relation to environmental conditions, including during the time before human activities started to affect the atmospheric carbon dioxide (CO2) concentration (Fritts2012; Hemming et al.2001). Even though trees grown in the absence of a clear annual rhythm of vegetative and dormant seasons may not develop distinct tree rings, as observed for many species from the humid tropics, tree-ring records have been proposed as a large-scale and long-term benchmark for the land-surface component of Earth system models (Fig. S1b; See Sect. 5.3 for more details) (Babst et al.2014a, b, 2017, 2018; Zuidema et al.2018).

Until now, tree-ring records have often been collected to reconstruct past climate and hydrological variability from sites where trees grow near the colder or drier fringes of their distribution (Briffa et al.2004; D'Arrigo et al.2008). The most comprehensive archive of publicly shared tree-ring data is the International Tree-ring Data Bank (ITRDB), with more than 4000 locations from 226 species across most forested biomes (Fig. S2) (Grissino-Mayer and Fritts1997; Zhao et al.2019). However, a shortage of site metadata and the prevailing geographical, species, and tree selection sampling biases resulting from targeting climate-sensitive trees has limited the use of the ITRDB archive to infer long-term changes in forest growth (Bowman et al.2013; Briffa and Melvin2011; Klesse et al.2018; Zhao et al.2019). Compared with tree-ring records that were collected for the purpose of benchmarking LSMs, such as the European tree-ring network of biomass plots (hereafter called “European biomass network”; Klesse et al.2018) that is available through the database of the BACI project (BACI2020), the aforementioned issues may limit the information content of the ITRDB records. This incomplete information content should, however, be balanced against the associated benefits in terms of time gain and resource savings when reusing the large ITRDB dataset.

If tree rings are to be used as benchmarks for LSMs, the models must demonstrate skilful simulations of tree-ring width (TRW). Over the past few decades, the major physiological and ecological processes that are responsible for annual tree-ring growth have become sufficiently well-understood to be formalized in mathematical models with different levels of detail. The first TRW models (Wilson and Howard1968) described processes at the cell level: cell division, cell enlargement, and cell wall thickening. Later, the carbon and water balance of trees was added (Fritts et al.1999) as well as a parameterization of the influences of climate on cambial activity (Vaganov et al.2006). These models were capable of reproducing short-term radial growth at the tree level. Further developments introduced a notion of turgor and hormone regulation for cell growth (Drew et al.2010; Hölttä et al.2006; Leuzinger et al.2013; De Schepper and Steppe2010; Steppe et al.2006).

At the same time, the spatial scale of models simulating wood formation based on cell dynamics was extended to the stand level by simplifying the representation of processes. In these models, photosynthate availability, air temperature, and soil water content were used to constrain wood cell growth and successfully reproduced observations (Deleuze and Houllier1998; Hayat et al.2017; Wilkinson et al.2015). Further simplifications were proposed by simulating the radial growth of trees based solely on carbon allocation (Deleuze et al.2004; Merganičová et al.2019) rather than cell dynamics, with the latter being computationally too expensive for large-scale vegetation models (Li et al.2014; Misson2004; Sato et al.2007). Hence, a variety of approaches are now available to describe TRW growth in forest models, dynamic vegetation models, and LSMs, but to the best of our knowledge, there is currently no land-surface component of any Earth system model with such capability.

This study articulates an improved conceptual framework for benchmarking simulated radial growth against ITRDB tree-ring data, addressing limitations in the models, the data, and the methods to compare models and data. The aims are to (1) use current understanding of tree-ring growth to derive the minimal requirements for benchmarking LSMs against tree-ring records archived in the ITRDB, (2) review potential issues of using the ITRDB to benchmark LSMs, (3) propose solutions for a meaningful comparison of LSMs against ITRDB records, and (4) verify the proposed solutions by benchmarking a LSM using data from a European biomass network (BACI2020) that is not prone to sampling biases related to dendroclimatic research. The organization of this paper follows these aims, and dependencies between these aims and the workflow of this study are detailed in Fig. 1.

Figure 1Workflow of this study and the dependencies between the different sections of the paper. Each colour represents a different aim of the study. The arrow shows that the outcomes of the first three aims have to be combined to verify the proposed benchmarks. In this study, “virtual tree” refers to a tree that does not exist in the data but is constructed from the data in varying ways. Section 2.3 provides detailed descriptions of the different virtual trees used.


Figure 2Main drivers of the linear aggregate conceptual model of tree-ring growth (Cook and Kairiukstis1990) and the equivalent processes in land-surface models. The dotted lines connect related components. Note that both the aggregation and the land-surface model come with errors, uncertainties, and unaccounted for processes that are not explicitly modelled.


2 Background: model requirements, data limitations, and benchmarks

2.1 Minimal requirements for land-surface models to mechanistically simulate TRW

The conceptual linear aggregate model of tree growth (Cook and Kairiukstis1990) considers that the observed TRW at year t (in mm) consists of five additive growth contributions (Fig. 2, left column) and, as such, provides a framework for simulating tree-ring widths with (semi-)mechanistic model approaches (Fig. 2):

  • i.

    Size-dependent growth is the dominant signal in raw tree-ring measurements (Cook et al.1995). Conceptually, an almost constant volume of wood due to more or less constant primary production (Hirata et al.2007) being added to the trunks year after year (Nash2011). The annual diameter increment of the trees decreases as the trunk grows wider because a given wood volume has to be distributed over an increasing surface area as both the circumference and height of the stem are increasing. In reality, however, self-thinning tends to reduce stand density and competition for resources. Thus, the trees left can increase their crown volume and their primary production (Oliver and Larson1996) which largely compensates for the size-dependent decrease in TRW and contributes to the observed almost constant TRW of tall trees. Several of the common allocation schemes used in LSMs account for size-dependent growth and stand self-thinning (Franklin et al.2012; Wolf et al.2011).

  • ii.

    Climate-dependent growth reflects the sensitivity of tree growth to radiation, temperature, phenology, and water availability (Fritts2012) and is accounted for in LSMs, as it represents the core purpose of this type of model. LSMs often rely on the Farquhar model for the radiation and temperature dependency of photosynthesis (Farquhar1989), and on the McCree–de Wit–Penning de Vries–Thornley approach for the temperature dependence of respiration (Amthor2000). They account for a decoupling of photosynthesis and growth by the use of a labile carbon pool (Friend et al.2019; Naudts et al.2015; Zaehle and Friend2010). Plant water availability is represented through either simple transfer functions or, more recently, by accounting for the hydraulic architecture of the simulated trees (Bonan et al.2014; Naudts et al.2015).

  • iii.

    Endogenous disturbances refer to within-stand resource competition and are being increasingly simulated in LSMs, albeit often by empirical approaches (Haverd et al.2013; Moorcroft et al.2001; Naudts et al.2015). From a benchmarking point of view, simulating individuals of different size or cohorts within a single forest is essential to reproduce the sampling biases present in the ITRDB (see Sect. 2.2 and 2.3).

  • iv.

    Chronic exogenous disturbances such as increasing atmospheric CO2 concentration (LaMarche et al.1984) and nitrogen (N) deposition (Magnani et al.2007) are well-developed, as they are among the main purposes of using LSMs. The effect of CO2 fertilization on photosynthesis is accounted for in the photosynthetic sub-model, whereas nitrogen dynamics are accounted for through static or dynamic stoichiometric approaches (Vuichard et al.2019; Zaehle and Friend2010). Although abrupt disturbances such as fires, pests, and storms are increasingly being simulated by LSMs (Chen et al.2018; Yue et al.2014) and leave marks in TRW (Bowman et al.2009; Bräuning et al.2016), such as fire scars and missing rings, they are of limited use for benchmarking against TRW data. The timing of such events largely depends on the simulated diagnostics, for example, fuel wood build-up, insect population dynamics, and soil moisture, which could strongly deviate from the observed timing in decadal to century-long simulation periods. Thus, simulated stand demographics should be the basis for benchmarking against observations rather than secular changes such as infrequent disturbances described above.

  • v.

    The final term in the aggregate tree-growth model constitutes all processes and interactions between processes not previously accounted for in the LSM, and will make up the model error.

This aggregate tree-growth model provides the conceptual basis for tree-ring standardization and climate signal extraction methods used in dendrochronology (Briffa and Melvin2011; Cook and Kairiukstis1990). These methods rely on the assumptions that the sampled trees capture the common growth variability of the stand (e.g growth responses to climate variability and resource competition) and that the contribution of each major driver can be statistically identified as either signal or noise. Alternative approaches based on Liebig's law of the minimum (Stine2019) have been proposed to attribute TRW to its major drivers. In practice, observed TRW records cannot always be fully decomposed in the absence of metadata because several drivers might not leave a unique fingerprint in growth. However, size-dependent growth and climate sensitivity have been observed to comprise the primary contributions of variance in the linear aggregate model (Hughes et al.2011).

In addition to accurate process representation, LSMs will need to be driven by historical climate, atmospheric CO2 concentrations, and N deposition. In general, commonly used century-long climate reanalyses such as NCEP (Kalnay et al.1996), 20CR (Compo et al.2011), and CERA-20C (Laloyaux et al.2018) are based on the assimilation of instrumental observations in climate simulations and are, thus, independent of climate estimates derived from tree rings or other proxy data. Nevertheless, random and systematic errors in the reanalyses increase as data availability decreases, particularly in remote areas with a low density and temporal depth of meteorological stations. Given that local climate effects may have contributed to the TRW, it might be desirable to correct the bias in reanalysis with present-day site-specific climate observations where they exist (Ols et al.2018). When LSMs are forced by actual climate observations, reproducing the observed climate sensitivity in tree rings would add credibility to the land-surface simulation – if the forcing data and the LSM and TRW models are all realistic and unbiased.

Given the above, LSMs that intend to use TRWs as a benchmark should at the minimum simulate the following: (1) size-dependent growth, (2) dynamic plant phenology, (3) differently sized trees within a stand, and (4) responses to chronic exogenous environmental changes (Fig. 2). Whereas responses to chronic exogenous environmental changes are the reason LSMs exist and are therefore to some extent accounted for by all current LSMs, size-dependent growth and size differentiation within a stand are, at present, only accounted for in a few LSMs, for example, CLM (ED) (Fisher et al.2015), ORCHIDEE (Naudts et al.2015), and LPJ-GUESS (Smith2001). The ORCHIDEE model (revision 5698) meets the aforementioned minimum requirements and, therefore, will be used in this study.

2.2 Challenges of using ITRDB data as a long-term benchmark

A typical record in the ITRDB consists of TRW measurements of incremental cores from tens of individual trees from the same site and species. Each record may have different starting and ending dates and, thus, a different length (Fig. 3). If a core reaches the pith of the trunk, the annual tree diameter can be reconstructed (Bakker2005); however, even then, diameter reconstruction may come with some uncertainty because trunks are not perfectly round. If the core does not contain the pith, which is often the case for large trees, the lack of information about the rings near the pith adds uncertainty to the diameter and age reconstruction (Briffa and Melvin2011). In this case, the diameter increment could still be reconstructed (by subtracting the measured TRW from the diameter of the tree) if the trunk diameter at the time of sampling is known, but these metadata are rarely recorded in dendroclimatic collections and are not stored in the ITRDB.

Figure 3Example of a typical data record in the ITRDB dataset. Each dataset in ITRDB consists of incremental cores from multiple trees (tens to several hundred, depending on the dataset) with varying ages and growth rates. In this figure, observations are from a pine forest in Germany, archived as germ214 (Neuwirth et al.2007) (Table S3). The diameter of individual trees was reconstructed by summing the annual tree ring from the dataset. Note the presence of both fast-growing relatively young trees (dark grey lines) and slow-growing older trees (light grey lines).


The predominant sampling design in the ITRDB targets the trees that are presumed to be the oldest; these trees should give the longest time series and are, therefore, the most useful to reconstruct the climate variability prior to instrumental records. Thus, the ITRDB is likely to over-represent large trees (Brienen et al.2012; Nehrbass-Ahles et al.2014) relative to the population demographics at the time of sampling. This big-tree selection bias makes the ITRDB unsuitable to upscale the growth of individual trees to larger spatial domains, i.e. stand, forest, or the region (Babst et al.2014a; Nehrbass-Ahles et al.2014), but does not affect the value of the ITRDB archive for documenting individual tree growth as long as tree size and dominance effects are explicitly considered.

Tree-ring datasets often contain cores of individuals from different cohorts. The presence of slow- and fast-growing trees within the same cohort (illustrated by the grey lines in Fig. 3) is another source of bias (Melvin2004; Zuidema et al.2011). Slow-growing trees tend to live longer than fast-growing trees in the same cohort (Mencuccini et al.2005; Schulman1954). Owing to survivorship being biased towards slow-growing individuals (Bowman et al.2013), TRW records are likely to underestimate the mean tree growth of a stand in long-passed centuries, as fast-growing trees would have died off before the samples were taken (Brienen et al.2012).

2.3 Solutions for the challenges and virtual trees

Despite its known biases, poorly described sampling protocols, or protocols that were not rigorously enforced, information contained in TRW records of the ITRDB can still be used for benchmarking LSMs.

Without additional data, data–model comparison cannot correct for the big-tree selection bias in the ITRDB; however, this bias can be accommodated through models that simulate multiple tree diameter classes by comparing the largest simulated diameter class with the observed ITRDB tree-ring records (illustrated by the respective bold blue line and black dotted line in Fig. 4a)

Figure 4Using virtual trees to account for challenges related to the use of ITRDB datasets when evaluating land surface. (a) Data–model comparison may overcome the big-tree selection bias by comparing only the simulated largest diameter class (bold blue line) for evaluation rather than all diameter classes (thin blue lines), with the compiled average virtual tree (black dotted line). Grey lines represent individual trees from observations. (b) The observed tree-ring records are a mixture of relatively slow-growing trees (light grey lines) and fast-growing trees (dark grey lines). Fast-growing trees do not attain the same age as slow-growing trees because they tend to die earlier. Using all trees without further consideration in the calculation of the average virtual tree (black dotted line) would lead to an underestimation of tree growth at the time of stand establishment, resulting in a flawed comparison with the simulated tree growth (blue lines). (c) However, aligning observations by the age of individual trees before compiling the largest virtual tree (black dotted line) results in a very different virtual tree compared with Fig. 4b. The largest virtual tree better reconstructs tree growth during stand establishment, thereby facilitating data–model comparison. Note the change in the x axis label between panels (b) and (c). Observations taken from a pine forest in Germany, archived as germ214 (Neuwirth et al.2007) (Table S3). Simulation were run with ORCHIDEE r5698.


Likewise, data–model comparison cannot correct biases from slow-growing trees' survivorship, but we propose to enhance the consistency between modelled and observed TRWs by making use of site-specific virtual trees. Virtual trees are created from observations by combining data from different individuals to obtain a time series of TRW with the desired property (details on the desired properties are given below) that reflects stand-level characteristics. By definition, virtual trees are not observed as data sequences in the ITRDB observations but are rather extracted from site-level data. As tree-ring observations for a single site consist of samples from multiple individual trees (individual trees are shown by grey lines in Fig. 4), constructing a single virtual tree for a given site facilitates data–model comparisons. Because virtual trees are dependent on the chosen desired property, multiple data–model comparisons are possible for each site. In Sect. 2.4, we propose four different benchmarks based on the ITRDB data, which make use of three virtual-tree target properties:

  • Tree-age-aligned average virtual tree – the average virtual tree of a stand aligned by tree age is calculated as the time series for the average ring width after aligning the age of the individual trees (Fig. 4a). Age-aligned TRWs are widely used to calculate a statistic known as the mean regional curve of the sampled stand (Briffa and Melvin2011). This assumes that size-related growth is the dominant driver of tree growth (see Sect. 2.1 and 2.4, point i).

  • Calendar-year-aligned average virtual tree – the average virtual tree of a stand aligned by calendar year is calculated by ordering individual tree-ring series by calendar year (Fig. 4b), and the average observed diameter is calculated for each year. Thus, alignment by calendar year reflects the real temporal evolution of the stand. This virtual tree can be used to better cope with a difference in simulated and observed forest structure by compiling a representative and comparable tree with the simulated tree (see Sect. 2.4, point ii).

  • Tree-age-aligned largest virtual tree – the largest virtual tree of a stand is calculated after aligning individual trees by their age (Fig. 4c). The recommendation to remove the age trend from tree-ring records (Cook et al.1995) confirms the assumption underlying the alignment by age, i.e. that size-dependent age exceeds the growth trends due to long-term environmental changes. Subsequently, the age-aligned TRWs can be used to compile a virtual fast-growing tree that has the maximum observed diameter of all trees for a given tree age (note the difference in the weight of the dark grey lines from virtual trees in Fig. 4b and c). Thus, the virtual fast-growing tree gives a better idea of the true mean tree growth in old stands. (see Sect. 2.4, point iii.)

The proposed data–model comparison largely relies on the concept of virtual trees, as these virtual trees can better account for known sampling biases of the ITRDB datasets and different aspects of TRW, and they facilitate the comparison of simulations and observations at the stand level. The proposed definitions and uses of virtual trees, which were partly customized to ORCHIDEE r5698, are evaluated in Sect. 4. Except for LSMs with an individual tree-based stand definition (Sato et al.2007), benchmarking other models against ITRDB data will also have to consider the use of virtual trees and may have to adjust the proposed definitions to the peculiarities of the LSM under evaluation.

Figure 5Example of the major steps for calculating the metrics of the benchmark for the size-related trend in diameter increment. The size-related trend in diameter increment can be assessed by calculating a time series for the average ring width after aligning the age of the individual trees (a, b). Observations are shown as grey lines, and the simulation is shown as blue lines. The largest diameter class from the simulation is presented by the bold line. The black dotted lines represent the virtual tree based on the observations. The TRWs of this virtual tree are then subtracted from the simulated TRWs of the largest diameter class (c). Subsequently, a linear regression is used to quantify the temporal trend in the residuals (d). The green line denotes the model residuals, and the green dotted line is the linear regression of the model residuals. Furthermore, the root mean square error (RMSE) between the simulations and observations is calculated (b; RMSE between blue line and black dotted line) and normalized by the length of the time series to calculate the difference in observed and simulated growth trends. Observations and the simulation are from a Pinus sylvestris site in Finland, archived as finl052 (Meriläinen et al.2004). For this example, the calculated RMSE (b) is 0.39 mm, and the slope of residuals (d) is 0.002 mm yr−1.


Figure 6Example of the major steps in calculating the metrics of the benchmark for the diameter increment in mature trees. Individual tree records are ordered by calendar year, and the average observed diameter is calculated for each year (a). Observations are shown as grey lines, and the simulation is shown as blue lines. The largest diameter class from the simulation is presented by the bold blue line. Black dotted lines represent the yearly average of observations. Note that the x axis in Fig. 6 is different from Fig. 5. Under the assumption that the observed trees are the biggest trees from a given site, the virtual tree is compared with the largest diameter class from the model (b, c). Given that both the fast- and slow-growing trees are still alive and could have been sampled for the most recent decades, only the recent decades (10 decades, in this example) of the virtual-tree growth are compared to the simulations (d). The RMSE (d; black arrows) and trend of the residuals between the virtual tree and the largest diameter class simulated are calculated (e, f). From the whole period shown in panels (a) to (d) (1828–1976), the first 50 years were excluded, and panels (e) and  (f) zoom in on the last 10 decades. The green line denotes the residuals, and the green dotted line is the linear regression of the model residuals. Observations and simulation are from a Pinus sylvestris site in Scotland, archived as brit021 (Schweingruber1995). In this case, the RMSE (d) and the slope of residuals (f) were calculated as 33.65 mm and 0.68 mm yr−1 respectively.


Figure 7Example of the major steps in calculating the metrics of the benchmark for diameter increment in young trees. After aligning the TRW records of the individual trees by their age, a virtual tree is constructed by taking the maximum observed diameter of all trees for each year (a). Observations are shown as grey lines, and the simulation is shown as blue lines. The largest diameter class from the simulation is presented by the bold line. Black dotted lines represent the yearly maximum of the observations. The growth of the virtual tree is then compared to the simulated growth of the largest diameter class (b) by calculating the RMSE (c) and trend of the residuals (e, f). The x axes of panels (e) and  (f) zoom in on the selected period, the green line denotes the model residuals, and the green dotted line is the linear regression of the model residuals. These calculations are limited to the first decades of the time series (d) to compensate for the bias caused by the fact that the old fast-growing trees died well before sampling took place. By using different approaches to evaluate the growth of young (this benchmark) and mature trees (the previous benchmark), the comparison accounts for the observation that the drivers of ring growth change when the trees grow taller (Cook, 1985). Observations and simulation are from Pinus sylvestris site in Scotland, archived as brit021 (Schweingruber1995). The calculated RMSE (d) was 21.86 mm, and the slope of residuals (f) was 0.88 mm yr−1 for this example.


Figure 8Example of the major steps in calculating the metrics of the benchmark for extreme-growth events. In this benchmark, extreme growth is defined as the first and last quartiles in TRW ordered by calendar year and averaged over the individual trees' records (a). The red shaded area and ticks represent observations exceeding the 75th percentile, and the blue shaded area and ticks represent observations below the 25th percentile (a). The TRW simulated for the largest diameter class are then extracted for the years identified in panels (a) and (b). Both observations and simulations were normalized to remove the difference in the range of values between configurations. These normalized values correspond to the x and y axis in panels (c) and (d) for the observations and simulations respectively. Subsequently, the similarity between simulations and observations was tested by calculating the distance from the 1 : 1 line (shown in green in panel d), which is equivalent to the RMSE for years with extreme growth (d). An additional metric is calculated in a similar way but by using both the 25th percentile and the 75th percentile of extreme values of the simulations and observations regardless of the year (e, f). This test identifies if the simulation can reproduce the amplitude of TRW. The observations and simulations were not normalized to assess the absolute amplitude. Possible uncertainties from using reconstructed climate forcing were avoided by limiting the calculations of both metrics to the past 5 decades for which climate observations are available. Observations and simulation are from Pinus sylvestris site in Spain, archived as spai006 (Schweingruber2020). In this test case, the RMSE for extreme years (d) was 0.57 mm, and the RMSE for extreme growth (amplitude; f) was 0.03 (scaled).


2.4 Benchmarks for comparing observed and simulated tree-ring widths

If a LSM explicitly accounts for the main contributors to TRW, i.e. size effects and climate sensitivity (Hughes et al.2011), meaningful benchmarking against specific aspects of the observations becomes feasible in spite of the aforementioned biases in the ITRDB. Our technical framework considers four complementary aspects of the observations: (i) the size-related growth trend in tree rings, (ii) the diameter increment of mature trees, (iii) the diameter increment of young trees, and (iv) extreme-growth events. Each of these aspects formed the basis of a benchmark:

  • i.

    Size-related diameter growth – the size-related growth trend in the diameter increment can be assessed by calculating the average virtual tree for a stand aligned by tree age (examples shown in Figs. 4a and 5a and b) and subtracting this virtual tree from the simulated TRWs of the largest diameter class (Fig. 5c). Subsequently, linear regression is used to quantify the temporal trend in the residuals (examples shown in Fig. 5d). If the simulations and observations have similar size-related trends, the temporal trend in the residuals will be close to zero. Furthermore, the root mean square error (RMSE) between the simulations and observations is calculated and normalized by the length of the time series used to calculate the difference in observed and simulated growth trends. A skilled model is expected to simultaneously show no trend in the residuals and a low RMSE.

  • ii.

    Diameter increment of mature trees – in LSMs that account for within-stand competition, larger trees will consistently grow faster than smaller trees due to the way that competition is formalized (Bellassen et al.2010; Haverd et al.2013). In reality, growing conditions can suddenly become favourable for trees that have previously been suppressed, resulting in fluctuating growth rates (see dark grey lines in Fig. 4b). This discrepancy between simulated and observed competition can be accounted for in the benchmark by using the observations to compile a virtual tree of the stand aligned by calendar year, taking the average tree diameter of all samples to construct the virtual tree (Figs. 4b and 6a and b). Following the big-tree selection bias (Sect. 2.2), it can be assumed that the observed trees are representative of the biggest trees from a given site. Hence, the virtual tree can be compared with the largest diameter class from the model. The survivorship bias of slow-growing individuals has the strongest impact when assessing TRW in century-old trees (Sect. 2.2). When analysing recent decades, both the fast- and slow-growing trees are still alive and could have been sampled; therefore, the first 5 decades were excluded for better comparison (Fig. 6c, d). The 50-year threshold in this study is somewhat arbitrary but reflects the observation that the fastest changes in tree growth occur in the first few decades in most of the selected time series. When benchmarking against other TRW data, this threshold could be adjusted to better fit the observed growth dynamics for other tree species and/or other regions. The RMSE and trend of the residuals between the virtual tree and the largest diameter class simulated are calculated (Fig. 6d, e, f). A skilled model is expected to simultaneously show no trend in the residuals and a low RMSE.

  • iii.

    Diameter increment of young trees – the diameter increment of young trees can be assessed by calculating the largest virtual tree of the stand. The maximum age of a virtual tree equals the shortest observed individual TRW record for the stand, as it represents the age intersection between the TRW records for all individuals in the stand. The largest virtual tree is clearly biased towards higher observed diameters, compensating for the loss of observed high diameters in field sampling due to the fact that the old fast-growing trees died well before sampling took place (Figs. 4c and 7a and b). The first 3 decades of growth of the virtual tree are then compared to the simulated growth of the largest diameter class (Fig. 7c, d) by calculating the RMSE and trend of the residuals (Fig. 7d, e, f). As for the previous benchmark, the threshold is somewhat arbitrary and was set to focus the analysis on the first decades in which diameter growth is generally faster compared with later decades. Note that the thresholds for young and mature trees are separated by 20 years during which the observations are not considered in either benchmark. When needed, these thresholds could be adjusted to better match local tree growth, but it is recommended to keep the separation between the thresholds to account for the transition in diameter growth from young to mature trees. A skilled model is expected to simultaneously show no trend in the residuals and a low RMSE. By using different approaches to evaluate the growth of young (this benchmark) and mature trees (the previous benchmark), the comparison accounts for the observation that the drivers of ring growth change as the trees grow taller (Cook1985).

  • iv.

    Extreme-growth events – for this benchmark, extreme growth is defined coarsely as the first and last quartiles in TRW ordered by calendar year. For the purpose of estimating extremes, we seek the average virtual trees and LSM simulations to most accurately and precisely characterize the interannual variability. In the present study, use of the 1951–2000 interval produces LSM simulations forced by climate and nutrient loading estimates best constrained by dense, recent climate observations (see Sect. 3), and therefore minimizing the contribution of forcing uncertainty to LSM simulation uncertainty; in the TRW observations, it also leverages non-juvenile tree-ring series of more than 50 years which do not require detrending and are expected to most accurately reflect the climate sensitivity. Subsequently, for each site, individual tree records are averaged to obtain a single time series (Fig. 8a). Model skill for estimating the distribution of growth arising from climate variability is evaluated by comparing the observed and simulated 25th and 75th percentiles of TRWs for the largest diameter class, which is the diameter class showing the strongest climate sensitivity (Fig. 8e, f). Additionally, model skill for reproducing the timing of individual extreme-growth events is evaluated by comparing simulated to observed virtual standardized TRW for the exact years during which extreme growth was observed (Fig. 7a, b, c, d; Rammig et al.2015). For both the amplitude and timing of growth extremes, the similarity between simulations and observations was calculated as the RMSE of the distance from the 1 : 1 line (Fig. 8c, d, e, f). A skilled model is expected to simultaneously show low RMSE for both the amplitude and timing of extreme years.

Figure 9Schematic representation of the verification process for the RMSE metric. Before the verification, two types of datasets were prepared: big-tree data (limited to the 15 % biggest trees) and all-tree data. In this example, the simulated TRW was multiplied with a verified modifier such that it minimized the RMSE between the simulated and observed TRW for the 15 % biggest trees (see Sect. 3.2 for details). The same multiplier was then applied to the all-tree data, and the RMSE was calculated. Finally, the decrease or increase in RMSE with the multiplier was compared to the RMSE obtained without the multiplier. The other two modifiers, which are detailed in Sect. 3.3, follow a similar approach.


3 Materials and methods

3.1 The land-surface model ORCHIDEE

ORCHIDEE (Ducoudré et al.1993; Krinner et al.2005) is the land-surface model of the IPSL (Institute Pierre Simon Laplace) Earth system model (Dufrêne et al.2005). Hence, by design, it can be coupled to an atmospheric general global circulation model or become a component in a fully coupled Earth system model. In a coupled set-up, the atmospheric conditions affect the land surface, and the land surface, in turn, affects the atmospheric conditions. However, when a study focuses on changes in the land surface rather than on the interactions with climate, it can also be run as a stand-alone land-surface model. In both configurations the model receives atmospheric conditions, such as precipitation, air temperature, air humidity, winds, incoming solar radiation, and CO2, as input; this combination of different input is known as the climate forcing. Both configurations can cover any area ranging from global to regional domains and even down to a single grid point for the stand-alone case.

Although ORCHIDEE does not enforce a spatial or temporal resolution, the model does use a predefined spatial grid and equidistant time steps. The spatial resolution is an implicit user setting that is determined by the resolution of the climate forcing. Although the temporal resolution is not fixed, the processes were formalized at given time steps: half-hourly (i.e. photosynthesis and energy budget), daily (i.e. net primary production), and annually (i.e. vegetation dynamics). Hence, meaningful simulations have a temporal resolution between 1 min and 1 h for the energy balance, water balance, and photosynthesis calculations.

ORCHIDEE builds on the concept of meta-classes to describe vegetation distribution. By default, it distinguishes 13 meta-classes (one for bare soil, eight for forests, two for grasslands, and two for croplands). Each meta-class can be subdivided into an unlimited number of plant functional types (PFTs). When simulations make use of species-specific parameters and age classes, several PFTs belonging to a single meta-class will be defined. Biogeochemical and biophysical variables are calculated for each PFT or groups of PFTs (e.g. all tree PFTs in a pixel drawn from the same description of soil hydrology, known as a soil water column).

ORCHIDEE is not an individual-based model but instead currently represents forest stand complexity and stand dynamics with diameter and age classes. Each class contains a number of individuals that represent the mean state of the class. Therefore, each diameter class contains a single modelled tree that is replicated multiple times and distributed at random throughout the PFT area. At the start of a simulation, each PFT contains a user-defined number of stem diameter classes. This number is held constant throughout the simulation, whereas the diameter boundaries of the classes are adjusted to accommodate for temporal evolution in the stand structure. By using flexible class boundaries with a fixed number of diameter classes, different forest structures can be simulated. An even-aged forest, for example, is simulated with a small diameter range between the smallest and largest classes. All classes will then effectively belong to the same stratum. An uneven-aged forest is simulated by applying a large range between the diameter classes. Therefore, different diameter classes will effectively represent different strata. The limitations of this approach become apparent when the TRW data and simulations are compared by calendar year, as the model does not track individual trees. Although the dimensions of each model tree itself are well defined, the amount of radiation it receives (and therefore the amount of carbon produced) is determined by the statistical distribution of all model trees in that grid cell.

Vegetation structure is then used for the calculation of the biophysical and biogeochemical processes of the model such as photosynthesis, plant hydraulic stress, and the radiative transfer model. The r5698 version of ORCHIDEE (Sect. S1), used in this study, combines the dynamic nitrogen cycle of ORCHIDEE r4999 (Vuichard et al.2019; Zaehle and Friend2010) and the explicit canopy representation of ORCHIDEE r4262 (Chen et al.2016; Naudts et al.2015; Ryder et al.2016). It simulates carbon, water, energy, and nitrogen fluxes, allows for size-dependent allocation across three diameter classes within a forest stand, and has been parameterized and tested for the simulation of TRW series via radial tree growth estimation. A detailed overview of earlier developments (Krinner et al.2005; Naudts et al.2015; Vuichard et al.2019) that resulted in the emerging capability of ORCHIDEE r5698 to match the aggregate tree growth model (Fig. 2) is given in the Supplement (Sect. S1).

3.2 Simulations: forcing and model parameterization

In this study, offline ORCHIDEE simulations for the 20th century are forced with the merged and homogenized gridded CRU-NCEP climate dataset (Viovy2016); the gridded nitrogen deposition product from the Chemistry-Climate Model Initiative (CCMI) (Eyring et al.2013); a gridded nitrogen fertilization product for N2O (Lu and Tian2017); an observation-based time series of global atmospheric CO2 concentrations (Keeling et al.1996); and forest management followed the reported management status for each of 11 sites simulated (Sect. S3) for comparison, via the virtual-tree benchmarks, with observations (Sect. 3.3). Simulations were started from a 300-year-long spin-up, which was required for equilibrium with respect to the slow carbon and nitrogen pools in the soil. The spin-up was concluded with a forest clear-cut, such that the start year and the length of each simulation matched the observed stand age for the validation dataset (Sect. 3.3).

During model development, two global (number of diameter classes and ratio for number of trees per each diameter class; see Sect. 3.1) and six PFT-specific parameters (fpower, fσ, kα_r, kα_s, kβ_r, and kβ_s; Table S1) were manually adjusted to jointly reproduce the TRW data of 11 ITRDB sites: aust112 (Strumia2005), cana106 (Archambault and Bergeron2002), chin037 (PAGES 2k Consortium2013), finl055 (Melvin2005), fran4 (Tessier1996), id007 (Briffa and Schweingruber2002), japa011 (Davi et al.2011), mo009 (Bell et al.2002), nepa003 (Krusic and Cook2005), spai055 (Briongos and Cerro-Barja2007), and turk027 (Griggs et al.2006). Thus, sites belonging to the same PFT were simulated by making use of a single PFT-specific parameter set. In other words, no site-specific parameters were applied. All other parameters were set to default values. As configured, the model distinguished five diameter classes for simulated trees. The smallest and largest diameter classes each contained 15 % of the total number of simulated trees. The three intermediate classes contained 21 %, 27 %, and 21 % of simulated trees respectively. Agreement between simulated and observed TRW was assessed visually.

3.3 Verification

The European biomass network contains TRW samples from “fixed-plot sampling”. The database was established within multiple research projects and made publicly available through the EU Horizon 2020 project BACI (BACI2020). It currently archives 48 datasets covering temperate and semi-boreal climates (Fig. S2) that have been collected from a variety of research efforts in Eurasia (Klesse et al.2018). All trees larger than 5.6 cm in diameter at breast height had to be sampled in a 10–40 m radius plot, depending on stand density, in order to be archived in the European biomass network database (Babst et al.2014b). The European biomass network is, therefore, considered to be free of the big-tree selection bias that has plagued the ITRDB, although other known biases (e.g. slow-growing tree survivorship bias) may still be present. Thus, the records from the European biomass network are suited to evaluate the validity of using virtual trees constructed from ITRDB records to cope with the aforementioned sampling biases.

We selected sites from the European biomass network based on the following criteria: (1) the site had to be dominated by a single species for enhanced compatibility with ORCHIDEE, which is monospecific by design; and (2) the stand age should exceed 50 years as a requirement to apply all four proposed benchmarks (Sect. 2.4). The benchmarks were applied to a common evergreen and a common deciduous species. Hence, within the filtered sites, only sites dominated by Picea abies or Fagus sylvatica were retained, resulting in 12 sites out of the total of 48 sites. CIM, a site dominated by Fagus sylvatica, was removed from the selection (decreasing the final number of sites to 11) because only one tree out of 61 trees was aged over 100 years, resulting in a diameter distribution that was not at all compatible with the default diameter distribution of the model. The details of the selected sites are given in Table S3 in the Supplement.

The European biomass network data were additionally used to verify whether the big-tree selection bias that is present in the ITRDB data invalidates its use for benchmarking LSMs. The verification checked whether changes in parameter values or model process representation that would be required to make the model output better match the ITRDB data would also result in a better match between the model output and the all-tree data from the European biomass network. If this were the case, benchmarking LSMs against ITRDB data would result in model changes that would enhance the model's capability to simulate tree growth, thereby justifying the conclusion that ITRDB data can be used for LSM benchmarking despite the known biases.

Therefore, the verification used the data from the European biomass network in two different ways. First, all trees in the European biomass network dataset were used (hereafter called “all-tree data”) to calculate the four proposed benchmarks at the site level. The results of these benchmarks were used as the reference in the verification. Second, only big trees were subsampled from the data (hereafter called “big-tree data”), and all four benchmarks were calculated against this subsample of data. Big trees were defined as the top 15 % of the trees based on their diameter, and the 15 % threshold was taken to match the diameter distribution in ORCHIDEE, where by definition the largest diameter class contains 15 % of the trees.

The “big-tree” verification required three additional steps (Fig. 9). In the first step, the simulated TRW values from the largest diameter class were transformed by modifiers to minimize the two metrics of each benchmark (Sect. 2.4; Table S2). The different benchmarks may use different metrics, i.e. the RMSE and slope of the residuals were used as the metrics for benchmarking size-related growth trend, growth of mature trees, and growth of young trees, whereas extreme growth and TRW amplitude were used as the metrics for benchmarking extreme growths (Table S2). In the second step, the same modifiers were then applied to all simulated diameter classes, and all four benchmarks. Hence, two metrics were calculated using all-tree data for each benchmark. In the third and final step, the actual verification tested whether – for a given metric and a given benchmark – the modifier improved simulations for the big-tree sample and for the all-tree data. Improvement of a specific metric of a benchmark was quantified by subtracting the original value for that metric from its modified value for all-tree data. Thus, a negative value indicated an improvement. If all three conditions were satisfied, the benchmarks of the big-tree and all-tree data were said to be consistent, implying that using this benchmark in combination with the ITRDB data would reveal the same model shortcomings as benchmarking ORCHIDEE against TRW data from all-tree networks. Across the 11 sites and for each of the four proposed benchmarks and both benchmark metrics, sites where the test improved for both datasets were counted to estimate the confidence in using ITRDB in benchmarking LSMs.

As this study aims to propose benchmarks making use of the ITRDB study rather than improving the ORCHIDEE model, the modifiers were applied to the model output directly. This approach has the advantage of remaining conceptual, avoiding the need to optimize specific model parameters or rewrite or add processes in the model code. Different modifiers were used to accommodate the differences between the metrics: (1) the RMSE or amplitude of a benchmark was minimized by multiplying the simulated TRW with a modifier (Fig. 9); (2) the slope of the residuals of a benchmark was minimized by subtracting a trend modifier from the simulated growth trend; and (3) the years of the simulated TRW were rearranged such that they matched the ranked order of observed extreme TRWs.

Table 1Verification of the benchmarks and their metrics. Each cell represents the result from a single site. The values show the difference for each metric before and after optimization. Bold cells show the cases where the optimization for the all-tree data was consistent with the optimization result of the big-tree data. (See Table S3 for more information.)

Download Print Version | Download XLSX

4 Results

Verification of ORCHIDEE-based TRW simulations was applied at the 11 sites selected from the European biomass network (Sect. 3.3), via estimation of the four benchmarks (Sect. 2.4) from simulations and observations as well as their evaluation via the two skill metrics per benchmark, for a total of 88 test case comparisons (Table 1, Fig. S3). We first describe the results of big-tree bias estimation and modifier estimation in general, and then detail each of the individual benchmarks in the remainder of this section.

4.1 Big-tree bias estimation

Despite its simplicity, the use of modifiers was found to be robust as it improved all metrics of the four proposed benchmarks at each of the 11 sites when verifying against the big-tree data. Applying the same modifiers to the all-tree data improved the match between the simulations and observations in 72 % of the test cases (63 out of the 88 test cases; Table 1). We note that this overall result hides large differences between tree species. The verification appeared to be more successful for beech with an overall confidence level of 84 % (27 out of 32 test cases) compared with spruce with an overall confidence level of 64 % (36 out of 56 test cases).

4.2 Verification by benchmark

When benchmarking size-related growth, conclusions would be similar in 16 out of 22 cases (73 %), regardless of whether ORCHIDEE was benchmarked against the big-tree data or the all-tree data. Some sites such as DEO and DVN showed positive differences close to zero, suggesting that simulations with ORCHIDEE r5698 matched the observed size-related growth trend reasonably well, leaving limited room for improvement. One site (SCH) showed a positive difference because it contained two slow-growing trees which lived roughly 40 years longer than the rest of trees but whose diameters were too small to be contained in the big-tree sample (Fig. S4). Except for this site, the other sites showed marginal inconsistencies or showed improved simulated output against the two datasets. Thus, the size-related trend in tree growth can be derived from either the big-tree or the all-tree data.

For the mature-tree benchmark, big-tree data can be used with 68 % (15 out of the 22) confidence for benchmarking against LSMs. At 5 out of 11 sites, the all-tree data and the big-tree data yield different results. Two sites (HD2 and TIC) for which inconsistencies between the big-tree and all-tree data were observed for both metrics have 36 %–44 % of small trees in their size distribution, compared with an average of 28 % at the other nine sites. The proportion of small trees in the observation was estimated by counting trees in the smallest bin when trees were divided into five size classes similar to the model. The site labelled as ZOF has a bimodal size distribution with the biggest number of trees in the 1st and 4th diameter classes (35 % and 32 % respectively). The default size distribution in ORCHIDEE has 15 % of its trees in the smallest size class and 21 % of its trees in the 4th size class. At two other sites, DEO and SOB, the growth rate for big trees was higher in the observations (0.95 and 0.50 for the slope of residuals respectively), as the difference in big trees and small trees was bigger in the observations (Fig. S5b); despite this, the average simulation at these sites matched well with the average diameter trend as shown by the calculated slope of residuals: 0.08 and 0.09 respectively (Fig. S5a). These results suggest that the mature-tree benchmark is sensitive to the stand structure.

With 50 % (11 out of 22) confidence in using the big-tree data in benchmarking LSMs, the young-tree benchmark appears to be the most demanding in terms of its data. At the DEO, HD2, and SOB sites, inconsistencies between benchmarking the big-tree data and the all-tree data stemmed from (1) the similarity between simulations and observations, with a RMSE of around 10 mm; and (2) the fact that the difference between big trees' and small trees' growth was larger in the observations (Fig. S6). The site labelled as SCH contained two extremely fast-growing young trees resulting in a very fast-growing virtual tree in the optimized model output (Fig. S7). For SOR, the difficulties may have come from the model itself, more specifically from difficulties in simulating the carbon allocation (Fig. S8). These results suggest that a variety of issues decreases the confidence in using big-tree data for young-tree benchmarking.

For the extreme-growth benchmark, big-tree data can be used with 95 % (21 out of the 22) confidence for benchmarking LSMs. The observed consistency between benchmarking the big-tree data and the all-tree data suggests that extreme growth happens in the same years, irrespective of which dataset is being used. The DVN site showed the smallest RMSE for amplitude when it was calculated with the all-tree data (0.02), but the site has the biggest ratio of big-trees to all-trees for amplitudes compared with the simulation (1.30, Fig. S9). In other words, if the simulation is adjusted to the big trees in the observation, as the difference between subsampled big-tree and all-tree is larger in the observations, the average simulation becomes bigger than the average observation (see Figs. S5 and S6). This result suggests that the extreme-growth benchmark is the least demanding benchmark in terms of sampling design.

5 Discussion

5.1 LSM verification: beyond tree-ring width

Until now, verification of LSMs against tree-ring width records has relied primarily on interannual variation in simulated net primary productivity as a proxy for site-level TRW chronologies (Klesse et al.2018; Kolus et al.2019; Rammig et al.2015; Zhang et al.2018). Although such an indirect approach is appropriate, to a certain extent, for validating the capability of LSMs to simulate interannual variability and for studying patterns and mechanism of change over longer timescales, the observations will need to be detrended to remove the size-related growth signal, adding considerable uncertainty to the verification process (Bunde et al.2013; Cedro2016; Nicklen et al.2019; Stine2019). Recognizing Cook’s conceptual model of aggregate tree growth (Fig. 2), we propose to move beyond the net primary production proxy by explicitly simulating and validating stem radial growth demographics. By doing so, we enrich the verification by including the effects of potentially confounding factors such as forest structure, age and size trends (Alexander et al.2018; Nickless et al.2011; Jiang et al.2018), phenology (Shen et al.2020), and sampling biases (Babst et al.2014a), in addition to climate and environmental forcing (Klesse et al.2018; Zuidema et al.2020; Li et al.2014; Rollinson et al.2017).

Targeting both size-structured and age-structured information in observations and simulations (Fig. 3), we have proposed the use of four verification benchmarks created from observations and potentially simulated by LSMs, with each of them defined by two complementary metrics (Fig. 2; Table S2):

  • i.

    The size-trend benchmark targets the long-term trend in TRW. This trend contains information about ontogenetic growth during establishment and endogenous competition from canopy closure (Cook and Kairiukstis1990). Although this trend is removed in many dendrochronological studies to amplify the climate signal contained in TRW (Briffa and Melvin2011), we suggest testing the skill of the model in reproducing it because it is important to constrain biomass production. Benchmarking a suitable LSM against observed size-related trends in TRW may help to develop, evaluate, or parameterize allometric relationships and changes in simulated stand density (Fig. 4a).

  • ii.

    The mature-tree benchmark tests the capability of the model in simulating annual growth of a mature forest. As this benchmark aligns the observations by calendar year (Fig. 4b), it may reflect the effects of long-term environmental changes, if there were any and if the observational record is long enough to express them (Hess et al.2018; Panthi et al.2020). As a skilled LSM is expected to reproduce plant responses to long-term environmental change, this benchmark could be used to develop, evaluate, and parameterize the processes that simulate endogenous disturbances and plant responses to factors such as increasing atmospheric CO2 concentrations, atmospheric N deposition, and warming.

  • iii.

    Tree growth during stand establishment can be tested with the young-tree benchmark. The growth of establishing trees differs from that of mature canopy trees, and this difference has been accounted for by using separate benchmarks for young and mature stands (Fig. 4c). This benchmark could be used to develop, evaluate, or parameterize allometric growth of young trees as well as tree mortality prior to canopy closure.

  • iv.

    The extreme-growth benchmark tests the occurrence and range of extreme-growth events. Previously, interannual variability in TRW has been used to evaluate the climate sensitivity of LSMs. Interannual variability has a limitation because we cannot expect the model to simulate the timing of endogenous or exogenous disturbances, such as fire, pest, and disease outbreak, or the death of big trees leading to sudden growth releases in adjacent trees. By forming a benchmark from the extrema of the empirical distribution of incremental growth in mature trees (e.g. as evident from Fig. 5a; see also Figs. 9, S8, and S9), we create a direct comparison with the simulated demographics of trees, as observed over a contemporaneous time interval. This benchmark could be used to develop, evaluate, or parameterize the plant water stress and the temperature dependency of plant growth in the model.

The metrics of the first three benchmarks are the root mean square error (RSME) and slope (Figs. 5, 67). The RMSE examines if the model reproduces the absolute values of TRWs. However, even though a model might reproduce the value of TRWs well, it is still expected to simulate the long-term trend in TRW that comes from climate changes or endogenous competition. This latter aspect is quantified by the slope metric. For the large-scale models such as a LSM and for sites with little high-quality site information, correctly simulating growth trends should be prioritized over matching the end points in tree diameter. Because the benchmark for extreme-growth events was not intended to test the capability of LSMs to simulate growth trends, the slope of residuals was not included. As a skilled model is expected to simulate not only the timing of extreme growth but also the magnitude of it, the metrics for this benchmark were designed to both be evaluated using the RMSE (Fig. 9).

5.2 Toward LSM verification using the ITRDB

Regardless of the approach to LSM verification, the largest publicly available archive of tree-ring records, the ITRDB, is prone to sampling biases (Klesse et al.2018; Zhao et al.2019). Although it may be difficult to correct the data for these biases, our benchmarks present two solutions for comparing LSM output to ITRDB observations of raw ring width. Simulating a size-structured population of trees enables the comparison of the observations relative to a benchmark for a tall simulated tree, which compensates for the tendency of dendroclimatic sampling to select the oldest trees in a stand, which may turn out to be the larger trees. Although the ITRDB does not contain the site metadata that would be required to make this comparison exact (i.e. the diameter and true age distribution of the sampled stand), the use of the tall-tree benchmark protects against comparing the observed mean of a biased sample to the observed mean in unbiased simulation demographics. The second solution relies on the observation that the variation due to size-related growth by far exceeds the variation due to environmental changes and helps to constrain the survivor bias, which is derived from the growth of young fast-growing trees that died a long time ago and are therefore absent from records made from present-day sampling of old growth forests (Brienen et al.2017). The benchmarks proposed here provide a tool to start using ITRDB TRWs as a much-needed large-scale constraint on the maximum tree diameter and annual growth for the transition from pre-industrial to present-day environmental conditions.

Our verification approach estimated the level of confidence for each benchmark from the fraction of cases for which scaling simulations to observational benchmarks for big-tree data would result in improved model performance for observational benchmarks for all-tree data. In other words, big-tree-biased verification data should not degrade model performance relative to all-tree verification data, and such tests were performed using the European biomass network dataset (BACI2020). The results, then, might inform the use of the ITRDB, which is suspected to contain large-tree bias, for large-scale verification of LSMs on decadal to centennial timescales (Fig. S1b).

Verification results (Table 1) show that if the output of ORCHIDEE is benchmarked against data with imposed big-tree bias, there is 70 % confidence that the benchmark will produce similar conclusions as reached from the use of all-tree data (Table 1, columns 1 and 2). This level of confidence is perhaps sufficient to support benchmarking a LSM against tens to hundreds of ITRDB sites in aggregate. The same level of confidence is likely too low to benchmark a LSM, or any ecosystem model, against a single ITRDB data series, as there is a 30 % chance that parameter tuning or model improvements following the benchmarking will not verifiably improve the model. Given the limited spatial extent, species content, and environmental range of the European biomass network used in this study, the levels of confidence represent temperate and hemi-boreal forests, a subset of the ITRDB range, and the actual distribution of forests (Fig. S2). The validity of the use of the ITRDB data from boreal and tropical forests will need to be verified as suitable data become available.

Across all species, benchmarking against extreme events, mature trees, and the size-related growth appeared to be the least sensitive to big-tree bias (Table 1; 63 out of 88 cases in bold font). Benchmarking against young trees will benefit from using data free of the big-tree selection bias (Table 1, compare the bold values reported in columns 1–4 and 7–8 to those in columns 5–6). This finding suggests limited utility of ITRDB data to verify simulations using the young-tree benchmarks. Because the true diameter distribution is not contained in the ITRDB, it is neither possible to select only ITRDB sites for which the actual diameter distribution matches ORCHIDEE’s distribution nor to adjust the diameter distribution in ORCHIDEE to the observed distribution (Figs. S4–S7). Matching observed and simulated distributions appears to be essential when benchmarking the growth of young trees. The same finding suggests, however, that forest inventory data for which the diameter distribution is known but only a few big trees were cored would be a reliable data source for benchmarking LSM. The higher fraction of modifier-improved ORCHIDEE simulations for beech (87 %) relative to spruce (64 %) (Table 1, compare the bold values in rows 1–7 to the bold values in rows 8–11) suggests that the validity of the assumptions underlying the use of ITRDB data partly depends on tree species. Unfortunately, in this study, the variety of species was too limited to generalize this result in terms of plant functional types.

Even when benchmarks calculated from observations and simulations are not in agreement, they may nevertheless be used to identify ways in which to improve the observational database or the simulation model. Considering the comparison of the observed to simulated interquartile range (Fig. S9), we see that the GIU site is amongst the most poorly simulated sites. However, we found that the simulation could be improved for both metrics of all four benchmarks through the use of modifiers (Table 1, third row), despite the difference in simulated and observed stand structure (Fig. S10). In addition, we also found that inconsistencies between big-tree data and all-tree-based benchmarks may appear even when the simulated mean TRWs approach observed means (Figs. S4–S7). This suggests that (1) ITRDB data can be used as a first approximation to benchmark the growth of young and mature trees in LSMs, and (2) as the model improves, the need for unbiased datasets will increase, as biases in observed stand structure and growth rates could hamper the use of young- and mature-tree benchmarks in particular.

5.3 Outlook

Tree-ring records of incremental growth that are suitably described in terms of benchmarks might complement well-established but short-term benchmarks for LSMs (Randerson et al.2009), such as forest inventory data (Bellassen et al.2010; Naudts et al.2015), eddy covariance measurements (Blyth et al.2010; Williams et al.2009), free-air CO2 enrichment experiments (De Kauwe et al.2013), and satellite observations of vegetation activity (Chen et al.2011; Demarty et al.2007). The novel benchmarks proposed here may also provide new targets for evaluating LSMs’ performance, as the metrics could be used in the objective function of any data assimilation technique (Peylin et al.2016) to rigorously account for the information contained in TRW datasets. The value of tree-ring records for LSM verification might be further enhanced by (i) developing new, unbiased networks, such as the European biomass network, to both complement and identify biases in the ITRDB; (ii) adding their stable isotope ratios to verification benchmarks that may be simulated by isotope-enabled LSMs (Levesque et al.2019; Barichivich et al.2021); and (iii) combining their use with high-frequency but short-term eddy covariance measurements (Pappas et al.2020; Teets et al.2018), experimental data from plant growth under pre-industrial CO2 concentrations (Temme et al.2015), and proxies of atmospheric composition (Campbell et al.2017).

6 Conclusion

We have proposed and evaluated the use of four benchmarks and two metrics that leverage observed demographics to provide more nuanced verification targets for LSMs that simulate both demographic responses and their environmental forcing on decadal to centennial timescales. Using small but relatively unbiased European biomass network datasets, we identify the extent to which presumed biases in the much larger ITRDB might degrade the validation of LSMs. We find that size, mature-tree, and extreme-growth verification benchmarks are relatively insensitive to big-tree bias, but the use of young-tree benchmarks for verification of LSMs may require the development of new unbiased observational TRW datasets and/or the innovative use of independent verification data.

Code and data availability

In line with GMD requirements, the model code has been archived and made accessible: (Luyssaert2019). The scripts required for reproducing the figures, the ORCHIDEE simulations, and the intermediate results are available at (Jeong et al.2021). The BACI dataset is freely available online at (last access: 16 August 2021, BACI, 2020) but requires registration by email.

The ITRDB dataset can be accessed through, last access: 15 September 2021, National Oceanic and Atmospheric Administration2020).


The supplement related to this article is available online at:

Author contributions

Proposed benchmarks are the outcome of discussions between JJ, JB, PP, VH, and SL. JJ ran the model, analysed the output, and prepared figures. FB collected the BACI data. JJ, SL, and MNE wrote the paper, and all authors contributed to revising and editing the different versions of the paper.

Competing interests

Author Philippe Peylin is a member of the editorial board of the journal.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Michael Neil Evans acknowledges insights arising from work with the PAGES/Data Assimilation and Proxy System Modeling Working Group. Stefan Klesse co-developed the European biomass network and provided management information. Sebastiaan Luyssaert would like to thank Antonio Lara (Universidad Austral de Chile) for early discussions on the study topic. Valerie Daux (Laboratoire des Sciences du Climat et de l’Environnement) is acknowledged for comments on a previous version of the paper, which improved its clarity.

Financial support

This research has primarily been supported by the Horizon 2020 VERIFY project (grant no. 776810). Additional support was received from the Centre National de la Recherche Scientifique (CNRS) of France through the “Make Our Planet Great Again” programme; the Earth Systems and Climate Change Hub, funded by the Australian Government’s National Environmental Science Program (grant no. NSF/AGS1903626); and the “Inside out' project (grant no. POIR.04.04.00-00-5F85/18-00), which is funded by the HOMING programme of the Foundation for Polish Science and the European Union under the European Regional Development Fund.

Review statement

This paper was edited by Christian Folberth and reviewed by three anonymous referees.


Alexander, M. R., Rollinson, C. R., Babst, F., Trouet, V., and Moore, D. J. P.: Relative influences of multiple sources of uncertainty on cumulative and incremental tree-ring-derived aboveground biomass estimates, Trees, 32, 265–276,, 2018. a

Amthor, J. S.: The McCree–de Wit–Penning de Vries–Thornley Respiration Paradigms: 30 Years Later, Ann. Botany, 86, 1–20,, 2000. a

Archambault, S. and Bergeron, Y.: Lac Duparquet – THOC – ITRDB CANA106 – RWL, NOAA National Centers for Environmental Information,, 2002. a

Babst, F., Alexander, M. R., Szejner, P., Bouriaud, O., Klesse, S., Roden, J., Ciais, P., Poulter, B., Frank, D., Moore, D. J., and Trouet, V.: A tree-ring perspective on the terrestrial carbon cycle, Oecologia, 176, 307–322,, 2014a. a, b, c

Babst, F., Bouriaud, O., Alexander, R., Trouet, V., and Frank, D.: Toward consistent measurements of carbon accumulation: A multi-site assessment of biomass and basal area increment across Europe, Dendrochronologia, 32, 153–161,, 2014b. a, b

Babst, F., Poulter, B., Bodesheim, P., Mahecha, M. D., and Frank, D. C.: Improved tree-ring archives will support earth-system science, Nat. Ecol. Evol., 1, 0008,, 2017. a

Babst, F., Bodesheim, P., Charney, N., Friend, A. D., Girardin, M. P., Klesse, S., Moore, D. J. P., Seftigen, K., Björklund, J., and Bouriaud, O.: When tree rings go global: challenges and opportunities for retro-and prospective insight, Quatern. Sci. Rev., 197, 1–20, 2018. a

BACI: European biomass network, available at: (last access: 16 August 2021), 2020. a, b, c, d

Bakker, J. D.: A new, proportional method for reconstructing historical tree diameters, Can. J. Forest Res., 35, 2515–2520,, 2005. a

Barichivich, J., Peylin, P., Launois, T., Daux, V., Risi, C., Jeong, J., and Luyssaert, S.: A triple tree-ring constraint for tree growth and physiology in a global land surface model, Biogeosciences, 18, 3781–3803,, 2021. a

Bell, R., Magre, F., and Senter, D.: Jefferson County Missouri – JUVI – ITRDB MO009 – RWL, NOAA National Centers for Environmental Information,, 2002. a

Bellassen, V., Le Maire, G., Dhôte, J., Ciais, P., and Viovy, N.: Modelling forest management within a global vegetation model – Part 1: Model structure and general behaviour, Ecol. Model., 221, 2458–2474,, 2010. a, b

Blyth, E., Gash, J., Lloyd, A., Pryor, M., Weedon, G. P., and Shuttleworth, J.: Evaluating the JULES Land Surface Model Energy Fluxes Using FLUXNET Data, J. Hydrometeorol., 11, 509–519,, 2010. a

Bonan, G. B., Williams, M., Fisher, R. A., and Oleson, K. W.: Modeling stomatal conductance in the earth system: linking leaf water-use efficiency and water transport along the soil–plant–atmosphere continuum, Geosci. Model Dev., 7, 2193–2222,, 2014. a

Bowman, D. M., Brienen, R. J., Gloor, E., Phillips, O. L., and Prior, L. D.: Detecting trends in tree growth: not so simple, Trends Plant Sci., 18, 11–17,, 2013. a, b

Bowman, D. M. J. S., Balch, J. K., Artaxo, P., Bond, W. J., Carlson, J. M., Cochrane, M. A., D'Antonio, C. M., DeFries, R. S., Doyle, J. C., Harrison, S. P., Johnston, F. H., Keeley, J. E., Krawchuk, M. A., Kull, C. A., Marston, J. B., Moritz, M. A., Prentice, I. C., Roos, C. I., Scott, A. C., Swetnam, T. W., van der Werf, G. R., and Pyne, S. J.: Fire in the Earth System, Science, 324, 481–484,, 2009. a

Bräuning, A., De Ridder, M., Zafirov, N., García-González, I., Petrov Dimitrov, D., and Gärtner, H.: Tree-Ring Features: Indicators Of Extreme Event Impacts, IAWA Journal, 37, 206–231,, 2016. a

Brienen, R. J. W., Gloor, E., and Zuidema, P. A.: Detecting evidence for CO2 fertilization from tree ring studies: The potential role of sampling biases, Global Biogeochem. Cy., 26, GB1025,, 2012. a, b

Brienen, R. J. W., Gloor, M., and Ziv, G.: Tree demography dominates long‐term growth trends inferred from tree rings, Glob. Change Biol., 23, 474–484,, 2017. a

Briffa, K. and Schweingruber, F.: Cascade Radar St. Payette – PCEN – ITRDB ID007 – RWL, NOAA National Centers for Environmental Information,, 2002. a

Briffa, K. R. and Melvin, T. M.: A Closer Look at Regional Curve Standardization of Tree-Ring Records: Justification of the Need, a Warning of Some Pitfalls, and Suggested Improvements in Its Application, Dendroclimatology, 11, 113–145,, 2011. a, b, c, d, e

Briffa, K. R., Osborn, T. J., and Schweingruber, F. H.: Large-scale temperature inferences from tree rings: a review, Global Planet. Change, 40, 11–26,, 2004. a

Briongos, J. and Cerro-Barja, A.: La Camarilla el Provencio Cuenca Undisturbed – PIPN – ITRDB SPAI055 – RWL, NOAA National Centers for Environmental Information,, 2007. a

Bunde, A., Büntgen, U., Ludescher, J., Luterbacher, J., and von Storch, H.: Is there memory in precipitation?, Nat. Clim. Change, 3, 174–175,, 2013. a

Campbell, J. E., Berry, J. A., Seibt, U., Smith, S. J., Montzka, S. A., Launois, T., Belviso, S., Bopp, L., and Laine, M.: Large historical growth in global terrestrial gross primary production, Nature, 544, 84–87,, 2017. a

Cao, X., Tian, F., Li, F., Gaillard, M.-J., Rudaya, N., Xu, Q., and Herzschuh, U.: Pollen-based quantitative land-cover reconstruction for northern Asia covering the last 40 ka cal BP, Clim. Past, 15, 1503–1536,, 2019. a

Cedro, A.: Growth-climate relationships of wild service trees on the easternmost range boundary in Poland, STR16/04, GFZ German Research Centre for Geosciences, Potsdam, p. 24,, 2016. a

Chen, Y., Yang, K., He, J., Qin, J., Shi, J., Du, J., and He, Q.: Improving land surface temperature modeling for dry land of China, J. Geophys. Res.-Atmos., 116, D20104,, 2011. a

Chen, Y., Ryder, J., Bastrikov, V., McGrath, M. J., Naudts, K., Otto, J., Ottlé, C., Peylin, P., Polcher, J., Valade, A., Black, A., Elbers, J. A., Moors, E., Foken, T., van Gorsel, E., Haverd, V., Heinesch, B., Tiedemann, F., Knohl, A., Launiainen, S., Loustau, D., Ogée, J., Vessala, T., and Luyssaert, S.: Evaluating the performance of land surface model ORCHIDEE-CAN v1.0 on water and energy flux estimation with a single- and multi-layer energy budget scheme, Geosci. Model Dev., 9, 2951–2972,, 2016. a

Chen, Y.-Y., Gardiner, B., Pasztor, F., Blennow, K., Ryder, J., Valade, A., Naudts, K., Otto, J., McGrath, M. J., Planque, C., and Luyssaert, S.: Simulating damage for wind storms in the land surface model ORCHIDEE-CAN (revision 4262), Geosci. Model Dev., 11, 771–791,, 2018. a

Compo, G. P., Whitaker, J. S., Sardeshmukh, P. D., Matsui, N., Allan, R. J., Yin, X., Gleason, B. E., Vose, R. S., Rutledge, G., Bessemoulin, P., Brönnimann, S., Brunet, M., Crouthamel, R. I., Grant, A. N., Groisman, P. Y., Jones, P. D., Kruk, M. C., Kruger, A. C., Marshall, G. J., Maugeri, M., Mok, H. Y., Nordli, Ø., Ross, T. F., Trigo, R. M., Wang, X. L., Woodruff, S. D., and Worley, S. J.: The Twentieth Century Reanalysis Project, Q. J. Roy. Meteorol. Soc., 137, 1–28,, 2011. a

Cook, E. R.: A time series analysis approach to tree ring standardization, PhD Thesis, 1985. a

Cook, E. R. and Kairiukstis, L. A.: Methods of Dendrochronology, Springer Netherlands, Dordrecht,, 1990. a, b, c, d

Cook, E. R., Briffa, K. R., Meko, D. M., Graybill, D. A., and Funkhouser, G.: The “segment length curse” in long tree-ring chronology development for palaeoclimatic studies, The Holocene, 5, 229–237,, 1995. a, b

D'Arrigo, R., Wilson, R., Liepert, B., and Cherubini, P.: On the “Divergence Problem” in Northern Forests: A review of the tree-ring evidence and possible causes, Global Planet. Change, 60, 289–305,, 2008. a

Davi, N., D'Arrigo, R., Jacoby, G. C., Buckley, B., and Kobayashi, O.: Shiretoko – PCGN – ITRDB JAPA011 – RWL, NOAA National Centers for Environmental Information,, 2011. a

De Kauwe, M. G., Medlyn, B. E., Zaehle, S., Walker, A. P., Dietze, M. C., Hickler, T., Jain, A. K., Luo, Y., Parton, W. J., Prentice, I. C., Smith, B., Thornton, P. E., Wang, S., Wang, Y.-P., Wårlind, D., Weng, E., Crous, K. Y., Ellsworth, D. S., Hanson, P. J., Seok Kim, H., Warren, J. M., Oren, R., and Norby, R. J.: Forest water use and water use efficiency at elevated CO2: a model-data intercomparison at two contrasting temperate forest FACE sites, Glob. Change Biol., 19, 1759–1779,, 2013. a

De Schepper, V. and Steppe, K.: Development and verification of a water and sugar transport model using measured stem diameter variations, J. Exp. Bot., 61, 2083–2099,, 2010. a

Deleuze, C. and Houllier, F.: Simple process-based xylem growth model for describing wood microdensitometric profiles, J. Theor. Biol., 193, 99–113,, 1998. a

Deleuze, C., Pain, O., Dhôte, J.-F., and Hervé, J.-C.: A flexible radial increment model for individual trees in pure even-aged stands, Ann. Forest Sci., 61, 327–335,, 2004. a

Demarty, J., Chevallier, F., Friend, A. D., Viovy, N., Piao, S., and Ciais, P.: Assimilation of global MODIS leaf area index retrievals within a terrestrial biosphere model, Geophys. Res. Lett., 34, L15402,, 2007. a

Drew, D. M., Downes, G. M., and Battaglia, M.: CAMBIUM, a process-based model of daily xylem development in Eucalyptus, J. Theor. Biol., 264, 395–406,, 2010. a

Ducoudré, N. I., Laval, K., and Perrier, A.: SECHIBA, a New Set of Parameterizations of the Hydrologic Exchanges at the Land-Atmosphere Interface within the LMD Atmospheric General Circulation Model, J. Climate, 6, 248–273,<0248:SANSOP>2.0.CO;2, 1993. a

Dufrêne, E., Davi, H., François, C., Le Maire, G., Le Dantec, V., and Granier, A.: Modelling carbon and water cycles in a beech forest. Part I: Model description and uncertainty analysis on modelled NEE, Ecol. Model., 185, 407–436,, 2005. a

Eyring, V., Lamarque, J.-F., Hess, P., Arfeuille, F., Bowman, K., Chipperfiel, M. P., Duncan, B., Fiore, A., Gettelman, A., and Giorgetta, M. A.: Overview of IGAC/SPARC Chemistry-Climate Model Initiative (CCMI) community simulations in support of upcoming ozone and climate assessments, SPARC newsletter, 40, 48–66, 2013. a

Farquhar, G. D.: Models of Integrated Photosynthesis of Cells and Leaves, Philos. T. R. Soc. B, 323, 357–367,, 1989. a

Fisher, R. A., Muszala, S., Verteinstein, M., Lawrence, P., Xu, C., McDowell, N. G., Knox, R. G., Koven, C., Holm, J., Rogers, B. M., Spessa, A., Lawrence, D., and Bonan, G.: Taking off the training wheels: the properties of a dynamic vegetation model without climate envelopes, CLM4.5(ED), Geosci. Model Dev., 8, 3593–3619,, 2015. a

Franklin, O., Johansson, J., Dewar, R. C., Dieckmann, U., McMurtrie, R. E., Brännström, Å., and Dybzinski, R.: Modeling carbon allocation in trees: a search for principles, Tree Physiol., 32, 648–666,, 2012. a

Friedlingstein, P., Betts, R., Bopp, L., Bloh, W. V., Brovkin, V., Doney, S., Eby, M., Fung, I., Govindasamy, B., John, J., Jones, C., Joos, F., Kato, T., Kawamiya, M., Knorr, W., Lindsay, K., Matthews, H. D., Raddatz, T., Rayner, P., Reick, C., Roeckner, E., Schnitzler, K. G., Schnurr, R., Strassmann, K., Thompson, S., J Weaver, A., Yoshikawa, C., and Zeng, N.: Climate –carbon cycle feedback analysis, results from the C4MIP model intercomparison, J. Climate, 19, 3337–3353,, 2006. a

Friedlingstein, P., Meinshausen, M., Arora, V. K., Jones, C. D., Anav, A., Liddicoat, S. K., and Knutti, R.: Uncertainties in CMIP5 climate projections due to carbon cycle feedbacks, J. Climate, 27, 511–526,, 2014. a

Friend, A. D., Eckes-Shephard, A. H., Fonti, P., Rademacher, T. T., Rathgeber, C. B., Richardson, A. D., and Turton, R. H.: On the need to consider wood formation processes in global vegetation models and a suggested approach, Ann. Forest Sci., 76, 49,, 2019. a

Fritts, H. C.: Tree rings and climate, Elsevier, 2012. a, b

Fritts, H. C., Shashkin, A., and Downes, G. M.: A simulation model of conifer ring growth and cell structure, in: Tree-ring analysis: biological, methodological and environmental aspects, CABI publishing,Wallingford, UK, 3–32, 1999. a

Griggs, C., Kuniholm, P., and Petrucci, A.: Devecikonak Forest – QUSP – ITRDB TURK027 – RWL, NOAA National Centers for Environmental Information,, 2006. a

Grissino-Mayer, H. D. and Fritts, H. C.: The International Tree-Ring Data Bank: an enhanced global database serving the global scientific community, The Holocene, 7, 235–238,, 1997. a

Haverd, V., Smith, B., Cook, G. D., Briggs, P. R., Nieradzik, L., Roxburgh, S. H., Liedloff, A., Meyer, C. P., and Canadell, J. G.: A stand-alone tree demography and landscape structure module for Earth system models, Geophys. Res. Lett., 40, 5234–5239,, 2013. a, b

Hayat, A., Hacket-Pain, A. J., Pretzsch, H., Rademacher, T. T., and Friend, A. D.: Modeling Tree Growth Taking into Account Carbon Source and Sink Limitations, Front. Plant Sci., 8, 182,, 2017. a

Hemming, D., Fritts, H., Leavitt, S. W., Wright, W., Long, A., and Shashkin, A.: Modelling tree-ring δ13C, Dendrochronologia, 19, 23–38, 2001. a

Hess, C., Niemeyer, T., Fichtner, A., Jansen, K., Kunz, M., Maneke, M., von Wehrden, H., Quante, M., Walmsley, D., von Oheimb, G., and Härdtle, W.: Anthropogenic nitrogen deposition alters growth responses of European beech (Fagus sylvativa L.) to climate change, Environ. Pollut., 233, 92–98,, 2018. a

Hirata, R., Hirano, T., Saigusa, N., Fujinuma, Y., Inukai, K., Kitamori, Y., Takahashi, Y., and Yamamoto, S.: Seasonal and interannual variations in carbon dioxide exchange of a temperate larch forest, Agr. Forest Meteorol., 147, 110–124,, 2007. a

Hölttä, T., Vesala, T., Sevanto, S., Perämäki, M., and Nikinmaa, E.: Modeling xylem and phloem water flows in trees according to cohesion theory and Münch hypothesis, Trees, 20, 67–78,, 2006. a

Hughes, M. K., Swetnam, T. W., and Diaz, H. F.: Dendroclimatology, vol. 11, in: Developments in Paleoenvironmental Research, Springer Netherlands, Dordrecht,, 2011. a, b

IPCC: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA,, 2013. a

Jeong, J., Barichivich, J., Peylin, P., Haverd, V., McGrath, M. J., Vuichard, N., Evans, M. N., Babst, F., and Luyssaert, S.: Source codes for gmd-2020-29, Version 1.1, Zenodo [code],, 2021. a

Jiang, X., Huang, J.-G., Cheng, J., Dawson, A., Stadt, K. J., Comeau, P. G., and Chen, H. Y.: Interspecific variation in growth responses to tree size, competition and climate of western Canadian boreal mixed forests, Sci. Total Environ., 631-632, 1070–1078,, 2018. a

Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Leetmaa, A., Reynolds, R., Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K. C., Ropelewski, C., Wang, J., Jenne, R., and Joseph, D.: The NCEP/NCAR 40-Year Reanalysis Project, B. Am. Meteorol. Soc., 77, 437–471,<0437:TNYRP>2.0.CO;2, 1996. a

Keeling, C. D., Chin, J. F. S., and Whorf, T. P.: Increased activity of northern vegetation inferred from atmospheric CO2 measurements, Nature, 382, 146–149,, 1996. a

Klesse, S., Babst, F., Lienert, S., Spahni, R., Joos, F., Bouriaud, O., Carrer, M., Di Filippo, A., Poulter, B., Trotsiuk, V., Wilson, R., and Frank, D. C.: A Combined Tree Ring and Vegetation Model Assessment of European Forest Growth Sensitivity to Interannual Climate Variability, Global Biogeochem. Cy., 32, 1226–1240,, 2018. a, b, c, d, e, f

Kolus, H. R., Huntzinger, D. N., Schwalm, C. R., Fisher, J. B., McKay, N., Fang, Y., Michalak, A. M., Schaefer, K., Wei, Y., Poulter, B., Mao, J., Parazoo, N. C., and Shi, X.: Land carbon models underestimate the severity and duration of drought's impact on plant productivity, Sci. Reports, 9, 2758,, 2019. a

Krinner, G., Viovy, N., de Noblet-Ducoudré, N., Ogée, J., Polcher, J., Friedlingstein, P., Ciais, P., Sitch, S., and Prentice, I. C.: A dynamic global vegetation model for studies of the coupled atmosphere-biosphere system, Global Biogeochem. Cy., 19, GB1015,, 2005. a, b

Krusic, P. and Cook, E.: Above Gheri – ABSB – ITRDB NEPA003 – RWL, NOAA National Centers for Environmental Information,, 2005. a

Laloyaux, P., de Boisseson, E., Balmaseda, M., Bidlot, J.-R., Broennimann, S., Buizza, R., Dalhgren, P., Dee, D., Haimberger, L., Hersbach, H., Kosaka, Y., Martin, M., Poli, P., Rayner, N., Rustemeier, E., and Schepers, D.: CERA-20C: A Coupled Reanalysis of the Twentieth Century, J. Adv. Model. Earth Sy., 10, 1172–1195,, 2018. a

LaMarche, V. C., Graybill, D. A., Fritts, H. C., and Rose, M. R.: Increasing Atmospheric Carbon Dioxide: Tree Ring Evidence for Growth Enhancement in Natural Vegetation, Science, 225, 1019–1021,, 1984. a

Leuzinger, S., Manusch, C., Bugmann, H., and Wolf, A.: A sink-limited growth model improves biomass estimation along boreal and alpine tree lines, Global Ecol. Biogeogr., 22, 924–932,, 2013. a

Levesque, M., Andreu-Hayles, L., Smith, W. K., Williams, A. P., Hobi, M. L., Allred, B. W., and Pederson, N.: Tree-ring isotopes capture interannual vegetation productivity dynamics at the biome scale, Nat. Commun., 10, 742,, 2019. a

Li, G., Harrison, S. P., Prentice, I. C., and Falster, D.: Simulation of tree-ring widths with a model for primary production, carbon allocation, and growth, Biogeosciences, 11, 6711–6724,, 2014. a, b

Lu, C. and Tian, H.: Global nitrogen and phosphorus fertilizer use for agriculture production in the past half century: shifted hot spots and nutrient imbalance, Earth Syst. Sci. Data, 9, 181–192,, 2017. a

Luo, Y. Q., Randerson, J. T., Abramowitz, G., Bacour, C., Blyth, E., Carvalhais, N., Ciais, P., Dalmonech, D., Fisher, J. B., Fisher, R., Friedlingstein, P., Hibbard, K., Hoffman, F., Huntzinger, D., Jones, C. D., Koven, C., Lawrence, D., Li, D. J., Mahecha, M., Niu, S. L., Norby, R., Piao, S. L., Qi, X., Peylin, P., Prentice, I. C., Riley, W., Reichstein, M., Schwalm, C., Wang, Y. P., Xia, J. Y., Zaehle, S., and Zhou, X. H.: A framework for benchmarking land models, Biogeosciences, 9, 3857–3874,, 2012. a

Luyssaert, S.: ORCHIDEE_CN_CAN_r5698, Institut Pierre Simon Laplace (IPSL) [code],, 2019. a

Magnani, F., Mencuccini, M., Borghetti, M., Berbigier, P., Berninger, F., Delzon, S., Grelle, A., Hari, P., Jarvis, P. G., Kolari, P., Kowalski, A. S., Lankreijer, H., Law, B. E., Lindroth, A., Loustau, D., Manca, G., Moncrieff, J. B., Rayment, M., Tedeschi, V., Valentini, R., and Grace, J.: The human footprint in the carbon cycle of temperate and boreal forests, Nature, 447, 849–851,, 2007. a

McGuffie, A., K. and Henderson‐Sellers: Practical Climate Modelling, in: A Climate Modelling Primer, John Wiley & Sons, Ltd,, 2005. a

Melvin, T.: Historical growth rates and changing climatic sensitivity of boreal conifers, PhD thesis, University of East Anglia, 2004. a

Melvin, T.: Vytamoselka – PISY – ITRDB FINL055 – RWL, NOAA National Centers for Environmental Information,, 2005. a

Mencuccini, M., Martínez-Vilalta, J., Vanderklein, D., Hamid, H. A., Korakaki, E., Lee, S., Michiels, B., Martínez‐Vilalta, J., Vanderklein, D., Hamid, H. A., Korakaki, E., Lee, S., and Michiels, B.: Size‐mediated ageing reduces vigour in trees, Ecol. Lett., 8, 1183–1190,, 2005. a

Merganičová, K., Merganič, J., Lehtonen, A., Vacchiano, G., Sever, M. Z. O., Augustynczik, A. L. D., Grote, R., Kyselová, I., Mäkelä, A., Yousefpour, R., Krejza, J., Collalti, A., and Reyer, C. P. O.: Forest carbon allocation modelling under climate change, Tree Physiol., 39, 1937–1960,, 2019. a

Meriläinen, J., Lindholm, M., Timonen, M., and Kolström, T.: Kukelo Ahmovaara Juuka – PISY – ITRDB FINL052 – RWL, NOAA National Centers for Environmental Information,, 2004. a

Misson, L.: MAIDEN: a model for analyzing ecosystem processes in dendroecology, Can. J. Forest Res., 34, 874–887,, 2004. a

Moorcroft, P. R., Hurtt, G. C., and Pacala, S. W.: A method for scaling vegetation dynamics: The ecosystem demography model (ED), Ecol. Monogr., 71, 557–586,[0557:AMFSVD]2.0.CO;2, 2001. a

Nash, S. E.: Fundamentals of tree-ring research. James H. Speer., Geoarchaeology, 26, 453–455,, 2011. a

National Oceanic and Atmospheric Administration (NOAA): International Tree-Ring Data Bank (ITRDB), Version: 7.22, NOAA [data set], available at: (last access: 15 September 2021), 2020. a

Naudts, K., Ryder, J., McGrath, M. J., Otto, J., Chen, Y., Valade, A., Bellasen, V., Berhongaray, G., Bönisch, G., Campioli, M., Ghattas, J., De Groote, T., Haverd, V., Kattge, J., MacBean, N., Maignan, F., Merilä, P., Penuelas, J., Peylin, P., Pinty, B., Pretzsch, H., Schulze, E. D., Solyga, D., Vuichard, N., Yan, Y., and Luyssaert, S.: A vertically discretised canopy description for ORCHIDEE (SVN r2290) and the modifications to the energy, water and carbon fluxes, Geosci. Model Dev., 8, 2035–2065,, 2015. a, b, c, d, e, f, g

Nehrbass-Ahles, C., Babst, F., Klesse, S., Nötzli, M., Bouriaud, O., Neukom, R., Dobbertin, M., and Frank, D.: The influence of sampling design on tree-ring-based quantification of forest growth, Glob. Change Biol., 20, 2867–2885,, 2014. a, b

Neuwirth, B., Schweingruber, F. H., and Winiger, M.: Spatial patterns of central European pointer years from 1901 to 1971, Dendrochronologia, 24, 79–89,, 2007. a, b

Nicklen, E. F., Roland, C. A., Csank, A. Z., Wilmking, M., Ruess, R. W., and Muldoon, L. A.: Stand basal area and solar radiation amplify white spruce climate sensitivity in interior Alaska: Evidence from carbon isotopes and tree rings, Glob. Change Biol., 25, 911–926,, 2019. a

Nickless, A., Scholes, R. J., and Archibald, S.: A method for calculating the variance and confidence intervals for tree biomass estimates obtained from allometric equations, S. Afr. J. Sci., 107, 86–95,, 2011. a

Oliver, C. D. and Larson, B. C.: Forest stand dynamics, Wiley, New York, 1996. a

Ols, C., Girardin, M. P., Hofgaard, A., Bergeron, Y., and Drobyshev, I.: Monitoring Climate Sensitivity Shifts in Tree-Rings of Eastern Boreal North America Using Model-Data Comparison, Ecosystems, 21, 1042–1057,, 2018. a

PAGES 2k Consortium, P.: Continental-scale temperature variability during the past two millennia, Nat. Geosci., 6, 339–346,, 2013. a

Panthi, S., Fan, Z.-X., van der Sleen, P., and Zuidema, P. A.: Long-term physiological and growth responses of Himalayan fir to environmental change are mediated by mean climate, Glob. Change Biol., 26, 1778–1794,, 2020. a

Pappas, C., Maillet, J., Rakowski, S., Baltzer, J. L., Barr, A. G., Black, T. A., Fatichi, S., Laroque, C. P., Matheny, A. M., Roy, A., Sonnentag, O., and Zha, T.: Aboveground tree growth is a minor and decoupled fraction of boreal forest carbon input, Agr. Forest Meteorol., 290, 108030,, 2020. a

Paris Agreements: Paris agreement, in: Report of the Conference of the Parties to the United Nations Framework Convention on Climate Change, 21st Session, 2015, Paris, HeinOnline, vol. 4, p. 2017, 2015. a

Peylin, P., Bacour, C., MacBean, N., Leonard, S., Rayner, P., Kuppel, S., Koffi, E., Kane, A., Maignan, F., Chevallier, F., Ciais, P., and Prunet, P.: A new stepwise carbon cycle data assimilation system using multiple data streams to constrain the simulated land surface carbon cycle, Geosci. Model Dev., 9, 3321–3346,, 2016. a

Rammig, A., Wiedermann, M., Donges, J. F., Babst, F., von Bloh, W., Frank, D., Thonicke, K., and Mahecha, M. D.: Coincidences of climate extremes and anomalous vegetation responses: comparing tree ring patterns to simulated productivity, Biogeosciences, 12, 373–385,, 2015. a, b

Randerson, J. T., Hoffman, F. M., Thorton, P. E., Mahowald, N. M., Lindsay, K., Lee, Y., Nevison, C. D., Doney, S. C., Bonan, G., Stöckli, R., Covey, C., Running, S. W., and Fung, I. Y.: Systematic assessment of terrestrial biogeochemistry in coupled climate-carbon models, Glob. Change Biol., 15, 2462–2484,, 2009. a

Rollinson, C. R., Liu, Y., Raiho, A., Moore, D. J. P., McLachlan, J., Bishop, D. A., Dye, A., Matthes, J. H., Hessl, A., Hickler, T., Pederson, N., Poulter, B., Quaife, T., Schaefer, K., Steinkamp, J., and Dietze, M. C.: Emergent climate and CO2 sensitivities of net primary productivity in ecosystem models do not agree with empirical data in temperate forests of eastern North America, Glob. Change Biol., 23, 2755–2767,, 2017. a

Ryder, J., Polcher, J., Peylin, P., Ottlé, C., Chen, Y., van Gorsel, E., Haverd, V., McGrath, M. J., Naudts, K., Otto, J., Valade, A., and Luyssaert, S.: A multi-layer land surface energy budget model for implicit coupling with global atmospheric simulations, Geosci. Model Dev., 9, 223–245,, 2016. a

Sato, H., Itoh, A., and Kohyama, T.: SEIB–DGVM: A new Dynamic Global Vegetation Model using a spatially explicit individual-based approach, Ecol. Model., 200, 279–307,, 2007. a, b

Schulman, E.: Longevity under Adversity in Conifersca, Science, 119, 396–399,, 1954. a

Schweingruber, F. H.: Drimmie Schottland – PISY – ITRDB BRIT021 – RWL, NOAA National Centers for Environmental Information,, 1995. a, b

Schweingruber, F. H.: Schweingruber – El Quintar – PISY – ITRDB SPAI006,, 2020. a

Shen, Y., Fukatsu, E., Muraoka, H., Saitoh, T. M., Hirano, Y., and Yasue, K.: Climate responses of ring widths and radial growth phenology of Betula ermanii, Fagus crenata and Quercus crispula in a cool temperate forest in central Japan, Trees, 34, 679–692,, 2020. a

Smith, B.: LPJ-GUESS-an ecosystem modelling framework, Department of Physical Geography and Ecosystems Analysis, INES, Sölvegatan, 12, 22362, 2001. a

Steppe, K., De Pauw, D. J. W., Lemeur, R., and Vanrolleghem, P. A.: A mathematical model linking tree sap flow dynamics to daily stem diameter fluctuations and radial stem growth, Tree Physiol., 26, 257–273,, 2006. a

Stine, A. R.: Global demonstration of local Liebig's law behavior for tree‐ring reconstructions of climate, Paleoceanogr. Paleoclimatol., 34, 203–216,, 2019. a, b

Strumia, G.: Weinerwald – QUPE – ITRDB AUST112 – RWL, NOAA National Centers for Environmental Information,, 2005. a

Teets, A., Fraver, S., Weiskittel, A. R., and Hollinger, D. Y.: Quantifying climate–growth relationships at the stand level in a mature mixed‐species conifer forest, Glob. Change Biol., 24, 3587–3602, 2018. a

Temme, A. A., Liu, J. C., Cornwell, W. K., Cornelissen, J. H. C., and Aerts, R.: Winners always win: growth of a wide range of plant species from low to future high CO2, Ecol. Evolut., 5, 4949–4961,, 2015. a

Tessier, L.: Mimet (mt. L' Eloile) – PISY – ITRDB FRAN4 – RWL, NOAA National Centers for Environmental Information,, 1996. a

Vaganov, E. A., Hughes, M. K., and Shashkin, A. V.: Growth dynamics of conifer tree rings: images of past and future environments, vol. 183, Springer, New York, 2006. a

Viovy, N.: CRUNCEP data set Version 5.3.2, Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory,, 2016. a

Vuichard, N., Messina, P., Luyssaert, S., Guenet, B., Zaehle, S., Ghattas, J., Bastrikov, V., and Peylin, P.: Accounting for carbon and nitrogen interactions in the global terrestrial ecosystem model ORCHIDEE (trunk version, rev 4999): multi-scale evaluation of gross primary production, Geosci. Model Dev., 12, 4751–4779,, 2019. a, b, c

Wilkinson, S., Ogée, J. J., Domec, J.-C. C., Rayment, M., and Wingate, L.: Biophysical modelling of intra-ring variations in tracheid features and wood density of Pinus pinaster trees exposed to seasonal droughts, Tree Physiol., 35, 305–318,, 2015. a

Williams, M., Richardson, A. D., Reichstein, M., Stoy, P. C., Peylin, P., Verbeeck, H., Carvalhais, N., Jung, M., Hollinger, D. Y., Kattge, J., Leuning, R., Luo, Y., Tomelleri, E., Trudinger, C. M., and Wang, Y.-P.: Improving land surface models with FLUXNET data, Biogeosciences, 6, 1341–1359,, 2009. a

Wilson, B. F. and Howard, R. A.: A computer model for cambial activity, Forest Sci., 14, 77–90,, 1968. a

Wolf, A., Ciais, P., Bellassen, V., Delbart, N., Field, C. B., and Berry, J. A.: Forest biomass allometry in global land surface models, Global Biogeochem. Cy., 25, GB3015,, 2011. a

Yue, C., Ciais, P., Cadule, P., Thonicke, K., Archibald, S., Poulter, B., Hao, W. M., Hantson, S., Mouillot, F., Friedlingstein, P., Maignan, F., and Viovy, N.: Modelling the role of fires in the terrestrial carbon balance by incorporating SPITFIRE into the global vegetation model ORCHIDEE – Part 1: simulating historical global burned area and fire regimes, Geosci. Model Dev., 7, 2747–2767,, 2014. a

Zaehle, S. and Friend, A. D.: Carbon and nitrogen cycle dynamics in the O‐CN land surface model: 1. Model description, site‐scale evaluation, and sensitivity to parameter estimates, Global Biogeochem. Cy., 24, GB1005,, 2010.  a, b, c

Zhang, Z., Babst, F., Bellassen, V., Frank, D., Launois, T., Tan, K., Ciais, P., and Poulter, B.: Converging Climate Sensitivities of European Forests Between Observed Radial Tree Growth and Vegetation Models, Ecosystems, 21, 410–425,, 2018. a

Zhao, S., Pederson, N., D'Orangeville, L., HilleRisLambers, J., Boose, E., Penone, C., Bauer, B., Jiang, Y., and Manzanedo, R. D.: The International Tree-Ring Data Bank (ITRDB) revisited: Data availability and global ecological representativity, J. Biogeogr., 46, 355–368,, 2019. a, b, c

Zuidema, P. A., Vlam, M., and Chien, P. D.: Ages and long-term growth patterns of four threatened Vietnamese tree species, Trees, 25, 29–38,, 2011. a

Zuidema, P. A., Poulter, B., and Frank, D. C.: A Wood Biology Agenda to Support Global Vegetation Modelling, Trends Plant Sci., 23, 1006–1015,, 2018. a

Zuidema, P. A., Heinrich, I., Rahman, M., Vlam, M., Zwartsenberg, S. A., and Sleen, P.: Recent CO2 rise has modified the sensitivity of tropical tree growth to rainfall and temperature, Glob. Change Biol., 26, 4028–4041,, 2020. a

Short summary
We have proposed and evaluated the use of four benchmarks that leverage tree-ring width observations to provide more nuanced verification targets for land-surface models (LSMs), which currently lack a long-term benchmark for forest ecosystem functioning. Using relatively unbiased European biomass network datasets, we identify the extent to which presumed biases in the much larger International Tree-Ring Data Bank might degrade the validation of LSMs.