Inaccurate parameter estimation is a significant source
of uncertainty in complex terrestrial biosphere models. Model parameters may
have large spatial variability, even within a vegetation type. Model
uncertainty from parameters can be significantly reduced by model–data
fusion (MDF), which, however, is difficult to implement over a large region
with traditional methods due to the high computational cost. This study
proposed a hybrid modeling approach that couples a terrestrial biosphere
model with a data-driven machine learning method, which is able to consider
both satellite information and the physical mechanisms. We developed a
two-step framework to estimate the essential parameters of the revised
Integrated Biosphere Simulator (IBIS) pixel by pixel using the
satellite-derived leaf area index (LAI) and gross primary productivity (GPP)
products as “true values.” The first step was to estimate the optimal
parameters for each sample using a modified adaptive surrogate modeling
algorithm (MASM). We applied the Gaussian process regression algorithm (GPR)
as a surrogate model to learn the relationship between model parameters and
errors. In our second step, we built an extreme gradient boosting (XGBoost)
model between the optimized parameters and local environmental variables.
The trained XGBoost model was then used to predict optimal parameters
spatially across the deciduous forests in the eastern United States. The
results showed that the parameters were highly variable spatially and quite
different from the default values over forests, and the simulation errors of
the GPP and LAI could be markedly reduced with the optimized parameters. The
effectiveness of the optimized model in estimating GPP, ecosystem
respiration (ER), and net ecosystem exchange (NEE) were also tested through
site validation. The optimized model reduced the root mean square error
(RMSE) from 7.03 to 6.22 gC m

Accurate quantification of the terrestrial carbon budget is crucial for understanding the global carbon cycle and biosphere–atmosphere interactions, and informing climate projections (Fernández-Martínez et al., 2018; Piao et al., 2020). Land surface models (LSMs) built on process-based mechanisms of atmosphere–biosphere interactions are often used to simulate the behavior of the terrestrial carbon cycle in response to a changing climate (Peaucelle et al., 2019). These models typically use a large number of parameters. Prescribed values of model parameters based on a theoretical assumption, empirical analysis, or field measurements are prone to substantial uncertainties which can engender inaccurate projections in modeling (Barman et al., 2014; Keenan et al., 2012; Kuppel et al., 2014). Parameter estimation is becoming more challenging, especially when greater details are introduced to enhance the authenticity and interpretability of LSMs (Famiglietti et al., 2021).

Model–data fusion (MDF) is an increasingly used method that can be leveraged
to reduce the model–data misfit by calibrating parameters. Researchers have
mainly used the site and sample measurements, raw reflectivity observations,
and satellite-based products to estimate parameters in complex terrestrial
biosphere models, and significant progress has been made in the integration
of univariate observations (MacBean et al., 2016). However, due to the
complex relationship between model variables, univariate assimilation is
often not enough to constrain the vegetation parameters involved in multiple surface processes, and it is
still necessary to introduce other data streams to increase the constraints
on model parameters (Fernández-Martínez et al., 2018; Liu et al.,
2015; Schürmann et al., 2016; Zobitz et al., 2014). For example, soil
moisture observations can be regarded as an additional constraint of a plant
functional type (PFT) due to the tight coupling of carbon and water cycles
in vegetation photosynthesis (Scholze et al., 2016), and some remote sensing
products, such as the fraction of photosynthetically active radiation data
(FAPAR) and leaf area index (LAI) could bring phenology information in
constraining long-term vegetation dynamics. Joint constraints of LAI or FAPAR
products and in situ observations of carbon fluxes or atmospheric CO

The parameter estimation of complex models has been well studied at the site scale. Many researchers used observations to optimize parameters at the site scale and then applied the optimized parameters to regions. However, when a PFT covers a broad area, ecosystem characteristics, density, or disturbance history can vary substantially across the region. In this case, site-scale parameters with a small footprint cannot be considered spatially representative, and the relationship between observations and models may not be spatially scalable (Raoult et al., 2016; Xiao et al., 2014; Zhou et al., 2020). Finding a globally applicable parameter scheme is complicated for LSMs, as it requires a high computational cost (Gong and Duan, 2017). Machine learning (ML) approaches are flexible in adapting to an increasing stream of geospatial products, making it easy to extract patterns and combine them with physical models as an additional source of information. It offers an opportunity to improve or accelerate parameterizations by integrating model simulation and multiple observations or high-quality spatial products more intensively in different ways. For example, Chaney et al. (2016) carried out parameter calibration at 85 eddy covariance flux sites, and then obtained the spatial parameters using extra trees and combining local environmental characteristics. Although the parameters showed significant spatial variability, their study did not verify the performance of the optimized model. Tao et al. (2020) constrained parameters at the site scale to improve the soil organic carbon simulation and then extended the optimized parameters to the United States, utilizing a neural network. Their results showed that the model error was significantly reduced when the spatial heterogeneity of optimal parameters was considered. These studies using flux sites to calibrate parameters can ensure the parameter accuracy at each site, while it is likely to cause potential problems of overfitting due to limited training samples, which makes it difficult to guarantee the accuracy when extending the resulting optimal parameters to a broad region using ML.

To date, few researchers conducted pixel-level parameterization because
parameter optimization often depends on a large number of parameter
samplings and model operations, especially when using the Markov chain Monte
Carlo (MCMC) method as the optimization algorithm. MCMC is usually applied
to model parameter calibration to obtain the optimal posterior probability
distribution of parameters (Yuan et al., 2012; Safta et al., 2015). It
typically requires thousands of model simulations, which are excessively
expensive for complex LSMs that may take hours for each model simulation
(Fer et al., 2018). ML can be an innovative method to conduct surrogate
modeling or emulation in parameter optimization (J. Li et al., 2018;
Reichstein et al., 2019). ML is regarded as an effective method to speed up
model parameterizations and can make better use of the abundant spatial
information provided by the multisource high-precision remote sensing
products. Researchers have proved the feasibility of surrogate modeling and
indicated that once the ML emulator is trained well, the optimization speed
can be increased by an order of magnitude without much loss of accuracy
(Sawada and Koike, 2014; Fer et al., 2018). For example, Gong and Duan (2017)
pointed out that using a surrogate model can reduce the number of model runs
from 10

Some researchers considered finding unknown regional parameters using an ML emulator but with a very coarse resolution (Dagon et al., 2020; J. Li et al., 2018). This is because a large number of model runs are still needed to train the surrogate model to ensure its accuracy, even though adaptive surrogate modeling-based optimization algorithms have been developed to reduce the initial training samples (J. Li et al., 2018; Gong and Duan, 2017). In this paper, we explored a two-step method of combining ML and a physical model to improve the calibration speed of spatial parameters, made full use of high-quality remote sensing products to calibrate the model at each pixel, and carried out a study on the deciduous forests (DF) in the eastern United States. We first performed pixel-level parameter calibration using surrogate running instead of a terrestrial model within samples in our region, and then expanded the optimal parameters obtained from samples into spatial distribution using the ML approach with several environmental variables. In this way, the results of our first step can provide more samples for training ML model than directly using ML for spatial expansion of optimized parameters. Compared to studies that simply use surrogate models to optimize parameters, our second-step ML extension makes it easier to obtain large-scale, high-resolution parameter space distributions.

This paper is organized as follows. Section 2 briefly introduces a terrestrial model and related data sources. Section 3 describes the two-step pixel-by-pixel region parametrization algorithm and the experimental setting. Section 4 shows the results of thus optimizing parameters and presents a spatial analysis of carbon fluxes before and after the optimization. Uncertainty analysis and future work are described in Sect. 5, followed by a conclusion in Sect. 6.

The Integrated Biosphere Simulator (IBIS) model integrates many land-surface ecosystem processes into a complex physical mechanism model, which is divided into the land surface, vegetation phenology, carbon balance, and vegetation dynamic modules (Foley et al., 1996; Yuan et al., 2014). It has a hierarchical structure operating on 60 min or 1-year time steps. IBIS considers three snow layers and six soil layers in each pixel and determines PFTs based on different ecological characteristics. Detailed information about IBIS is available in Foley et al. (1996) and Kucharik et al. (2000).

A simple function is used in IBIS to describe the relationship between leaf
behaviors and phenological status, which forecasts the onset of budburst
when suitable temperatures are reached. It describes the defoliation when
the 10 d average daily air temperature is below 5

Biomass allocation in the IBIS model is updated annually, and therefore it
is difficult to accurately describe the influence of meteorological factors
on LAI and to capture the detailed dynamics of leaves (Kucharik et al.,
2006). Capturing the daily dynamics of LAI requires changes in the leaf
biomass pool each day. For this process, we incorporated part of the carbon
allocation in the data assimilation linked ecosystem carbon (DALEC) model
(Chuter et al., 2015) into the original IBIS model, a simple box model
evaluating daily six-carbon pools for deciduous forests. The photosynthesis
rate of deciduous forests conveyed following the Farquhar equation (Farquhar
et al., 1980) can be expressed as the minimum of light and Rubisco-limited
rates of photosynthesis in the IBIS model, and then we can obtain gross
primary productivity (GPP) at each time step on a pixel basis.
Photosynthates stored in daily GPP will be allocated to the foliage
(

We simulated the LAI and carbon fluxes for the deciduous forests in the
eastern United States from 2000 to 2019 at a spatial resolution of
0.05

For model initialization, the improved IBIS was cyclically spun up for 50 years by repeating meteorological datasets in 2000. The final soil and carbon states of initialization run at each pixel reached a quasi-equilibrium and were saved as inputs for the subsequent transient simulation. Default values for the initial carbon pools (six pools for DF) used in DALEC modeling were set to the same scheme as Chuter et al. (2015).

Initial soil and vegetation properties and daily climate data are required
in IBIS. The soil parameters include soil sand and clay content (%),
which were gathered from available comprehensive, gridded Global Soil
Datasets for use in Earth System Models (GSDE) with a resolution of 30 arcsec (Shangguan et al., 2014). Forcing meteorological datasets are
maximum and minimum temperature (

Description of the datasets used in this study

The primary vegetation type in each pixel was identified by the
high-resolution land cover maps from the National Land Cover Database (NLCD,
30 m) (Homer et al., 2020, available on

This study used global LAI and GPP products from the Global Land Surface
Satellite (GLASS) suite as “observations” (“true values”) for parameter
calibration on a spatial scale. This dataset has long-time coverage, high
spatial resolution (500 m and 0.05

We collected 14 eddy covariance flux sites of deciduous forests in the
eastern United States from FLUXNET2015 (“Tier 1”) (Pastorello et al.,
2020,

We employed adaptive surrogate modeling in this study to establish a fast
and practical framework for the iterative optimization of a complex physical
model. We slightly improved the adaptive surrogate modeling-based
optimization parameter optimization and distribution estimation (ASMO–PODE)
method proposed by Gong and Duan (2017). We applied Bayesian optimization
twice to reduce the time cost of Monte Carlo iterations. The modified
procedure (Fig. S1) is as follows.

At the end of each Monte Carlo iteration in Bayes1, five representative
samples were selected adaptively from the posterior distribution of
parameters and were then used as inputs to the IBIS model to obtain the
corresponding

Adding a new module inevitably introduces more parameters (Sect. 2.1.1),
resulting in higher computational costs. Before calibration, we needed to
screen parameters and select the ones with the most influence on the target
variables for calibration. A few parameters related to photosynthesis,
carbon allocation, respiration, and water stress were considered in our
revised IBIS model. All the parameters were restricted to a predefined range
according to several published studies (Cunha et al, 2013; Varejão et
al., 2013; Bloom et al., 2016; Chuter et al., 2015; Lu et al., 2017). Table 2 lists all parameters, together with the prior ranges and their
descriptions. Some parameters are independent of vegetation types by default
(e.g., alpha3 and theta3), while those related to turnover time or
allocation are generally regarded to vary by type of vegetation (e.g.,
tauleaf, aleaf, and aroot). We randomly chose

Description and prior ranges of IBIS–DALEC parameters.

Sensitivity of

We randomly selected another sample (

A pixel-by-pixel optimization was performed stepwise for each sample using the MASM method with the GPP and LAI from the GLASS products as the observed values. The daily GPP simulated by the model was aggregated into 8 d, and the MASM method was used for iterative optimization to obtain the posterior probability distribution of sensitive parameters of GPP. Considering p5, p14, and p20 have more obvious effects on LAI (Fig. 1), we only used mean values of alpha3, theta3, beta3, and vmax as the fixed inputs for the second step optimization with GLASS LAI as the reference. Adopting such a stepwise optimization required the use of a “parameter block” approach so that each data stream could optimize only the more strongly correlated parameters (Wutzler and Carvalhais, 2014). We tested the effect of optimization order on a small number of samples, and the results showed no significant differences. A multivariable stepwise optimization performed by Alton (2013) also showed that optimization order did not distinctly change the results.

The optimal parameter scheme of each sample was obtained after calibration. Considering the spatial differences in the environmental characteristics, the machine learning (ML) method was used to expand the optimal scheme of the samples. We used the eXtreme Gradient Boosting (XGBoost) algorithm to describe the nonlinear relationship between parameters and environment variables, which has achieved great success in ML and has been widely used in remote sensing classification, surface variable inversion, and information extraction (Zhong et al., 2019; Pilaš et al., 2020; Liu et al., 2021).

Similar to sample running, the regional simulation was also forced with
multisource driving datasets (Table 1) and adopted the same spin-up scheme
to reach equilibrium. We ran the spatial-scale parameters predicted by
XGBoost back into the physical model and verified the final accuracy. After
obtaining a parameter distribution with a resolution of
0.05

The evaluation in this paper involves three aspects: (1) verification of the
accuracy of MASM when calibrating parameters; (2) verification of the
feasibility of the XGBoost method in terms of predicting parameters; and (3)
verification of the accuracy in spatial-scale and multi-product comparisons.
We used Pearson's correlation coefficient (

After parameter calibration of all samples, the posterior parameter distributions of the 10 sensitive parameters under the joint constraint of GLASS LAI and GPP were obtained by considering the last 100 000 values. Here, we only showed the posterior distributions of 10 representative pixels that were randomly selected (Fig. 2). Parameter behaviors showing an obvious unimodal pattern were considered as well-constrained, the ranges of which were within the previous definition. Most sensitive parameters (except alpha3, theta3, and beta3) exhibited a well-constrained distribution but with different degrees of concentration. GPP had weak constraints on alpha3, theta3, and beta3, mainly reflected in the scattered distributions and wide fluctuation ranges. With the Morris index shown in Fig. 1, we found that GPP seemed insensitive to small perturbations of these parameters. Comparatively, one significant limitation occurred with vmax, the maximum carboxylase capacity of Rubisco, which had a pronounced effect on plant photosynthesis.

Posterior probability distribution of 10 representative samples.

The model structure also affected the limiting effect of observations on parameters. The relationship between LAI and its sensitive parameters was more direct without repeated transmission through complex processes. Thus, the posterior distributions of most samples were well constrained. Edge-hitting distribution appeared when the previous ranges of parameters were unsuitable for selected samples, showing that retrieved values were clustered near the highest or lowest bounds of the prior ranges (Fig. 2, 1-3, and 6-6). The rationality of the parameter prior range is not discussed in this paper, but edge-hitting distribution emphasized the importance of prior knowledge for optimization. The differences between samples were related to the spatial variation of parameter sensitivity, and the quality of the GLASS GPP and LAI products.

The posterior mean was chosen as the best estimate of each parameter. Thus, we obtained the histogram distribution of the optimal parameters for all samples (Fig. 3). The optimal values of each parameter among samples had noticeable spatial variations that were significantly different from their default values. The Rubisco enzymes exhibited stronger carboxylase capacity, shown as higher vmax after calibration (Fig. 3d), and the leaf turnover rate (p5) also decreased by more than 30 % for most samples. For p14, the optimized value was less than half the default value, which indicated that the proportion of carbon in leaf loss transferred to litter was notably reduced. The results proved that using site-scale optimal parameters to define regional optimal schemes could not fully capture the discrepancy of ecophysical characteristics within a PFT, especially when a PFT type covers a large area.

Histogram distribution of optimal parameters from all samples using the MASM optimization.

With the constrained parameters resulting from the MASM optimization, we took the mean of each posterior distribution as the optimal value and re-entered it into the model to obtain the daily LAI and GPP simulations. Figure 4 shows the accuracy verification of LAI and GPP simulations against the GLASS products on the scale of every 8 d before and after parameter optimization.

Validation of samples with optimal parameter schemes:

We calculated the DISO value, a comprehensive indicator used for accuracy
verification, for each sample. The histogram of DISO for all samples is
illustrated in Fig. 4. For LAI, the DISO of more than 98 % pixels was less
than 0.6, and the overall mean value was 0.33. For GPP, the DISO of more
than 96 % of the pixels was less than 0.4, and the overall mean was 0.27.
The calibrated IBIS model simulated LAI well on an 8 d basis (

The parameters calibrated in this study were divided into two categories: (1) parameters related to the photosynthesis process including alpha3, theta3, beta3, and vmax; and (2) parameters related to carbon conversion including p5, p14, p15, p17, p18, and p20 (Table 2). Considering the environmental heterogeneity of each pixel, we explored the relationship between these 10 parameters and local environmental characteristics from the aspects of climate, soil, location, and important surface variables (like LAI, GPP, and photosynthetically active radiation (PAR) that closely related to photosynthesis) (Table S2). The vapor pressure deficit (VPD), day length, and shortwave radiation data (SRAD) were also considered as influence factors. VPD affects the rate and intensity of evapotranspiration. When plants close their stomata to adapt to high VPD, photosynthesis slows down and the growth rate slows down (Y. Li et al., 2018). Shortwave radiation provides the energy for photosynthesis; temperature and sunshine length in the growing season also affect the gas exchange characteristics of leaves (Rogers, 2014). We also added elevation information as areas with high altitudes are less affected by human activities and may have a higher carbon sequestration capacity compared with low-altitude areas.

We used the annual mean values of the selected features and the growth
season statistics of temperature, VPD, SRAD, and PAR as the inputs of XGBoost
modeling. We set 10 optimal parameters obtained by MASM as the target
variables (i.e., the outputs of XGBoost). However, it was not guaranteed
that posterior values obtained through the MASM approach were the best
selection for each pixel due to observation (GLASS products) quality and
algorithm uncertainty. Large errors indicated that calibrated parameters
were not suitable for those samples, which would affect spatial prediction
if they were included in training data for machine learning. In this paper,
the screening indexes were defined according to the DISO values of LAI and
GPP. All the samples with DISO

Figure 5 shows the validation after running XGBoost methods. For parameters with strong effects (e.g., vmax, p18) on GPP and LAI, the fitting accuracy could reach more than 0.8, while parameters with lower sensitivity (like alpha3, theta3, beta3, p14, and p15) tended to have slightly lower accuracy. This is because, in the MASM optimization process, the optimal parameters were mainly constrained by LAI and GPP data, which also led to a more prominent order of LAI and GPP in feature importance ranking when the XGBoost model was trained.

Scatter plots of the results of the XGBoost model. The

When the parameters from the XGBoost simulation and MASM posterior distributions were used for the IBIS model, the estimated LAI and GPP showed high correlations with those of the GLASS products (2001–2018). For LAI, the correlation was above 0.6, while that of GPP was mostly above 0.8. Parameters obtained by XGBoost and MASM were highly consistent in capturing the degree of correlation with GLASS products. For the testing set, the estimated errors (RMSE and DISO) using XGBoost were slightly less correlated with the corresponding accuracy indexes of the MASM approach, but the range was similar. DISO was distributed below 0.5 of LAI, while GPP was distributed below 0.3. For the RMSE and DISO indexes, samples above the diagonal indicated results better estimated using MASM-optimized parameters, and those below the diagonal were better estimated by using parameters for the XGBoost simulation. The XGBoost performance was superior to that of MASM in terms of the final validation of parameters. For example, for testing LAI (Fig. 6c-3), 52 % of the pixels showed that the parameters obtained by XGBoost were more accurate, while the ratio reached 60 % for testing GPP (Fig. 6d-3). The possible explanation was that XGBoost used more environment variables related to the parameter; hence, the results may be more appropriate for each pixel. Moreover, the uncertainty of the MASM algorithm and the diversity of the posterior probability distribution model also affected the selection of the optimal parameters. In addition to better accuracy, another benefit of using the XGBoost method was that the calculation cost of parameter calibration was greatly reduced compared to surrogate modeling optimization.

Accuracy indexes (

By combining the environmental characteristics, the optimal parameter
distribution of each pixel was estimated for the entire deciduous forest
area (0.05

Spatial distribution of posterior parameter. The inset in the lower right corner of each plot represents the spatial distribution histogram of each parameter.

Compared with the original parameter ranges, the posterior distributions were significantly more concentrated (see “Pars_Hist” in Fig. 7). The overall ranges were greatly reduced, especially for theta3, beta3, p15, p17, and p18. There was a great difference in the distribution between optimal parameters and default values in the whole study area. For theta3 and beta3, the default values were outside the whole statistical range, which would induce a large error when using defaults to estimate carbon fluxes. For vmax, the optimal values were higher than the default values and had evident spatial heterogeneity. Vmax is a key source of uncertainty in current ecosystem models, and adopting a fixed value may give rise to large systematic error (Croft et al., 2017; Bonan et al., 2011; Liu et al., 2014; Walker et al., 2014). Rogers (2014) surveyed the derivation of vmax in earth system models and found a wide range of variation among vegetation types, and Bonan et al. (2011) showed that model uncertainty resulting from this parameter was comparable with that from the model structure.

Using the parameters estimated by XGBoost, we obtained the spatiotemporal
distribution of optimized LAI and GPP. The estimated LAI and GPP before and
after optimization were compared with those of the GLASS products. Figure 8
shows the frequency distribution of the accuracy indexes (

Histogram error statistics of LAI and GPP by running prior and posterior parameter schemes.

Figure 9 manifests the spatial distribution of the absolute error between
estimated variables and GLASS products. The error distributions of the
default parameters indicated that the error of LAI and GPP had similar
spatial distributions. A high discrepancy of the previous model compared
with GLASS mainly occurred in the middle and northwest parts of the study
area. The absolute errors were dramatically improved after optimization. For
LAI, the overall error was below 0.6 m

Spatial distribution of annual absolute error between
model results and GLASS products from 2001 to 2018.

Compared with the default estimates, our optimized fluxes had improved with
RMSE reduced by 12 % for GPP, 20.38 % for ER and 1.57 % for NEE, while
the correlation coefficients decreased slightly for GPP and NEE (Fig. S2).
Using DISO as a comprehensive evaluation indicator, we verified the GPP
before and after optimization at 14 flux sites, and also summarized the
effect of parameter optimization on ER and NEE for every 8 d. When the
GLASS GPP product was regarded as the “true value”, the optimized model
successfully estimated the magnitudes and temporal variations of GPP, and
the DISO values of several sites improved by more than half throughout the
whole year (e.g., US-Bar, US-LPH) (Fig. 10, gray bars). This is because the
GLASS GPP product was used as the reference while calibrating the sensitive
parameters. Considering the uncertainties of the GLASS product, we evaluated
GPP, ER, and NEE using data from flux observation sites as a “true value”.
Collectively, the optimized model improved the flux estimates at most flux
towers compared with those from the original model (DISO

Optimized carbon fluxes performances for 14 flux sites
throughout the whole year (All), growing season (GS, May to September), and
non-growing season periods (NGS). The

We calculated the DISO indicators between GLASS GPP and flux observations (DGLA) for the three periods (annual, growing season, and non-growing season) (see DGLA values shown in Fig. 10). The values of DISO in the non-growing season were significantly higher than those in GS, especially at the US-Bar, US-Wi1, and US-LPH sites, which indicated that GLASS GPP had a lower performance in the non-growing season for these 14 sites. This also explains why the difference between the optimized carbon fluxes and the observation was larger in the non-growing season. For US-Wi8, the GLASS GPP differed greatly from the observed data (with DISO of 1.12), and therefore the optimized results showed lower accuracy than those based on the default parameters. The timing of the peak GPP generally closely matched that of the flux tower GPP, although there was still significant underestimation for several sites (e.g., US-LPH, US-MMS, and US-Oho).

We used our optimized model to estimate GPP, ER, and NEE for each
0.05

Regional patterns of mean annual carbon fluxes (GPP, NEE,
and ER) for the years 2000–2019. Panels

Figure 12 showed the interannual variability in carbon fluxes (Fig. 12a)
and percentages of variation after optimization for the whole year, the
growing season, and non-growing season (Fig. 12b–c). The mean annual GPP, ER,
and NEE over the deciduous forests in the eastern United States for the
period from 2000 to 2019 were 5.79, 4.60, and

Temporal variations of each carbon flux (GPP, NEE, and
ER) over the deciduous forests in the eastern US (2000–2019).

Our pixel-level calibration was divided into two parts. The first part used
the MASM algorithm to calibrate each sample. We recorded the simulated CF of
GPR model constructed in each adaptive resampling and the simulated value of
the real model, which was expressed by GPR

It was found that they were correlated and had unusually high values (Fig. S3). When the error of the GPR model was large (i.e., the GPR model cannot
accurately represent the IBIS model), it was difficult to produce a smaller
CF through iterative optimization. Therefore, the optimal parameter scheme
for such samples would cause large errors when simulating LAI and GPP. From
an algorithmic perspective, the number of iterations may have been
insufficient during MCMC sampling. However, it should be noted that the
increased number of iterations definitely increased the calculation cost.
Two reasons could explain MASM uncertainty: (1) different samples may have
had different sensitivity to parameters, and therefore we could not
guarantee that the sensitive parameters of every sample were highly
correlated with the model error; and (2) the prior range of parameters may
have been inappropriate, failing to obtain a suitable combination. To ensure
the accuracy of the input training set during the later training of the
XGBoost model, sample pixels with excessive GPR

It was found that the posterior distribution of the optimized parameters in each pixel had significant differences in distribution forms (single peak or multipeak), mean values, and fluctuations. The poorly constrained results meant that the model predictions were slightly sensitive while changing those parameters; for a multipeak distribution, there were multiple combination schemes for this sample that met the requirement of minimum CF; and edge patterns representing posterior values were skewed to one side of the previous ranges, which reflected the defect of the model structures or the irrationality of the prior parameter ranges (Liu et al., 2015; Mäkelä et al., 2019). The uncertainty boundaries of these parameters were likely to be unrealistic and could lead to overconfidence in model predictions (Lu et al., 2017). The limiting effect of observations was strongly related to the sensitivity of observed variables to parameters, which indicated that the spatial variability of parameter sensitivity should also be considered in parameter optimization. The interaction between parameters should also be considered in parameter estimation (Fig. S4).

We provided an optimal parameter scheme for each pixel of the eastern United States and a more concentrated range of proposed parameters through the calibration that may be helpful for others to further find optimal parameters with high efficiency within this area. Although there had been many studies on parameter calibration, significant inconsistency still existed among the same parameters for different ecosystem models. This was mainly because the model and input data errors were compensated by parameter adjustment, and therefore it was difficult to ensure that the estimated parameters could be explained theoretically. When expanding the spatial domain of parameters, the limited understanding of the influences on each parameter also prevented us from estimating the actual values of these parameters. For example, previous studies found that vmax is closely related to leaf nitrogen content, and an increase of phosphorus content in leaves significantly improves the sensitivity of vmax to leaf nitrogen (Walker et al., 2014). Leaf and labile carbon turnover rates (p5, p15) were key factors determining the carbon sequestration capacity of terrestrial ecosystems. Wang et al. (2017) studied the biological and abiotic factors affecting forest carbon turnover time through quadrat observation and showed that multiple factors such as soil nutrients (e.g., carbon, nitrogen, and phosphorus), pH, and forest ages could not be ignored. It was also pointed out that the carbon turnover time could vary with time, which was not considered in this study. In addition, the calculation of LAI in DALEC was so simple that the sensitive parameters involved were correlated with LAI values. However, the calculation of GPP in IBIS involved many complex biochemical processes and factors, and thus the effect of GPP on parameters was not apparent. We could not guarantee that the parameters of each pixel were well-constrained in sample calibration. For samples with poor and edge-hitting constraints, if the simulated LAI and GPP showed good accuracy, we also took the optimal values as training samples for XGBoost, which could negatively impact model prediction.

Generally, the previously estimated carbon fluxes matched the distribution of several key environmental variables (e.g., temperature, day length, radiation, specific humidity) more closely (Fig. S5), the trend of which was that the photosynthesis and respiration of deciduous forests gradually increased with the decrease of latitude. The information provided by climate conditions was reflected in the carbon flux simulation through the simulation of the model. As the physiological parameters used in each pixel were uniform values in the original model and only a single vegetation type was considered, the sources of spatial differences in carbon fluxes were mainly from the differences in driving data sets. The integration of GLASS products introduced the spatial distribution pattern information of the LAI and GPP products into the model by adjusting the distribution pattern of key parameters, which made the optimized spatial pattern distribution more reasonable. The optimized model with calibrated parameters can provide more accurate LAI and GPP information and the temporal and spatial distribution of both LAI and GPP were closer to those of the GLASS products. Although a decrease in the RMSE of the optimized fluxes performed a better validation, particularly around the period of peak fluxes, the results also indicated that when there is quite a distinction between GLASS and ground observation, it is difficult to successfully capture the variations of each flux site. For example, our results of most sites (e.g., US-Bar, US-WCr, and US-Wi3) showed that optimized NEE exhibited overestimation of net carbon uptake during the non-growing season. Overall, the accuracy of ER and NEE was not significantly improved based on the site-level validation (Fig. S2; Fig. 10). The reason for this may be that respiration is closely related to the size of carbon pools, while factors related to forest age and carbon pools are not considered in our current research. We will try to integrate forest age and biomass products to improve the key parameters of terrestrial carbon pools (such as allocation, turnover rate, and respiration rate) in the next exploration so that we can improve the simulation of respiration and vegetation carbon storage.

It is already known that the accuracy of spatial reference products (GLASS
products in this paper) is the key factor affecting the accuracy of carbon
fluxes during the model–data fusion. The GLASS LAI and GPP products have
been validated against globally available ground measurements and compared
with several different simulations from other products. The results showed
that GLASS products performed well in accuracy validation and
spatial–temporal variations (Zheng et al., 2020; Ma and Liang, 2022). We
also compared the accuracy differences between GLASS GPP product and several
other GPP products (GPP derived from Breathing Earth System Simulator – BESS, GPP derived from the Vegetation Photosynthesis Model – VPM, GPP derived from the upscaling approaches based on machine learning methods – FLUXCOM, and MODIS) in our research region
before regarding as a reference while optimizing parameters, and found that
GLASS estimated GPP fairly well. The

Multiple data streams and more relevant state variables should be applied as a way to mitigate the deviations from a single product or variable. We considered both LAI and GPP in the optimization but their contributions to the model improvement were not evaluated separately, which should be taken into account in future work. ML provides a convenient approach for integrating more spatial products with physical models, and we expect more explorations on developing a hybrid modeling framework to couple ML with physical models or explain and even compensate model errors due to lack of prior knowledge. Such a combination can increase the credibility of future carbon budget estimation and strengthen the rationality and interpretability of ML. In addition, we also put forward a demand for higher resolution and high-precision satellite products, which are necessary for model improvement and for carrying out benchmarking tests to comprehensively evaluate the performances of different models.

It was generally accepted that the model parameters had spatial heterogeneity, but calibrating a complex process-based model at the pixel level was not realistic, especially with an increase in spatial resolution. This paper proposed a two-step framework for estimating optimal parameters at the pixel level. We randomly sampled the study area and used the GPR model as the surrogate model and then applied the MASM algorithm for iterative optimization to obtain the posterior distributions of samples. Next, we used XGBoost to describe the nonlinear relationship between optimal parameters and local climate, soil, and surface variables, and extend to the entire deciduous forests in the eastern United States. Our method provided an optimal parameter scheme for each pixel and confirmed that the discrepancy between GLASS products and predicted values was significantly reduced with the optimized parameters. The results showed that there was significant spatial variability of parameters within a vegetation type, and that using high-quality satellite products could efficiently calibrate the parameters of terrestrial biosphere models at the pixel level. Although we tested our approach only for deciduous forests of the eastern United States, it provided a feasible scheme for spatial calibration of other vegetation types, at higher resolutions and in larger areas.

We provide an optimization script of several pixels and their
driving data sets as materials to support the implementation of MASM
algorithm in calibrating the IBIS simulator. All source code and the
additional information of main codes used in our manuscript are packaged at

The supplement related to this article is available online at:

RM, SL, and JX conceived the study. RM performed the analysis. RM, SL, and JX interpreted the results. RM wrote the draft. JX, HM and TH revised the article. DG and XL contributed to the preprocessing of all the model drivings and helped with running machine learning approach. HL guided the operation of IBIS model. All authors contributed to the preparation of the article.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We gratefully acknowledge the data support from National Earth System Science Data Center, National Science & Technology Infrastructure of China (

This research has been supported by the National Key Research and Development Program of China (grant no. 2016YFA0600103) and the National Natural Science Foundation of China (grant no. 42090011). Jingfeng Xiao was supported by the University of New Hampshire.

This paper was edited by Tomomichi Kato and reviewed by two anonymous referees.