GMDGeoscientific Model DevelopmentGMDGeosci. Model Dev.1991-9603Copernicus GmbHGöttingen, Germany10.5194/gmd-8-3285-2015CH4 parameter estimation in CLM4.5bgc using surrogate global optimizationMüllerJ.juliane.mueller2901@gmail.comPaudelR.ShoemakerC. A.WoodburyJ.WangY.MahowaldN.Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USAEarth and Atmospheric Sciences, Cornell University, Ithaca, NY 14853, USASchool of Civil and Environmental Engineering, Cornell University, Ithaca, NY 14853, USAJ. Müller (juliane.mueller2901@gmail.com)20October2015810328533104November20146January201524June20155October2015This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/This article is available from https://gmd.copernicus.org/articles/8/3285/2015/gmd-8-3285-2015.htmlThe full text article is available as a PDF file from https://gmd.copernicus.org/articles/8/3285/2015/gmd-8-3285-2015.pdf
Over the anthropocene methane has increased dramatically. Wetlands are one of
the major sources of methane to the atmosphere, but the role of changes in
wetland emissions is not well understood. The Community Land Model (CLM) of
the Community Earth System Models contains a module to estimate methane
emissions from natural wetlands and rice paddies. Our comparison of CH4
emission observations at 16 sites around the planet reveals, however, that
there are large discrepancies between the CLM predictions and the
observations. The goal of our study is to adjust the model parameters in
order to minimize the root mean squared error (RMSE) between model
predictions and observations. These parameters have been selected based on a
sensitivity analysis. Because of the cost associated with running the CLM
simulation (15 to 30 min on the Yellowstone Supercomputing Facility), only
relatively few simulations can be allowed in order to find a near-optimal
solution within an acceptable time. Our results indicate that the parameter
estimation problem has multiple local minima. Hence, we use a computationally
efficient global optimization algorithm that uses a radial basis function
(RBF) surrogate model to approximate the objective function. We use the
information from the RBF to select parameter values that are most promising
with respect to improving the objective function value. We show with pseudo
data that our optimization algorithm is able to make excellent progress with
respect to decreasing the RMSE. Using the true CH4 emission observations
for optimizing the parameters, we are able to significantly reduce the
overall RMSE between observations and model predictions by about 50 %.
The methane emission predictions of the CLM using the optimized parameters agree better with the observed methane emission data in northern and tropical latitudes. With the optimized parameters, the methane emission predictions are higher in northern latitudes than when the default parameters are used. For the tropics,
the optimized parameters lead to lower emission predictions than the default parameters.
Introduction and motivation
Methane is the second most important greenhouse gas in terms of radiative
forcing and thus a major concern regarding climate change.
Natural wetlands as well as human activities such as agriculture (for
example, rice cultivation) contribute to the methane emissions
. The role of wetlands in the total budget of methane, as
well as in driving inter-annual variability and changes in the methane growth
rate is not well understood (e.g. ). The
Community Land Model (CLM), which is the land component of the Community
Earth System Model (CESM), is equipped with a methane module that models
methane emissions . There are several parameters
in CLM related to the methane emission computations. The methane emissions
estimated by the model are sensitive to the exact parameter values although
these parameters are not well known (e.g. ). and reported significant
differences in model simulations and observations in both site-level methane
emissions and the global budget. One important source of uncertainty is
associated with the parametrization since the methane module has numerous
parameters and they are yet to be identified empirically due to the lack of
field data . In this study our goal is to use surrogate
model optimization techniques in order to adjust the methane-related
parameters of the CLM such that the differences between the simulated and
observed methane emissions at 16 sites around the globe are minimized.
For computing an objective function value, we have to do a computationally
expensive simulation with CLM4.5bgc in order to obtain the methane emission
predictions at each observation site. CLM4.5bgc and related codes are
deterministic models, i.e., the simulated CH4 emissions for a given
parameter set will always be the same whenever we run the model for the same
parameter set. In an optimization framework where the goal is to find the
best set of parameters to minimize the objective function, one obstacle is
the computation time that is needed to obtain a single objective function
value. Only a few hundred function evaluations can be allowed in order to
obtain a solution within a reasonable time. Moreover, the objective function
value must be computed by running a simulation model, and thus an analytic
description of the objective function is not available (black-box).
Therefore, gradient information, which is important for many optimization
algorithms, is not available. Due to the black-box nature of the objective
function, it is also not known whether or not the objective function is
convex and has only one local minimum (which corresponds to the global
minimum) or if there are several local and global minima in the objective
function landscape.
These characteristics of the objective function (computationally expensive,
black-box, possibly multi-modal) do not allow the application of a
gradient-based optimization algorithm because, on the one hand, the
derivatives would have to be computed numerically (which may be inaccurate
and requires many expensive function evaluations), and, on the other hand,
gradient-based algorithms generally stop at a local minimum if the initial
guess is not close to the global minimum.
For calibrating the parameters of other CLM modules, Markov Chain Monte Carlo
(MCMC) methods and Kalman filters have been used in the literature
. MCMC, however, requires generally thousands of
function evaluations and is thus not applicable for obtaining
solutions in an acceptable time for computationally expensive problems. When
using Ensemble Kalman Filters, assumptions about the underlying parameter
distributions must be made and generally a large number of observations is
necessary for the method to be effective. Furthermore, evolutionary
strategies such as simulated annealing, particle swarm, and differential
evolution methods have been used for parameter tuning in the climate area
. However, these methods generally require many
function evaluations in order to obtain good solutions.
Other methods that have recently gained interest for parameter tuning are
based on data assimilation (see, for example, ).
In order to produce good parameter estimates, these methods require in
general many observations. In our optimization problem, however, the number
of observations at each site is very low (between 10 and 79 observations
distributed over 1 to 3 years), and thus data assimilation techniques
are not suitable because of the low number of observations.
use a computationally cheap surrogate for CLM on which MCMC is used to reduce
the number of costly simulations required during the optimization. In
contrast to , we apply an adaptive surrogate model during the
optimization. Instead of relying on a surrogate that is based only on a
limited number of initial sample points, we iteratively improve our surrogate
by incorporating new data (new objective function values) that become
available during the optimization.
We use surrogate model based global optimization algorithms because they have
been shown to find near-optimal solutions within a few hundred function
evaluations for computationally expensive multimodal black-box problems
. Surrogate models are
used as computationally cheap approximations of the objective function.
During the optimization, information from the surrogate model is used to
carefully select a new promising point in the variable domain at which the
computationally expensive objective function will be evaluated. The surrogate
model is updated throughout the optimization whenever new data are obtained.
Several surrogate model algorithms have been developed in the literature that
use different surrogate model types. The efficient global optimization
algorithm by , for example, uses a kriging surrogate model
and selects a new sample point by maximizing an expected improvement
function. uses radial basis function (RBF) surrogate
models to approximate the expensive objective function and a new sample point
is selected by minimizing a so-called bumpiness measure. also use RBF models and new function evaluation points are
selected by a stochastic method. developed a framework for
automatically computing ensembles of various surrogate model types
and extended the study to investigate the influence of
different sampling strategies on the solution quality. Here for the first
time we apply a state-of-the-art RBF surrogate optimization algorithm to the
problem of land surface emissions of methane and describe the results. As far
as we know, no other groups have applied optimization techniques to find
better parameters for methane emission models, and thus our work represents
an innovative approach to an important land–atmosphere interaction.
The remainder of this paper is organized as follows. In Sect. we
briefly describe the CLM and the configuration we used for predicting the
methane emissions and we give information about the individual observation
sites. We also provide the mathematical description of the optimization
problem. In Sect. we summarize the methane-related parameters in
CLM4.5bgc and show the results of a sensitivity analysis with which we
determined the parameters that are most important for the optimization. We
describe the surrogate optimization approach for solving the problem in
Sect. . Section contains information about the
setup of our numerical experiments and we discuss the results of the
optimization. We draw conclusions in Sect. . The Appendix
contains additional information about the methane equations and the
observation sites.
Model description, configuration, and mathematical problem descriptionModel description
We used the Community Land Model Version 4.5 (CLM4.5), a land component of
the Community Earth System Model (CESM) which contains a
detailed biophysics, hydrology, and biogeochemistry representation
. CLM4.5 is fully prognostic with respect to the
carbon and nitrogen state variables in the vegetation, litter, and soil
organic matter, as well as methane emissions and it is the most updated version of the model available.
We selected the latest version of CLM with improved biogeochemistry
(CLM4.5bgc) over CLM4.0-CN. The major improvements in CLM4.5bgc include the
incorporation of vertically resolved soil carbon dynamics, an alternate
decomposition cascade from the Century soil model, and a more detailed
representation of nitrification and denitrification based on the Century
nitrogen model . The hydrology of CLM4.5 has been improved
to better represent the hydraulic properties of frozen soils, perched water
tables, snow cover fraction, and lakes .
In previous versions, simulation of ecosystem productivity was too low in
high latitudes and perhaps too high in low latitudes . However, CLM4.5bgc has substantially increased the
productivity in high latitudes, which may be overpredicted .
We used a mechanistic methane emission model, which is a module integrated in
CLM4.5bgc . The model simulates the physical and
biogeochemical processes regulating terrestrial methane fluxes such as
methane production, methane oxidation, methane and oxygen transport through
aerenchyma of wetland plants, ebullition, and methane and oxygen diffusion
through soil . added constraints on methane
emissions such as the effects of redox potential and soil pH to improve the
predictions of methane emissions as well as the ability to simulate
satellite-derived inundation fraction .
The model has been compared to the limited site-level observations of methane
emissions (many of the sites have very sparse spatial and temporal data
coverage, and directly measured climate forcing was unavailable at any of the
sites) . Additionally, the model was compared with
the results from three recent global atmospheric inversion estimates of
methane emissions . In these comparisons, simulated
emissions agreed relatively well with the observed emissions at some of the
sites. However, there are considerable differences in seasonality and
magnitude at other sites. The simulated patterns and magnitudes of
annual-average methane emissions are consistent with the results from
atmospheric inversion across most latitude bands. The limitations are
discussed in .
Model configuration and data
Although the land model can be used interactively within CESM, we use it at
specific points driven by appropriate meteorology . At each
site, we forced the model with NCEP/NCAR's reanalysis atmospheric forcing
data sets . These data sets include precipitation,
temperature, wind speeds, and solar radiation. We also forced the model with
transient atmospheric carbon dioxide concentrations, aerosol deposition data,
and nitrogen deposition data that are available in CLM4.5. Please note that
this model is a deterministic model, and thus will give the same answer every
time it is simulated when driven by observationally based data sets as done
here.
In this study we used a total of six natural wetland sites and ten rice paddy
sites (see Tables and in
Appendix ). We chose the wetland sites from varying geographical
regions such as the tropics, mid-latitudes, and high-latitudes to account for
the zonal variability. We selected the rice paddy sites such as to cover the
major rice-growing regions with a focus on Asia.
The water table depth is one of the critical factors for methane emissions
from natural wetlands because it determines the extent of anoxic and oxic
soil zones where methane is produced and oxidized, respectively
. Methane is produced in the wetlands from
litter and dead vegetation remnants in anoxic conditions. The changes in the
water table position also influence the moisture conditions of the soil and
therefore affect the methane emissions. Here, we prescribed the measured
water table position at each wetland site (except Panama) based on previous
studies. Since the measured water table depths at Panama were not available,
we used modeled water table positions (similar to ). For
the point simulations, the methane emissions were calculated only from the
saturated portion of the soil (i.e. below the water table) when the water
table is below the surface. The prescribed water table depth is used in the
methane model for calculating anaerobic conditions, production, and
oxidation.
Most of these wetland sites usually have peat soils with varying depths
underlain by mineral soil. We also forced each wetland site with measured pH
and a specific plant functional type (PFT). The PFT reflects the phenological
and physiological characteristics of the vegetation . Since
the wetland PFT was not available in CLM4.5, we choose PFTs that are
available in CLM4.5 and that closely match the specific vegetation types of
the individual sites. We use C3 arctic grass for Salmisuo, C3
non-arctic grass for Alberta, Michigan, and Minnesota, and C4 grass for
Florida and Panama. Other surface data required to perform the point
simulation include soil color and soil texture which we extracted from the
global grid data sets available in CLM4.5.
For the point simulations at the rice paddy sites we only considered the rice
growing season. The flooding and drainage dates are shown in
Table in Appendix . We assumed that the
fields were submerged during the simulation period between initial flooding
and final drainage. A common feature of these sites during the growing season
is that the water was not drained until harvest. We prescribed the C3 crop
PFT for all rice paddy sites, and assumed an optimal pH for the methane
production whenever the pH value was not available. The dominant soil types
at these sites are loam and clay. Other soil-related information such as soil
color and texture are derived from the global grid data sets.
To bring the terrestrial carbon and nitrogen cycles close to steady-state
conditions, we spun up both wetland and rice paddy sites for 1850 conditions
(atmospheric CO2 concentrations, nitrogen deposition, aerosol deposition,
and land use) driven by a repeating 25-year subset (1948–1972) of the
meteorological forcing data for more than 2000 years. Then, we performed
transient simulations from 1850 to the simulation starting year of each site
to generate the initial conditions file.
Additionally, we conducted global simulations of methane emissions from
natural wetlands for 1993–2004. For these simulations, the grid cell
averaged methane emissions were considered which accounts for methane
emissions from both the inundated and non-inundated portion of the grid cell.
Since the CLM4.5 simulated saturated fraction (an index of inundation) was
substantially greater than the estimates from satellite observations and did
not match the spatio-temporal pattern of variability , we
prescribed the model with inundation fraction derived from multi-satellite
observations for 1993–2004 . Similar to point
simulations, the global simulations were forced with NCEP/NCAR reanalysis
atmospheric forcing data from 1948 to 2004 . The simulations
were also spun up to steady-state conditions driven by atmospheric CO2,
nitrogen deposition, aerosol deposition, and land use in the year 1850 and a
repeated 25-year (1948–1972) subset of the meteorological forcing.
CH4 related parameters in CLM4.5bgc and their upper and lower bounds xku and xkl, respectively, and the default parameter values vk.
Parameter IDParameter namexklxkuvk1q10ch4141.332f_ch40.10.40.263redoxlag1545304oxinhib2006004005pHmax81096pHmin242.27vmax_ch4_oxid1.25×10-61.25×10-41.25×10-58k_m0.00050.050.0059k_m_o20.0020.20.00210q10_ch4oxid141.911k_m_unsat0.000050.0050.000512vmax_oxid_unsat1.25×10-71.25×10-51.25×10-613scale_factor_aere0.22114nongrassporosratio0.20.50.3315porosmin0.010.20.0516rob24317unsat_aere_ratio0.10.250.166718vgc_max0.050.30.1519scale_factor_gasdiff15120atmch41.7×10-71.7×10-51.7×10-621mino2lim0.10.30.2Mathematical problem formulation
The goal of our study is to improve the methane emission predictions of
CLM4.5bgc by tuning the methane-related parameters such that the model better
fits the observations. We use the CH4 emission observation data for the
locations and observation periods shown in Tables and
. Given the observation data at the M=16 locations,
the goal is to minimize the root mean squared errors (RMSEs) between the
CLM4.5bgc methane emission predictions and the observations at each site
simultaneously. In order to tackle the problem, we formulate it such that we
minimize the weighted sum of the RMSEs as follows:
minf(x)=∑i=1Mwiri(x)s.t. -∞<xkl≤xk≤xku<∞,k=1,…,d,
where d denotes the problem dimension (the number of optimization
parameters), and xkl and xku are the lower and upper bounds of
variable xk, respectively. The RMSE
ri(x)=1Ni∑j=1NiOi,j-Si,j(x)2,i=1,…,M,
is computed for each location i. Ni is the number of observations
available at location i, Oi,j denotes the jth methane emission
observation at location i, and Si,j denotes the corresponding methane
emission predicted by CLM4.5bgc. The weights wi are computed based on the
means of the CH4 emissions at the observation locations as follows. Denote
ai=1Ni∑j=1NiOi,j
the mean CH4 emission at location i, i=1,…,M. The weight wi
for the ith location is then defined by
wi=gi∑i=1Mgi,
where
gi=maxi=1,…,Maiai,
where it is assumed that ai>0 for all i. The goal is to give each
location approximately equal influence in the weighted sum of RMSEs, i.e., we
assign locations with large mean CH4 values small weights such that these
locations have approximately the same influence on the weighted sum as
locations with low emissions. Otherwise, locations with large emissions would
dominate the sum () because their RMSEs would accordingly
be larger. In that case the optimization would be driven by minimizing the
RMSE of the site(s) with the largest emissions. There are also other methods
of how wi could be determined. In the numerical experiments, we will
investigate also the possibilities of assigning equal weights to each
observation site and assigning weights derived from grouping the observation
sites into zones. Another possibility would be to apply clustering algorithms
in order to determine groups of observation sites with similar
characteristics. For this possibility, however, different clustering methods
and different numbers of desired clusters will lead to different groups and
different weight adjustments. Lastly, the problem could be formulated as
multi-objective optimization problem, for example, with 16 objectives and the
goal of minimizing each observation site's RMSE individually, or as
bi-objective optimization problem by minimizing the sum of the weighted RMSE
values of northern and southern locations at the same time. However, each
objective function evaluation is very expensive, and thus the number of
evaluations that can be done to obtain the Pareto front in a multi-objective
setting is limited. Our focus is on demonstrating that single objective
global optimization analysis is useful in identifying reasonable parameter
values.
Methane-related parameters in CLM4.5bgc and sensitivity analysis
CLM4.5bgc has 21 parameters related to the methane emission predictions. The
parameter names, their upper and lower bounds, and default values are shown
in Table . The upper and lower bounds have been derived based
on reported values in the literature (see Table in
Appendix ). How these parameters are used in the model is detailed
in and and we repeat the important equations
in Appendix . The default parameter values vk are available in
the CLM4.5bgc (see Table ).
Optimization problems become increasingly more complex and difficult to solve
as the number of parameters increases (curse of dimensionality). Thus, we
determine first which of these 21 parameters are the most sensitive and thus
the most important for the optimization. By sensitive we refer to parameters
that when changed slightly lead to a significant change in emission
predictions. Insensitive parameters, on the other hand, can be changed and do
not (or comparatively only very mildly) change the emission predictions and
can thus be excluded from the optimization, which decreases the problem
dimension.
Parameters that are sensitive for most observation sites (out of 16).
We conducted analyses for each observation site in which we investigated to
which of these 21 parameters the methane emission predictions of CLM4.5bgc
are the most sensitive. We altered the value of each parameter k=1,…,d
by, respectively, adding and subtracting 20 % of the variable range and
we recorded the absolute change in emission predictions, i.e. we ran
CLM4.5bgc with perturbed parameter values
xk=min{vk+0.2(xku-xkl),xku},∀k=1,…,d when increasing vk for 20 %, and
xk=max{vk-0.2(xku-xkl),xkl},∀k=1,…,d when decreasing vk for 20 %
for each parameter separately.
There are several parameters that are relatively important to the sensitivity
test for all 16 observation sites, but there are also parameters that are
important for some locations and less important for others.
Tables and show the sensitive and
insensitive parameters together with the number of locations (out of 16) for
which these parameters are important and unimportant, respectively. Thus, in
the optimization we consider only the parameters in Table
since these parameters are the most important at most locations. Please note
that, due to (nonlinear) relationships between the parameters, for many
parameters the effects of individual parameters will be opposite but act in a
similar manner, indicating that some parameters may be difficult to optimize
for. In order to limit the number of parameters we consider, while allowing
for the largest range in behavior, we combine information from the
sensitivity study with information about the methane flux equations
themselves (described in more detail in Appendix A). The most important
parameters from the sensitivity study come from the dominant three terms in
the methane flux equation, which are production (parameters 1, 2, and 21),
oxidation (parameters 7, 8, 9, and 10), and aerenchyma transport
(parameters 13, 15, 16, and 17). The first four parameters chosen are also
the most important parameters at all 16 sites (see Table ).
Because production is the most important term, there are two parameters from
production that the sensitivity studies indicate are the most important,
namely one that controls globally the methane production flux
(f_ch4, parameter 2), and one term that controls the temperature
dependency of the methane production (q10ch4, parameter 1). Another
parameter that influences methane at all the sites comes from the oxidation
equation (vmax_ch4_oxid, parameter 7), and the final parameter
that is important at all 16 sites is the parameter controlling the aerenchyma
transport (scale_factor_aere, parameter 13). The above four
parameters are the most sensitive parameters, and thus are easy to choose, as
well as cover most of the important processes we want to investigate. For the
last parameter, we include one parameter that controls how inundation affects
methane production (mino2lim, parameter 21). Inundation is an
important process for controlling methane flux, since there is an order of
magnitude more methane coming from wet areas than dry, and thus having one
parameter which changes the model's sensitivity to inundation is appropriate.
Parameters that are least sensitive for observation
sites (out of 16).
ParameterParameter# insensitiveIDnamesites3redoxlag164oxinhib165pHmax166pHmin1614nongrassporosratio1618vgc_max1620atmch41511k_m_unsat1319scale_factor_gassdiff1312vmax_oxid_unsat10Surrogate models and surrogate model algorithmsSurrogate models
Surrogate models are used in optimization algorithms that aim to solve
computationally expensive black-box problems. Surrogate models serve as
computationally cheap approximations of the expensive simulation model
, i.e., f(x)=s(x)+e(x), where
f(⋅) denotes the true expensive objective function, s(⋅) denotes
the computationally inexpensive surrogate model, and e(⋅) denotes the
difference between both. Surrogate models are used throughout the
optimization to guide the search for promising solutions. The computationally
expensive objective function is evaluated only at few selected points, and
thus it is possible to find near-optimal solutions with only very few
expensive function evaluations.
There are different surrogate model types such as radial basis functions
(RBFs) , kriging ,
polynomial regression models , and multivariate adaptive
regression splines . There are also mixture models (also
known as ensemble models) that exploit information from several different
surrogate model types . In
general any type of surrogate model may be used in a surrogate model
optimization algorithm. In this study, we use RBFs because they have been
shown to perform better in comparison to other surrogate model types
.
An RBF interpolant is defined as follows:
s(x)=∑ι=1nλιϕ(‖x-xι‖)+p(x),
where ϕ(τ)=τ3 denotes the cubic radial basis function whose
corresponding polynomial tail is linear
(p(x)=b0+bTx), and xι,ι=1,…,n, denotes the points at which the objective function has
already been evaluated. The parameters λι∈R,ι=1,…,n, and the parameters b0∈R and
b=[b1,…,bd]∈Rd are determined by solving the
following linear system of equations:
ΦPPT0λc=F0,
where Φιν=ϕ(‖xι-xν‖), ι,ν=1,…,n, 0 is a matrix with all entries 0 of appropriate dimension, and
P=x1T1x2T1⋮⋮xnT1,λ=λ1λ2⋮λnc=b1b2⋮bdb0,F=f(x1)f(x2)⋮f(xn).
The matrix in Eq. () is invertible if and only if
rank(P)=d+1.
Surrogate global optimization algorithm
Surrogate global optimization algorithms follow in general the steps shown in Algorithm .
General surrogate global optimization algorithm
Select points from the variable domain to create an initial experimental design.
Do the expensive objective function evaluations (here the CLM4.5bgc simulations) at the points selected in Step .
Fit the surrogate model (here the RBF model) to the data from Steps and .
Use the information from the surrogate model to select the new evaluation point xnew.
Do the expensive evaluation at xnew: fnew=f(xnew) (here, run CLM 4.5bgc for the parameter input vector xnew).
if Stopping criterion is not met (the maximum number of allowed function evaluations has not been reached) then
Update the surrogate model and go to Step .
else
Return the best solution found during the optimization.
endif
We use the DYCORS algorithm by for the optimization of the
methane-related parameters of CLM4.5bgc. The reader is referred to this
publication for the details of the algorithm. Since the parameters have
significantly different ranges (see Table ), we scale all
parameters to the interval [0,1] when selecting new sample sites. When
doing the computationally expensive CLM4.5bgc simulations, we scale the
parameters back to their original ranges. Thus, the perturbation radius used
in DYCORS is the same for each variable.
We create a symmetric Latin hypercube initial experimental design with
2(d+1) points and run CLM4.5bgc at the selected parameter vectors in order
to compute the objective function values. We then fit the cubic RBF model to
the data and generate two sets of candidate points for the next expensive
function evaluation (the next CLM4.5bgc run at the 16 sites). The first set
of candidate points is generated as described by by randomly
perturbing the best point found so far. The second set of candidate points is
generated by uniformly selecting random points from the whole variable
domain. Thus, we create twice as many candidate points as DYCORS. The goal of
using uniformly random points from the whole variable domain is to obtain
candidates that are far away from the best point found so far, and hence if
selected as a new evaluation point, the search is more global (exploration by
function evaluation at points that are far away from already sampled points).
We use the same criteria as in DYCORS for determining the best candidate
point (using the RBF approximation to predict the objective function values
at the candidate points, compute the distance of the candidate points to the
set of already sampled points, and compute a weighted score of these two
measures where the weights cycle through a predefined pattern). In order to
guarantee that the matrix in Eq. () is
well-conditioned, we ensure (as done in ) that the sample
points are sufficiently far away from previously evaluated points by
discarding candidate points that are closer than a given threshold distance
to previously evaluated points. We run CLM4.5bgc at each of the 16 observation sites using the one newly selected sample point as input
parameter vector to obtain the corresponding objective function value. We
update the RBF model with the new data and iterate until we have reached the
maximum number of allowed function evaluations.
Numerical experiments
In this section we discuss the setup and results of the numerical
experiments. In a first set of experiments (pseudo data case), we generate
synthetic (pseudo) data and treat it as if it were the real measurement data
in order to assess how well our optimization approach performs. For these
experiments we know the optimal solution. In the second set of experiments
(real data case), we use the measured methane emission data and apply the
optimization algorithm. The goal in the second set of experiments is to find
a parameter set that reduces the objective function value (the weighted RMSE
in Eq. ) from its default value (the RMSE when
using CLM4.5bgc's default parameter settings, see also
Table , column vk). Finally, we run CLM4.5bgc globally
with the best set of parameters found during the optimization of the real
data case and investigate how much the default model predictions and the
model predictions with the optimized parameter values differ from each other.
We did experiments with d=5 and d=11 parameters respectively. For the
d=5 experiments, we used parameters 1, 2, 7, 13, and 21
(Table ). Thus, we have parameters related to three types of
CH4 emission, namely oxidation (parameter 7), aerenchyma (parameter 13),
and production (parameters 1, 2, 21). For the 11-parameter optimization, we
used all variables shown in Table .
For each set of experiments we ran the optimization algorithm three times in
order to examine the influence of the random component in the algorithm
(random initial experimental design and random generation of candidate
points). We allowed 800 function evaluations for the five-dimensional problem
and 1000 evaluations for the 11-dimensional problem. The question of how many
function evaluations need to be performed in order to obtain a fixed level of
solution accuracy is problem dependent. For computationally expensive
optimization problems, such as the problem we consider here, the time for
evaluating the objective function and the totally available time for
obtaining a solution usually defines how many evaluations can be done with
any algorithm. Results for many difficult computationally expensive
optimization problems (for example, problems with multiple local minima)
indicate that surrogate global optimization methods can usually obtain more
accurate results compared to non-surrogate methods with the same limited
number of evaluations (see, for example, ). It is a very
difficult problem to find the best values of the parameters for climate
models, and the more evaluations one does, in general the better the answer.
The weights wi in Eq. () were for the pseudo data
case computed based on the pseudo observations (see Sect. ) at
each of the 16 sites at the same dates for which we also have real
measurements. For the real data case, the weights were computed based on the
actual measurements. The weights are given in Table in
Appendix .
Solving problem () requires running CLM4.5bgc for each
input vector x of parameter values and for each of the 16
observation sites. We run CLM4.5bgc on the Yellowstone Supercomputing
Facility . Each simulation at a single location takes between 15
and 30 min. We do the simulations for the 16 sites in parallel in order
to speed up the objective function evaluation time.
Pseudo data case
Progress plot that shows the development of the best objective
function value found vs. the number of function evaluations for the pseudo
data case with d=5 parameters for optimization trials T1, T2, and T3. The
legend shows the lowest RMSE value found in each trial.
We assessed the performance of the optimization algorithm by investigating
how well the algorithm could find the model parameters that were used for
creating the pseudo data. For this purpose, we ran CLM4.5bgc with default
parameter values vk,k=1,…,d, at all 16 sites for the same time
span for which we also have observation data (see Tables
and in Appendix ) and we record the model's
predictions for the same dates at which the methane emissions were measured.
We use this as our pseudo observation data that we want to match in the
optimization, i.e., the goal of the optimization is to start from a set of
parameter vectors that is different from the default parameter values and to
recover the default parameter values by optimization. For the default
parameter values, the objective function value will be zero, which is the
global minimum of the pseudo data case.
Results for d=5
Figure shows the progress plots of the three optimization
trials T1, T2, and T3. Illustrated is the development of the best objective
function value found within the given number of function evaluations
(horizontal axis). The fewer evaluations needed for reducing the objective
function value, the better. The plot shows that the objective function value
is reduced significantly in each of the three trials from a value of over 30
to about 5 within less than 150 function evaluations and close to zero
towards the end of the optimization. Table
shows the best parameter values found during each of the three optimization
trials together with the default parameter values. The table shows that the
RMSE after 800 function evaluations is not exactly zero (which can be
expected from an approximation method), but the default parameter values are
matched closely.
Default and optimized parameter values of optimization
trials T1, T2, and T3 for the five-dimensional pseudo data case. We report four
decimal places because the model output is sensitive to very small changes
for some variables. Note that we scaled the numbers to the interval [0,1].
Param.DefaultT1T2T310.11000.10880.10990.109120.53330.53660.53850.545870.09090.09120.09430.0967130.44440.44610.44540.4443210.50000.49360.49340.4856RMSE00.280.460.40Results for d=11
Figure shows the objective function value development as
the number of function evaluations increases for the 11-dimensional case for
the three trials T1, T2, and T3. The figure shows a rapid decrease of the
objective function value from over 50 to less than 10 within 100 evaluations,
which shows that the surrogate model algorithm is very efficient at finding
improved solutions. Although the objective function value improvement over
the following function evaluations is lower, we can see that the algorithm
still makes progress and if we allowed more than 1000 evaluations, the
objective function value would be further improved (which also follows from
the global convergence property of the DYCORS algorithm).
Table shows the parameter values of the best of
the three trials (T3) together with the default parameter values and the
variable vector CP that was evaluated during the optimization and that has a
worse objective function value than the best solution, but that is
closer to the default parameter values. This point has the same
parameter values as T3 for all but two parameters, namely, parameters 10
(q10_ch4oxid) and 21 (mino2lim), which we indicate by bold
numbers. For these two parameters, the point CP is closer to the global
optimum, but it has a worse objective function value. This indicates a
multimodality of the objective function (getting closer to the true global
minimum requires an increase in the objective function value, i.e., the
algorithm has to escape from a local basin of attraction). This multimodality
makes the search for the global optimum significantly more difficult.
Progress plot that shows the development of the best objective
function value found vs. the number of function evaluations for the pseudo
data case with d=11 parameters for optimization trials T1, T2, and T3. The
legend shows the lowest RMSE value found in each trial.
Default and optimized parameter values of optimization
trial T3 and parameter values for the point CP that was sampled during the
same optimization trial and that is closer to the default point, but that has
a worse objective function value (11-dimensional pseudo data case). Bold numbers indicate the parameters for which CP is closer
to the default value than T3 (but CP has a worse objective function value).
In order to examine the impact of the differences between default and
optimized parameter values on the model prediction, we use the best parameter
vector of each trial and plot the corresponding CH4 emission predictions
against the predictions when using the default parameter values in
Figure . We can see that although we do not exactly
match the default parameter values, the model's predictions when using the
optimized parameters are very close to the predictions when using the default
parameter values (all points in the scatter plot lie close to or on the
dashed line which represents agreement of default and optimized predictions).
As also reflected in the best RMSE value reported in the legend, T3 matches
the default data best and T2 has the largest differences.
CLM4.5bgc CH4 predictions when using the default parameter values
vs. the predictions when using the best solution found in each of the
three optimization trials T1, T2, and T3, respectively, for the pseudo data
case with d=11 parameters. The legend shows the lowest RMSE value found in
each trial.
This result indicates that the calibration problem is not “identifiable” for
all parameter sets, indicating that more than one parameter set can give a
very similar result in terms of the objective function value. For example,
for the model y=αβx+γ, there are many
combinations of values for α and β that lead to the same value
of y as long as α=κβ for some constant κ. With only
five parameters as described in the previous section, the parameter values
obtained from the optimization did match very closely with those of the default
case used to create the pseudo data, and thus with this small set of
parameters the problem was identifiable. However, for 11 parameters, we did
encounter the identifiability problem. In some disciplines such parameters
are called “hidden”. For example, estimating α and γ in the
previous example with y=αβx+γ when β
is given would be identifiable. However, estimating α, β, and
γ is no longer identifiable.
It would be desirable to have an identifiable model, but the CLM (and
probably other climate modules) have a number of interacting parameters and
multiplicative nonlinearities, and thus there is no guarantee that all
parameters are identifiable. This is reinforced by the data in
Table , which indicates that the surface over
which the optimization algorithm searches in the 11 parameter case is
multi-modal, i.e., there are multiple local minima and it is possible for two
(or more) parameter sets to yield the same objective function value (here
RMSE). Hence the inability of the optimization to find the exact set of
parameters that was used for generating the pseudo data is a problem caused
by the complexity and multiplicative nonlinearities of the CLM model, not by
the choice of the optimization method. However, the optimization analysis for
both pseudo data cases (with 5 and 11 parameters, respectively) shows that
the chosen optimization method is able to find a set of parameter values that
has a low prediction error. The multi-modality in
Table does indicate the need for a global (not a
local) optimization method.
Progress plot that shows the development of the best objective
function value found vs. the number of function evaluations for the real
data case with d=5 parameters for optimization trials T1, T2, and T3. The
legend shows the lowest RMSE value found in each trial. The first function
evaluation (left side of the graphs) corresponds to the RMSE when using the
default parameters.
Real data case
In the real data case, we use the actual methane emission measurements at
each of the 16 observation sites for computing the objective function value.
Since we only have very few observations for each site and no information
about measurement errors, we did not exclude any of the measurements from the
optimization although there might be outliers. Also for the real data case we
examine the case for d=5 and d=11 variables.
Results for d=5
The progress of the development of the objective function value for the three
trials T1, T2, and T3, respectively, is illustrated in
Fig. which also shows in the legend the lowest
RMSE value found in each of the three trials. The RMSE was efficiently
reduced from over 155 to below 115 within the first 150 function evaluations.
Thereafter the objective function value improvement was at a significantly
lower rate. All three trials return a solution with approximately the same
objective function value.
The parameter values of the best solutions found in the three trials are
shown in Table where also the default
parameter values are given for comparison. We can see that the three
optimized solutions are approximately the same and significantly different
from the default case. We can also see that three of the five optimized
parameter values are on or very close to the boundary of the variable domain
(shown in bold), indicating that improvements of the objective function value
may be possible by increasing the parameter range. However, it is not
possible due to physical constraints and at this point, we do not have
information about possible wider parameter ranges than the ones we used in
this study.
Default and optimized parameter values of optimization
trials T1, T2, and T3 for the five-dimensional real data case. Bold indicates
optimized parameters that are on (or close to) the variable boundary (all
variables are scaled to [0,1]).
CH4 emission observations and predictions when using the
optimized parameters of optimization trials T1, T2, and T3, respectively, and
when using the default parameters for the wetland site Alberta, Canada, for
the real data case with d=5 parameters. The legend shows the lowest RMSE
value found in each trial.
Figures and show the CH4
emission predictions of CLM4.5bgc when using the default and the optimized
parameter values for two selected observation sites (one wetland and one rice
paddy site) together with the actual observation data. The legends show the
associated RMSE value before applying the weights for
computing (Eq. ). We can see that the optimized solution
actually worsens the predictions for Alberta (the RMSE value with default
parameters is about 209 and with optimized parameters, the value is about
221, which is about 6 % worse). For Central Java, on the other hand, the
RMSE values of the optimized solutions are significantly better than for the
default values (the default RMSE is about 221 and the optimized RMSE values
are about 48, which is an improvement of over 350 %). In both figures we
can also see that despite the large differences between optimized and default
parameter values, the trend in the predictions of CLM4.5bgc is the same,
i.e., when the predicted CH4 emissions with default parameters increase so
do the predicted emissions when using the optimized parameters and vice
versa.
CH4 emission observations and predictions when using the
optimized parameters of optimization trials T1, T2, and T3, respectively, and
when using the default parameters for the rice paddy site Central Java,
Indonesia, for the real data case with d=5 parameters. The legend shows the
lowest RMSE value found in each trial.
Progress plot that shows the development of the best objective
function value found vs. the number of function evaluations for the real
data case with d=11 parameters for optimization trials T1, T2, and T3. The
legend shows the lowest RMSE value found in each trial. The first function
evaluation (left side of the graphs) corresponds to the RMSE when using the
default parameters.
Results for d=11
Figure shows the progress plots for each of the three
trials together with the best objective function values found (legend) for
the 11-dimensional case. The best objective function value found is about
equal for each of the three trials. The figure shows that in each trial the
algorithm is able to efficiently reduce the objective function value within
the first 200 function evaluations. The improvement after 200 function
evaluations is significantly slower.
Default and optimized parameter values of optimization
trials T1, T2, and T3 for the 11-dimensional real data case. Bold indicates
optimized parameters that are on the variable bound (all variables are scaled
to [0,1]).
Table shows the parameter values of the best
solution found in each of the three trials and the default parameter values.
The table shows that for some parameters, for example, parameters 1, 7, and
8, all trials lead to approximately the same values (which are different from
the default parameter values). For the remaining parameters, the values
corresponding to the best solution found differ significantly for each trial
and differ also from the default parameter values. Also for the
11-dimensional problem, some parameter values corresponding to the best
solution found are on the upper or lower boundary of the parameter range (for
example, parameters 1, 8, 13, 15, indicated in bold).
Since all three solutions have approximately the same objective function
values, but the points differ greatly, it is an indicator that we either have
a multi-modal surface in which some minima assume approximately the same
objective function values, or we have a very flat valley in which many points
assume similar objective function values. Both possibilities make it very
difficult for gradient-based optimization algorithms to find the global
optimum. In the first case, the optimization algorithm will get trapped in a
local optimum if it is not started close to the global minimum. In the second
case, the gradient-based algorithm would require many function evaluations
because many steps and gradient computations are necessary due to a very
small step size. The surrogate optimization algorithm overcomes this problem.
CH4 emission observations and predictions when using the
optimized parameters of optimization trials T1, T2, and T3, respectively, and
when using the default parameters for the wetland site Alberta, Canada, for
the real data case with d=11 parameters. The legend shows the lowest RMSE
value found in each trial.
CH4 emission observations and predictions when using the
optimized parameters of optimization trials T1, T2, and T3, respectively, and
when using the default parameters for the rice paddy site Central Java,
Indonesia, for the real data case with d=11 parameters. The legend shows
the lowest RMSE value found in each trial.
Table shows the unweighted RMSE values (before applying
the weights in Eq. () for computing the objective function
value) between observations and simulations using the default parameters
(column 5), the best parameters of optimization trial T1 of the
11-dimensional case (column 4), and the best parameters of trial T2 of the
5-dimensional case, respectively. The table shows that with our optimization
we were able to decrease the default RMSE for four sites in the 5-dimensional
case and for six sites in the 11-dimensional case. The RMSE is lower at seven
sites for the 11-dimensional case than for the 5-dimensional case. Since we
minimized a weighted sum of all RMSE values, it can be expected that the RMSE
at some locations may be worse for the optimized case than for the default
case. We can see that for two of the improved sites (Java and Cuttack), the
improvement is very large, and thus the overall RMSE of the optimized
solution is lower than for the default parameters.
Unweighted RMSE values for each site using the best parameters found during optimization
trial T1 of the d=11 real data case and trial T2 of the d=5 real data case and with default parameter values.
CH4 emission observations and predictions when using the
optimized parameters of optimization trials T1, T2, and T3, respectively, and
when using the default parameters for the wetland site Salmisuo, Finland, for
the real data case with d=11 parameters. The legend shows the lowest RMSE
value found in each trial.
Figures and show the observed
CH4 emissions, the predictions with the default parameter values, and the
predictions using the optimized parameter values for Alberta (Canada) and
Central Java (Indonesia). For both sites we can see that the predictions with
the optimized parameters have lower RMSEs than when using the default
parameter values (note that the reported RMSEs in the legend are not weighted
as done in Eq. ). For Central Java, for example,
the optimized parameters greatly improved the model's predictions, but we can
also see that the temporal variability in the predictions stays the same
although not as pronounced. We noticed this “temporal variability
preserving” behavior for several sites such as Beijing, California, Cuttack, New Delhi,
Florida, Japan, Michigan, Minnesota, Salmisuo, Texas, and Vercelli. Compared
to the case where we optimized only five parameters, the solution for Alberta
has improved and the RMSE values for all three trials are for the d=11 case
better than the default RMSE value. On the other hand, the solution for
Central Java is worse for T1 in the d=11 case than in the d=5 case.
Scatterplot showing the mean values of the CH4 predictions using
the default and optimized parameter values of trials T1, T2, and T3,
respectively, vs. the mean values of the observations. The numbers in the
legend show the best RMSE value corresponding to each trial. The numbers
above/below the boxes indicate the observation site ID (1: Alberta, 2: Florida,
3: Michigan, 4: Minnesota, 5: Nanjing, 6: Vercelli, 7: Texas, 8: Japan,
9: California, 10: New Delhi, 11: Beijing, 12: Central Java, 13: Chengdu,
14: Cuttack, 15: Panama, 16: Salmisuo).
Average methane emissions (mg CH4 m-2 d-1) simulated by
CLM4.5bgc for (a) default parameters, (b) differences between default
parameters and 11-dimensional optimization trial T1, (c) differences between
default parameters and optimization trial with unweighted sum of RMSE, and
(d) differences between default parameters and optimization trial with
zonally weighted sum of RMSE. Zonal means are shown on the right side of each
spatial plot.
The temporal variability in the model's predictions does not necessarily
follow the temporal variability in the observation data (see, for example,
Fig. ). Note that in Fig.
the temporal variability is the same for each of the three trials although
the best solutions found in the three trials were very different (see
Table ). Thus, it seems that the improvement
of the model's predictions is restricted by an underlying model component
that enforces the temporal variability. This is likely to be associated with
structural errors either in the methane or in the carbon model. Notice that
the methane emission is dependent on the temporal variability predicted in
the carbon and land model, especially on the heterotrophic respiration rate,
which could have the wrong magnitude or temporal evolution.
Figure shows a scatter plot of the mean values
of the CH4 predictions using default and optimized parameter values vs.
the mean values of the observed CH4 emissions. Ideally, if the simulated
emissions agreed with the observations, all points would lie on the dashed
line. Thus, the closer a point to the dashed line, the more simulation and
observation are in agreement. The figure shows that with the optimized
parameters, we obtain better or similar results for Beijing, Cuttack,
Minnesota, Central Java, Nanjing, Japan, Salmisuo, Alberta, and Michigan.
Although not all sites have been strictly improved by the optimization, the
overall RMSE has been improved (indicated in the legend).
Figure also shows that with default parameters,
CLM4.5bgc predicts less CH4 emissions than observed for both observation
sites in the northern latitudes (Alberta, ID = 1, and Salmisuo, ID = 16), which
is corrected by the optimization such that the mean emissions at these sites
are closer to the dashed line. Thus, based on the observation data, CLM4.5bgc
with default parameters does not predict enough emissions in the northern
latitudes. On the other hand, CLM4.5bgc over-predicts the emissions for four
locations, namely Cuttack (ID =14), Central Java (ID =12), Nanjing (ID =5), and
Japan (ID =8), which are located in the tropical and/or subtropical zone. For those
four locations, the predictions with the optimized parameters are closer in
agreement with the observations. Hence, the observation data force the model
predictions to increase in the northern latitudes and to decrease in the
tropics. This can also be seen in Figs. and
in the following section where we simulated the model globally and compared
default and optimized model predictions for the individual zones (discussed
below).
Gobal CH4 emission simulations
We simulated CLM4.5bgc to obtain predictions for the CH4 emissions on a
global scale and compared the predictions when using the default parameter
values and the optimized parameter values from the 11-dimensional cases.
Figure shows spatial plots of the average methane emissions
(mg CH4 m-2 d-1) and the zonal means (right hand side of the
plots) when using the default parameters (panel a), and the difference
between the predictions when using the default and the optimized parameters
for trial T1 (panel b). The figure shows that with the optimized parameters,
the CH4 emission predictions in the northern regions are larger than for
the default parameters. For the tropics, the predictions with the optimized
parameters are lower than when using the default values.
Comparison of total methane emissions (Tg CH4 yr-1) between
CLM4.5bgc and other models from natural wetlands. 1: ,
2: , 3: , 4: ,
5: , 6: , 7: , 8: , 9: CLM4Me, , 10: CLM4Me', ,
11: this study, CLM4.5bgc with default parameters, 12: this study, CLM4.5bgc
with d=11 optimized parameters of T1, 13: this study, CLM4.5bgc with d=11
optimized parameters of unweighted sum of RMSE, and 14: this study, CLM4.5bgc
with d=11 optimized parameters of zonally weighted RMSE. Note that number 7
is a top-down approach and number 9 may include the rice paddy emissions. For
number 8, no data were available for the tropics and the temperate zone.
Figure shows a comparison of the CH4 emission predictions
from several different models (models 1–10). We can see that globally the
predictions with the optimized parameters (model 12) were only slightly
higher than with the default parameters (model 11). However, the predictions
of CH4 emissions in the tropics are significantly lower than for the
default model and the predictions are also lower in comparison to all other
models (1–10). On the other hand, for the northern latitudes, CLM4.5bgc with
optimized parameters predicts significantly more CH4 emissions than the
default model and models 1–10 in the comparison. Hence, even though the
global average of predicted emissions did not change much, the distribution
of the predicted emissions between the tropical and the northern latitudes
changed significantly.
As indicated in the previous section, the observation data drive the model
to predict more CH4 emissions in northern latitudes and fewer emissions in
the tropics. We investigated whether our weighting scheme in
Eq. () may give too much influence to individual
observation sites or zones. Thus, we did an additional optimization trial of
the parameters in Table where we give each observation site
the same weight wi=1, i=1,…,16 (“unweighted”). We also did a
second additional optimization trial of the parameters in
Table where we give each zone the same influence on the
total RMSE in order to account for the location of the various observation
sites (“zonally weighted”). Thus, each location in the temperate zone (12 sites totally) has wi=1/36, and each location in the northern (2 sites)
and tropical (2 sites) zone, respectively, has the weight wi=1/6.
The spatial plots of the differences between the average methane emissions
when using default and optimized parameters for the unweighted trial are
shown in panel c of Fig. , and the spatial plots of the
differences when using the zonally weighted objective function is shown in
panel d of Fig. . The figures show that for both
additional trials, the CH4 emissions in the northern latitudes are even
further increased. Moreover, the bars for models 13 and 14 in
Fig. show the total methane emissions of the unweighted and
the zonally weighted trials, respectively. The zonally weighted trial
increases the global emissions, which is caused by larger emission
predictions in the temperate zone and the northern latitudes. In comparison
to the default CLM4.5bgc predictions, the unweighted trial shows a decrease
in the predicted emissions in the tropics and an increase in the predicted
emissions in the northern latitudes. Thus, even though it is suggested that
CLM4.5bgc with default parameter settings over-predicts the CH4 emissions
in high latitudes , the observation data argue that the
predictions should even be increased.
Conclusions
In this paper we used a surrogate optimization approach for calibrating the
parameters of the methane module of the Community Land Model (CLM4.5bgc).
Given only relatively few measurements at 16 observation sites (wetlands and
rice paddies) our goal was to explore the use of a surrogate optimization
method to improve the model prediction capability in a computationally
efficient way by minimizing the root mean squared error between the
measurements and the model's predictions. We identified important
methane-related parameters in CLM4.5bgc by doing a sensitivity analysis and
we were thus able to reduce the problem dimension from 21 to 11. We then used
a surrogate optimization approach for tuning the most important parameters in
order to solve the problem. We investigated two cases, namely a problem with
five of the most important parameters and a problem with all 11 parameters,
respectively.
We first used pseudo data in order to asses how well the surrogate
optimization performs and showed that we are able to closely match the pseudo
observations. We were able to reduce the RMSE to less than a fifth within the
first 150 function evaluations for both pseudo data cases. The objective
function was shown to have multiple local minima, which indicates that the
problem is probably not identifiable when 11 parameters were optimized.
Although the RMSE was greatly reduced by the optimization for the 11 parameter pseudo data case, the optimization results did not generate the
same values of the parameters in some cases as were used to generate the
pseudo data. This is a problem with the model, not with the optimization
method used. The multiple local minima detected in
Table indicate that a global optimization method
was needed. We used a surrogate global optimization method because the
objective function was expensive to evaluate and has multiple local minima.
The surrogate has been shown to reduce the number of objective function
evaluations (e.g. climate model simulations) required to obtain accurate
approximations of the global minimum and so it is designed for
computationally expensive models like climate modules.
By conducting the simulations globally and comparing the average predicted
emissions with default and optimized parameters, we could show that the total
global CH4 emissions did not change significantly.
However, the
distribution of the predicted emissions between latitudes changed
significantly. The observation data force the optimized model's CH4
emission predictions in the northern latitudes to increase and the predicted
emissions in the tropics to decrease. In comparison to other models,
CLM4.5bgc with both default and optimized parameters predicts significantly
more emissions in the northern latitudes and less emissions in the tropics.
Model equations
The methane biogeochemical model used in this study is integrated in the
Community Land Model version 4.5 (CLM4.5), which is the land component of the
Community Earth System Model (CESM, ). As discussed in more
detail in and , the model represents five
primary processes relevant to methane emission predictions. These processes
include methane production (P), oxidation (Roxic), ebullition
(E), transport through wetland plant aerenchyma (A), and diffusion
through soil (FDe) (described below). The methane gas and aqueous phase
concentrations (RC) in each soil layer of each grid box is calculated at
every time point using the following equation:
∂RC∂t=∂FDe∂z+P-E+A-Roxic.
In the following sections we consider each of these terms in more detail.
Methane production
Methane production (P) in the anaerobic portion of the soil column is
related to the grid cell estimate of heterotrophic respiration from soil and
litter corrected for various factors:
P=RH×f_ch4×q10ch4×fpHfpES,
where RH is the heterotrophic respiration from soil and litter (mol C m-2 s-1), and f_ch4 is the baseline fraction of
anaerobically mineralized C atoms becoming CH4 (i.e., CO2/CH4).
RH is corrected for its soil temperature dependence through a Q10 factor
(q10ch4), pH (fpH), redox potential (fpE),
and a factor accounting for the seasonal inundation fraction (S).
We adjusted the fractional inundation in each grid cell to account for a
changing redox potential.
fpE=filag(t)fi(t),
where the redox potential factor fpE is computed based on the
fractional inundation fi(t) and the adjusted fractional inundation
filag(t) that is producing methane.
The adjusted fractional inundation filag(t) is computed as
filag(t)=fi(t)-fredox(t),
where
fredox(t)=fi(t)-fi(t-1)+fredox(t-1)1-Δtredoxlag
is the fraction of the grid cell where alternative electron acceptors (such
as O2, NO3-, Fe+3, SO42- etc.) are consumed (methane
production is completely inhibited), Δt is the time step, and
redoxlag is the time constant parameter.
In the non-inundated fraction of a grid cell, we estimated the delay in
methane production as the water table depth increases by estimating an
effective depth below which CH4 production can occur
(Zilag):
Zilag(t)=Zi(t)-Zredox(t),
where
Zredox(t)=Zi(t)-Zi(t-1)+Zredox(t-1)1-Δtredoxlag
is the depth of the saturated water layer where alternative electron
acceptors are consumed at time t and Zi(t) is the actual water depth at
time t.
Additionally, we constrained the methane production using the soil pH
function fpH which is represented as
fpH=10-0.2335pH2+2.7727pH-8.6,
where pH represents the soil pH. fpH is bounded by two
parameters, namely pHmin and pHmax (i.e., pHmin < pH< pHmax). The maximum methane production occurs at
pH ≈ 6.2.
We used a scaling factor (S) to mimic the impacts of seasonal inundation on
methane production which is represented as
S=mino2lim(f-f‾)+f‾f,S≤1,
where f and f‾ are the instantaneous inundation fraction and annual
average inundation fraction weighted by heterotrophic respiration,
mino2lim is the anoxia factor that relates the fully anoxic
decomposition rate to the fully oxygen-unlimited decomposition rate.
Methane oxidation
Methane oxidation (Roxic) is represented with double
Michaelis-Menten kinetics:
Roxic=vmax_ch4_oxidCCH4k_m+CCH4CO2k_m_o2+CO2q10_ch4oxid×Fϑ,
where vmax_ch4_oxid is the maximum oxidation rate (mol m-3 s-1), q10_ch4oxid is the temperature dependence of the
reaction, k_m and k_m_o2 are the half saturation
coefficients with respect to CH4 and O2 concentrations (mol m-3),
CCH4 and CO2 are the methane and oxygen
concentrations in the soil (mol m-3), and Fϑ is the soil
moisture limitation factor for oxidation applied above the water table to
represent water stress for methanotrophs.
Fϑ is represented as:
Fϑ=exp-PPC,
where P and PC are the soil moisture potential and optimum water
potential (-2.4×105 mm). If the soil layer is above the water
table, the soil moisture limitation factor Fϑ is applied. To
account for high-CH4-affinity methanotrophs in upland soils, we used a
lower oxidation rate constant (vmax_oxid_unsat) and half
saturation coefficient with respect to CH4 concentrations
(k_m_unsat).
Methane transport through plant aerenchyma
The diffusive transport through aerenchyma A (mol m-2 s-1) from
each soil layer is represented in the model as:
A=C(z)-Cara+rob×zDpTρf,
where D is the free-air gas diffusion coefficient (m2 s-1), C(z)
and Ca are the gaseous concentrations at depth z and at the atmosphere
(mol m-3), ra is the aerodynamic resistance between the surface and
the atmospheric reference height (s m-1), rob is the ratio of
root length to vertical depth (obliquity), p is the porosity, T is the
specific aerenchyma area (m2 m-2), and ρf is the root density
as a function of depth. Oxygen concentrations can also diffuse into the soil
layer from the atmosphere via the reverse of the CH4 pathway.
Here, aerenchyma porosity is parameterized based on the plant functional
types (PFTs). A ratio is used to multiply upland vegetation aerenchyma
porosity by comparing to inundated systems:
p=p×unsat_aere_ratio
If the PFT is c3_arctic_grass, c3_nonarctic_grass, or c4_grass, then p=0.3. For the remaining PFTs, the porosity is multiplied by
nongrassporosratio (ratio of root porosity in non-grass to grass):
p=p×nongrassporosratio.
A minimum aerenchyma porosity is set to 0.05. Therefore, p is modified as:
p=max{p,porosmin}.
The aerenchyma area varies over the course of the growing season. Therefore,
it is parameterized using the simulated leaf area index as
T=fNNaL0.22πR2,
where L is the leaf area index (m2 m-2) (used from CLM4.5 model
simulation), Na is the maximum annual net primary production (NPP, mol m-2 s-1), R is the aerenchyma radius (2.9 × 10-3 m), and
fN is the below-ground fraction of the current NPP.
The aerenchyma area T is multiplied by a scale factor to adjust it:
T=T×scale_factor_aere.
The default value is 1.
Methane ebullition
The representation of the ebullition fluxes in the methane model is based
on . The simulated aqueous CH4 concentration in each soil
level is used to estimate the expected equilibrium gaseous partial pressure
as a function of temperature and pressure. When this partial pressure exceeds
vgc_max, bubbling occurs to remove CH4 to below this value,
modified by the fraction of CH4 in the bubbles (taken as 57 %). The
vgc_max parameter is the ratio of saturation pressure triggering
ebullition.
Aqueous and gaseous diffusion
Gaseous diffusivity in the soil depends on several factors such as molecular
diffusivity, soil structure, porosity, and organic matter content. The
relationship between effective diffusivity (De, m2 s-1) and soil
properties is represented as
De=D0θa2θaθs3b×scale_factor_gassdiff,
where θa and θs are the air-filled and saturated water-filled
porosity, b is the slope of the water retention curve, and
scale_factor_gassdiff is the scale factor for the gas diffusion
(the default value is 1).
Observation sites
Tables and show the information
about the wetland and rice paddy observation sites, respectively, where
methane emissions have been measured.
Wetland site data. P= precipitation, T= temperature.
Site NameLocationTimeWetland typeDominant vegetationMean P & TSoil and climate characteristicsMeas. techniquesForcing data sets∗ReferencesMichigan, USA42.45∘ N, 84.00∘ W1991–1993Ombrotrophic peatlandSphagnum, Scheuchzeria palustris, Vaccinium oxycoccosP: 761 mm (1948–1980)Soil pH: 4.2Static chamberMeasured WT positionsMinnesota, USA47.53∘ N, 266.33∘ E1991–1992Poorly minerotrophic to ombrotrophic peatlandSphagnum, Chamaedaphne calyculata, Scheuchzeria palustrisP: 553 mm, T: ≈ 13.6 ∘C for May–October periodSoil pH: 4.6Eddy correlation techniqueMeasured WT positionsAlberta, Canada54.60∘ N, 246.60∘ E1994–1996Nutrient rich fenCarex aquatilis and Carex rostrata–Soil pH: 7; the freeze-thaw cycle spans from May to OctOpen chamberMeasured WT positionsSalmisuo, Eastern Finland62.75∘ N, 30.93∘ E1993Minerogenic, oligotrophic pine fenSphagnum papillosumT: ≈ 10 ∘CWet condition from Jul to SeptStatic chamberMeasured WT positionsFlorida, USA30.07∘ N, 275.80∘ E1993SwampSagittaria lancifoliaAnnual P: ≈ 1400 mmSoil pH: 6.2Open chamberMeasured WT positionsPanama9.00∘ N, 80.00∘ E1987SwampPalms–Soil pH: 6.2; Feb to May is the dry seasonStatic chamberModeled WT positions
∗ All sites use NCEP atmospheric forcing; P is precipitation; T is temperature; WT is water table.
Rice paddy site data.
Site NameLocationYearDate of Field floodedDate of final drainagepHMeasurement techniquesSoil typeReferencesTexas, USA29.95∘ N, 265.50∘ E199417 May11 AugN/AChamberBernard-MoreyVercelli, Italy45.30∘ N, 8.42∘ E19917 May30 Aug6Static (closed) chamberSandy loamChengdu, China31.27∘ N, 105.45∘ E20039 May7 Sep8.1ChamberPurplishNanjing, China32.80∘ N, 118.75∘ E199918 Jun13 OctN/AChamberHydromorphicBeijing, China40.55∘ N, 116.78∘ E19954 Jun17 Oct7.99Automatic chamberSilty clay loamCalifornia, USA40.20∘ N, 237.98∘ E1982 & 198311 May (1982); 21 May (1983)2 Oct (1982); 1 Oct (1983)N/AStatic chamberCapay silty clayJapan36.02∘ N, 140.22∘ E1991 & 19937 May12 Aug (1991); 2 Sept (1993)6.6–6.9Automatic chamberGley soil (Sandy clay loam)New Delhi, India28.63∘ N, 77.12∘ E19951 Jul1 Nov8.2Closed chamber, manualUstochrept (sandy loam)Cuttack, India20.42∘ N, 85.92∘ E199619 Jul30 Oct6.19Automatic chamberHaplaquept (Alluvial)Central Java, Indonesia6.78∘ S, 110.15∘ E2001–20021 Nov28 Feb5.1Automatic closed chamberAeric Tropaquept (Silty loam)Parameters and references for bounds
Table shows the CH4 related parameters in CLM4.5bgc
and their literature reference information.
Parameter names, descriptions, ranges, and literature references.
NumberParameterDescriptionUnitsRangeReferences1q10ch4Q10 for methane productionunitless1–102f_ch4Ratio of CH4 production to total C mineralizationunitless0.05–0.5, Effective value will depend on temperature, redox and pH but cannot exceed 50 % based on stoichiometry (Bill Riley, personal communication)3redoxlagNumber of days to lag for productiondays15–454oxinhibInhibition of methane production by oxygenm3 mol-1200–6005pHmaxMaximum pH for methane productionunitless8–106pHminMinimum pH for methane productionunitless2–47vmax_ch4_oxidOxidation rate constantmol m-3-w/s1.25×10-6–1.25×10-48k_mMichaelis-Menten oxidation rate constant for CH4 conc.mol m-3-w5×10-4–5×10-29k_m_o2Michaelis-Menten oxidation rate constant for O2 conc.mol m-3-w0.002–0.210q10_ch4oxidQ10 oxidation constantunitless1–411k_m_unsatMichaelis-Menten oxidation rate constant for CH4 conc. in upland areasmol m-3-w5×10-5–5×10-312vmax_oxid_unsatOxidation rate constant in upland areasmol m-3-w/s1.25×10-7–1.25×10-513scale_factor_aereScale factor on the aerenchyma areaunitless0.2–514nongrassporosratioRatio of root porosity in non-grass to grassunitless0.2–0.515porosminMinimum aerenchyma porosityunitless0.01–0.216robRatio of root length to vertical depth (“root obliquity”)unitless2–4. This parameter is poorly constrained.17unsat_aere_ratioRatio to multiply upland vegetation aerenchyma porosity by compared to inundated systemsunitless0.1–0.25Not available in literature. The reasonable range could be between 0.1 and 0.25. used this range for sensitivity.18vgc_maxRatio of saturation pressure triggering ebullitionunitless0.05–0.319scale_factor_gasdiffScale factor for gas diffusionunitless1–5Range not available. Reasonable range is 1–5 for sensitivity analyses.20atmch4Atm. CH4 mixing ratio to prescribemol mol-11.7×10-7–1.7×10-5Range not available. Variable range; global average is ≈1.7×10-621mino2limMin. anaerobic decomposition rate as a fraction of potential aerobic rateunitless0.05–0.45Range not available in the literature. The default value (0.2) is from . The reasonable range could be between 0.05 and 0.45 to adjust effect of anoxia on decomposition rate (used to calculate seasonal inundation factor). The range is considered based on knowledge.Weights used for RMSE computation in Eq. (1) of the paper
Table contains information about the weights used for each
observation site when computing the objective function value.
ID, name of observation sites, and associated weights for real data and pseudo data case (Eq. 1 of the main document).
IDLocationwi real datawi pseudo data1Alberta0.03270.06562Florida0.00780.00673Michigan0.02800.15994Minnesota0.09380.07835Nanjing0.05660.01496Vercelli0.01980.03827Texas0.02670.01898Japan0.04410.01539California0.04210.068410New Delhi0.27870.170711Beijing0.10530.118912Central Java0.08100.014313Chengdu0.02830.057114Cuttack0.09680.010415Panama Swamp0.01770.079516Salmisuo0.04050.0827Acknowledgements
The authors want to acknowledge the funding sources DOE SciDAC DE-SC0006791,
NSF 1049031, NSF 1049033, and NSF CISE 1116298. The first author also wants
to acknowledge partial support by the U.S. Department of Energy, Office of
Science, Office of Advanced Scientific Computing Research, Applied
Mathematics program under contract number DE-AC02005CH11231. We thank the
anonymous reviewers for their helpful comments and improvement suggestions.
Edited by: A. Sandu
References
Adhya, T., Bharati, K., Mohanty, S., Ramakrishnan, B., Rao, V., Sethunathan,
N., and Wassmann, R.: Methane emission from rice fields at Cuttack,
India, Nutr. Cycl. Agroecosys., 58, 95–105, 2000.
Aleman, D., Romeijn, H., and Dempsey, J.: A response surface approach to beam
orientation optimization in intensity modulated radiation therapy treatment
planning, INFORMS J. Comput., 21, 62–76, 2009.
Arah, J. and Stephen, K.: A model of the processes leading to methane emission
from peatland, Atmos. Environ., 32, 3257–3264, 1998.
Aselmann, I. and Crutzen, P.: Global distribution of natural fresh-water
wetlands and rice paddies, their net primary productivity, seasonality and
possible methane emsissions, J. Atmos. Chem., 8, 307–358,
1989.Baird, A., Beckwith, C., Waldron, S., and Waddington, J.: Ebullition of
methane-containing gas bubbles from near surface Sphagnum peat, Geophys.
Res. Lett., 31, L21505, 10.1029/2004GL021157, 2004.
Bartlett, K. and Harriss, R.: Review and assessment of methane emissions from
wetlands, Chemosphere, 26, 261–320, 1993.
Bartlett, K., Crill, P., Bonassi, J., Richey, J., and Harriss, R.: Methane flux
from the Amazon River floodplain: Emissions during rising water, J. Geophys. Res., 95, 16773–16788, 1990.Bender, M. and Conrad, R.: Kinetics of CH4 oxidation in oxic soils exposed
to ambient air or high CH4 mixing ratios, FEMS Microbiol. Ecol., 101,
261–270, 1992.
Bloom, A., Palmer, P., Fraser, A., Reay, D., and Frankenberg, C.: Large-Scale
Controls of Methanogenesis Inferred from Methane and Gravity Spaceborne Data,
Science, 327, 322–325, 2010.
Booker, A., Dennis Jr., J., Frank, P., Serafini, D., Torczon, V., and
Trosset, M.: A rigorous framework for optimization of expensive functions by
surrogates, Struct. Multidiscip. O., 17, 1–13, 1999.
Bousquet, P., Ciais, P., Miller, J., Dlugokencky, E., Hauglustaine, D.,
Prigent, C., Van der Werf, G., Peylin, P., Brunke, E., Carouge, C.,
Langenfelds, R., Lathiere, J., Papa, F., Ramonet, M., Schmidt, M., Steele,
L., Tyler, S., and White, J.: Contribution of anthropogenic and natural
sources to atmospheric methane variability, Nature, 443, 439–443, 2006.
Butterbach-Bahl, K., Papen, H., and Rennenberg, H.: Impact of gas transport
through rice cultivars on methane emission from rice paddy fields, Plant Cell
Environ., 20, 1175–1183, 1997.
Cao, M., Marshall, S., and Gregson, K.: Global carbon exchange and methane
emissions from natural wetlands: Application of a process-based model,
J. Geophys. Res., 101, 14399–14414, 1996.
Cheng, W., Yagi, K., Akiyama, H., Nishimura, S., Sudo, S., Fumoto, T.,
Hasegawa, T., Hartley, A., and Megonigal, J.: An empirical model of soil
chemical properties that regulate methane production in Japanese rice paddy
soils, J. Environ. Qual., 36, 1920–1925, 2007.Ciais, P., Gasser, T., Paris, J., Caldeira, K., Raupach, M., Canadell, J.,
Patwardhan, A., Friedlingstein, P., Piao, S., and Gitz, V.: Attributing the
increase in atmospheric CO2 to emitters and absorbers, Nature Clim.
Change, 3, 926–930, 2013.
Cicerone, R., Shetter, J., and Delwiche, C.: Seasonal-variation of methane flux
from a California rice paddy, J. Geophys. Res.-Oceans, 88,
1022–1024, 1983.
Cicerone, R., Delwiche, C., Tyler, S., and Zimmerman, P.: Methane emissions
from California rice paddies with varied treatments, Global Biogeochem.
Cy., 6, 233–248, 1992.
Colmer, T.: Long-distance transport of gases in plants: a perspective on
internal aeration and radial oxygen loss from roots, Plant Cell Environ., 26, 17–36, 2003.Computational and Information Systems Laboratory: Yellowstone: IBM
iDataPlex System (Wyoming-NCAR Alliance), Boulder, CO, USA:
National Center for Atmospheric Research.
http://n2t.net/ark:/85065/d7wd3xhc (last access: 15 October 2015), 2012.
Conrad, R.: Control of microbial methane production in wetland rice fields,
Nutr. Cycl. Agroecosys., 64, 59–69, 2002.
Cronk, J. and Fennessy, M.: Wetland Plants: Biology and Ecology, Lewis
Publishers, Boca Raton, FL., USA, 2001.
Davis, E. and Ierapetritou, M.: Kriging based method for the solution of
mixed-integer nonlinear programs containing black-box functions, J. Global
Optim., 43, 191–205, 2009.
Dlugokencky, E., Nisbet, E., Fisher, R., and Lowry, D.: Global atmospheric
methane: budget, changes and dangers, Phil. T. R. Soc. A, 369, 2058–2072, 2011.
Dunfield, P., Knowles, R., Dumont, R., and Moore, T.: Methane production and
consumption in temperate and subarctic peat soils: response to temperature
and pH, Soil Biol. Biochem., 25, 321–326, 1993.
Forrester, A., Sóbester, A., and Keane, A.: Engineering Design via Surrogate
Modelling – A Practical Guide, John Wiley & Sons Ltd, Chichester, UK, 2008.
Friedman, J.: Multivariate Adaptive Regression Splines, The Annals of
Statistics, 19, 1–141, 1991.
Giunta, A., Balabanov, V., Haim, D., Grossman, B., Mason, W., Watson, L., and
Haftka, R.: Aircraft multidisciplinary design optimisation using design of
experiments theory and response surface modelling, Aeronaut. J., 101,
347–356, 1997.
Goel, T., Haftka, R. T., Shyy, W., and Queipo, N. V.: Ensemble of Surrogates,
Struct. Multidiscip. O., 33, 199–216, 2007.
Grunfeld, S. and Brix, H.: Methanogenesis and methane emissions: effects of
water table, substrate type and presence of Phragmites australis, Aquat.
Bot., 64, 63–75, 1999.
Gutmann, H.: A Radial Basis Function Method for Global Optimization, J.
Global Optim., 19, 201–227, 2001.
Han, X., Hendricks Franssen, H.-J., Montzka, C., and Vereecken, H.: Soil
moisture and soil properties estimation in the Community Land Model
with synthetic brightness temperature observations, Water Resources Research,
50, 6081–6105, 2014.
Huang, Y., Jaing, J., Zong, L., Sass, R., and Fisher, F.: Comparison of field
measurements of CH4 emission from rice cultivation in Nanjing, China
and in Texas, USA, Adv. Atmos. Sci., 18, 1121–1130,
2001.
Hurrell, J., Holland, M., Gent, P., Ghan, S., Kay, J., Kushner, P., Lamarque,
J.-F., Large, W., Lawrence, D., Lindsay, K., Lipscomb, W., Long, M.,
Mahowald, N., Marsh, D., Neale, R., Rasch, P., Vavrus, S., Vertenstein, M.,
Bader, D., Collins, W., Hack, J., Kiehl, J., and Marshall, S.: The Community
Earth System Model: A Framework for Collaborative Research, B. Am.
Meteorol. Soc., 94, 1339–1360, 2013.
Jain, M., Kumar, S., Wassmann, R., Mitra, S., Singh, S., Singh, J., Singh, R.,
Yadav, A., and Gupta, S.: Methane emissions from irrigated rice fields in
northern India (New Delhi), Nutr. Cycl. Agroecosys., 58,
75–83, 2000.
Jiang, C., Wang, Y., Zheng, X., Zhu, B., Huang, Y., and Hao, Q.: Methane and
nitrous oxide emissions from three paddy rice based cultivation systems in
southwest China, Adv. Atmos. Sci., 23, 415–424, 2006.
Jones, D., Schonlau, M., and Welch, W.: Efficient Global Optimization of
Expensive Black-Box Functions, J. Global Optim., 13, 455–492,
1998.
Keller, M. M.: Biological Sources and Sinks of Methane in Tropical Habitats and
Tropical Atmospheric Chemistry, PhD thesis, Princeton University, Princeton, USA,
1990.Kellner, E., Baird, A., Oosterwoud, M., Harrison, K., and Waddington, J.:
Effect of temperature and atmospheric pressure on methane (CH4) ebullition
from near surface peats, Geophys. Res. Lett., 33, L18405,
10.1029/2006GL027509, 2006.
Knoblauch, C.: Bodenkundlich-mikrobiologische Bestandsaufnahme zur
Methanoxidation in einer Flussmarsch der Tide-Elbe, Master's thesis,
University of Hamburg, Hamburg, Germany, 1994.Koven, C. D., Riley, W. J., Subin, Z. M., Tang, J. Y., Torn, M. S., Collins,
W. D., Bonan, G. B., Lawrence, D. M., and Swenson, S. C.: The effect of
vertically resolved soil biogeochemistry and alternate soil C and N models on
C dynamics of CLM4, Biogeosciences, 10, 7109–7131,
10.5194/bg-10-7109-2013, 2013.Lo, M.-H., Famiglietti, J., Yeh, P.-F., and Syed, T.: Improving parameter
estimation and water table depth simulation in a land surface model using
GRACE water storage and estimated base flow data, Water Resour.
Res., 46, W05517, 10.1029/2009WR007855, 2010.
Lombardi, J., Epp, M., and Chanton, J.: Investigation of the methyl fluoride
technique for determining rhizospheric methane oxidation, Biogeochemistry,
36, 153–172, 1997.
Matthews, E. and Fung, I.: Methane emisiion from natural wetlands: global
distribution, area, and environmental characteristics of sources, Global
Biogeochem. Cy., 1, 61–86, 1987.Meng, L., Hess, P. G. M., Mahowald, N. M., Yavitt, J. B., Riley, W. J.,
Subin, Z. M., Lawrence, D. M., Swenson, S. C., Jauhiainen, J., and Fuka, D.
R.: Sensitivity of wetland methane emissions to model assumptions:
application and model testing against site observations, Biogeosciences, 9,
2793–2819, 10.5194/bg-9-2793-2012, 2012.Moore, D., Hub, J., Sacks, W. J., Schimel, D., and Monson, R.: Estimating
transpiration and the sensitivity of carbon uptake to water availability in a
subalpine forest using a simple ecosystem process model informed by measured
net CO2 and H2O fluxes, Agr. Forest Meteorol.,
148, 1467–1477, 2008.Mugunthan, P., Shoemaker, C., and Regis, R.: Comparison of function
approximation, heuristic, and derivative-based methods for automatic
calibration of computationally expensive groundwater bioremediation models,
Water Resour. Res., 41, W11427, 10.1029/2005WR004134, 2005.
Müller, J. and Piché, R.: Mixture Surrogate Models Based on
Dempster-Shafer Theory for Global Optimization Problems, J.
Global Optim., 51, 79–104, 2011.Müller, J. and Shoemaker, C.: Influence of ensemble surrogate models and
sampling strategy on the solution quality of algorithms for computationally
expensive black-box global optimization problems, J. Global
Optim., 60, 123–144, 10.1007/s10898-014-0184-0, 2014.
Müller, J., Shoemaker, C., and Piché, R.: SO-MI: A Surrogate Model
Algorithm for Computationally Expensive Nonlinear Mixed-Integer Black-Box
Global Optimization Problems, Comput. Oper. Res., 40,
1383–1400, 2013.
Myers, R. and Montgomery, D.: Response Surface Methodology, Process and Product
Optimization using Designed Experiments, Wiley-Interscience Publication, New Jersey, USA,
1995.
Myhre, G., Shindell, D., Bréon, F.-M., Collins, W., Fuglestvedt, J., Huang,
J., Koch, D., Lamarque, J.-F., Lee, D., Mendoza, B., Nakajima, T., Robock,
A., Stephens, G., Takemura, T., and Zhang, H.: Anthropogenic and Natural
Radiative Forcing, in: Climate Change 2013: The Physical Science Basis.
Contribution of Working Group I to the Fifth Assessment Report of the
Intergovernmental Panel on Climate Change, Cambridge University Press,
Cambridge, UK and New York, NY, USA, 2013.Oleson, K., Lawrence, D., Bonan, G., Drewniak, B., Huang, M., Koven, C., Levis,
S., Li, F., Riley, W., Subin, Z., Swenson, S., Thornton, P., Bozbiyik, A.,
Fisher, R., Kluzek, E., Lamarque, J.-F., Lawrence, P., Leung, L., Lipscomb,
W., Muszala, S., Ricciuto, D., Sacks, W., Sun, Y., Tang, J., and Yang, Z.-L.:
Technical Description of Version 4.5 of the Community Land Model (CLM),
Tech. Rep. NCAR/TN-503+STR, National Center for Atmospheric Research,
Boulder, CO, USA, 10.5065/D6RR1W7M, 2013.
Popp, T. J., Chanton, J. P., Whiting, G. J., and Grant, N.: Evaluation of
Methane Oxidation in the Rhizosphere of Carex Dominated Fen in North Central
Alberta, Canada, Biogeochemistry, 51, 259–281, 2000.
Powell, M.: The Theory of Radial Basis Function Approximation in 1990, Advances
in Numerical Analysis, vol. 2: wavelets, subdivision algorithms and radial
basis functions, Oxford University Press, Oxford, UK, 105–210, 1992.Prigent, C., Papa, F., Aires, F., Rossow, W., and Matthews, E.: Global
inundation dynamics inferred from multiple satellite observations, 1993–2000,
J. Geophys. Res., 112, D12107, 10.1029/2006JD007847, 2007.
Prihodko, L., Denning, A., Hanan, N., Baker, I., and Davis, K.: Sensitivity,
uncertainty and time dependence of parameters in a complex land surface
model, Agr. Forest Meteorol., 148, 268–287, 2008.
Qian, T., Dai, A., Trenberth, K., and Oleson, K.: Simulation of global land
surface conditions from 1948 to 2004. Part I: Forcing data and
evaluations, J. Hydrometeorol., 7, 953–975, 2006.
Ray, J. and Swiler, L.: Bayesian calibration of the Community Land Model
using surrogates, Tech. Rep. SAND2014-0867, Sandia National Laboratories,
Livermore, CA, USA, 2014.
Regis, R.: Stochastic radial basis function algorithms for large-scale
optimization involving expensive black-box objective and constraint
functions, Comput. Oper. Res., 38, 837–853, 2011.
Regis, R. and Shoemaker, C.: A Stochastic Radial Basis Function Method for
the Global Optimization of Expensive Functions, INFORMS J. Comput.,
19, 497–509, 2007.
Regis, R. and Shoemaker, C.: Parallel Stochastic Global Optimization Using
Radial Basis Functions, INFORMS J. Comput., 21, 411–426, 2009.
Regis, R. and Shoemaker, C.: Combining radial basis function surrogates and
dynamic coordinate search in high-dimensional expensive black-box
optimization, Eng. Optimiz., 45, 529–555, 2013.Riley, W. J., Subin, Z. M., Lawrence, D. M., Swenson, S. C., Torn, M. S.,
Meng, L., Mahowald, N. M., and Hess, P.: Barriers to predicting changes in
global terrestrial methane fluxes: analyses using CLM4Me, a methane
biogeochemistry model integrated in CESM, Biogeosciences, 8, 1925–1953,
10.5194/bg-8-1925-2011, 2011.Ringeval, B., de Noblet-Ducoudre, N., Ciais, P., Bousquet, P., Prigeent, C.,
Papa, F., and Rossow, W.: An attempt to quantify the impact of changes in
wetland extent on methane emissions on the seasonal and interannual time
scales, Global Biogeochem. Cy., 24, GB2003, 10.1029/2008gb003354,
2010.
Saarnio, S., Alm, J., Silvola, J., Lohila, A., Nykänen, H., and Martikainen,
P.: Seasonal Variation in CH4 Emissions and Production and Oxidation
Potentials at Microsites on an Oligotrophic Pine Fen, Oecologia, 110,
414–422, 1997.Schuh, A. E., Denning, A. S., Corbin, K. D., Baker, I. T., Uliasz, M.,
Parazoo, N., Andrews, A. E., and Worthy, D. E. J.: A regional high-resolution
carbon flux inversion of North America for 2004, Biogeosciences, 7,
1625–1644, 10.5194/bg-7-1625-2010, 2010.
Segers, R.: Methane production and methane consumption: a review of processes
underlying wetland methane fluxes, Biogeochemistry, 41, 23–51, 1998.
Segers, R. and Kengen, S.: Methane production as a function of anaerobic carbon
mineralization: A process model, Soil Biol. Biochem., 30,
1107–1117, 1998.
Setyanto, P., Rosenami, A., Boer, R., Fauziah, C., and Khanif, M.: The effect
of rice cultivars on methane emission from irrigated rice field, Indonesian
Journal of Agricultural Sciences, 5, 20–31, 2004.
Shannon, R. D. and White, J. R.: 3-Year Study of Controls on Methane Emissions
from 2 Michigan Peatlands, Biogeochemistry, 27, 35–60, 1994.
Shurpali, N. J. and Verma, S. B.: Micrometeorological measurements of methane
flux in a Minnesota peatland during two growing seasons, Biogeochemistry, 40,
1–15, 1998.
Sigren, L., Lewis, S., Fisher, F., and Sass, R. L.: Effects of field drainage
on soil parameters related to methane production and emision from rice
paddies, Global Biogeochem. Cy., 11, 151–162, 1997.
Simpson, T., Mauery, T., Korte, J., and Mistree, F.: Kriging metamodels for
global approximation in simulation-based multidisciplinary design
optimization, AIAA J., 39, 2233–2241, 2001.
Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., and Järvinen,
H.: Efficient MCMC for climate model parameter estimation: Parallel
adaptive chains and early rejection, Bayesian Analysis, 7, 715–736, 2012.Subin, Z., Riley, W., and Mironov, D.: Improved lake model for climate
simulations, J. Adv. Model. Earth Syst., 4, M02001, 10.1029/2011MS000072,
2012.Sun, Y., Hou, Z., Huang, M., Tian, F., and Ruby Leung, L.: Inverse modeling
of hydrologic parameters using surface flux and runoff observations in the
Community Land Model, Hydrol. Earth Syst. Sci., 17, 4995–5011,
10.5194/hess-17-4995-2013, 2013.Swenson, S. and Lawrence, D.: A New Fractional Snow Covered Area
Parameterization for the Community Land Model and its Effect on the Surface
Energy Balance, J. Geophys. Res., 117, D21107,
10.1029/2012JD018178, 2012.Swenson, S., Lawrence, D., and Lee, H.: Improved Simulation of the
Terrestrial Hydrological Cycle in Permafrost Regions by the Community Land
Model, J. Adv. Model. Earth Syst., 4, M08002, 10.1029/2012MS000165,
2012.Thornton, P., Lamarque, J., Rosenbloom, N., and Mahowald, N.: Influence of
carbon-nitrogen cycle couplng on land model response to CO2
fertilization and climate variability, Global Biogeochem. Cy., 21, GB4018, 10.1029/2006GB002868,
2007.Thornton, P. E., Doney, S. C., Lindsay, K., Moore, J. K., Mahowald, N.,
Randerson, J. T., Fung, I., Lamarque, J.-F., Feddema, J. J., and Lee, Y.-H.:
Carbon-nitrogen interactions regulate climate-carbon cycle feedbacks: results
from an atmosphere-ocean general circulation model, Biogeosciences Discuss.,
6, 3303–3354, 10.5194/bgd-6-3303-2009, 2009.Tian, X., Xie, Z., and Dai, A.: A land surface soil moisture data assimilation
system based on the dual-UKF method and the Community Land Model,
J. Geophys. Res.-Atmos., 113, D14127, 10.1029/2007JD009650, 2008.
Turner, D., Ritts, W., Wharton, S., Thomas, C., Monson, R., Black, T., and
Falk, M.: Assessing FPAR source and parameter optimization scheme in
application of a diagnostic carbon flux model, Remote Sens. Environ.,
113, 1529–1539, 2009.
Viana, F., Haftka, R., and Steffen Jr., V.: Multiple surrogates: how
cross-validation errors can help us to obtain the best predictor, Struct.
Multidiscip. O., 39, 439–457, 2009.
Walter, B. and Heimann, M.: A process-based, climate-sensitive model to derive
methane emissions from natural wetlands: Application to five wetland sites,
sensitivity to model parameters, and climate, Global Biogeochem. Cy.,
14, 745–765, 2000.
Walter, B., Heimann, M., and Matthews, E.: Modeling modern methane emissions
from natural wetlands 1. Model description and results, J. Geophys.
Res.-Atmos., 106, 34189–34206, 2001.
Wang, Z., Xu, Y., Li, Z., Guo, Y., Wassmann, R., Neue, H., Lantin, R.,
Buendia, L., Ding, Y., and Wang, Z.: A four-year record of methane emissions
from irrigated rice fields in the Beijing region of China, Nutr. Cycl.
Agroecosys., 58, 55–63, 2000.Wania, R., Ross, I., and Prentice, I. C.: Implementation and evaluation of a
new methane model within a dynamic global vegetation model: LPJ-WHyMe v1.3.1,
Geosci. Model Dev., 3, 565–584, 10.5194/gmd-3-565-2010, 2010.Whalen, S. and Reeburgh, W.: Moisture and temperature sensitivity of CH4
oxidation in boreal soils, Soil Biol. Biochem., 28, 1271–1281, 1996.
Wild, S. and Shoemaker, C.: Global convergence of radial basis function
trust-region algorithms for derivative-free optimization, SIAM Review, 55,
349–371, 2013.
Yagi, K., Tsuruta, H., Kanda, K., and Minami, K.: Effect of water management on
methane emission from a Japanese rice paddy field: Automated methane
monitoring, Global Biogeochem. Cy., 10, 255–267, 1996.Yang, B., Qian, Y., Lin, G., Leung, R., and Zhang, Y.: Some issues in
uncertainty quantification and parameter tuning: a case study of convective
parameterization scheme in the WRF regional climate model, Atmos. Chem.
Phys., 12, 2409–2427, 10.5194/acp-12-2409-2012, 2012.Yang, B., Qian, Y., Lin, G., Leung, L., Rasch, P., Zhang, G., McFarlane, S.,
Zhao, C., Zhang, Y., Wang, H., Wang, M., and Liu, X.: Uncertainty
quantification and parameter tuning in the CAM5 Zhang-McFarlane
convection scheme and impact of improved convection on the global circulation
and climate, J. Geophys. Res.-Atmos., 118, 395–415,
2013.
Zeng, X., Drewniak, B. A., and Constantinescu, E. M.: Calibration of the Crop
model in the Community Land Model, Geosci. Model Dev. Discuss., 6, 379–398,
10.5194/gmdd-6-379-2013, 2013.
Zhang, Y., Li, C., Trettin, C., Li, H., and Sun, G.: An integrated model of
soil, hydrology, and vegetation for carbon dynamics in wetland ecosystems,
Global Biogeochem. Cy., 16, 1–17, 2002.Zhu, Q., Liu, J., Peng, C., Chen, H., Fang, X., Jiang, H., Yang, G., Zhu, D.,
Wang, W., and Zhou, X.: Modelling methane emissions from natural wetlands:
TRIPLEX-GHG model integration, sensitivity analysis, and calibration, Geosci.
Model Dev. Discuss., 6, 5423–5473, 10.5194/gmdd-6-5423-2013, 2013.Zhuang, Q., Melillo, J., Kicklighter, D., Prinn, R., McGuire, A., Steudler, P.,
Felzer, B., and Hu, S.: Methane fluxes between terrestrial ecosystems and the
atmosphere at northern high latitudes during the past century: A
retrospective analysis with a process-based biogeochemistry model, Global
Biogeochem. Cy., 18, GB3010, 10.1029/2004GB002239, 2004.