Introduction and motivation

GMD

Geoscientific Model Development

GMD

Geosci. Model Dev.

1991-9603

Copernicus GmbH

Göttingen, Germany

10.5194/gmd-8-3285-2015

CH4 parameter estimation in CLM4.5bgc using surrogate global optimization

Müller

juliane.mueller2901@gmail.com Paudel

Shoemaker

C. A.

Woodbury

Wang

Mahowald

1Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 2Earth and Atmospheric Sciences, Cornell University, Ithaca, NY 14853, USA 3School of Civil and Environmental Engineering, Cornell University, Ithaca, NY 14853, USA

J. Müller (juliane.mueller2901@gmail.com)

20October2015

8 10 32853310 4November2014 6January2015 24June2015 5October2015

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://gmd.copernicus.org/articles/8/3285/2015/gmd-8-3285-2015.html

The full text article is available as a PDF file from https://gmd.copernicus.org/articles/8/3285/2015/gmd-8-3285-2015.pdf

Over the anthropocene methane has increased dramatically. Wetlands are one of the major sources of methane to the atmosphere, but the role of changes in wetland emissions is not well understood. The Community Land Model (CLM) of the Community Earth System Models contains a module to estimate methane emissions from natural wetlands and rice paddies. Our comparison of CH4 emission observations at 16 sites around the planet reveals, however, that there are large discrepancies between the CLM predictions and the observations. The goal of our study is to adjust the model parameters in order to minimize the root mean squared error (RMSE) between model predictions and observations. These parameters have been selected based on a sensitivity analysis. Because of the cost associated with running the CLM simulation (15 to 30 min on the Yellowstone Supercomputing Facility), only relatively few simulations can be allowed in order to find a near-optimal solution within an acceptable time. Our results indicate that the parameter estimation problem has multiple local minima. Hence, we use a computationally efficient global optimization algorithm that uses a radial basis function (RBF) surrogate model to approximate the objective function. We use the information from the RBF to select parameter values that are most promising with respect to improving the objective function value. We show with pseudo data that our optimization algorithm is able to make excellent progress with respect to decreasing the RMSE. Using the true CH4 emission observations for optimizing the parameters, we are able to significantly reduce the overall RMSE between observations and model predictions by about 50 %. The methane emission predictions of the CLM using the optimized parameters agree better with the observed methane emission data in northern and tropical latitudes. With the optimized parameters, the methane emission predictions are higher in northern latitudes than when the default parameters are used. For the tropics, the optimized parameters lead to lower emission predictions than the default parameters.

Introduction and motivation

Methane is the second most important greenhouse gas in terms of radiative forcing and thus a major concern regarding climate change. Natural wetlands as well as human activities such as agriculture (for example, rice cultivation) contribute to the methane emissions . The role of wetlands in the total budget of methane, as well as in driving inter-annual variability and changes in the methane growth rate is not well understood (e.g. ). The Community Land Model (CLM), which is the land component of the Community Earth System Model (CESM), is equipped with a methane module that models methane emissions . There are several parameters in CLM related to the methane emission computations. The methane emissions estimated by the model are sensitive to the exact parameter values although these parameters are not well known (e.g. ). and reported significant differences in model simulations and observations in both site-level methane emissions and the global budget. One important source of uncertainty is associated with the parametrization since the methane module has numerous parameters and they are yet to be identified empirically due to the lack of field data . In this study our goal is to use surrogate model optimization techniques in order to adjust the methane-related parameters of the CLM such that the differences between the simulated and observed methane emissions at 16 sites around the globe are minimized.

For computing an objective function value, we have to do a computationally expensive simulation with CLM4.5bgc in order to obtain the methane emission predictions at each observation site. CLM4.5bgc and related codes are deterministic models, i.e., the simulated CH4 emissions for a given parameter set will always be the same whenever we run the model for the same parameter set. In an optimization framework where the goal is to find the best set of parameters to minimize the objective function, one obstacle is the computation time that is needed to obtain a single objective function value. Only a few hundred function evaluations can be allowed in order to obtain a solution within a reasonable time. Moreover, the objective function value must be computed by running a simulation model, and thus an analytic description of the objective function is not available (black-box). Therefore, gradient information, which is important for many optimization algorithms, is not available. Due to the black-box nature of the objective function, it is also not known whether or not the objective function is convex and has only one local minimum (which corresponds to the global minimum) or if there are several local and global minima in the objective function landscape.

These characteristics of the objective function (computationally expensive, black-box, possibly multi-modal) do not allow the application of a gradient-based optimization algorithm because, on the one hand, the derivatives would have to be computed numerically (which may be inaccurate and requires many expensive function evaluations), and, on the other hand, gradient-based algorithms generally stop at a local minimum if the initial guess is not close to the global minimum.

For calibrating the parameters of other CLM modules, Markov Chain Monte Carlo (MCMC) methods and Kalman filters have been used in the literature . MCMC, however, requires generally thousands of function evaluations and is thus not applicable for obtaining solutions in an acceptable time for computationally expensive problems. When using Ensemble Kalman Filters, assumptions about the underlying parameter distributions must be made and generally a large number of observations is necessary for the method to be effective. Furthermore, evolutionary strategies such as simulated annealing, particle swarm, and differential evolution methods have been used for parameter tuning in the climate area . However, these methods generally require many function evaluations in order to obtain good solutions.

Other methods that have recently gained interest for parameter tuning are based on data assimilation (see, for example, ). In order to produce good parameter estimates, these methods require in general many observations. In our optimization problem, however, the number of observations at each site is very low (between 10 and 79 observations distributed over 1 to 3 years), and thus data assimilation techniques are not suitable because of the low number of observations. use a computationally cheap surrogate for CLM on which MCMC is used to reduce the number of costly simulations required during the optimization. In contrast to , we apply an adaptive surrogate model during the optimization. Instead of relying on a surrogate that is based only on a limited number of initial sample points, we iteratively improve our surrogate by incorporating new data (new objective function values) that become available during the optimization.

We use surrogate model based global optimization algorithms because they have been shown to find near-optimal solutions within a few hundred function evaluations for computationally expensive multimodal black-box problems . Surrogate models are used as computationally cheap approximations of the objective function. During the optimization, information from the surrogate model is used to carefully select a new promising point in the variable domain at which the computationally expensive objective function will be evaluated. The surrogate model is updated throughout the optimization whenever new data are obtained.

Several surrogate model algorithms have been developed in the literature that use different surrogate model types. The efficient global optimization algorithm by , for example, uses a kriging surrogate model and selects a new sample point by maximizing an expected improvement function. uses radial basis function (RBF) surrogate models to approximate the expensive objective function and a new sample point is selected by minimizing a so-called bumpiness measure. also use RBF models and new function evaluation points are selected by a stochastic method. developed a framework for automatically computing ensembles of various surrogate model types and extended the study to investigate the influence of different sampling strategies on the solution quality. Here for the first time we apply a state-of-the-art RBF surrogate optimization algorithm to the problem of land surface emissions of methane and describe the results. As far as we know, no other groups have applied optimization techniques to find better parameters for methane emission models, and thus our work represents an innovative approach to an important land–atmosphere interaction.

The remainder of this paper is organized as follows. In Sect. we briefly describe the CLM and the configuration we used for predicting the methane emissions and we give information about the individual observation sites. We also provide the mathematical description of the optimization problem. In Sect. we summarize the methane-related parameters in CLM4.5bgc and show the results of a sensitivity analysis with which we determined the parameters that are most important for the optimization. We describe the surrogate optimization approach for solving the problem in Sect. . Section contains information about the setup of our numerical experiments and we discuss the results of the optimization. We draw conclusions in Sect. . The Appendix contains additional information about the methane equations and the observation sites.

Model description, configuration, and mathematical problem description Model description

We used the Community Land Model Version 4.5 (CLM4.5), a land component of the Community Earth System Model (CESM) which contains a detailed biophysics, hydrology, and biogeochemistry representation . CLM4.5 is fully prognostic with respect to the carbon and nitrogen state variables in the vegetation, litter, and soil organic matter, as well as methane emissions and it is the most updated version of the model available.

We selected the latest version of CLM with improved biogeochemistry (CLM4.5bgc) over CLM4.0-CN. The major improvements in CLM4.5bgc include the incorporation of vertically resolved soil carbon dynamics, an alternate decomposition cascade from the Century soil model, and a more detailed representation of nitrification and denitrification based on the Century nitrogen model . The hydrology of CLM4.5 has been improved to better represent the hydraulic properties of frozen soils, perched water tables, snow cover fraction, and lakes .

In previous versions, simulation of ecosystem productivity was too low in high latitudes and perhaps too high in low latitudes . However, CLM4.5bgc has substantially increased the productivity in high latitudes, which may be overpredicted .

We used a mechanistic methane emission model, which is a module integrated in CLM4.5bgc . The model simulates the physical and biogeochemical processes regulating terrestrial methane fluxes such as methane production, methane oxidation, methane and oxygen transport through aerenchyma of wetland plants, ebullition, and methane and oxygen diffusion through soil . added constraints on methane emissions such as the effects of redox potential and soil pH to improve the predictions of methane emissions as well as the ability to simulate satellite-derived inundation fraction .

The model has been compared to the limited site-level observations of methane emissions (many of the sites have very sparse spatial and temporal data coverage, and directly measured climate forcing was unavailable at any of the sites) . Additionally, the model was compared with the results from three recent global atmospheric inversion estimates of methane emissions . In these comparisons, simulated emissions agreed relatively well with the observed emissions at some of the sites. However, there are considerable differences in seasonality and magnitude at other sites. The simulated patterns and magnitudes of annual-average methane emissions are consistent with the results from atmospheric inversion across most latitude bands. The limitations are discussed in .

Model configuration and data

Although the land model can be used interactively within CESM, we use it at specific points driven by appropriate meteorology . At each site, we forced the model with NCEP/NCAR's reanalysis atmospheric forcing data sets . These data sets include precipitation, temperature, wind speeds, and solar radiation. We also forced the model with transient atmospheric carbon dioxide concentrations, aerosol deposition data, and nitrogen deposition data that are available in CLM4.5. Please note that this model is a deterministic model, and thus will give the same answer every time it is simulated when driven by observationally based data sets as done here.

In this study we used a total of six natural wetland sites and ten rice paddy sites (see Tables and in Appendix ). We chose the wetland sites from varying geographical regions such as the tropics, mid-latitudes, and high-latitudes to account for the zonal variability. We selected the rice paddy sites such as to cover the major rice-growing regions with a focus on Asia.

The water table depth is one of the critical factors for methane emissions from natural wetlands because it determines the extent of anoxic and oxic soil zones where methane is produced and oxidized, respectively . Methane is produced in the wetlands from litter and dead vegetation remnants in anoxic conditions. The changes in the water table position also influence the moisture conditions of the soil and therefore affect the methane emissions. Here, we prescribed the measured water table position at each wetland site (except Panama) based on previous studies. Since the measured water table depths at Panama were not available, we used modeled water table positions (similar to ). For the point simulations, the methane emissions were calculated only from the saturated portion of the soil (i.e. below the water table) when the water table is below the surface. The prescribed water table depth is used in the methane model for calculating anaerobic conditions, production, and oxidation.

Most of these wetland sites usually have peat soils with varying depths underlain by mineral soil. We also forced each wetland site with measured pH and a specific plant functional type (PFT). The PFT reflects the phenological and physiological characteristics of the vegetation . Since the wetland PFT was not available in CLM4.5, we choose PFTs that are available in CLM4.5 and that closely match the specific vegetation types of the individual sites. We use C3 arctic grass for Salmisuo, C3 non-arctic grass for Alberta, Michigan, and Minnesota, and C4 grass for Florida and Panama. Other surface data required to perform the point simulation include soil color and soil texture which we extracted from the global grid data sets available in CLM4.5.

For the point simulations at the rice paddy sites we only considered the rice growing season. The flooding and drainage dates are shown in Table in Appendix . We assumed that the fields were submerged during the simulation period between initial flooding and final drainage. A common feature of these sites during the growing season is that the water was not drained until harvest. We prescribed the C3 crop PFT for all rice paddy sites, and assumed an optimal pH for the methane production whenever the pH value was not available. The dominant soil types at these sites are loam and clay. Other soil-related information such as soil color and texture are derived from the global grid data sets.

To bring the terrestrial carbon and nitrogen cycles close to steady-state conditions, we spun up both wetland and rice paddy sites for 1850 conditions (atmospheric CO2 concentrations, nitrogen deposition, aerosol deposition, and land use) driven by a repeating 25-year subset (1948–1972) of the meteorological forcing data for more than 2000 years. Then, we performed transient simulations from 1850 to the simulation starting year of each site to generate the initial conditions file.

Additionally, we conducted global simulations of methane emissions from natural wetlands for 1993–2004. For these simulations, the grid cell averaged methane emissions were considered which accounts for methane emissions from both the inundated and non-inundated portion of the grid cell. Since the CLM4.5 simulated saturated fraction (an index of inundation) was substantially greater than the estimates from satellite observations and did not match the spatio-temporal pattern of variability , we prescribed the model with inundation fraction derived from multi-satellite observations for 1993–2004 . Similar to point simulations, the global simulations were forced with NCEP/NCAR reanalysis atmospheric forcing data from 1948 to 2004 . The simulations were also spun up to steady-state conditions driven by atmospheric CO2, nitrogen deposition, aerosol deposition, and land use in the year 1850 and a repeated 25-year (1948–1972) subset of the meteorological forcing.

CH4 related parameters in CLM4.5bgc and their upper and lower bounds xku and xkl, respectively, and the default parameter values vk.

Parameter ID Parameter name

xkl

xku

1 q10ch4 1 4 1.33 2 f_ch4 0.1 0.4 0.26 3 redoxlag 15 45 30 4 oxinhib 200 600 400 5 pHmax 8 10 9 6 pHmin 2 4 2.2 7 vmax_ch4_oxid

1.25×10-6

1.25×10-4

1.25×10-5

8 k_m 0.0005 0.05 0.005 9 k_m_o2 0.002 0.2 0.002 10 q10_ch4oxid 1 4 1.9 11 k_m_unsat 0.00005 0.005 0.0005 12 vmax_oxid_unsat

1.25×10-7

1.25×10-5

1.25×10-6

13 scale_factor_aere 0.2 2 1 14 nongrassporosratio 0.2 0.5 0.33 15 porosmin 0.01 0.2 0.05 16 rob 2 4 3 17 unsat_aere_ratio 0.1 0.25 0.1667 18 vgc_max 0.05 0.3 0.15 19 scale_factor_gasdiff 1 5 1 20 atmch4

1.7×10-7

1.7×10-5

1.7×10-6

21 mino2lim 0.1 0.3 0.2

Mathematical problem formulation

The goal of our study is to improve the methane emission predictions of CLM4.5bgc by tuning the methane-related parameters such that the model better fits the observations. We use the CH4 emission observation data for the locations and observation periods shown in Tables and . Given the observation data at the M=16 locations, the goal is to minimize the root mean squared errors (RMSEs) between the CLM4.5bgc methane emission predictions and the observations at each site simultaneously. In order to tackle the problem, we formulate it such that we minimize the weighted sum of the RMSEs as follows: min⁡f(x)=∑i=1Mwiri(x) s.t. -∞<xkl≤xk≤xku<∞,k=1,…,d, where d denotes the problem dimension (the number of optimization parameters), and xkl and xku are the lower and upper bounds of variable xk, respectively. The RMSE ri(x)=1Ni∑j=1NiOi,j-Si,j(x)2,i=1,…,M, is computed for each location i. Ni is the number of observations available at location i, Oi,j denotes the jth methane emission observation at location i, and Si,j denotes the corresponding methane emission predicted by CLM4.5bgc. The weights wi are computed based on the means of the CH4 emissions at the observation locations as follows. Denote ai=1Ni∑j=1NiOi,j the mean CH4 emission at location i, i=1,…,M. The weight wi for the ith location is then defined by wi=gi∑i=1Mgi, where gi=max⁡i=1,…,Maiai, where it is assumed that ai>0 for all i. The goal is to give each location approximately equal influence in the weighted sum of RMSEs, i.e., we assign locations with large mean CH4 values small weights such that these locations have approximately the same influence on the weighted sum as locations with low emissions. Otherwise, locations with large emissions would dominate the sum () because their RMSEs would accordingly be larger. In that case the optimization would be driven by minimizing the RMSE of the site(s) with the largest emissions. There are also other methods of how wi could be determined. In the numerical experiments, we will investigate also the possibilities of assigning equal weights to each observation site and assigning weights derived from grouping the observation sites into zones. Another possibility would be to apply clustering algorithms in order to determine groups of observation sites with similar characteristics. For this possibility, however, different clustering methods and different numbers of desired clusters will lead to different groups and different weight adjustments. Lastly, the problem could be formulated as multi-objective optimization problem, for example, with 16 objectives and the goal of minimizing each observation site's RMSE individually, or as bi-objective optimization problem by minimizing the sum of the weighted RMSE values of northern and southern locations at the same time. However, each objective function evaluation is very expensive, and thus the number of evaluations that can be done to obtain the Pareto front in a multi-objective setting is limited. Our focus is on demonstrating that single objective global optimization analysis is useful in identifying reasonable parameter values.

Methane-related parameters in CLM4.5bgc and sensitivity analysis

CLM4.5bgc has 21 parameters related to the methane emission predictions. The parameter names, their upper and lower bounds, and default values are shown in Table . The upper and lower bounds have been derived based on reported values in the literature (see Table in Appendix ). How these parameters are used in the model is detailed in and and we repeat the important equations in Appendix . The default parameter values vk are available in the CLM4.5bgc (see Table ).

Optimization problems become increasingly more complex and difficult to solve as the number of parameters increases (curse of dimensionality). Thus, we determine first which of these 21 parameters are the most sensitive and thus the most important for the optimization. By sensitive we refer to parameters that when changed slightly lead to a significant change in emission predictions. Insensitive parameters, on the other hand, can be changed and do not (or comparatively only very mildly) change the emission predictions and can thus be excluded from the optimization, which decreases the problem dimension.

Parameters that are sensitive for most observation sites (out of 16).

Parameter Parameter # sensitive ID name sites 1 q10ch4 16 2 f_ch4 16 7 vmax_ch4_oxid 16 13 scale_factor_aere 16 9 k_m_o2 15 15 porosmin 14 16 rob 11 8 k_m 10 17 unsat_aere_ratio 10 10 q10_ch4oxid 9 21 mino2lim 9

We conducted analyses for each observation site in which we investigated to which of these 21 parameters the methane emission predictions of CLM4.5bgc are the most sensitive. We altered the value of each parameter k=1,…,d by, respectively, adding and subtracting 20 % of the variable range and we recorded the absolute change in emission predictions, i.e. we ran CLM4.5bgc with perturbed parameter values

(a)

xk=min⁡{vk+0.2(xku-xkl),xku},∀k=1,…,d when increasing vk for 20 %, and

(b)

xk=max⁡{vk-0.2(xku-xkl),xkl},∀k=1,…,d when decreasing vk for 20 %

for each parameter separately.

There are several parameters that are relatively important to the sensitivity test for all 16 observation sites, but there are also parameters that are important for some locations and less important for others. Tables and show the sensitive and insensitive parameters together with the number of locations (out of 16) for which these parameters are important and unimportant, respectively. Thus, in the optimization we consider only the parameters in Table since these parameters are the most important at most locations. Please note that, due to (nonlinear) relationships between the parameters, for many parameters the effects of individual parameters will be opposite but act in a similar manner, indicating that some parameters may be difficult to optimize for. In order to limit the number of parameters we consider, while allowing for the largest range in behavior, we combine information from the sensitivity study with information about the methane flux equations themselves (described in more detail in Appendix A). The most important parameters from the sensitivity study come from the dominant three terms in the methane flux equation, which are production (parameters 1, 2, and 21), oxidation (parameters 7, 8, 9, and 10), and aerenchyma transport (parameters 13, 15, 16, and 17). The first four parameters chosen are also the most important parameters at all 16 sites (see Table ). Because production is the most important term, there are two parameters from production that the sensitivity studies indicate are the most important, namely one that controls globally the methane production flux (f_ch4, parameter 2), and one term that controls the temperature dependency of the methane production (q10ch4, parameter 1). Another parameter that influences methane at all the sites comes from the oxidation equation (vmax_ch4_oxid, parameter 7), and the final parameter that is important at all 16 sites is the parameter controlling the aerenchyma transport (scale_factor_aere, parameter 13). The above four parameters are the most sensitive parameters, and thus are easy to choose, as well as cover most of the important processes we want to investigate. For the last parameter, we include one parameter that controls how inundation affects methane production (mino2lim, parameter 21). Inundation is an important process for controlling methane flux, since there is an order of magnitude more methane coming from wet areas than dry, and thus having one parameter which changes the model's sensitivity to inundation is appropriate.

Parameters that are least sensitive for observation sites (out of 16).

Parameter Parameter # insensitive ID name sites 3 redoxlag 16 4 oxinhib 16 5 pHmax 16 6 pHmin 16 14 nongrassporosratio 16 18 vgc_max 16 20 atmch4 15 11 k_m_unsat 13 19 scale_factor_gassdiff 13 12 vmax_oxid_unsat 10

Surrogate models and surrogate model algorithms Surrogate models

Surrogate models are used in optimization algorithms that aim to solve computationally expensive black-box problems. Surrogate models serve as computationally cheap approximations of the expensive simulation model , i.e., f(x)=s(x)+e(x), where f(⋅) denotes the true expensive objective function, s(⋅) denotes the computationally inexpensive surrogate model, and e(⋅) denotes the difference between both. Surrogate models are used throughout the optimization to guide the search for promising solutions. The computationally expensive objective function is evaluated only at few selected points, and thus it is possible to find near-optimal solutions with only very few expensive function evaluations.

There are different surrogate model types such as radial basis functions (RBFs) , kriging , polynomial regression models , and multivariate adaptive regression splines . There are also mixture models (also known as ensemble models) that exploit information from several different surrogate model types . In general any type of surrogate model may be used in a surrogate model optimization algorithm. In this study, we use RBFs because they have been shown to perform better in comparison to other surrogate model types .

An RBF interpolant is defined as follows: s(x)=∑ι=1nλιϕ(‖x-xι‖)+p(x), where ϕ(τ)=τ3 denotes the cubic radial basis function whose corresponding polynomial tail is linear (p(x)=b0+bTx), and xι,ι=1,…,n, denotes the points at which the objective function has already been evaluated. The parameters λι∈R,ι=1,…,n, and the parameters b0∈R and b=[b1,…,bd]∈Rd are determined by solving the following linear system of equations: ΦPPT0λc=F0, where Φιν=ϕ(‖xι-xν‖), ι,ν=1,…,n, 0 is a matrix with all entries 0 of appropriate dimension, and P=x1T1x2T1⋮⋮xnT1,λ=λ1λ2⋮λnc=b1b2⋮bdb0,F=f(x1)f(x2)⋮f(xn). The matrix in Eq. () is invertible if and only if rank(P)=d+1 .

Surrogate global optimization algorithm

Surrogate global optimization algorithms follow in general the steps shown in Algorithm .

General surrogate global optimization algorithm

Select points from the variable domain to create an initial experimental design.

Do the expensive objective function evaluations (here the CLM4.5bgc simulations) at the points selected in Step .

Fit the surrogate model (here the RBF model) to the data from Steps and .

Use the information from the surrogate model to select the new evaluation point xnew.

Do the expensive evaluation at xnew: fnew=f(xnew) (here, run CLM 4.5bgc for the parameter input vector xnew).

if Stopping criterion is not met (the maximum number of allowed function evaluations has not been reached) then

Update the surrogate model and go to Step .

else

Return the best solution found during the optimization.

end if

We use the DYCORS algorithm by for the optimization of the methane-related parameters of CLM4.5bgc. The reader is referred to this publication for the details of the algorithm. Since the parameters have significantly different ranges (see Table ), we scale all parameters to the interval [0,1] when selecting new sample sites. When doing the computationally expensive CLM4.5bgc simulations, we scale the parameters back to their original ranges. Thus, the perturbation radius used in DYCORS is the same for each variable.

We create a symmetric Latin hypercube initial experimental design with 2(d+1) points and run CLM4.5bgc at the selected parameter vectors in order to compute the objective function values. We then fit the cubic RBF model to the data and generate two sets of candidate points for the next expensive function evaluation (the next CLM4.5bgc run at the 16 sites). The first set of candidate points is generated as described by by randomly perturbing the best point found so far. The second set of candidate points is generated by uniformly selecting random points from the whole variable domain. Thus, we create twice as many candidate points as DYCORS. The goal of using uniformly random points from the whole variable domain is to obtain candidates that are far away from the best point found so far, and hence if selected as a new evaluation point, the search is more global (exploration by function evaluation at points that are far away from already sampled points).

We use the same criteria as in DYCORS for determining the best candidate point (using the RBF approximation to predict the objective function values at the candidate points, compute the distance of the candidate points to the set of already sampled points, and compute a weighted score of these two measures where the weights cycle through a predefined pattern). In order to guarantee that the matrix in Eq. () is well-conditioned, we ensure (as done in ) that the sample points are sufficiently far away from previously evaluated points by discarding candidate points that are closer than a given threshold distance to previously evaluated points. We run CLM4.5bgc at each of the 16 observation sites using the one newly selected sample point as input parameter vector to obtain the corresponding objective function value. We update the RBF model with the new data and iterate until we have reached the maximum number of allowed function evaluations.

Numerical experiments

In this section we discuss the setup and results of the numerical experiments. In a first set of experiments (pseudo data case), we generate synthetic (pseudo) data and treat it as if it were the real measurement data in order to assess how well our optimization approach performs. For these experiments we know the optimal solution. In the second set of experiments (real data case), we use the measured methane emission data and apply the optimization algorithm. The goal in the second set of experiments is to find a parameter set that reduces the objective function value (the weighted RMSE in Eq. ) from its default value (the RMSE when using CLM4.5bgc's default parameter settings, see also Table , column vk). Finally, we run CLM4.5bgc globally with the best set of parameters found during the optimization of the real data case and investigate how much the default model predictions and the model predictions with the optimized parameter values differ from each other.

We did experiments with d=5 and d=11 parameters respectively. For the d=5 experiments, we used parameters 1, 2, 7, 13, and 21 (Table ). Thus, we have parameters related to three types of CH4 emission, namely oxidation (parameter 7), aerenchyma (parameter 13), and production (parameters 1, 2, 21). For the 11-parameter optimization, we used all variables shown in Table .

For each set of experiments we ran the optimization algorithm three times in order to examine the influence of the random component in the algorithm (random initial experimental design and random generation of candidate points). We allowed 800 function evaluations for the five-dimensional problem and 1000 evaluations for the 11-dimensional problem. The question of how many function evaluations need to be performed in order to obtain a fixed level of solution accuracy is problem dependent. For computationally expensive optimization problems, such as the problem we consider here, the time for evaluating the objective function and the totally available time for obtaining a solution usually defines how many evaluations can be done with any algorithm. Results for many difficult computationally expensive optimization problems (for example, problems with multiple local minima) indicate that surrogate global optimization methods can usually obtain more accurate results compared to non-surrogate methods with the same limited number of evaluations (see, for example, ). It is a very difficult problem to find the best values of the parameters for climate models, and the more evaluations one does, in general the better the answer.

The weights wi in Eq. () were for the pseudo data case computed based on the pseudo observations (see Sect. ) at each of the 16 sites at the same dates for which we also have real measurements. For the real data case, the weights were computed based on the actual measurements. The weights are given in Table in Appendix .

Solving problem () requires running CLM4.5bgc for each input vector x of parameter values and for each of the 16 observation sites. We run CLM4.5bgc on the Yellowstone Supercomputing Facility . Each simulation at a single location takes between 15 and 30 min. We do the simulations for the 16 sites in parallel in order to speed up the objective function evaluation time.

Pseudo data case

Progress plot that shows the development of the best objective function value found vs. the number of function evaluations for the pseudo data case with d=5 parameters for optimization trials T1, T2, and T3. The legend shows the lowest RMSE value found in each trial.

We assessed the performance of the optimization algorithm by investigating how well the algorithm could find the model parameters that were used for creating the pseudo data. For this purpose, we ran CLM4.5bgc with default parameter values vk,k=1,…,d, at all 16 sites for the same time span for which we also have observation data (see Tables and in Appendix ) and we record the model's predictions for the same dates at which the methane emissions were measured. We use this as our pseudo observation data that we want to match in the optimization, i.e., the goal of the optimization is to start from a set of parameter vectors that is different from the default parameter values and to recover the default parameter values by optimization. For the default parameter values, the objective function value will be zero, which is the global minimum of the pseudo data case.

Results for <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>

Figure shows the progress plots of the three optimization trials T1, T2, and T3. Illustrated is the development of the best objective function value found within the given number of function evaluations (horizontal axis). The fewer evaluations needed for reducing the objective function value, the better. The plot shows that the objective function value is reduced significantly in each of the three trials from a value of over 30 to about 5 within less than 150 function evaluations and close to zero towards the end of the optimization. Table shows the best parameter values found during each of the three optimization trials together with the default parameter values. The table shows that the RMSE after 800 function evaluations is not exactly zero (which can be expected from an approximation method), but the default parameter values are matched closely.

Default and optimized parameter values of optimization trials T1, T2, and T3 for the five-dimensional pseudo data case. We report four decimal places because the model output is sensitive to very small changes for some variables. Note that we scaled the numbers to the interval [0,1].

Param. Default T1 T2 T3 1 0.1100 0.1088 0.1099 0.1091 2 0.5333 0.5366 0.5385 0.5458 7 0.0909 0.0912 0.0943 0.0967 13 0.4444 0.4461 0.4454 0.4443 21 0.5000 0.4936 0.4934 0.4856 RMSE 0 0.28 0.46 0.40

Results for <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:math></inline-formula>

Figure shows the objective function value development as the number of function evaluations increases for the 11-dimensional case for the three trials T1, T2, and T3. The figure shows a rapid decrease of the objective function value from over 50 to less than 10 within 100 evaluations, which shows that the surrogate model algorithm is very efficient at finding improved solutions. Although the objective function value improvement over the following function evaluations is lower, we can see that the algorithm still makes progress and if we allowed more than 1000 evaluations, the objective function value would be further improved (which also follows from the global convergence property of the DYCORS algorithm).

Table shows the parameter values of the best of the three trials (T3) together with the default parameter values and the variable vector CP that was evaluated during the optimization and that has a worse objective function value than the best solution, but that is closer to the default parameter values. This point has the same parameter values as T3 for all but two parameters, namely, parameters 10 (q10_ch4oxid) and 21 (mino2lim), which we indicate by bold numbers. For these two parameters, the point CP is closer to the global optimum, but it has a worse objective function value. This indicates a multimodality of the objective function (getting closer to the true global minimum requires an increase in the objective function value, i.e., the algorithm has to escape from a local basin of attraction). This multimodality makes the search for the global optimum significantly more difficult.

Progress plot that shows the development of the best objective function value found vs. the number of function evaluations for the pseudo data case with d=11 parameters for optimization trials T1, T2, and T3. The legend shows the lowest RMSE value found in each trial.

Default and optimized parameter values of optimization trial T3 and parameter values for the point CP that was sampled during the same optimization trial and that is closer to the default point, but that has a worse objective function value (11-dimensional pseudo data case). Bold numbers indicate the parameters for which CP is closer to the default value than T3 (but CP has a worse objective function value).

Param. Default T3 CP 1 0.1100 0.1148 0.1148 2 0.5333 0.5806 0.5806 7 0.0909 0.1336 0.1336 8 0.0909 0.1785 0.1785 9 0.0909 0.1248 0.1248 10 0.3000 0.4375 0.4302 13 0.4444 0.7107 0.7107 15 0.2105 0.1778 0.1778 16 0.5000 0.9583 0.9583 17 0.4444 0.2740 0.2740 21 0.5000 0.4436 0.4583 RMSE 0 2.28 2.35

In order to examine the impact of the differences between default and optimized parameter values on the model prediction, we use the best parameter vector of each trial and plot the corresponding CH4 emission predictions against the predictions when using the default parameter values in Figure . We can see that although we do not exactly match the default parameter values, the model's predictions when using the optimized parameters are very close to the predictions when using the default parameter values (all points in the scatter plot lie close to or on the dashed line which represents agreement of default and optimized predictions). As also reflected in the best RMSE value reported in the legend, T3 matches the default data best and T2 has the largest differences.

CLM4.5bgc CH4 predictions when using the default parameter values vs. the predictions when using the best solution found in each of the three optimization trials T1, T2, and T3, respectively, for the pseudo data case with d=11 parameters. The legend shows the lowest RMSE value found in each trial.

This result indicates that the calibration problem is not “identifiable” for all parameter sets, indicating that more than one parameter set can give a very similar result in terms of the objective function value. For example, for the model y=αβx+γ, there are many combinations of values for α and β that lead to the same value of y as long as α=κβ for some constant κ. With only five parameters as described in the previous section, the parameter values obtained from the optimization did match very closely with those of the default case used to create the pseudo data, and thus with this small set of parameters the problem was identifiable. However, for 11 parameters, we did encounter the identifiability problem. In some disciplines such parameters are called “hidden”. For example, estimating α and γ in the previous example with y=αβx+γ when β is given would be identifiable. However, estimating α, β, and γ is no longer identifiable.

It would be desirable to have an identifiable model, but the CLM (and probably other climate modules) have a number of interacting parameters and multiplicative nonlinearities, and thus there is no guarantee that all parameters are identifiable. This is reinforced by the data in Table , which indicates that the surface over which the optimization algorithm searches in the 11 parameter case is multi-modal, i.e., there are multiple local minima and it is possible for two (or more) parameter sets to yield the same objective function value (here RMSE). Hence the inability of the optimization to find the exact set of parameters that was used for generating the pseudo data is a problem caused by the complexity and multiplicative nonlinearities of the CLM model, not by the choice of the optimization method. However, the optimization analysis for both pseudo data cases (with 5 and 11 parameters, respectively) shows that the chosen optimization method is able to find a set of parameter values that has a low prediction error. The multi-modality in Table does indicate the need for a global (not a local) optimization method.

Progress plot that shows the development of the best objective function value found vs. the number of function evaluations for the real data case with d=5 parameters for optimization trials T1, T2, and T3. The legend shows the lowest RMSE value found in each trial. The first function evaluation (left side of the graphs) corresponds to the RMSE when using the default parameters.

Real data case

In the real data case, we use the actual methane emission measurements at each of the 16 observation sites for computing the objective function value. Since we only have very few observations for each site and no information about measurement errors, we did not exclude any of the measurements from the optimization although there might be outliers. Also for the real data case we examine the case for d=5 and d=11 variables.

Results for <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>

The progress of the development of the objective function value for the three trials T1, T2, and T3, respectively, is illustrated in Fig. which also shows in the legend the lowest RMSE value found in each of the three trials. The RMSE was efficiently reduced from over 155 to below 115 within the first 150 function evaluations. Thereafter the objective function value improvement was at a significantly lower rate. All three trials return a solution with approximately the same objective function value.

The parameter values of the best solutions found in the three trials are shown in Table where also the default parameter values are given for comparison. We can see that the three optimized solutions are approximately the same and significantly different from the default case. We can also see that three of the five optimized parameter values are on or very close to the boundary of the variable domain (shown in bold), indicating that improvements of the objective function value may be possible by increasing the parameter range. However, it is not possible due to physical constraints and at this point, we do not have information about possible wider parameter ranges than the ones we used in this study.

Default and optimized parameter values of optimization trials T1, T2, and T3 for the five-dimensional real data case. Bold indicates optimized parameters that are on (or close to) the variable boundary (all variables are scaled to [0,1]).

Param. Default T1 T2 T3 1 0.1100 0 0 0 2 0.5333 0.1705 0.1747 0.1699 7 0.0909 0.7878 0.7518 0.7865 13 0.4444 0 0 0.0267 21 0.5000 1 1 1 RMSE 156.40 114.24 114.11 114.24

CH4 emission observations and predictions when using the optimized parameters of optimization trials T1, T2, and T3, respectively, and when using the default parameters for the wetland site Alberta, Canada, for the real data case with d=5 parameters. The legend shows the lowest RMSE value found in each trial.

Figures and show the CH4 emission predictions of CLM4.5bgc when using the default and the optimized parameter values for two selected observation sites (one wetland and one rice paddy site) together with the actual observation data. The legends show the associated RMSE value before applying the weights for computing (Eq. ). We can see that the optimized solution actually worsens the predictions for Alberta (the RMSE value with default parameters is about 209 and with optimized parameters, the value is about 221, which is about 6 % worse). For Central Java, on the other hand, the RMSE values of the optimized solutions are significantly better than for the default values (the default RMSE is about 221 and the optimized RMSE values are about 48, which is an improvement of over 350 %). In both figures we can also see that despite the large differences between optimized and default parameter values, the trend in the predictions of CLM4.5bgc is the same, i.e., when the predicted CH4 emissions with default parameters increase so do the predicted emissions when using the optimized parameters and vice versa.

CH4 emission observations and predictions when using the optimized parameters of optimization trials T1, T2, and T3, respectively, and when using the default parameters for the rice paddy site Central Java, Indonesia, for the real data case with d=5 parameters. The legend shows the lowest RMSE value found in each trial.

Progress plot that shows the development of the best objective function value found vs. the number of function evaluations for the real data case with d=11 parameters for optimization trials T1, T2, and T3. The legend shows the lowest RMSE value found in each trial. The first function evaluation (left side of the graphs) corresponds to the RMSE when using the default parameters.

Results for <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:math></inline-formula>

Figure shows the progress plots for each of the three trials together with the best objective function values found (legend) for the 11-dimensional case. The best objective function value found is about equal for each of the three trials. The figure shows that in each trial the algorithm is able to efficiently reduce the objective function value within the first 200 function evaluations. The improvement after 200 function evaluations is significantly slower.

Default and optimized parameter values of optimization trials T1, T2, and T3 for the 11-dimensional real data case. Bold indicates optimized parameters that are on the variable bound (all variables are scaled to [0,1]).

Param. Default T1 T2 T3 1 0.1100 0 0 0 2 0.5 333 0.4220 0.3298 0.3813 7 0.0909 0.7093 0.6889 0.7260 8 0.0909 1 1 0.9754 9 0.0909 0 0.2335 0.6971 10 0.3000 0.7702 0.6195 0.6195 13 0.4444 1 1 0.8063 15 0.2105 0.6987 1 1 16 0.5000 0.0865 0.4274 0.2473 17 0.4444 0.8543 0.3113 0.5359 21 0.5000 0.5064 0.7449 0.5586 RMSE 164.46 107.24 107.58 107.41

Table shows the parameter values of the best solution found in each of the three trials and the default parameter values. The table shows that for some parameters, for example, parameters 1, 7, and 8, all trials lead to approximately the same values (which are different from the default parameter values). For the remaining parameters, the values corresponding to the best solution found differ significantly for each trial and differ also from the default parameter values. Also for the 11-dimensional problem, some parameter values corresponding to the best solution found are on the upper or lower boundary of the parameter range (for example, parameters 1, 8, 13, 15, indicated in bold).

Since all three solutions have approximately the same objective function values, but the points differ greatly, it is an indicator that we either have a multi-modal surface in which some minima assume approximately the same objective function values, or we have a very flat valley in which many points assume similar objective function values. Both possibilities make it very difficult for gradient-based optimization algorithms to find the global optimum. In the first case, the optimization algorithm will get trapped in a local optimum if it is not started close to the global minimum. In the second case, the gradient-based algorithm would require many function evaluations because many steps and gradient computations are necessary due to a very small step size. The surrogate optimization algorithm overcomes this problem.

CH4 emission observations and predictions when using the optimized parameters of optimization trials T1, T2, and T3, respectively, and when using the default parameters for the wetland site Alberta, Canada, for the real data case with d=11 parameters. The legend shows the lowest RMSE value found in each trial.

CH4 emission observations and predictions when using the optimized parameters of optimization trials T1, T2, and T3, respectively, and when using the default parameters for the rice paddy site Central Java, Indonesia, for the real data case with d=11 parameters. The legend shows the lowest RMSE value found in each trial.

Table shows the unweighted RMSE values (before applying the weights in Eq. () for computing the objective function value) between observations and simulations using the default parameters (column 5), the best parameters of optimization trial T1 of the 11-dimensional case (column 4), and the best parameters of trial T2 of the 5-dimensional case, respectively. The table shows that with our optimization we were able to decrease the default RMSE for four sites in the 5-dimensional case and for six sites in the 11-dimensional case. The RMSE is lower at seven sites for the 11-dimensional case than for the 5-dimensional case. Since we minimized a weighted sum of all RMSE values, it can be expected that the RMSE at some locations may be worse for the optimized case than for the default case. We can see that for two of the improved sites (Java and Cuttack), the improvement is very large, and thus the overall RMSE of the optimized solution is lower than for the default parameters.

Unweighted RMSE values for each site using the best parameters found during optimization trial T1 of the d=11 real data case and trial T2 of the d=5 real data case and with default parameter values.

Site Name Unweighted RMSE Unweighted RMSE Unweighted RMSE

d=5

d=11

default 1 Alberta 220.34 203.82 209.25 2 Florida 1247.70 1280.29 1180.99 3 Michigan 334.01 337.51 328.10 4 Minnesota 41.05 35.16 34.31 5 Nanjing 97.88 96.14 212.18 6 Vercelli 325.34 326.04 293.36 7 Texas 179.21 139.09 116.85 8 Japan 132.31 161.22 184.88 9 California 372.71 374.59 360.37 10 New Delhi 18.67 19.96 14.21 11 Beijing 66.79 60.89 56.99 12 Java 49.09 54.61 221.52 13 Chengdu 231.93 241.91 198.42 14 Cuttack 72.01 63.75 364.75 15 Panama 446.83 464.59 422.86 16 Salmisuo 156.79 132.16 146.52 Total RMSE 3792.66 3991.73 4345.56

CH4 emission observations and predictions when using the optimized parameters of optimization trials T1, T2, and T3, respectively, and when using the default parameters for the wetland site Salmisuo, Finland, for the real data case with d=11 parameters. The legend shows the lowest RMSE value found in each trial.

Figures and show the observed CH4 emissions, the predictions with the default parameter values, and the predictions using the optimized parameter values for Alberta (Canada) and Central Java (Indonesia). For both sites we can see that the predictions with the optimized parameters have lower RMSEs than when using the default parameter values (note that the reported RMSEs in the legend are not weighted as done in Eq. ). For Central Java, for example, the optimized parameters greatly improved the model's predictions, but we can also see that the temporal variability in the predictions stays the same although not as pronounced. We noticed this “temporal variability preserving” behavior for several sites such as Beijing, California, Cuttack, New Delhi, Florida, Japan, Michigan, Minnesota, Salmisuo, Texas, and Vercelli. Compared to the case where we optimized only five parameters, the solution for Alberta has improved and the RMSE values for all three trials are for the d=11 case better than the default RMSE value. On the other hand, the solution for Central Java is worse for T1 in the d=11 case than in the d=5 case.

Scatterplot showing the mean values of the CH4 predictions using the default and optimized parameter values of trials T1, T2, and T3, respectively, vs. the mean values of the observations. The numbers in the legend show the best RMSE value corresponding to each trial. The numbers above/below the boxes indicate the observation site ID (1: Alberta, 2: Florida, 3: Michigan, 4: Minnesota, 5: Nanjing, 6: Vercelli, 7: Texas, 8: Japan, 9: California, 10: New Delhi, 11: Beijing, 12: Central Java, 13: Chengdu, 14: Cuttack, 15: Panama, 16: Salmisuo).

Average methane emissions (mg CH4 m-2 d-1) simulated by CLM4.5bgc for (a) default parameters, (b) differences between default parameters and 11-dimensional optimization trial T1, (c) differences between default parameters and optimization trial with unweighted sum of RMSE, and (d) differences between default parameters and optimization trial with zonally weighted sum of RMSE. Zonal means are shown on the right side of each spatial plot.

The temporal variability in the model's predictions does not necessarily follow the temporal variability in the observation data (see, for example, Fig. ). Note that in Fig. the temporal variability is the same for each of the three trials although the best solutions found in the three trials were very different (see Table ). Thus, it seems that the improvement of the model's predictions is restricted by an underlying model component that enforces the temporal variability. This is likely to be associated with structural errors either in the methane or in the carbon model. Notice that the methane emission is dependent on the temporal variability predicted in the carbon and land model, especially on the heterotrophic respiration rate, which could have the wrong magnitude or temporal evolution.

Figure shows a scatter plot of the mean values of the CH4 predictions using default and optimized parameter values vs. the mean values of the observed CH4 emissions. Ideally, if the simulated emissions agreed with the observations, all points would lie on the dashed line. Thus, the closer a point to the dashed line, the more simulation and observation are in agreement. The figure shows that with the optimized parameters, we obtain better or similar results for Beijing, Cuttack, Minnesota, Central Java, Nanjing, Japan, Salmisuo, Alberta, and Michigan. Although not all sites have been strictly improved by the optimization, the overall RMSE has been improved (indicated in the legend).

Figure also shows that with default parameters, CLM4.5bgc predicts less CH4 emissions than observed for both observation sites in the northern latitudes (Alberta, ID = 1, and Salmisuo, ID = 16), which is corrected by the optimization such that the mean emissions at these sites are closer to the dashed line. Thus, based on the observation data, CLM4.5bgc with default parameters does not predict enough emissions in the northern latitudes. On the other hand, CLM4.5bgc over-predicts the emissions for four locations, namely Cuttack (ID =14), Central Java (ID =12), Nanjing (ID =5), and Japan (ID =8), which are located in the tropical and/or subtropical zone. For those four locations, the predictions with the optimized parameters are closer in agreement with the observations. Hence, the observation data force the model predictions to increase in the northern latitudes and to decrease in the tropics. This can also be seen in Figs. and in the following section where we simulated the model globally and compared default and optimized model predictions for the individual zones (discussed below).

Gobal CH<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:math></inline-formula> emission simulations

We simulated CLM4.5bgc to obtain predictions for the CH4 emissions on a global scale and compared the predictions when using the default parameter values and the optimized parameter values from the 11-dimensional cases. Figure shows spatial plots of the average methane emissions (mg CH4 m-2 d-1) and the zonal means (right hand side of the plots) when using the default parameters (panel a), and the difference between the predictions when using the default and the optimized parameters for trial T1 (panel b). The figure shows that with the optimized parameters, the CH4 emission predictions in the northern regions are larger than for the default parameters. For the tropics, the predictions with the optimized parameters are lower than when using the default values.

Comparison of total methane emissions (Tg CH4 yr-1) between CLM4.5bgc and other models from natural wetlands. 1: , 2: , 3: , 4: , 5: , 6: , 7: , 8: , 9: CLM4Me, , 10: CLM4Me', , 11: this study, CLM4.5bgc with default parameters, 12: this study, CLM4.5bgc with d=11 optimized parameters of T1, 13: this study, CLM4.5bgc with d=11 optimized parameters of unweighted sum of RMSE, and 14: this study, CLM4.5bgc with d=11 optimized parameters of zonally weighted RMSE. Note that number 7 is a top-down approach and number 9 may include the rice paddy emissions. For number 8, no data were available for the tropics and the temperate zone.

Figure shows a comparison of the CH4 emission predictions from several different models (models 1–10). We can see that globally the predictions with the optimized parameters (model 12) were only slightly higher than with the default parameters (model 11). However, the predictions of CH4 emissions in the tropics are significantly lower than for the default model and the predictions are also lower in comparison to all other models (1–10). On the other hand, for the northern latitudes, CLM4.5bgc with optimized parameters predicts significantly more CH4 emissions than the default model and models 1–10 in the comparison. Hence, even though the global average of predicted emissions did not change much, the distribution of the predicted emissions between the tropical and the northern latitudes changed significantly.

As indicated in the previous section, the observation data drive the model to predict more CH4 emissions in northern latitudes and fewer emissions in the tropics. We investigated whether our weighting scheme in Eq. () may give too much influence to individual observation sites or zones. Thus, we did an additional optimization trial of the parameters in Table where we give each observation site the same weight wi=1, i=1,…,16 (“unweighted”). We also did a second additional optimization trial of the parameters in Table where we give each zone the same influence on the total RMSE in order to account for the location of the various observation sites (“zonally weighted”). Thus, each location in the temperate zone (12 sites totally) has wi=1/36, and each location in the northern (2 sites) and tropical (2 sites) zone, respectively, has the weight wi=1/6.

The spatial plots of the differences between the average methane emissions when using default and optimized parameters for the unweighted trial are shown in panel c of Fig. , and the spatial plots of the differences when using the zonally weighted objective function is shown in panel d of Fig. . The figures show that for both additional trials, the CH4 emissions in the northern latitudes are even further increased. Moreover, the bars for models 13 and 14 in Fig. show the total methane emissions of the unweighted and the zonally weighted trials, respectively. The zonally weighted trial increases the global emissions, which is caused by larger emission predictions in the temperate zone and the northern latitudes. In comparison to the default CLM4.5bgc predictions, the unweighted trial shows a decrease in the predicted emissions in the tropics and an increase in the predicted emissions in the northern latitudes. Thus, even though it is suggested that CLM4.5bgc with default parameter settings over-predicts the CH4 emissions in high latitudes , the observation data argue that the predictions should even be increased.

Conclusions

In this paper we used a surrogate optimization approach for calibrating the parameters of the methane module of the Community Land Model (CLM4.5bgc). Given only relatively few measurements at 16 observation sites (wetlands and rice paddies) our goal was to explore the use of a surrogate optimization method to improve the model prediction capability in a computationally efficient way by minimizing the root mean squared error between the measurements and the model's predictions. We identified important methane-related parameters in CLM4.5bgc by doing a sensitivity analysis and we were thus able to reduce the problem dimension from 21 to 11. We then used a surrogate optimization approach for tuning the most important parameters in order to solve the problem. We investigated two cases, namely a problem with five of the most important parameters and a problem with all 11 parameters, respectively.

We first used pseudo data in order to asses how well the surrogate optimization performs and showed that we are able to closely match the pseudo observations. We were able to reduce the RMSE to less than a fifth within the first 150 function evaluations for both pseudo data cases. The objective function was shown to have multiple local minima, which indicates that the problem is probably not identifiable when 11 parameters were optimized. Although the RMSE was greatly reduced by the optimization for the 11 parameter pseudo data case, the optimization results did not generate the same values of the parameters in some cases as were used to generate the pseudo data. This is a problem with the model, not with the optimization method used. The multiple local minima detected in Table indicate that a global optimization method was needed. We used a surrogate global optimization method because the objective function was expensive to evaluate and has multiple local minima. The surrogate has been shown to reduce the number of objective function evaluations (e.g. climate model simulations) required to obtain accurate approximations of the global minimum and so it is designed for computationally expensive models like climate modules.

By conducting the simulations globally and comparing the average predicted emissions with default and optimized parameters, we could show that the total global CH4 emissions did not change significantly.

However, the distribution of the predicted emissions between latitudes changed significantly. The observation data force the optimized model's CH4 emission predictions in the northern latitudes to increase and the predicted emissions in the tropics to decrease. In comparison to other models, CLM4.5bgc with both default and optimized parameters predicts significantly more emissions in the northern latitudes and less emissions in the tropics.

Model equations

The methane biogeochemical model used in this study is integrated in the Community Land Model version 4.5 (CLM4.5), which is the land component of the Community Earth System Model (CESM, ). As discussed in more detail in and , the model represents five primary processes relevant to methane emission predictions. These processes include methane production (P), oxidation (Roxic), ebullition (E), transport through wetland plant aerenchyma (A), and diffusion through soil (FDe) (described below). The methane gas and aqueous phase concentrations (RC) in each soil layer of each grid box is calculated at every time point using the following equation: ∂RC∂t=∂FDe∂z+P-E+A-Roxic.

In the following sections we consider each of these terms in more detail.

Methane production

Methane production (P) in the anaerobic portion of the soil column is related to the grid cell estimate of heterotrophic respiration from soil and litter corrected for various factors: P=RH×f_ch4×q10ch4×fpHfpES, where RH is the heterotrophic respiration from soil and litter (mol C m-2 s-1), and f_ch4 is the baseline fraction of anaerobically mineralized C atoms becoming CH4 (i.e., CO2/CH4). RH is corrected for its soil temperature dependence through a Q10 factor (q10ch4), pH (fpH), redox potential (fpE), and a factor accounting for the seasonal inundation fraction (S).

We adjusted the fractional inundation in each grid cell to account for a changing redox potential. fpE=filag(t)fi(t), where the redox potential factor fpE is computed based on the fractional inundation fi(t) and the adjusted fractional inundation filag(t) that is producing methane.

The adjusted fractional inundation filag(t) is computed as filag(t)=fi(t)-fredox(t), where fredox(t)=fi(t)-fi(t-1)+fredox(t-1)1-Δtredoxlag is the fraction of the grid cell where alternative electron acceptors (such as O2, NO3-, Fe+3, SO42- etc.) are consumed (methane production is completely inhibited), Δt is the time step, and redoxlag is the time constant parameter.

In the non-inundated fraction of a grid cell, we estimated the delay in methane production as the water table depth increases by estimating an effective depth below which CH4 production can occur (Zilag): Zilag(t)=Zi(t)-Zredox(t), where Zredox(t)=Zi(t)-Zi(t-1)+Zredox(t-1)1-Δtredoxlag is the depth of the saturated water layer where alternative electron acceptors are consumed at time t and Zi(t) is the actual water depth at time t.

Additionally, we constrained the methane production using the soil pH function fpH which is represented as fpH=10-0.2335pH2+2.7727pH-8.6, where pH represents the soil pH. fpH is bounded by two parameters, namely pHmin and pHmax (i.e., pHmin < pH< pHmax). The maximum methane production occurs at pH ≈ 6.2.

We used a scaling factor (S) to mimic the impacts of seasonal inundation on methane production which is represented as S=mino2lim(f-f‾)+f‾f,S≤1, where f and f‾ are the instantaneous inundation fraction and annual average inundation fraction weighted by heterotrophic respiration, mino2lim is the anoxia factor that relates the fully anoxic decomposition rate to the fully oxygen-unlimited decomposition rate.

Methane oxidation

Methane oxidation (Roxic) is represented with double Michaelis-Menten kinetics: Roxic=vmax_ch4_oxidCCH4k_m+CCH4CO2k_m_o2+CO2q10_ch4oxid×Fϑ, where vmax_ch4_oxid is the maximum oxidation rate (mol m-3 s-1), q10_ch4oxid is the temperature dependence of the reaction, k_m and k_m_o2 are the half saturation coefficients with respect to CH4 and O2 concentrations (mol m-3), CCH4 and CO2 are the methane and oxygen concentrations in the soil (mol m-3), and Fϑ is the soil moisture limitation factor for oxidation applied above the water table to represent water stress for methanotrophs.

Fϑ is represented as: Fϑ=exp⁡-PPC, where P and PC are the soil moisture potential and optimum water potential (-2.4×105 mm). If the soil layer is above the water table, the soil moisture limitation factor Fϑ is applied. To account for high-CH4-affinity methanotrophs in upland soils, we used a lower oxidation rate constant (vmax_oxid_unsat) and half saturation coefficient with respect to CH4 concentrations (k_m_unsat).

Methane transport through plant aerenchyma

The diffusive transport through aerenchyma A (mol m-2 s-1) from each soil layer is represented in the model as: A=C(z)-Cara+rob×zDpTρf, where D is the free-air gas diffusion coefficient (m2 s-1), C(z) and Ca are the gaseous concentrations at depth z and at the atmosphere (mol m-3), ra is the aerodynamic resistance between the surface and the atmospheric reference height (s m-1), rob is the ratio of root length to vertical depth (obliquity), p is the porosity, T is the specific aerenchyma area (m2 m-2), and ρf is the root density as a function of depth. Oxygen concentrations can also diffuse into the soil layer from the atmosphere via the reverse of the CH4 pathway.

Here, aerenchyma porosity is parameterized based on the plant functional types (PFTs). A ratio is used to multiply upland vegetation aerenchyma porosity by comparing to inundated systems: p=p×unsat_aere_ratio

If the PFT is c3_arctic_grass, c3_nonarctic_grass, or c4_grass, then p=0.3. For the remaining PFTs, the porosity is multiplied by nongrassporosratio (ratio of root porosity in non-grass to grass): p=p×nongrassporosratio.

A minimum aerenchyma porosity is set to 0.05. Therefore, p is modified as: p=max⁡{p,porosmin}.

The aerenchyma area varies over the course of the growing season. Therefore, it is parameterized using the simulated leaf area index as T=fNNaL0.22πR2, where L is the leaf area index (m2 m-2) (used from CLM4.5 model simulation), Na is the maximum annual net primary production (NPP, mol m-2 s-1), R is the aerenchyma radius (2.9 × 10-3 m), and fN is the below-ground fraction of the current NPP.

The aerenchyma area T is multiplied by a scale factor to adjust it: T=T×scale_factor_aere. The default value is 1.

Methane ebullition

The representation of the ebullition fluxes in the methane model is based on . The simulated aqueous CH4 concentration in each soil level is used to estimate the expected equilibrium gaseous partial pressure as a function of temperature and pressure. When this partial pressure exceeds vgc_max, bubbling occurs to remove CH4 to below this value, modified by the fraction of CH4 in the bubbles (taken as 57 %). The vgc_max parameter is the ratio of saturation pressure triggering ebullition.

Aqueous and gaseous diffusion

Gaseous diffusivity in the soil depends on several factors such as molecular diffusivity, soil structure, porosity, and organic matter content. The relationship between effective diffusivity (De, m2 s-1) and soil properties is represented as De=D0θa2θaθs3b×scale_factor_gassdiff, where θa and θs are the air-filled and saturated water-filled porosity, b is the slope of the water retention curve, and scale_factor_gassdiff is the scale factor for the gas diffusion (the default value is 1).

Observation sites

Tables and show the information about the wetland and rice paddy observation sites, respectively, where methane emissions have been measured.

Wetland site data. P= precipitation, T= temperature.

Site Name Location Time Wetland type Dominant vegetation Mean P & T Soil and climate characteristics Meas. techniques Forcing data sets∗ References Michigan, USA 42.45∘ N, 84.00∘ W 1991–1993 Ombrotrophic peatland Sphagnum, Scheuchzeria palustris, Vaccinium oxycoccos P: 761 mm (1948–1980) Soil pH: 4.2 Static chamber Measured WT positions Minnesota, USA 47.53∘ N, 266.33∘ E 1991–1992 Poorly minerotrophic to ombrotrophic peatland Sphagnum, Chamaedaphne calyculata, Scheuchzeria palustris P: 553 mm, T: ≈ 13.6 ∘C for May–October period Soil pH: 4.6 Eddy correlation technique Measured WT positions Alberta, Canada 54.60∘ N, 246.60∘ E 1994–1996 Nutrient rich fen Carex aquatilis and Carex rostrata – Soil pH: 7; the freeze-thaw cycle spans from May to Oct Open chamber Measured WT positions Salmisuo, Eastern Finland 62.75∘ N, 30.93∘ E 1993 Minerogenic, oligotrophic pine fen Sphagnum papillosum T: ≈ 10 ∘C Wet condition from Jul to Sept Static chamber Measured WT positions Florida, USA 30.07∘ N, 275.80∘ E 1993 Swamp Sagittaria lancifolia Annual P: ≈ 1400 mm Soil pH: 6.2 Open chamber Measured WT positions Panama 9.00∘ N, 80.00∘ E 1987 Swamp Palms – Soil pH: 6.2; Feb to May is the dry season Static chamber Modeled WT positions

∗ All sites use NCEP atmospheric forcing; P is precipitation; T is temperature; WT is water table.

Rice paddy site data.

Site Name Location Year Date of Field flooded Date of final drainage pH Measurement techniques Soil type References Texas, USA 29.95∘ N, 265.50∘ E 1994 17 May 11 Aug N/A Chamber Bernard-Morey Vercelli, Italy 45.30∘ N, 8.42∘ E 1991 7 May 30 Aug 6 Static (closed) chamber Sandy loam Chengdu, China 31.27∘ N, 105.45∘ E 2003 9 May 7 Sep 8.1 Chamber Purplish Nanjing, China 32.80∘ N, 118.75∘ E 1999 18 Jun 13 Oct N/A Chamber Hydromorphic Beijing, China 40.55∘ N, 116.78∘ E 1995 4 Jun 17 Oct 7.99 Automatic chamber Silty clay loam California, USA 40.20∘ N, 237.98∘ E 1982 & 1983 11 May (1982); 21 May (1983) 2 Oct (1982); 1 Oct (1983) N/A Static chamber Capay silty clay Japan 36.02∘ N, 140.22∘ E 1991 & 1993 7 May 12 Aug (1991); 2 Sept (1993) 6.6–6.9 Automatic chamber Gley soil (Sandy clay loam) New Delhi, India 28.63∘ N, 77.12∘ E 1995 1 Jul 1 Nov 8.2 Closed chamber, manual Ustochrept (sandy loam) Cuttack, India 20.42∘ N, 85.92∘ E 1996 19 Jul 30 Oct 6.19 Automatic chamber Haplaquept (Alluvial) Central Java, Indonesia 6.78∘ S, 110.15∘ E 2001–2002 1 Nov 28 Feb 5.1 Automatic closed chamber Aeric Tropaquept (Silty loam)

Parameters and references for bounds

Table shows the CH4 related parameters in CLM4.5bgc and their literature reference information.

Parameter names, descriptions, ranges, and literature references.

Number Parameter Description Units Range References 1 q10ch4 Q10 for methane production unitless 1–10 2 f_ch4 Ratio of CH4 production to total C mineralization unitless 0.05–0.5 , Effective value will depend on temperature, redox and pH but cannot exceed 50 % based on stoichiometry (Bill Riley, personal communication) 3 redoxlag Number of days to lag for production days 15–45 4 oxinhib Inhibition of methane production by oxygen m3 mol-1 200–600 5 pHmax Maximum pH for methane production unitless 8–10 6 pHmin Minimum pH for methane production unitless 2–4 7 vmax_ch4_oxid Oxidation rate constant mol m-3-w/s 1.25×10-6–1.25×10-4 8 k_m Michaelis-Menten oxidation rate constant for CH4 conc. mol m-3-w 5×10-4–5×10-2 9 k_m_o2 Michaelis-Menten oxidation rate constant for O2 conc. mol m-3-w 0.002–0.2 10 q10_ch4oxid Q10 oxidation constant unitless 1–4 11 k_m_unsat Michaelis-Menten oxidation rate constant for CH4 conc. in upland areas mol m-3-w 5×10-5–5×10-3 12 vmax_oxid_unsat Oxidation rate constant in upland areas mol m-3-w/s 1.25×10-7–1.25×10-5 13 scale_factor_aere Scale factor on the aerenchyma area unitless 0.2–5 14 nongrassporosratio Ratio of root porosity in non-grass to grass unitless 0.2–0.5 15 porosmin Minimum aerenchyma porosity unitless 0.01–0.2 16 rob Ratio of root length to vertical depth (“root obliquity”) unitless 2–4 . This parameter is poorly constrained. 17 unsat_aere_ratio Ratio to multiply upland vegetation aerenchyma porosity by compared to inundated systems unitless 0.1–0.25 Not available in literature. The reasonable range could be between 0.1 and 0.25. used this range for sensitivity. 18 vgc_max Ratio of saturation pressure triggering ebullition unitless 0.05–0.3 19 scale_factor_gasdiff Scale factor for gas diffusion unitless 1–5 Range not available. Reasonable range is 1–5 for sensitivity analyses. 20 atmch4 Atm. CH4 mixing ratio to prescribe mol mol-1 1.7×10-7–1.7×10-5 Range not available. Variable range; global average is

≈1.7×10-6

21 mino2lim Min. anaerobic decomposition rate as a fraction of potential aerobic rate unitless 0.05–0.45 Range not available in the literature. The default value (0.2) is from . The reasonable range could be between 0.05 and 0.45 to adjust effect of anoxia on decomposition rate (used to calculate seasonal inundation factor). The range is considered based on knowledge.

Weights used for RMSE computation in Eq. (1) of the paper

Table contains information about the weights used for each observation site when computing the objective function value.

ID, name of observation sites, and associated weights for real data and pseudo data case (Eq. 1 of the main document).

ID Location wi real data wi pseudo data 1 Alberta 0.0327 0.0656 2 Florida 0.0078 0.0067 3 Michigan 0.0280 0.1599 4 Minnesota 0.0938 0.0783 5 Nanjing 0.0566 0.0149 6 Vercelli 0.0198 0.0382 7 Texas 0.0267 0.0189 8 Japan 0.0441 0.0153 9 California 0.0421 0.0684 10 New Delhi 0.2787 0.1707 11 Beijing 0.1053 0.1189 12 Central Java 0.0810 0.0143 13 Chengdu 0.0283 0.0571 14 Cuttack 0.0968 0.0104 15 Panama Swamp 0.0177 0.0795 16 Salmisuo 0.0405 0.0827

Acknowledgements

The authors want to acknowledge the funding sources DOE SciDAC DE-SC0006791, NSF 1049031, NSF 1049033, and NSF CISE 1116298. The first author also wants to acknowledge partial support by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under contract number DE-AC02005CH11231. We thank the anonymous reviewers for their helpful comments and improvement suggestions. Edited by: A. Sandu

References Adhya et al.(2000)Adhya, Bharati, Mohanty, Ramakrishnan, Rao, Sethunathan, and Wassmann

Adhya, T., Bharati, K., Mohanty, S., Ramakrishnan, B., Rao, V., Sethunathan, N., and Wassmann, R.: Methane emission from rice fields at Cuttack, India, Nutr. Cycl. Agroecosys., 58, 95–105, 2000.

Aleman et al.(2009)Aleman, Romeijn, and Dempsey

Aleman, D., Romeijn, H., and Dempsey, J.: A response surface approach to beam orientation optimization in intensity modulated radiation therapy treatment planning, INFORMS J. Comput., 21, 62–76, 2009.

Arah and Stephen(1998)

Arah, J. and Stephen, K.: A model of the processes leading to methane emission from peatland, Atmos. Environ., 32, 3257–3264, 1998.

Aselmann and Crutzen(1989)

Aselmann, I. and Crutzen, P.: Global distribution of natural fresh-water wetlands and rice paddies, their net primary productivity, seasonality and possible methane emsissions, J. Atmos. Chem., 8, 307–358, 1989.

Baird et al.(2004)Baird, Beckwith, Waldron, and Waddington

Baird, A., Beckwith, C., Waldron, S., and Waddington, J.: Ebullition of methane-containing gas bubbles from near surface Sphagnum peat, Geophys. Res. Lett., 31, L21505, 10.1029/2004GL021157, 2004.

Bartlett and Harriss(1993)

Bartlett, K. and Harriss, R.: Review and assessment of methane emissions from wetlands, Chemosphere, 26, 261–320, 1993.

Bartlett et al.(1990)Bartlett, Crill, Bonassi, Richey, and Harriss

Bartlett, K., Crill, P., Bonassi, J., Richey, J., and Harriss, R.: Methane flux from the Amazon River floodplain: Emissions during rising water, J. Geophys. Res., 95, 16773–16788, 1990.

Bender and Conrad(1992)

Bender, M. and Conrad, R.: Kinetics of CH4 oxidation in oxic soils exposed to ambient air or high CH4 mixing ratios, FEMS Microbiol. Ecol., 101, 261–270, 1992.

Bloom et al.(2010)Bloom, Palmer, Fraser, Reay, and Frankenberg

Bloom, A., Palmer, P., Fraser, A., Reay, D., and Frankenberg, C.: Large-Scale Controls of Methanogenesis Inferred from Methane and Gravity Spaceborne Data, Science, 327, 322–325, 2010.

Booker et al.(1999)Booker, Dennis Jr, Frank, Serafini, Torczon, and Trosset

Booker, A., Dennis Jr., J., Frank, P., Serafini, D., Torczon, V., and Trosset, M.: A rigorous framework for optimization of expensive functions by surrogates, Struct. Multidiscip. O., 17, 1–13, 1999.

Bousquet et al.(2006)Bousquet, Ciais, Miller, Dlugokencky, Hauglustaine, Prigent, Van der Werf, Peylin, Brunke, Carouge, Langenfelds, Lathiere, Papa, Ramonet, Schmidt, Steele, Tyler, and White

Bousquet, P., Ciais, P., Miller, J., Dlugokencky, E., Hauglustaine, D., Prigent, C., Van der Werf, G., Peylin, P., Brunke, E., Carouge, C., Langenfelds, R., Lathiere, J., Papa, F., Ramonet, M., Schmidt, M., Steele, L., Tyler, S., and White, J.: Contribution of anthropogenic and natural sources to atmospheric methane variability, Nature, 443, 439–443, 2006.

Butterbach-Bahl et al.(1997)Butterbach-Bahl, Papen, and Rennenberg

Butterbach-Bahl, K., Papen, H., and Rennenberg, H.: Impact of gas transport through rice cultivars on methane emission from rice paddy fields, Plant Cell Environ., 20, 1175–1183, 1997.

Cao et al.(1996)Cao, Marshall, and Gregson

Cao, M., Marshall, S., and Gregson, K.: Global carbon exchange and methane emissions from natural wetlands: Application of a process-based model, J. Geophys. Res., 101, 14399–14414, 1996.

Cheng et al.(2007)Cheng, Yagi, Akiyama, Nishimura, Sudo, Fumoto, Hasegawa, Hartley, and Megonigal

Cheng, W., Yagi, K., Akiyama, H., Nishimura, S., Sudo, S., Fumoto, T., Hasegawa, T., Hartley, A., and Megonigal, J.: An empirical model of soil chemical properties that regulate methane production in Japanese rice paddy soils, J. Environ. Qual., 36, 1920–1925, 2007.

Ciais et al.(2013)Ciais, Gasser, Paris, Caldeira, Raupach, Canadell, Patwardhan, Friedlingstein, Piao, and Gitz

Ciais, P., Gasser, T., Paris, J., Caldeira, K., Raupach, M., Canadell, J., Patwardhan, A., Friedlingstein, P., Piao, S., and Gitz, V.: Attributing the increase in atmospheric CO2 to emitters and absorbers, Nature Clim. Change, 3, 926–930, 2013.

Cicerone et al.(1983)Cicerone, Shetter, and Delwiche

Cicerone, R., Shetter, J., and Delwiche, C.: Seasonal-variation of methane flux from a California rice paddy, J. Geophys. Res.-Oceans, 88, 1022–1024, 1983.

Cicerone et al.(1992)Cicerone, Delwiche, Tyler, and Zimmerman

Cicerone, R., Delwiche, C., Tyler, S., and Zimmerman, P.: Methane emissions from California rice paddies with varied treatments, Global Biogeochem. Cy., 6, 233–248, 1992.

Colmer(2003)

Colmer, T.: Long-distance transport of gases in plants: a perspective on internal aeration and radial oxygen loss from roots, Plant Cell Environ., 26, 17–36, 2003.

Computational and Information Systems Laboratory(2012)

Computational and Information Systems Laboratory: Yellowstone: IBM iDataPlex System (Wyoming-NCAR Alliance), Boulder, CO, USA: National Center for Atmospheric Research. http://n2t.net/ark:/85065/d7wd3xhc (last access: 15 October 2015), 2012.

Conrad(2002)

Conrad, R.: Control of microbial methane production in wetland rice fields, Nutr. Cycl. Agroecosys., 64, 59–69, 2002.

Cronk and Fennessy(2001)

Cronk, J. and Fennessy, M.: Wetland Plants: Biology and Ecology, Lewis Publishers, Boca Raton, FL., USA, 2001.

Davis and Ierapetritou(2009)

Davis, E. and Ierapetritou, M.: Kriging based method for the solution of mixed-integer nonlinear programs containing black-box functions, J. Global Optim., 43, 191–205, 2009.

Dlugokencky et al.(2011)Dlugokencky, Nisbet, Fisher, and Lowry

Dlugokencky, E., Nisbet, E., Fisher, R., and Lowry, D.: Global atmospheric methane: budget, changes and dangers, Phil. T. R. Soc. A, 369, 2058–2072, 2011.

Dunfield et al.(1993)Dunfield, Knowles, Dumont, and Moore

Dunfield, P., Knowles, R., Dumont, R., and Moore, T.: Methane production and consumption in temperate and subarctic peat soils: response to temperature and pH, Soil Biol. Biochem., 25, 321–326, 1993.

Forrester et al.(2008)Forrester, Sóbester, and Keane

Forrester, A., Sóbester, A., and Keane, A.: Engineering Design via Surrogate Modelling – A Practical Guide, John Wiley & Sons Ltd, Chichester, UK, 2008.

Friedman(1991)

Friedman, J.: Multivariate Adaptive Regression Splines, The Annals of Statistics, 19, 1–141, 1991.

Giunta et al.(1997)Giunta, Balabanov, Haim, Grossman, Mason, Watson, and Haftka

Giunta, A., Balabanov, V., Haim, D., Grossman, B., Mason, W., Watson, L., and Haftka, R.: Aircraft multidisciplinary design optimisation using design of experiments theory and response surface modelling, Aeronaut. J., 101, 347–356, 1997.

Goel et al.(2007)Goel, Haftka, Shyy, and Queipo

Goel, T., Haftka, R. T., Shyy, W., and Queipo, N. V.: Ensemble of Surrogates, Struct. Multidiscip. O., 33, 199–216, 2007.

Grunfeld and Brix(1999)

Grunfeld, S. and Brix, H.: Methanogenesis and methane emissions: effects of water table, substrate type and presence of Phragmites australis, Aquat. Bot., 64, 63–75, 1999.

Gutmann(2001)

Gutmann, H.: A Radial Basis Function Method for Global Optimization, J. Global Optim., 19, 201–227, 2001.

Han et al.(2014)Han, Hendricks Franssen, Montzka, and Vereecken

Han, X., Hendricks Franssen, H.-J., Montzka, C., and Vereecken, H.: Soil moisture and soil properties estimation in the Community Land Model with synthetic brightness temperature observations, Water Resources Research, 50, 6081–6105, 2014.

Huang et al.(2001)Huang, Jaing, Zong, Sass, and Fisher

Huang, Y., Jaing, J., Zong, L., Sass, R., and Fisher, F.: Comparison of field measurements of CH4 emission from rice cultivation in Nanjing, China and in Texas, USA, Adv. Atmos. Sci., 18, 1121–1130, 2001.

Hurrell et al.(2013)Hurrell, Holland, Gent, Ghan, Kay, Kushner, Lamarque, Large, Lawrence, Lindsay, Lipscomb, Long, Mahowald, Marsh, Neale, Rasch, Vavrus, Vertenstein, Bader, Collins, Hack, Kiehl, and Marshall

Hurrell, J., Holland, M., Gent, P., Ghan, S., Kay, J., Kushner, P., Lamarque, J.-F., Large, W., Lawrence, D., Lindsay, K., Lipscomb, W., Long, M., Mahowald, N., Marsh, D., Neale, R., Rasch, P., Vavrus, S., Vertenstein, M., Bader, D., Collins, W., Hack, J., Kiehl, J., and Marshall, S.: The Community Earth System Model: A Framework for Collaborative Research, B. Am. Meteorol. Soc., 94, 1339–1360, 2013.

Jain et al.(2000)Jain, Kumar, Wassmann, Mitra, Singh, Singh, Singh, Yadav, and Gupta

Jain, M., Kumar, S., Wassmann, R., Mitra, S., Singh, S., Singh, J., Singh, R., Yadav, A., and Gupta, S.: Methane emissions from irrigated rice fields in northern India (New Delhi), Nutr. Cycl. Agroecosys., 58, 75–83, 2000.

Jiang et al.(2006)Jiang, Wang, Zheng, Zhu, Huang, and Hao

Jiang, C., Wang, Y., Zheng, X., Zhu, B., Huang, Y., and Hao, Q.: Methane and nitrous oxide emissions from three paddy rice based cultivation systems in southwest China, Adv. Atmos. Sci., 23, 415–424, 2006.

Jones et al.(1998)Jones, Schonlau, and Welch

Jones, D., Schonlau, M., and Welch, W.: Efficient Global Optimization of Expensive Black-Box Functions, J. Global Optim., 13, 455–492, 1998.

Keller(1990)

Keller, M. M.: Biological Sources and Sinks of Methane in Tropical Habitats and Tropical Atmospheric Chemistry, PhD thesis, Princeton University, Princeton, USA, 1990.

Kellner et al.(2006)Kellner, Baird, Oosterwoud, Harrison, and Waddington

Kellner, E., Baird, A., Oosterwoud, M., Harrison, K., and Waddington, J.: Effect of temperature and atmospheric pressure on methane (CH4) ebullition from near surface peats, Geophys. Res. Lett., 33, L18405, 10.1029/2006GL027509, 2006.

Knoblauch(1994)

Knoblauch, C.: Bodenkundlich-mikrobiologische Bestandsaufnahme zur Methanoxidation in einer Flussmarsch der Tide-Elbe, Master's thesis, University of Hamburg, Hamburg, Germany, 1994.

Koven et al.(2013)Koven, Riley, Subin, Tang, Torn, Collins, Bonan, Lawrence, and Swenson

Koven, C. D., Riley, W. J., Subin, Z. M., Tang, J. Y., Torn, M. S., Collins, W. D., Bonan, G. B., Lawrence, D. M., and Swenson, S. C.: The effect of vertically resolved soil biogeochemistry and alternate soil C and N models on C dynamics of CLM4, Biogeosciences, 10, 7109–7131, 10.5194/bg-10-7109-2013, 2013.

Lo et al.(2010)Lo, Famiglietti, Yeh, and Syed

Lo, M.-H., Famiglietti, J., Yeh, P.-F., and Syed, T.: Improving parameter estimation and water table depth simulation in a land surface model using GRACE water storage and estimated base flow data, Water Resour. Res., 46, W05517, 10.1029/2009WR007855, 2010.

Lombardi et al.(1997)Lombardi, Epp, and Chanton

Lombardi, J., Epp, M., and Chanton, J.: Investigation of the methyl fluoride technique for determining rhizospheric methane oxidation, Biogeochemistry, 36, 153–172, 1997.

Matthews and Fung(1987)

Matthews, E. and Fung, I.: Methane emisiion from natural wetlands: global distribution, area, and environmental characteristics of sources, Global Biogeochem. Cy., 1, 61–86, 1987.

Meng et al.(2012)Meng, Hess, Mahowald, Yavitt, Riley, Subin, Lawrence, Swenson, Jauhiainen, and Fuka

Meng, L., Hess, P. G. M., Mahowald, N. M., Yavitt, J. B., Riley, W. J., Subin, Z. M., Lawrence, D. M., Swenson, S. C., Jauhiainen, J., and Fuka, D. R.: Sensitivity of wetland methane emissions to model assumptions: application and model testing against site observations, Biogeosciences, 9, 2793–2819, 10.5194/bg-9-2793-2012, 2012.

Moore et al.(2008)Moore, Hub, Sacks, Schimel, and Monson

Moore, D., Hub, J., Sacks, W. J., Schimel, D., and Monson, R.: Estimating transpiration and the sensitivity of carbon uptake to water availability in a subalpine forest using a simple ecosystem process model informed by measured net CO2 and H2O fluxes, Agr. Forest Meteorol., 148, 1467–1477, 2008.

Mugunthan et al.(2005)Mugunthan, Shoemaker, and Regis

Mugunthan, P., Shoemaker, C., and Regis, R.: Comparison of function approximation, heuristic, and derivative-based methods for automatic calibration of computationally expensive groundwater bioremediation models, Water Resour. Res., 41, W11427, 10.1029/2005WR004134, 2005.

Müller and Piché(2011)

Müller, J. and Piché, R.: Mixture Surrogate Models Based on Dempster-Shafer Theory for Global Optimization Problems, J. Global Optim., 51, 79–104, 2011.

Müller and Shoemaker(2014)

Müller, J. and Shoemaker, C.: Influence of ensemble surrogate models and sampling strategy on the solution quality of algorithms for computationally expensive black-box global optimization problems, J. Global Optim., 60, 123–144, 10.1007/s10898-014-0184-0, 2014.

Müller et al.(2013)Müller, Shoemaker, and Piché

Müller, J., Shoemaker, C., and Piché, R.: SO-MI: A Surrogate Model Algorithm for Computationally Expensive Nonlinear Mixed-Integer Black-Box Global Optimization Problems, Comput. Oper. Res., 40, 1383–1400, 2013.

Myers and Montgomery(1995)

Myers, R. and Montgomery, D.: Response Surface Methodology, Process and Product Optimization using Designed Experiments, Wiley-Interscience Publication, New Jersey, USA, 1995.

Myhre et al.(2013)Myhre, Shindell, Bréon, Collins, Fuglestvedt, Huang, Koch, Lamarque, Lee, Mendoza, Nakajima, Robock, Stephens, Takemura, and Zhang

Myhre, G., Shindell, D., Bréon, F.-M., Collins, W., Fuglestvedt, J., Huang, J., Koch, D., Lamarque, J.-F., Lee, D., Mendoza, B., Nakajima, T., Robock, A., Stephens, G., Takemura, T., and Zhang, H.: Anthropogenic and Natural Radiative Forcing, in: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, UK and New York, NY, USA, 2013.

Oleson et al.(2013)Oleson, Lawrence, Bonan, Drewniak, Huang, Koven, Levis, Li, Riley, Subin, Swenson, Thornton, Bozbiyik, Fisher, Kluzek, Lamarque, Lawrence, Leung, Lipscomb, Muszala, Ricciuto, Sacks, Sun, Tang, and Yang

Oleson, K., Lawrence, D., Bonan, G., Drewniak, B., Huang, M., Koven, C., Levis, S., Li, F., Riley, W., Subin, Z., Swenson, S., Thornton, P., Bozbiyik, A., Fisher, R., Kluzek, E., Lamarque, J.-F., Lawrence, P., Leung, L., Lipscomb, W., Muszala, S., Ricciuto, D., Sacks, W., Sun, Y., Tang, J., and Yang, Z.-L.: Technical Description of Version 4.5 of the Community Land Model (CLM), Tech. Rep. NCAR/TN-503+STR, National Center for Atmospheric Research, Boulder, CO, USA, 10.5065/D6RR1W7M, 2013.

Popp et al.(2000)Popp, Chanton, Whiting, and Grant

Popp, T. J., Chanton, J. P., Whiting, G. J., and Grant, N.: Evaluation of Methane Oxidation in the Rhizosphere of Carex Dominated Fen in North Central Alberta, Canada, Biogeochemistry, 51, 259–281, 2000.

Powell(1992)

Powell, M.: The Theory of Radial Basis Function Approximation in 1990, Advances in Numerical Analysis, vol. 2: wavelets, subdivision algorithms and radial basis functions, Oxford University Press, Oxford, UK, 105–210, 1992.

Prigent et al.(2007)Prigent, Papa, Aires, Rossow, and Matthews

Prigent, C., Papa, F., Aires, F., Rossow, W., and Matthews, E.: Global inundation dynamics inferred from multiple satellite observations, 1993–2000, J. Geophys. Res., 112, D12107, 10.1029/2006JD007847, 2007.

Prihodko et al.(2008)Prihodko, Denning, Hanan, Baker, and Davis

Prihodko, L., Denning, A., Hanan, N., Baker, I., and Davis, K.: Sensitivity, uncertainty and time dependence of parameters in a complex land surface model, Agr. Forest Meteorol., 148, 268–287, 2008.

Qian et al.(2006)Qian, Dai, Trenberth, and Oleson

Qian, T., Dai, A., Trenberth, K., and Oleson, K.: Simulation of global land surface conditions from 1948 to 2004. Part I: Forcing data and evaluations, J. Hydrometeorol., 7, 953–975, 2006.

Ray and Swiler(2014)

Ray, J. and Swiler, L.: Bayesian calibration of the Community Land Model using surrogates, Tech. Rep. SAND2014-0867, Sandia National Laboratories, Livermore, CA, USA, 2014.

Regis(2011)

Regis, R.: Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions, Comput. Oper. Res., 38, 837–853, 2011.

Regis and Shoemaker(2007)

Regis, R. and Shoemaker, C.: A Stochastic Radial Basis Function Method for the Global Optimization of Expensive Functions, INFORMS J. Comput., 19, 497–509, 2007.

Regis and Shoemaker(2009)

Regis, R. and Shoemaker, C.: Parallel Stochastic Global Optimization Using Radial Basis Functions, INFORMS J. Comput., 21, 411–426, 2009.

Regis and Shoemaker(2013)

Regis, R. and Shoemaker, C.: Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization, Eng. Optimiz., 45, 529–555, 2013.

Riley et al.(2011)Riley, Subin, Lawrence, Swenson, Torn, Meng, Mahowald, and Hess

Riley, W. J., Subin, Z. M., Lawrence, D. M., Swenson, S. C., Torn, M. S., Meng, L., Mahowald, N. M., and Hess, P.: Barriers to predicting changes in global terrestrial methane fluxes: analyses using CLM4Me, a methane biogeochemistry model integrated in CESM, Biogeosciences, 8, 1925–1953, 10.5194/bg-8-1925-2011, 2011.

Ringeval et al.(2010)Ringeval, de Noblet-Ducoudre, Ciais, Bousquet, Prigeent, Papa, and Rossow

Ringeval, B., de Noblet-Ducoudre, N., Ciais, P., Bousquet, P., Prigeent, C., Papa, F., and Rossow, W.: An attempt to quantify the impact of changes in wetland extent on methane emissions on the seasonal and interannual time scales, Global Biogeochem. Cy., 24, GB2003, 10.1029/2008gb003354, 2010.

Saarnio et al.(1997)Saarnio, Alm, Silvola, Lohila, Nykänen, and Martikainen

Saarnio, S., Alm, J., Silvola, J., Lohila, A., Nykänen, H., and Martikainen, P.: Seasonal Variation in CH4 Emissions and Production and Oxidation Potentials at Microsites on an Oligotrophic Pine Fen, Oecologia, 110, 414–422, 1997.

Schuh et al.(2010)Schuh, Denning, Corbin, Baker, Uliasz, Parazoo, Andrews, and Worthy

Schuh, A. E., Denning, A. S., Corbin, K. D., Baker, I. T., Uliasz, M., Parazoo, N., Andrews, A. E., and Worthy, D. E. J.: A regional high-resolution carbon flux inversion of North America for 2004, Biogeosciences, 7, 1625–1644, 10.5194/bg-7-1625-2010, 2010.

Segers(1998)

Segers, R.: Methane production and methane consumption: a review of processes underlying wetland methane fluxes, Biogeochemistry, 41, 23–51, 1998.

Segers and Kengen(1998)

Segers, R. and Kengen, S.: Methane production as a function of anaerobic carbon mineralization: A process model, Soil Biol. Biochem., 30, 1107–1117, 1998.

Setyanto et al.(2004)Setyanto, Rosenami, Boer, Fauziah, and Khanif

Setyanto, P., Rosenami, A., Boer, R., Fauziah, C., and Khanif, M.: The effect of rice cultivars on methane emission from irrigated rice field, Indonesian Journal of Agricultural Sciences, 5, 20–31, 2004.

Shannon and White(1994)

Shannon, R. D. and White, J. R.: 3-Year Study of Controls on Methane Emissions from 2 Michigan Peatlands, Biogeochemistry, 27, 35–60, 1994.

Shurpali and Verma(1998)

Shurpali, N. J. and Verma, S. B.: Micrometeorological measurements of methane flux in a Minnesota peatland during two growing seasons, Biogeochemistry, 40, 1–15, 1998.

Sigren et al.(1997)Sigren, Lewis, Fisher, and Sass

Sigren, L., Lewis, S., Fisher, F., and Sass, R. L.: Effects of field drainage on soil parameters related to methane production and emision from rice paddies, Global Biogeochem. Cy., 11, 151–162, 1997.

Simpson et al.(2001)Simpson, Mauery, Korte, and Mistree

Simpson, T., Mauery, T., Korte, J., and Mistree, F.: Kriging metamodels for global approximation in simulation-based multidisciplinary design optimization, AIAA J., 39, 2233–2241, 2001.

Solonen et al.(2012)Solonen, Ollinaho, Laine, Haario, Tamminen, and Järvinen

Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., and Järvinen, H.: Efficient MCMC for climate model parameter estimation: Parallel adaptive chains and early rejection, Bayesian Analysis, 7, 715–736, 2012.

Subin et al.(2012)Subin, Riley, and Mironov

Subin, Z., Riley, W., and Mironov, D.: Improved lake model for climate simulations, J. Adv. Model. Earth Syst., 4, M02001, 10.1029/2011MS000072, 2012.

Sun et al.(2013)Sun, Hou, Huang, Tian, and Leung

Sun, Y., Hou, Z., Huang, M., Tian, F., and Ruby Leung, L.: Inverse modeling of hydrologic parameters using surface flux and runoff observations in the Community Land Model, Hydrol. Earth Syst. Sci., 17, 4995–5011, 10.5194/hess-17-4995-2013, 2013.

Swenson and Lawrence(2012)

Swenson, S. and Lawrence, D.: A New Fractional Snow Covered Area Parameterization for the Community Land Model and its Effect on the Surface Energy Balance, J. Geophys. Res., 117, D21107, 10.1029/2012JD018178, 2012.

Swenson et al.(2012)Swenson, Lawrence, and Lee

Swenson, S., Lawrence, D., and Lee, H.: Improved Simulation of the Terrestrial Hydrological Cycle in Permafrost Regions by the Community Land Model, J. Adv. Model. Earth Syst., 4, M08002, 10.1029/2012MS000165, 2012.

Thornton et al.(2007)Thornton, Lamarque, Rosenbloom, and Mahowald

Thornton, P., Lamarque, J., Rosenbloom, N., and Mahowald, N.: Influence of carbon-nitrogen cycle couplng on land model response to CO2 fertilization and climate variability, Global Biogeochem. Cy., 21, GB4018, 10.1029/2006GB002868, 2007.

Thornton et al.(2009)Thornton, Doney, Lindsay, Moore, Mahowald, Randerson, Fung, Lamarque, Feddema, and Lee

Thornton, P. E., Doney, S. C., Lindsay, K., Moore, J. K., Mahowald, N., Randerson, J. T., Fung, I., Lamarque, J.-F., Feddema, J. J., and Lee, Y.-H.: Carbon-nitrogen interactions regulate climate-carbon cycle feedbacks: results from an atmosphere-ocean general circulation model, Biogeosciences Discuss., 6, 3303–3354, 10.5194/bgd-6-3303-2009, 2009.

Tian et al.(2008)Tian, Xie, and Dai

Tian, X., Xie, Z., and Dai, A.: A land surface soil moisture data assimilation system based on the dual-UKF method and the Community Land Model, J. Geophys. Res.-Atmos., 113, D14127, 10.1029/2007JD009650, 2008.

Turner et al.(2009)Turner, Ritts, Wharton, Thomas, Monson, Black, and Falk

Turner, D., Ritts, W., Wharton, S., Thomas, C., Monson, R., Black, T., and Falk, M.: Assessing FPAR source and parameter optimization scheme in application of a diagnostic carbon flux model, Remote Sens. Environ., 113, 1529–1539, 2009.

Viana et al.(2009)Viana, Haftka, and Steffen Jr.

Viana, F., Haftka, R., and Steffen Jr., V.: Multiple surrogates: how cross-validation errors can help us to obtain the best predictor, Struct. Multidiscip. O., 39, 439–457, 2009.

Walter and Heimann(2000)

Walter, B. and Heimann, M.: A process-based, climate-sensitive model to derive methane emissions from natural wetlands: Application to five wetland sites, sensitivity to model parameters, and climate, Global Biogeochem. Cy., 14, 745–765, 2000.

Walter et al.(2001)Walter, Heimann, and Matthews

Walter, B., Heimann, M., and Matthews, E.: Modeling modern methane emissions from natural wetlands 1. Model description and results, J. Geophys. Res.-Atmos., 106, 34189–34206, 2001.

Wang et al.(2000)Wang, Xu, Li, Guo, Wassmann, Neue, Lantin, Buendia, Ding, and Wang

Wang, Z., Xu, Y., Li, Z., Guo, Y., Wassmann, R., Neue, H., Lantin, R., Buendia, L., Ding, Y., and Wang, Z.: A four-year record of methane emissions from irrigated rice fields in the Beijing region of China, Nutr. Cycl. Agroecosys., 58, 55–63, 2000.

Wania et al.(2010)Wania, Ross, and Prentice

Wania, R., Ross, I., and Prentice, I. C.: Implementation and evaluation of a new methane model within a dynamic global vegetation model: LPJ-WHyMe v1.3.1, Geosci. Model Dev., 3, 565–584, 10.5194/gmd-3-565-2010, 2010.

Whalen and Reeburgh(1996)

Whalen, S. and Reeburgh, W.: Moisture and temperature sensitivity of CH4 oxidation in boreal soils, Soil Biol. Biochem., 28, 1271–1281, 1996.

Wild and Shoemaker(2013)

Wild, S. and Shoemaker, C.: Global convergence of radial basis function trust-region algorithms for derivative-free optimization, SIAM Review, 55, 349–371, 2013.

Yagi et al.(1996)Yagi, Tsuruta, Kanda, and Minami

Yagi, K., Tsuruta, H., Kanda, K., and Minami, K.: Effect of water management on methane emission from a Japanese rice paddy field: Automated methane monitoring, Global Biogeochem. Cy., 10, 255–267, 1996.

Yang et al.(2012)Yang, Qian, Lin, Leun, and Zhang

Yang, B., Qian, Y., Lin, G., Leung, R., and Zhang, Y.: Some issues in uncertainty quantification and parameter tuning: a case study of convective parameterization scheme in the WRF regional climate model, Atmos. Chem. Phys., 12, 2409–2427, 10.5194/acp-12-2409-2012, 2012.

Yang et al.(2013)Yang, Qian, Lin, Leung, Rasch, Zhang, McFarlane, Zhao, Zhang, Wang, Wang, and Liu

Yang, B., Qian, Y., Lin, G., Leung, L., Rasch, P., Zhang, G., McFarlane, S., Zhao, C., Zhang, Y., Wang, H., Wang, M., and Liu, X.: Uncertainty quantification and parameter tuning in the CAM5 Zhang-McFarlane convection scheme and impact of improved convection on the global circulation and climate, J. Geophys. Res.-Atmos., 118, 395–415, 2013.

Zeng et al.(2013)Zeng, Drewniak, and Constantinescu

Zeng, X., Drewniak, B. A., and Constantinescu, E. M.: Calibration of the Crop model in the Community Land Model, Geosci. Model Dev. Discuss., 6, 379–398, 10.5194/gmdd-6-379-2013, 2013.

Zhang et al.(2002)Zhang, Li, Trettin, Li, and Sun

Zhang, Y., Li, C., Trettin, C., Li, H., and Sun, G.: An integrated model of soil, hydrology, and vegetation for carbon dynamics in wetland ecosystems, Global Biogeochem. Cy., 16, 1–17, 2002.

Zhu et al.(2013)Zhu, Liu, Peng, Chen, Fang, Jiang, Yang, Zhu, Wang, and Zhou

Zhu, Q., Liu, J., Peng, C., Chen, H., Fang, X., Jiang, H., Yang, G., Zhu, D., Wang, W., and Zhou, X.: Modelling methane emissions from natural wetlands: TRIPLEX-GHG model integration, sensitivity analysis, and calibration, Geosci. Model Dev. Discuss., 6, 5423–5473, 10.5194/gmdd-6-5423-2013, 2013.

Zhuang et al.(2004)Zhuang, Melillo, Kicklighter, Prinn, McGuire, Steudler, Felzer, and Hu

Zhuang, Q., Melillo, J., Kicklighter, D., Prinn, R., McGuire, A., Steudler, P., Felzer, B., and Hu, S.: Methane fluxes between terrestrial ecosystems and the atmosphere at northern high latitudes during the past century: A retrospective analysis with a process-based biogeochemistry model, Global Biogeochem. Cy., 18, GB3010, 10.1029/2004GB002239, 2004.

</app></app-group></back> </article>