the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Optimising CH_{4} simulations from the LPJGUESS model v4.1 using an adaptive Markov chain Monte Carlo algorithm
Jalisha T. Kallingal
Johan Lindström
Paul A. Miller
Janne Rinne
Maarit Raivonen
Marko Scholze
The processes responsible for methane (CH_{4}) emissions from boreal wetlands are complex; hence, their model representation is complicated by a large number of parameters and parameter uncertainties. The arcticenabled dynamic global vegetation model LPJGUESS (Lund–Potsdam–Jena General Ecosystem Simulator) is one such model that allows quantification and understanding of the natural wetland CH_{4} fluxes at various scales, ranging from local to regional and global, but with several uncertainties. The model contains detailed descriptions of the CH_{4} production, oxidation, and transport controlled by several process parameters.
Complexities in the underlying environmental processes, warmingdriven alternative paths of meteorological phenomena, and changes in hydrological and vegetation conditions highlight the need for a calibrated and optimised version of LPJGUESS. In this study, we formulated the parameter calibration as a Bayesian problem, using knowledge of reasonable parameters values as priors. We then used an adaptive Metropolis–Hastings (MH)based Markov chain Monte Carlo (MCMC) algorithm to improve predictions of CH_{4} emission by LPJGUESS and to quantify uncertainties. Application of this method on uncertain parameters allows for a greater search of their posterior distribution, leading to a more complete characterisation of the posterior distribution with a reduced risk of the sample impoverishment that can occur when using other optimisation methods. For assimilation, the analysis used flux measurement data gathered during the period from 2005 to 2014 from the Siikaneva wetlands in Southern Finland with an estimation of measurement uncertainties. The data are used to constrain the processes behind the CH_{4} dynamics, and the posterior covariance structures are used to explain how the parameters and the processes are related. To further support the conclusions, the CH_{4} flux and the other component fluxes associated with the flux are examined.
The results demonstrate the robustness of MCMC methods to quantitatively assess the interrelationship between objective function choices, parameter identifiability, and data support. The experiment using real observations from Siikaneva resulted in a reduction in the rootmeansquare error (RMSE), from 0.044 to 0.023 gC m^{−2} d^{−1}, and a 93.89 % reduction in the cost function value. As a part of this work, knowledge about how CH_{4} data can constrain the parameters and processes is derived. Although the optimisation is performed based on a single site's flux data from Siikaneva, the algorithm is useful for largerscale multisite studies for a more robust calibration of LPJGUESS and similar models, and the results can highlight where model improvements are needed.
 Article
(3987 KB)  Fulltext XML

Supplement
(2583 KB)  BibTeX
 EndNote
Methane (CH_{4}) is the second most important longlived greenhouse gas after carbon dioxide (CO_{2}) (Ciais et al., 2013; Kirschke et al., 2013). It has been reported that the global atmospheric CH_{4} concentration has been growing since the preindustrial period. In 2021, it reached a value of 1908 ppb (parts per billion), nearly 2.62 times greater than its estimated value in 1750 (Dlugokencky, 2021). This increase in the atmospheric concentration of CH_{4} is responsible for around 16.5 % of the total effective radiative forcing (in W m^{−2}) of the wellmixed greenhouse gases (IPCC AR6: Forster et al., 2021).
Among the biogenic sources, wetlands contribute around 19 %–33 % of current global terrestrial CH_{4} emissions and are the largest and the most uncertain (Kirschke et al., 2013; Saunois et al., 2020; Bousquet et al., 2006). Wetlands occupy around 3.8 % of the Earth's land surface and are mainly located in highlatitude regions. There is approximately 455 Pg of carbon stored in boreal and subarctic wetland peat/Histosol soils. Under longterm anaerobic soil situations, this carbon will be metabolised by the anaerobic microorganisms (called methanogens) and will eventually be emitted as CH_{4} back to the atmosphere (Aurela et al., 2009).
In the future, climate change may cause a positive feedback on CH_{4} emissions from wetlands due to a warmer and wetter climate (Johansson et al., 2006; Bridgham et al., 2008). According to Zhang et al. (2017), at the end of the 21st century, 38 %–56 % of the CH_{4} production from the wetlands will be climate change induced. Increased uncertainty in CH_{4} emission from boreal wetlands (Christensen et al., 2007) is also predicted, partly due to expected spatiotemporal changes in wetland extent (Saunois et al., 2016). Considering the fragility of boreal wetlands in a changing environment (Jacob et al., 2007), one way to quantify their carbon budget is to model their carbon dynamics, including CH_{4} emission. Realistic and optimised processbased vegetation models can be used to reach a more precise estimation of emission variability and trends. However, representation of the complex biogeochemical processes, including soil carbon turnover, vegetation dynamics, hydrology, soil thermal dynamics, and defining wetland boundaries are complex; therefore, estimating the contribution from multiple pathways for CH_{4} production, consumption, and release complicates wetlands CH_{4} modelling (Ahti et al., 1968; Wania et al., 2010, 2013; Susiluoto et al., 2018).
The Lund–Potsdam–Jena General Ecosystem Simulator (LPJGUESS) (Smith et al., 2014) is one of a few available processbased dynamic global vegetation models (DGVMs) that simulate local to global vegetation dynamics and soil biogeochemistry (Smith, 2001; Sitch et al., 2003). Using information about the climate and the concentration of CO_{2} in the atmosphere, it predicts the structural, compositional, and functional properties of the native ecosystems of major climate zones of the Earth. Considering the complexity of LPJGUESS, with its large number of uncertain process parameters, the model requires a mathematically robust framework for parameter optimisation (Wramneby et al., 2008). Data assimilation using Bayesian statistics can be seen as a way of combining observations with prior information (i.e. model process formulation and prior model parameter values) to derive posterior parameter and emission estimates (Susiluoto et al., 2018; Ghil and MalanotteRizzoli, 1991; Dee, 2005; Carrassi et al., 2018). The Markov chain Monte Carlo (MCMC) method (Metropolis et al., 1953b) is a powerful and convenient Bayesian framework (Tarantola, 1987) for data assimilation, as it can combine prior information with observations to sample from the posterior distributions in complex models. This study has developed an adaptive MCMC Metropolis–Hasting (AMCMCMH) framework (Hastings, 1970b; Tarantola, 1987) with “RaoBlackwellised” adaptation (transformation of estimators using the Rao–Blackwell theorem) of the multivariate Gaussian random walk proposals (Andrieu and Thoms, 2008). The algorithm minimises the model–data misfit (i.e. a cost function) by sampling from the probability density function (PDF) of the posterior parameters. The adaptation allows the algorithm to learn the shape of the posterior, thereby improving sampling efficiency. The main objective of this paper is to evaluate the capabilities and limitations of the AMCMCMH framework to optimise CH_{4} wetland emissions simulated by the LPJGUESS model by analysing the posterior parameter distributions, the parameter correlations, and the processes they control.
2.1 Siikaneva wetland and measurements
The Siikaneva wetland is located at 61°49^{′} N, 24°12^{′} E, and 160 m a.s.l., and it is the second largest undrained wetland complex in Southern Finland (Ahti et al., 1968; Rinne et al., 2007). This boreal wetland complex has an area of 12 km^{2}, including minerotrophic and ombrotrophic sites with over 6 m of peat deposition under the surface (Mathijssen et al., 2016; Aurela et al., 2007; Rinne et al., 2007). The estimated average annual total precipitation is about 707 mm. The average temperatures for January and July are approximately −7.2 and 17.1 °C, respectively. The estimated mean annual temperature is around 4.2 °C (Korrensalo et al., 2018). Total annual CH_{4} emissions from the Siikaneva wetland vary between 6.0 and 14 gC m^{−2}, while net CO_{2} fluxes vary between −96 and 27 gC m^{−2} (Rinne et al., 2018).
The CH_{4} flux data were obtained from https://doi.org/10.23729/bcc98726ead845d4ac391e4b1bf5e243 (Alekseychik et al., 2019a). Daily measurements of incoming shortwave radiation (swr), precipitation, and air temperature collected at the wetland were used as input to the model. As the meteorological data measured directly at the Siikaneva wetland have several significant gaps, which made them unsuitable as input for the model, we used precipitation and temperature data collected at a nearby station called Juupajoki – Hyytiälä (https://en.ilmatieteenlaitos.fi/downloadobservations, last access: 5 February 2024), located approximately 5.5 km from Siikaneva. The swr data collected at the Hyytiälä weather station, situated around 6 km from Siikaneva, were obtained from https://doi.org/10.23729/23dd00b2b9d7467a9ceeb4a122486039 (Aalto et al., 2022). Given the short distances between these sites and Siikaneva, we assumed that the meteorological variables were representative of Siikaneva. To verify this assumption, we analysed the available data from Siikaneva and the datasets collected from the Juupajoki and Hyytiälä sites. The air temperature and precipitation from the Juupajoki and Siikaneva sites showed Pearson correlation values of 0.998 and 0.706, respectively. The swr data collected at Hyytiälä and Siikaneva showed a correlation of 0.98. However, there were some minor gaps in the swr data collected at Hyytiälä that were gapfilled using the available data collected at Siikaneva, which were obtained from https://doi.org/10.23729/371cd3e426ae41c996d769acccc206f7 (Alekseychik et al., 2019b) for the corresponding periods.
Additional inputs to the model are the atmospheric CO_{2} concentration, as described by McGuire et al. (2001) and updated until recent years using data from https://gml.noaa.gov/ccgg/trends (last access: 5 February 2024). Daily water table depth (wtd) and soil temperature at a 5 cm depth collected from the Siikaneva site are obtained from https://doi.org/10.23729/371cd3e426ae41c996d769acccc206f7 (Alekseychik et al., 2019b) and are used to evaluate the modelled values.
2.2 CH_{4} model description in LPJGUESS
Compared with version 4 of LPJGUESS described by Smith et al. (2014), version 4.1, which we used for this study, has more detailed representations of plant functional type (PFT) characteristics and processes in wetlands. This includes improved descriptions of peatlandspecific PFTs, peatland hydrology, soil temperature estimation, and CH_{4} emissions. Brief descriptions of the important wetland processes in LPJGUESS version 4.1 are given below and in Sect. S1 in the Supplement. For a more detailed description, the reader is referred to Gustafson (2022).
The active wetland peat in the LPJGUESS is represented by a 1.5 m deep column that is further divided into 15 layers of 0.1 m thickness each (see Fig. 1). The uppermost three layers comprise the acrotelm, within which the water table can vary. The underlying 12 layers of catotelm are permanently saturated with water (Wania et al., 2009a). The decomposed organic carbon on each day (explained in Sect. S1.4) is distributed vertically in different peat soil layers in proportion to an assumed static root distribution (see Eq. 1). This carbon pool is considered to be the “potential carbon pool” for methanogenic archaea and is the basic concept behind the CH_{4} model in LPJGUESS. The total available carbon is decomposed into two components – CO_{2} and CH_{4} – depending on the availability of O_{2} in the soil. The dissolved CH_{4} concentration and the gaseous CH_{4} fraction are calculated based on the estimated CH_{4} content in each layer. A portion of the estimated CH_{4} is oxidised by soil O_{2}, whereas the remainder is transported to the atmosphere by diffusion, ebullition, or plantmediated transport. Apart from being the key factor in estimating the potential carbon pool, root biomass in each soil layer also plays a role in the transport of O_{2} and CH_{4} into and out of each layer, as this transport is mediated by the plants. From different studies of various wetland PFTs, Wania et al. (2010) observed an exponential decrease in the root biomass with depth that was proportional to the degree of anoxia, which is expressed by the following equation, also used in LPJGUESS:
where f_{root} is the fraction of root biomass at a certain depth z, λ_{root}=0.2517 m is the decay length, and C_{root}=0.025 is a normalisation constant. This distribution ensures that approximately 60 % of the roots are distributed within the acrotelm, and the root fraction in the lowest soil layer is adjusted to achieve a total root distribution of 1 across all 15 soil layers.
2.2.1 CH_{4} production
Due to its wide range, the ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ ratio from decomposition is challenging to predict. For example, Segers (1998) observed high variation in the CH_{4}toCO_{2}production molar ratio of between 0.001 and 1.7 under anaerobic conditions. Hence, it is defined in the model as an adjustable parameter weighted by the degree of anoxia (α): $\mathit{\alpha}=\mathrm{1}({F}_{\mathrm{air}}+{f}_{\mathrm{air}})$, where F_{air} is the fraction of air in the soil layers and f_{air} is the fraction of air in peat (Wania et al., 2009a) (see Sect. S1 for details).
The production of CH_{4} on each day in each layer is determined as follows:
where α(z) is the degree of anoxia at depth z, f_{root}(z) is the fraction of roots in the peat at depth z, ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}=\mathrm{0.085}$ (prior value in the model) is the tuning parameter for the CH_{4}toCO_{2}production ratio, and R_{h} is the daily heterotrophic respiration. Note that the model is set to ${{\mathrm{CH}}_{\mathrm{4}}}_{\mathrm{prod}}=\mathrm{0}$ when F_{water}<0.1, assuring no CH_{4} production in frozen and/or dry soils (i.e. the model assumes there is no water when the water is frozen and, hence, F_{water} is 0).
2.2.2 CH_{4} oxidation
As mentioned above, the CH_{4} fraction that is oxidised (represented by the parameter f_{oxid}=0.5 as the prior value in the model) depends on the availability of O_{2} in the soil. A part of the O_{2} transported to the soil will be consumed by the plant roots and nonmethanotrophic microorganisms. The remaining part is then used to oxidise CH_{4}. The oxidised CH_{4} is added to the CO_{2} pool, and the remainder stays in the CH_{4} pool.
2.2.3 Total CH_{4} flux
Diffusion, ebullition, and plantmediated transport are the three pathways via which CH_{4} is transported to the atmosphere. The total CH_{4} flux from highlatitude wetland patches in the model is represented as follows:
where ${{\mathrm{CH}}_{\mathrm{4}}}_{\mathrm{diff}}$ is the CH_{4} flux component from diffusion, ${{\mathrm{CH}}_{\mathrm{4}}}_{\mathrm{plant}}$ is the CH_{4} flux component from plantmediated transport, and ${{\mathrm{CH}}_{\mathrm{4}}}_{\mathrm{ebul}}$ is the CH_{4} flux component from ebullition. As the daily CH_{4} production in each layer is dependent on R_{h} (Eq. 2), ${F}_{{\mathrm{CH}}_{\mathrm{4}}}$ is subtracted from R_{h} before saving the daily heterotrophic respiration. Any CO_{2} generated, whether from heterotrophic respiration or CH_{4} oxidation, is released into the atmosphere.
Diffusion
The fractions of CH_{4}, CO_{2}, and O_{2} that are transported to the atmosphere and from the atmosphere through diffusion are calculated by solving the gas diffusion equation within the peat layers using a Crank–Nicolson numerical scheme with a time step of 15 min. The molecular diffusivities of these gases in soil depend on temperature, soil porosity, and the water and air content of the soil. Diffusivity in water is derived by fitting a quadratic curve to observed diffusivities at different temperatures, as described in Broecker and Peng (1974); diffusivity in the air and its temperature dependency are derived from the values taken from Lerman (1979); and diffusivity in soil and its temperature dependency are estimated from the Millington–Quirk model, as described in Millington and Quirk (1961). A detailed description can be found in Wania et al. (2010).
At the water–air surface, the gas diffusivities change by a minimum of 4 orders of magnitude; hence, at the water–air boundary, the flux is calculated by the following equation:
where C_{surf} is the surface water gas concentration; C_{eq} is the concentration of gas in equilibrium with the atmospheric partial pressure, estimated using Henry’s law; and ψ is the gas exchange coefficient, also called the piston velocity, which is usually difficult to estimate for different gases. In this case, the piston velocities of CH_{4}, CO_{2}, and O_{2} are calculated by relating them to the known piston velocity of SF6 via the following equation:
where ${\mathit{\psi}}_{\mathrm{600}}=\mathrm{2.07}+\mathrm{0.215}\times {U}_{\mathrm{10}}^{\mathrm{1.7}}$ is the piston velocity of SF6 normalised to a Schmidt number of 600 (subjected to the wind speed U_{10} at 10 m from the ground, which is considered to be 0 in the model), Sc* represents the Schmidt number of the gas under consideration, and $n=\mathrm{1}/\mathrm{2}$ (see Wania et al., 2010, for details).
As mentioned above the diffusion through the soil is affected by soil porosity, i.e. by the value of F_{air}(t,z). When F_{air}≤0.05 in a given soil layer, the diffusivity values of water are used; otherwise (F_{air}>0.05), the diffusivity values of air, which are 4 orders of magnitude larger than those of water, are used. At each daily time step, before diffusion is calculated, the gas flux J at the boundary is used to update the dissolved gas content. The surface concentration C_{surf} of CH_{4} will mostly be greater than C_{eq}; hence, J will be negative, denoting a flux to the atmosphere, although it is possible for CH_{4} to diffuse into the soil in small amounts if the concentrations at the surface are suitable. The resulting daily flux of CH_{4} is determined as the total ${{\mathrm{CH}}_{\mathrm{4}}}_{\mathrm{diff}}$.
Ebullition
Ebullition depends on the solubility of CH_{4} at a given temperature and pressure and occurs when the water table reaches the surface during periods of high CH_{4} emission. Following Wania et al. (2010), in LPJGUESS, the bestfitted curve is represented as follows:
where S_{B} is the Bunsen solubility coefficient (i.e. the volume of gas dissolved per volume of liquid at atmospheric pressure and a given temperature).
The CH_{4} in each layer is converted to a maximum allowable dissolved mass, and this limit is used to separate the CH_{4} in the form of dissolved and gaseous components. If there is any CH_{4} that exceeds the maximum solubility of a layer, it will be released into the atmosphere. The CH_{4}_{ebul} is calculated by adding the ebullition fluxes from all layers.
Plantmediated transport
Plantmediated transport of CH_{4} occurs via the aerenchyma (the gasfilled tissues) of vascular plants, either through concentration gradient or active pumping from the soil to the atmosphere. Only the passive mechanism (through a concentration gradient) is considered in the model, as it is the most dominant one (Cronk and Fennessy, 2016). Abundance, biomass, phenology, and the rooting depth of aerenchymatous plants are considered to calculate this. Only the floodtolerant C_{3} graminoids are considered for plantmediated gas transport in the model (see Table S2 in the Supplement); hence, plantmediated transport of O_{2} and CH_{4} can only occur when C_{3} graminoids are present in a simulated patch.
The transport depends on the crosssectional area of plant tillers (all of the secondary shoots produced by grasses like Poaceae or Gramineae) in each soil layer, assuming that a significantly high percentage of CH_{4} is oxidised in the highly oxic zone near the roots, where methanotrophs flourish, before they enter into the plant tissue.
The mass of the tiller is calculated as follows:
where b_{graminoid} is the leaf biomass of graminoids, and P represents the daily phenology, which is the fraction of potential leaf cover that has reached full development. To calculate the number of tillers (n_{tiller}), the total weight of tillers (m_{tiller}) is divided by the average weight of an individual tiller (w_{tiller}). The crosssectional area of tillers (A_{tiller}) can then be obtained as follows:
where r_{tiller} is the tiller radius and ϕ_{tiller} is the tiller porosity. Based on the optimisation of McGuire et al. (2012), Tang et al. (2015), and Zhang et al. (2013), the value of r_{tiller} is estimated as 0.0035 m, and based on Wania et al. (2010), the values of ϕ_{tiller} and W_{tiller} are estimated as 70 % and 0.22 gC per tiller, respectively. Each soil layer is allocated a fraction of the total crosssectional area of tillers based on the root fraction estimated in that layer. The CH_{4}_{plant} is estimated by adding the plantmediated CH_{4} fluxes from all layers.
2.3 Parameters selected for optimisation
Parameter values related to the processes of CH_{4} emission in LPJGUESS are mostly adopted from the parameter values described in Wania et al. (2010). As Wania et al. (2010) had difficulties finding the optimal parameter values for many of the parameters, they performed some preliminary analyses for seven uncertain parameters for which there were little or no data available. They performed a simple initial sensitivity test by taking four sets of values for each of the seven parameters, followed by a parameter fitting exercise with three sets of values for each of the seven parameters. They ran the model with all 2187 different combinations for 7 sites for 1 year. As a result, they got an RMSE range of between 226.4 and 18.3 (mg CH_{4} m^{−2} d^{−1}) for the different sites, which clearly indicates loosely fitted parameters with a high degree of uncertainty.
In this study, parameters for the optimisation are selected based on their sensitivity to the model output (CH_{4}) and expert opinion. We used a simple method to calculate the percentage difference in output (single simulation) when varying only one input parameter at a time from its permitted minimum value to its maximum (Hoffman and Miller, 1983; Bauer and Hamby, 1991). The “sensitivity index” (SI) is calculated using the following equation:
where D_{min} and D_{max} represent the model output values corresponding to the respective minima and maxima of the corresponding parameter range. The values are taken based on expert opinion.
We considered 5 of the 7 parameters that Wania et al. (2010) tested in their sensitivity analysis (2 parameters related to the root exudate decomposition are not used in LPJGUESS) as well as 11 other parameters used in LPJGUESS. Based on their high SI values, we chose 11 of them for the optimisation (Fig. 2, Table 1).
Among the five eliminated parameters, ag_{frac} is the fraction of annual net primary production (ANPP) used to calculate number of tillers, DT_{min} is the minimum temperature (°C) for heterotrophic decomposition, por_{org} is the porosity of organic material, C_{con} is the carbon content of biomass, and U_{10} is the possible constant value of wind speed at 10 m height. Among the selected parameters related to the CH_{4} production, Rmoist and Rmoist_{an} are the response of the soil moisture content to the soil organic matter decomposition in the acrotelm and catotelm, respectively (see Eq. 8 in Supplement); ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ is the CH_{4}toCO_{2} ratio under the anaerobic conditions (Eq. 2); f_{air} is the fraction of air in peat (Sect. 2.2.1 and Eq. 2); por_{acro} and por_{cato} are the porosity in the acrotelm and catotelm, respectively (see Sect. S1); and λ_{root} is the decay length of root biomass in peat (Eq. 1). Among the selected parameters related to the CH_{4} transportation, f_{oxid} is the fraction of oxidised CH_{4} (Sect. 2.2.2) related to the CH_{4} oxidation, w_{tiller} is the average weight of an individual tiller, r_{tiller} is the tiller radius, and ϕ_{tiller} is the tiller porosity (Eq. 8).
2.4 Parameter optimisation framework
We assumed Gaussian PDFs to depict both the prior distributions of the parameters and the deviation between model and observations. The resulting model can be formulated as follows:
Here, Y represents the observations; M(x) is the LPJGUESS output given parameters x; x_{p} represents the prior values of the parameters; and R and B are error covariance matrices describing the uncertainty in observations and priors, respectively.
The prior uncertainties (B) are based on expert opinion and were kept relatively large to reduce the prior's influence on the posterior parameter estimates. We have assumed prior variance for each parameters as 40 % of their expected range (see Table 1). The parameters are also assumed to be a prior uncorrelated, due to the lack of good and consistent expert opinions regarding dependence.
2.4.1 Cost function
Using the Bayesian framework, the posterior for the parameters becomes
which, on a log scale, results in the following quadratic cost function (Tarantola, 1987):
where “const.” represents normalising constants not depending on the unknown parameters. The two terms in J(x) represent model–data misfit and the prior information on the parameters. A number of experiments aim to achieve the smallest cost function values to locate the optimal parameter set within the parameter space.
2.5 Adaptive Metropolis–Hastings (MH) algorithm
To search for the optimal posterior parameters, we used an MCMCMH algorithm (Metropolis et al., 1953a; Hastings, 1970a). A standard MH algorithm generates samples from a target distribution by, in each iteration, drawing from a proposal distribution and then either accepting the new state or copying the old state. For a Gaussian random walk, proposals ($\widehat{x}$) are generated by adding a meanzero normal random variable to the current value (x_{t}):
where λΣ is a scaling and covariance matrix describing the spread of the added random variable. The new value is accepted with a probability that compares the likelihood (or cost function) of the proposed and old sample:
If the value is accepted, set ${x}_{t+\mathrm{1}}=\widehat{x}$, otherwise keep the previous value, ${x}_{t+\mathrm{1}}={x}_{t}$. The resulting sequence of states will represent dependent samples from the target distribution.
The adaptive MH algorithm used here contains three key concepts that are explained in the following sections: transformed proposals, providing a natural way of including limits on the parameters; the adaptive random walk, allowing the algorithm to estimate λ and Σ from the target distribution; and tempering of the target distribution, to reduce the effects of local maxima and allow better exploration of the target.
2.5.1 Transformed proposals
The standard proposal in Eq. (13) does not include any restrictions on the parameters. To handle parameter limits, we transformed the parameters, resulting in an adjusted random walk proposal:
A list of possible limits and corresponding functions are given in Table 2. Note that different functions can be applied to each parameter in x.
Having a transformed proposal requires an adjustment of the acceptance probability (Hastings, 1970a) in Eq. (14) to
Here, ${x}_{t}^{\left(i\right)}$ denotes the ith parameter in the x_{t} vector, and the q terms are given in Table 2. Note that each transformation results in one adjustment for that parameter and that all adjustments have to be multiplied together.
2.5.2 Adaptive random walk
It has been shown that the optimal MH algorithm behaviour is obtained when about 20 %–30 % of samples are accepted (Gelman et al., 1996; Roberts et al., 1997). For the proposal in Eq. 13, this is achieved when Σ corresponds to the posterior covariance matrix of the target distribution, P(xY), and the scaling is $\mathit{\lambda}={\mathrm{2.38}}^{\mathrm{2}}/d$, where d is the number of parameters in x. The key idea of the adaptive algorithms suggested in Andrieu and Thoms (2008) is to recursively estimate both Σ and λ from previous samples. An important note for the transformed proposal in Eq. (15) is that Σ and λ relate to the transformed variable z_{t} and not x_{t}.
A RaoBlackwellised update of the covariance matrix will consider both the proposal, $\widehat{z}$, and the previous value, z_{t}, weighted according to the acceptance probability, α, computed in Eq. (16). The expectation and covariance matrix are updated recursively as follows:
where γ_{t} is an adaptation factor and we have used ${\mathit{\gamma}}_{t}={t}^{\mathrm{0.51}}$.
The global adaptive scaling then updates the scaling factor λ as follows:
The update is on a log scale to ensure that λ_{t} stays positive; the second part of the equation compares the current acceptance probability with a target probability, α_{target}=0.234, and adjusts λ_{t} to obtain this desired overall acceptance probability.
To limit the effect of initial values, we first ran 5000 steps of the chain without adaptation and with Σ as 10^{−3} times an identity matrix (i.e. very small initial steps). Thereafter, a covariance matrix Σ was estimated based on the initial samples, and the chain was run for a further 15 000 steps adapting only λ and not Σ. Finally, for the last 80 000 steps, both λ and Σ were updated as described in Andrieu and Thoms (2008). We call the resulting framework the global RaoBlackwellised adaptive metropolis (GRaBAM) algorithm.
2.5.3 Tempering the target distribution
For large numbers of data, the cost function J(x) can have very deep local minima, causing the MH algorithm to get stuck, even with the two already outlined adjustments. To reduce the scale of the cost function, we temper the target distribution (Jennison, 1993).
where T is a suitably large value.
Having run an MH algorithm for N samples, the first N_{b} samples are discarded as “burnin”, and expectations or variances can be computed as averages of the remaining samples.
However, with a tempered target distribution, we have samples from $\stackrel{\mathrm{\u0303}}{P}\left(x\rightY)$ and need to use importance sampling to adjust for the difference in distributions (Jennison, 1993), resulting in weighted averages
where the weights are given by
and ${J}_{\text{min}}={min}_{i}J\left({x}_{i}\right)$. The subtraction by J_{min} is for numerical stability to avoid cases of 0/0 when all J(x_{i}) are very large. For the case of no tempering, i.e. T=1, the weights simplify to ${w}_{i}=\mathrm{1}/(N{N}_{\mathrm{b}})$, resulting in unweighted averages.
2.6 Experimental design
We have designed a set of twin experiments and a realdata experiment. For both the twin and realdata experiments, we generated MCMC chains with a length of 100 000 samples, as mentioned in Sect. 2.5.2. MCMC approaches are computationally intensive and timeconsuming. In this study, each model simulation took approximately 9 s to complete (using an AMD Ryzen Threadripper processor). As a result, for the 100 000 iterations, it consumed nearly 250 computational hours. However, it should be noted that this study involves the model running for a single site, and the computational speed is highly dependent on the performance of the processors being used.
Twin experiment
Twin experiments are designed to assess the performance of the developed GRaBAM and its ability to recover the parameter values. The daily CH_{4} output simulated by LPJGUESS using randomly chosen true parameter values (Z_{true}) within their permitted range of variation is used as the synthetic observation. As the synthesised observation conforms completely to the model, any potential errors in the model have not influenced the parameter optimisation process, ensuring unbiased posteriors. The observations have been assigned uncertainties to resemble the realdata experiment, as described below. It is expected that the assimilated parameters will converge to the true values as the MCMC chain progresses in time. To freely recover the Z_{true} values, the prior parameter values (x_{p}) in the cost function (Eq. 12) are set as Z_{true}. Two scenarios are considered for the twin experiment to test the identifiability of the parameters under different conditions: scenario 1, which has a shorter temporal scale from 2005 to 2014 (10 years), and scenario 2, which has a longer temporal scale from 1901 to 2015 (115 years). Scenario 1 is more realistic and is chosen to mimic the real data at Siikaneva, whereas scenario 2 constitutes an ideal, hypothetical case with observations over the entire simulation period. In each set of the scenarios, the optimisation started from a different initial point in the parameter space that was randomly selected from their prescribed ranges.
Realdata experiment
To estimate the posterior parameter values, an experiment using the real observation from Siikaneva is designed. The observed daily averages are compared with the model simulation in the cost function only when more than 90 % of the hourly observation are available each day. When there are gaps in the daily observation, we eliminate them and their corresponding modelled values from the cost function calculation. In principle, the error covariance matrix R should include both observation uncertainties and their correlations. As the latter is difficult to estimate, we neglected them, and the observation uncertainties are estimated as 30 % for the daily observations greater than 0.01 gC m^{−2} d^{−1} and a floor value of 0.3 for the observations less than 0.01 gC m^{−2} d^{−1}.
2.7 Parameter value estimation
For all of the experiments conducted in this study, the first 75 % of the GRaBAM chains are discarded as the burnin. The PDFs generated after the burnin are used to estimate each parameter's maximum a posteriori probability (MAP), posterior mean, and standard deviation (SD). Following the idea used in Braswell et al. (2005), the parameter distributions are grouped into three categories: “wellconstrained”, “poorly constrained”, and “edgehitting” parameters. The wellconstrained parameters are those that exhibit a welldefined unimodal distribution with a low SD. The poorly constrained parameters are those that exhibit a relatively flat multimodal distribution with a large SD. To be more precise with the estimation, for posterior parameter distributions that appeared multimodel, if the SD of the distribution was greater than 20 % of the total range, we classified them as poorly constrained. The edgehitting parameters are those that cluster near one of the edges of their prior range.
2.8 Posterior resampling experiment
To examine the effect of parameter optimisation on flux components, we designed a resampling experiment from the posterior parameter distributions. From the experiment conducted using site observations, 1000 sets of parameters were randomly selected and used to run the model to simulate the CH_{4} flux components. The outputs from each simulation of the experiment were used to analyse the process correlations and process–parameter relationships.
3.1 Twin experiment using GRaBAM
We ran a set of four different twin experiments for each of the scenarios mentioned in Sect. 2.6. Each of them shows a reasonably good convergence, regardless of their chosen initial values. In scenario 1, all parameters except for ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ and λ_{root} showed good convergence (see Fig. S1 in the Supplement; posterior parameter correlations of the first experiment shown in Fig. S1 are provided in Fig. S2). The results of scenario 2 are not shown, as they followed the same pattern. The resulting PDFs of experiment 1 in scenario 1 after the burnin are represented in Fig. 3, which shows the mean and MAP values as well as the SD of the parameters. The parameter values and related statistics are given in Table 3.
The parameter retrieval capacity of the GRaBAM algorithm is estimated as the “retrieval score” by dividing the posterior mean estimates of the parameters from all of the chains in each scenario by the Z_{true} parameter values. The idea behind the retrieval score is that, in an ideal case of complete recovery, the posterior parameter estimate and the Z_{true} value are the same; hence, the retrieval score would be 1. Figure 5 shows the retrieval scores obtained for each parameter and their 1σ value. In scenario 1, the ϕ_{tiller}, por_{acro}, por_{cato}, w_{tiller}, and λ_{root} are well retrieved with a low SD. Scenario 2 performed better in parameter retrieval compared with scenario 1. In scenario 2, the majority of the parameters except the ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$, r_{tiller}, w_{tiller}, and λ_{root} showed good retrieval scores but with comparatively high SDs (see Fig. 5). The overall mean retrieval score estimation is based on the ratio of the estimated and true values, given a value of 0.95 with an SD of 0.19 for scenario 1 and a value of 1 with an SD 0.21 for scenario 2 (see Fig. 5).
In general, the twin experiments have resulted in wellconstrained and poorly constrained parameter classes. Examples of the different classes of the distributions for experiment 1 of scenario 1 are shown in Fig. 3. Based on the posterior distributions estimated from all four GRaBAM chains, the parameters Rmoist, ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$, f_{oxid}, r_{tiller}, f_{air}, por_{acro}, por_{cato}, and λ_{root} are well constrained in scenario 1, whereas the parameters Rmoist, ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$, f_{oxid}, r_{tiller}, f_{air}, por_{acro}, por_{cato}, w_{tiller}, and λ_{root} are well constrained in scenario 2 (see Table 3).
The reduced posterior cost function values and their χ^{2} values are given in Table 4. Here, the reduced χ^{2} values are calculated by dividing twice the cost function by the number of observations used in the assimilation. Overall, the χ^{2} values indicate a statistically robust reduction in the cost function in all of the experiments, although a systematic behaviour of being below 1 is observed.
3.2 Realdata experiments and optimised parameters
For the experiment with the real data, the observations collected at the Siikaneva wetland are assimilated using the GRaBAM algorithm. The MCMC trace plots are exemplified in Fig. 6.
3.2.1 Optimised parameter values and distributions
The posterior parameter PDFs are shown in Fig. 7. In contrast to the twin experiments, the parameters fell into three categories: wellconstrained, poorly constrained, and edgehitting parameters. The classifications are given in Table 5. The PDFs for the Rmoist, ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$, ϕ_{tiller}, f_{air}, por_{acro}, w_{tiller}, and λ_{root} parameters are classified as wellconstrained distributions; the PDFs for r_{tiller}, por_{cato}, and Rmoist_{an} are classified as poorly constrained distributions; and that for f_{oxid} is classified as an edgehitting distribution. In both the wellconstrained and poorly constrained parameters, high kurtosis is observed. The values of f_{oxid}, which is the edgehitting parameter, lay near the higher bound of the edges of the prior range, and most of the retrieved values were clustered near this edge. This parameter also exhibited large positive kurtosis and negative skewness. The posterior parameter values and their 1σ SDs along with the prior values are shown in Table 5. The MAP and posterior mean estimates agree on the value for the ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$, f_{air}, and por_{acro} parameters. For f_{oxid}, ϕ_{tiller}, and r_{tiller}, both the MAP and posterior mean estimates stayed out of onethird of the 1σ range of the posterior distribution, which we consider a large difference. For the remaining parameters, the MAP and posterior mean estimates stayed within onethird of the 1σ of their posterior distribution; hence, we consider this a small difference. For the Rmoist and ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ parameters, the posterior values appeared close to but below the prior values. The posterior values of Rmoist_{an} appeared very close to but above the prior values. For the ϕ_{tiller} parameter, the MAP estimate appeared very close to but above the prior value, whereas the posterior mean estimate appeared very close to but below the prior value. For these four parameters, the posterior mean stayed within onethird of the 1σ range of the assumed prior uncertainty. The posterior values of the f_{oxid}, r_{tiller}, and f_{air} parameters appeared slightly above the prior values but outside of onethird of the 1σ range of the prior uncertainty. The prior and posterior values of the por_{acro} parameter remained the same. In contrast, the por_{cato}, w_{tiller}, and λ_{root} parameters appeared very distant from and below the prior values, out of onethird of the 1σ range of prior uncertainty, but stayed within the prior range (see Sect. 4.2 for details).
3.2.2 Posterior parameter correlation
The 2D distributions of the posterior parameters and their correlations are illustrated in Fig. 9. Overall, the majority of the parameters showed weak positive or negative correlations with a few exceptions with extreme correlations (the values and corresponding colour code in the triangle above depict this). For example, Rmoist_{an} showed a high negative correlation with Rmoist, and por_{acro} showed a high positive correlation with f_{air}. The 2D marginal distributions (scatterplots), illustrated in the lower triangle, showed a general tendency toward high clustering within the 1σ range for all of the parameters; in general, the 1D histograms (on the diagonal; also shown in Fig. 7) appeared as wellconstrained unimodal distributions. For further details, the reader is referred to Sect. 4.2.1.
3.2.3 Cost function reduction
The prior and posterior cost function values corresponding to the prior and posterior parameter values are listed in Table 5. The prior cost function value calculated with the default model parameters showed a highcost value of 48424.4, with a model overestimation of around 4 times the observed flux. After the optimisation, the cost function value was reduced to 2959.8 (reduced χ^{2}=3.82) with the MAP estimate of parameters and to 3002.6 (reduced χ^{2}=3.88) with the posterior mean estimate of parameters.
3.2.4 Flux components of the CH_{4} simulation and parameter values
To understand how and the magnitude to which each optimised parameter influences the flux components and the total flux, the result of the resampling experiment (see Sect. 2.8) is examined using correlation and regression analyses. The Pearson correlation coefficients and regression slopes are calculated for all 1000 parameter sets and their corresponding sums of the flux components and total flux. Figure 8a shows a schematic summary of the correlation coefficients and regression slopes between the 11 parameters, the flux components, and the total flux. For both the total flux and diffusion, all parameters, except f_{oxid} and ϕ_{tiller}, showed a similar regression pattern, although with slight differences in magnitude. This similarity is not surprising, as diffusion is the most dominant among the process components. The total flux showed the highest correlations with ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ and λ_{root} and the lowest correlation with f_{oxid}. A detailed discussion of the process–parameter relations can be found in Sect. 4.2.1.
Figure 8b shows the correlations between the sums of flux components resulting from the resampling experiment. The 2D distributions in the lower triangle show a strong positive relation between diffusion and total flux. Almost all of the parameter residuals are observed within 3σ deviation without many outliers. Except for the correlation between diffusion and total flux, the analysis showed no other strong positive or negative correlations between the components, as can be seen in the correlation plot illustrated in the top triangle of the aforementioned figure panel.
Yearly variations in the fractional contributions of flux components simulated using prior and posterior parameter estimates are examined to understand the impact of the optimisation on the composition of the interannual emissions. The time series of the annual sums of flux components as a function of their total flux (in %) are shown in Fig. 10a. The result shows that, among the flux components, diffusion contributes the most to the total CH_{4} flux both in prior and posterior estimates, with a slightly higher contribution in the posterior estimate, followed by plantmediated transport (see Sect. 4.2.3 for detailed discussion).
The time series model–observation mismatch of prior and posterior estimates for the annual total fluxes can be seen in Fig. 10b; the values are displayed as a percentage of the observed CH_{4} flux. The prior estimate showed a mismatch of around 600 % for the first 2 years. Furthermore, a considerably high mismatch is observed in the years 2011, 2012, and 2014. The MAP estimate remained near 0, while the posterior mean estimate exhibited slightly negative values, indicating an underestimation of the flux. Interestingly, the MAP followed the same pattern as the prior estimation by showing an increase whenever the prior increased and a decrease whenever the prior decreased; however, the posterior mean estimate did not show this relation.
The percentage fractions of the annual errors in the flux components are shown in Fig. 11. The effect of optimisation on the individual contributions of each component can be seen from the annual means (solid dots) of their fractional contribution to the total flux. Among the prior estimates of flux components, the prior plantmediated transport showed the largest error (22.5 %), whereas the ebullition showed the smallest error (9.1 %). In the MAP estimate, ebullition showed the highest error, with a value of 12.3 %, followed by diffusion and ebullition, with similar error values of 6.9 % and 6.8 %, respectively. For the estimate using posterior mean values, diffusion and plantmediated transport showed around the same errors, 7.5 % and 7.4 %, and ebullition showed the least error (2.6 %). On the righthand side of the aforementioned figure, the fourth column displays the mean and errors for the interannual variation in the total fluxes obtained by prior parameter values and posterior estimates. The prior total estimate showed an error of 4.2 %, while the mean and MAP showed an error of 0.66 % and 0.72 %, respectively.
3.3 Fit to the observation
Figure 10b illustrates the percentage model–data misfit, and Fig. 12 shows the time series of the assimilated observations as well as the model prior and posterior estimates with their uncertainties. The total RMSE estimated between the prior and observations was 0.044 gC m^{−2} d^{−1}, which was reduced to a value of 0.023 gC m^{−2} d^{−1} for the posterior case. This result indicates that most of the mismatch between the prior model estimates and observations was contributed by the large overestimation in the initial years. This overestimation disappeared in the posterior, showing a better agreement with the observation. There are years for which the observations show large peaks during the summer (such as 2010, 2012, and 2013), and the posterior estimates succeeded in capturing these peaks to a large extent, although not completely (see Sect. 4.5 for details).
4.1 Twin experiment
A common problem with the adaptive MH algorithm is its inability to widely explore the target distribution if the setup is not well tuned. This can then result in a poor approximation of the target distribution and, hence, poor adaptation. The resulting trace plots shown in Figs. 6 and S1 depict a set of wellexplored parameters in their permitted space ranges during the progression of the random walk, indicating a welltuned assimilation setup. The use of adaptive RaoBlackwellised learning of the posterior distribution appeared beneficial during the transients of the chains whenever the acceptance probability was dropped to low values in lowprobability regions of the parameter space.
Even though twin experiments have retrieved all true parameters in different experiments, individual experiments reveal instances where one or two parameters may not fully converge to their true values due to factors such as parameter correlation or equifinality (Figs. 5 and S1). Given the complexity and nonlinearity of the model, it is unsurprising that not all parameters converged completely. It is also unsurprising that different chains estimated slightly different posterior solutions for the parameters. However, most poorly retrieved parameters still have their true values within the 1σ range of the Gaussian PDFs of the optimised values. Even when the parameters are slightly off from the Z_{true} values, Fig. 4 shows the capability of the twin experiments with respect to capturing the structure of the observations, including the observed spikes.
The occasional lack of convergence mentioned above likely introduces a slight bias in the posterior estimates, i.e. compensating effects due to the equifinality in subsequent realdata analyses, leading to suboptimal parameter estimates. This equifinality is essentially based on the high nonlinearity of the model and a missing observational data constraint that would help distinguish between certain parameter combinations. Nevertheless, in contrast to other similar studies, like Santaren et al. (2014), which have encountered challenges in achieving complete recovery of many true parameter values, our GRaBAM algorithm showed better performance with respect to recovering true parameter values. The analysis of the cost function reduction (Table 4), the ability to constrain the parameters (Fig. 3), the ability to capture the structure of the model (Fig. 4), and the parameter retrieval ability of the twin experiments (Fig. 5) showed that the developed GRaBAM algorithm is still capable of optimising the process parameters related to CH_{4} emissions in LPJGUESS, given the caveats mentioned above on the equifinality.
Similarly, the systematic low χ^{2} values observed for the twin experiment do not necessarily affect the framework's ability to be set up for the realdata experiment (Sect. 3.1). As the twin experiments here are under the assumption of an “idealised model”, meaning that the model perfectly reproduces the observations without any errors or uncertainty, and “errorfree data”, where the data perfectly represent the environmental conditions without any random errors, it is expected to have χ^{2} values systematically below 1. Furthermore, the χ^{2} value is highly sensitive to the number of observations and parameters. Having 3650 observations in scenario 1, 41 610 observations in scenario 2, and only 11 parameters can lead to low χ^{2} values. However, the comparatively smaller χ^{2} values for sets 1 and 3 in scenario 1 and set 3 in scenario 2 indicate a tendency toward overfitting the results and being overconfident in the estimated posterior values and uncertainties.
4.2 Parameter estimation using real observations
As described in Sect. 3.2.1, the experiment using realdata resulted in three poorly constrained parameters and one edgehitting parameter. Poorly constrained or edgehitting parameters, however, are not uncommon in MH parameter searches and are somewhat expected with a complex and highly nonlinear model such as LPJGUESS. The correlation of parameters with other parameters can affect the result, i.e. the number of parameters that can be optimised within this data assimilation framework is limited. Although the twin experiments showed good parameter retrieval capabilities, the assimilation of realworld observations into a complex ecosystem model like LPJGUESS is expected to have parameter retrieval and equifinality problems. This is another reasons for selecting a small subset of the parameters associated with wetland CH_{4} flux simulations for this study. Considerable changes have occurred in the prior parameter values after optimisation. Here, it should be considered that, in this study, the assimilation aims to reduce the magnitude of the prior CH_{4} flux simulation to minimise its misfit with the observed data, which is nearly half of the prior model estimate (see Table 6).
4.2.1 Posterior correlation estimates
The following discusses the possible impact of the posterior parameters (shown in Table 5) on CH_{4} flux simulation, the interactions between the optimised parameters and the component fluxes (shown in Fig. 8), and the parameter–parameter correlations (shown in Fig. 9). (We distinguish between strong, >0.5, and weak, <0.2, parameter correlations but focus on the strong ones here.)
The very slight reduction, i.e. within onethird of the 1σ error, observed in the posterior mean estimate of Rmoist (Table 5) indicates a slight decrease in the moisture response under aerobic conditions, which would likely result in a slower soil carbon turnover time with a slight decrease in CH_{4} emission. The weak R^{2} value and the weak positive β value of Rmoist with all of the flux components indicate that a decrease in this parameter value decreases the emission and explains some of the variances in the flux components (Fig. 8).
In contrast, Rmoist_{an} obtained a higher posterior value compared with the prior (Table 5) with a slightly asymmetric multimodal distribution (see Fig. 7), indicating an increase in the moisture response in the anaerobic catotelm. Together with this, the strong negative correlation (−0.8) observed between Rmoist and Rmoist_{an} (Fig. 9) indicates reduced decomposition in the acrotelm and increased decomposition in the catotelm. Rmoist_{an} had a positive effect on diffusion and a negative effect on plantmediated transport (Fig. 8). An increase in Rmoist_{an} could enhance CH_{4} production in the catotelm. As the catotelm has a low plant root abundance, this increase would lead to more diffusion and a reduction in total plantmediated transport. The increase in Rmoist_{an} contributed very little to ebullition. This is most likely due to the negligible contribution of ebullition to the overall flux, with no contribution most of the time.
The posterior ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ parameter, which is the CH_{4}toCO_{2} ratio in an anaerobic environment, was found to be lower compared with the prior (Table 5). This indicates a high fraction of CO_{2} production from the peat compared with CH_{4} production. The very high R^{2} value of ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ for diffusion and plantmediated transport (which represent the two diffusive pathways) indicates that a significant portion of the variance in these pathways can be explained by this parameter (Fig. 8). Similarly, the high, positive β value for diffusion and plantmediated transport indicates a substantial linear increase in emissions via these pathways if the parameter is increased. The increase in ebullition is marginally less than the other fluxes, most likely because ebullition is limited by the availability of the gaseous fraction of CH_{4}. The dissolved CH_{4} will first emit via diffusive fluxes; hence, there could be very little CH_{4} left in the gaseous phase for ebullition. The ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ ratio is negatively correlated with the Rmoist_{an} and λ_{root} parameters (Fig. 9). This indicates a lower CH_{4} fraction produced by decomposition in deep soil.
The prior parameter value for f_{air} was 0, which means that there is no “permanent” gas fraction in peat (Table 5). The posterior value of f_{air} was slightly positive (0.032), indicating a small air fraction in the peat. f_{air} showed a very high positive correlation with por_{acro}, which can simply be explained by the fact that more porous soil allows for more air in the soil (Fig. 9). An increase in the f_{air} value would increase all of the flux components, with a notably larger effect on diffusion (Fig. 8). As stated in Sect. 2.2.3, the diffusivity of CH_{4} in air is 4 orders of magnitude greater than that in water, indicating that a higher fraction of air in the soil results in the rapid and easy transport of CH_{4} to the atmosphere. The larger increase in diffusion can be directly attributed to f_{air}, as this parameter directly controls diffusion.
The fraction of available oxygen utilised for CH_{4} oxidation is determined by the f_{oxid} parameter, which has a higher value after optimisation (Table 5). The high values of f_{oxid} and f_{air}, indicating a high available air fraction and, hence, high O_{2} concentration in the soil, result in the conversion of most of the available carbon into CO_{2}. This could explain the abovementioned reduction in ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ as a balancing effect (Eq. 2). f_{oxid} showed a negative β value for diffusion and ebullition and a slight positive β value with a comparatively high R^{2} value for plantmediated transport (Fig. 8). A decrease in ebullition can be explained by the increased availability of oxygen for CH_{4} oxidation, resulting in less CH_{4} being emitted via ebullition. A significant decrease occurs in diffusion because the diffusive flux cannot bypass the top layer, into which oxygen diffuses. Directly explaining the increase in plantmediated transport is difficult due to the complex process formulation in the model; however, it can be assumed that the aerenchyma could transport a part of the oxygen deep down to the soil layers where it plays less of a role in oxidation but contributes more to the total gas pressure, which can escalate the passive plantmediated transport to the atmosphere.
The optimisation of plantrelated parameters depends on the specific plant species present in the wetland. A slight reduction in the posterior mean estimate of ϕ_{tiller} suggests that the tillers may be slightly more compact, with reduced porosity for CH_{4} transport. A considerable reduction, more than onethird of the prior uncertainty, is observed in w_{tiller}, indicating lower leaf biomass (Table 5). This reduction in potential leaf cover would lead to less carbon added to the potential carbon pool for methanogens, resulting in lower CH_{4} emissions. A decrease in tiller weight would add less organic carbon to the soil, leading to a lesscompact peat accumulation in the bottom soil layers with more porosity for water. This could explain the negative correlation between w_{tiller} and por_{cato} (Fig. 9). In contrast to the values of the two abovementioned CH_{4} transportrelated parameters, r_{tiller}, which represents the tiller radius of plants, showed a value more than twice the prior value (Table 5). This increase indicates more crosssectional area of tiller for a given biomass, resulting in an increase in plantmediated CH_{4} transport (see Eq. 8). These three parameters related to plantmediated transport showed strong positive correlation with each other. They also exhibited positive β values in relation to plantmediated transport (Fig. 8). These parameters can have two effects on emissions: on the one hand, having aerenchyma cells with more porous space, radius, and biomass enhances CH_{4} transport to the atmosphere; on the other hand, via the same spacious aerenchyma cells, it is also possible for plants to transport more O_{2} to the soil. This enhanced O_{2} transport to the soil could explain the slight reduction in diffusion and ebullition observed in the cases of ϕ_{tiller} and w_{tiller}.
The posterior value for the porosity in the catotelm (por_{cato}) was significantly lower than the prior (Table 5), suggesting a more compact catotelm with less water (as it is assumed to be saturated). This change could have a dual effect on CH_{4}. Variations in water content can slightly affect soil temperature, potentially leading to an increase in the flux if the temperature rises or to a decrease in the flux due to the compact peat. As described in Sect. 3.2.1, por_{acro} remained unchanged, indicating no changes in acrotelm porosity (Table 5). The positive kurtosis observed in the PDF of this parameter indicates a wellconstrained single solution, while the negative skewness indicates a more probabilistic region below the posterior estimate. Similar to f_{air}, por_{acro} also exhibited positive β values for all flux components, although with a relatively low R^{2} value (Fig. 8). This positive relationship may be attributed to the increased presence of air in the acrotelm soil, which could facilitate CH_{4} emissions. In contrast, an increase in por_{cato} could lead to a slight reduction in ebullition. This could be because more water can potentially occupy the pores of permanently saturated catotelm which will indirectly affect ebullition through phase change and by affecting soil temperature.
The posterior value for λ_{root} is estimated to be much smaller than the prior (higher than the value reported in Wania et al., 2010, and in Susiluoto et al., 2018), i.e. more than onethird of 1σ of the prior estimate (Table 5). This small posterior value for λ_{root} indicates a low decay length of root biomass in the soil, meaning that more of the decomposition and CH_{4} production occurs in the acrotelm and that less occurs in the catotelm. The emission of CH_{4} produced mainly by peat decomposition in the acrotelm would be facilitated by a low posterior value for λ_{root}, with around 60 % in the first layer of the acrotelm followed by 22 % and 8 % in the respective second and third layers of the acrotelm. λ_{root} played a key role in this optimisation. Figure 8 shows that λ_{root} has a strong negative β value for diffusion and a weak positive β value for plantmediated transport, both with relatively strong R^{2} values (Fig. 8). As most of the peat decomposition activities are assumed to occur in the acrotelm, the reduction in the magnitude of λ_{root} could facilitate diffusion, especially as it is the largest component. On the other hand, plantmediated transport may be reduced, as it limits the distribution of roots in soil depths.
4.2.2 Posterior flux components
In Fig. 13, the time series of process components is shown for the posterior mean estimate. In general, the optimisation of the model parameters leads to an approximately 50 % decrease in the production of CH_{4} compared with the prior, with a significant reduction in the plantmediated and ebullition components, leaving diffusion as the dominant component. Diffusion is reduced by around 30 %, and plantmediated transport is reduced by approximately 86 %.
The low contribution of plant transport is mainly due to the reduced value of the rootdepthcontrolling parameter λ_{root}, which decreased from 25.17 to 10.58. This smaller proportion of plantmediated transport is somewhat surprising for a fen wetland site like Siikaneva, which features a significant aerenchymatous leaf area throughout the growing season. The result is contradictory to the results obtained from optimising the HIMMELI model (Susiluoto et al., 2018), in which the largest fraction of CH_{4} is contributed by the plantmediated transport. However, field experiments conducted to estimate plantmediated transport by Korrensalo et al. (2022) have observed a smaller proportion of the ecosystemscale CH_{4} flux attributable to plant CH_{4} transport at the Siikaneva fen site. This observation aligns well with the results that we obtained.
The largest reduction, however, was for ebullition (by around 92 %). This result is not surprising, as Wania et al. (2010), who provide the basic foundation of the CH_{4} model in LPJGUESS, also reported almost virtually no ebullition to the surface at several sites. Figure 13 shows that no ebullition is estimated during the years 2008, 2010, and 2012. Here, it should be considered that the representation of ebullition in LPJGUESS is somewhat simplified using a curvefitting equation to calculate the solubility and using the ideal gas law to convert the volume of CH_{4} per volume of water into the corresponding number of moles. Due to this lack of detail and its fast timescale occurrence (mostly depends on the physical parameters such as temperature and pressure) and with no relevant parameters in the control vector, the optimisation could not alter the ebullition component directly. However, ebullition is indirectly controlled by parameters related to CH_{4} production and transport when there is a high amount of saturated CH_{4} available in the soil water. Thus, the optimisation can indirectly affect the ebullition. The total observed CH_{4} flux from Siikaneva during the period of 2005 to 2014 was 56.0 gC m^{−2}, while the prior model estimate was 98.5 gC m^{−2} (Table 6). After the optimisation, with the posterior mean estimate of parameter values, the model estimated a flux of 53.5 gC m^{−2} with an estimated posterior uncertainty of ±4.82 gC m^{−2}. This shows a reduced model–data error after optimisation with a difference of only 2.5 gC m^{−2}.
4.2.3 Posterior process–process correlation
After the optimisation, the air fraction in the peat was increased, which is likely the cause of the enhanced diffusion. Diffusion is estimated in the model based on the soil porosity, water, temperature, and air fractions in the soil. Correlating the diffusion with the ebullition showed a negative result, i.e. illustrating the dominance of diffusion over ebullition (see Fig. 8b). A larger air fraction in the soil can also lead to an increase in plantmediated emission, as the passive diffusion of air through the plant tissues depends on the amount of air in the soil/peat water (see Sect. 2.2.3). This can be seen in Fig. 8b as a comparatively high correlation between diffusion and plantmediated transport. The increased tiller radius (r_{tiller}) in plants increases the A_{tiller} value (Eq. 8), thereby favouring faster diffusion through the aerenchyma cells. Ebullition is positively correlated with plantmediated transport, indicating the occurrence of both of these components when there is a high concentration of CH_{4} in the soil. This occurs when the water table is located close to the surface and when there is a higher density of graminoids. An increase in plantmediated transport of gases to the soil increases the net pressure imposed by the gases in soil/peat water, which could also lead to increased ebullition.
4.3 Model error and fit to the observation
Except for ebullition, all of the prior process components exhibited larger variances in the annual errors compared with the posterior estimates (Fig. 11). The plantmediated transport is the component with the largest error in the prior estimate. The posterior error estimates for this component showed nearly equal values, with a slightly higher value for the posterior mean estimate. A similar pattern can also be seen for diffusion. In contrast, the MAP error estimate for ebullition showed a higher value compared with the posterior mean error but interestingly also to the prior. The posterior mean error estimate for ebullition showed the lowest value.
The annual sums of flux components mentioned above are illustrated in Fig. 10a. It is clear from this figure that the prior process components had large interannual variance, especially for the first 3 years and the last year. A considerable reduction in variance is observed for both the MAP and posterior mean estimates. The reduction in the variance observed in posterior estimates is not proportional to the prior; however, the posterior estimates showed a comparatively high variance in the first and last years. In Fig. 10b, as described in Sect. 3.2.4, the posterior mean estimate shows a comparatively high variance (with respect to the MAP estimate) for the annual errors with a negative bias throughout the time period. In contrast, the MAP estimate showed a positive bias throughout the time period. Compared with the posterior mean estimate, the MAP estimate has considerably larger parameter values for ϕ_{tiller} and r_{tiller}; this could possibly be interpreted as slightly more CH_{4} emission through the increased tillers of plants and, hence, the reason for the positive bias in the MAP estimate. Figure 11 also indicates a high percentage of annual plantmediated emissions for the MAP estimate. The negative bias in the posterior mean estimate could be due to the additional wintertime emissions from the realworld wetlands, which are not captured in the model. In the model, the emissions start around early summer, once the soil is not frozen anymore. In addition, the large daily variability in the observations of the summertime fluxes is also not represented in the model. Overall, the posterior estimates of the annual fluxes are in good agreement with the observations, leading to a small model–data mismatch for both MAP and the posterior mean estimates.
4.4 Model inputs and uncertainty
As mentioned in Sect. 3.3, a somewhat pronounced systematic underestimation of emissions was observed in the years 2010, 2012, and 2013. None of the twin experiments exhibited these systematic errors, which indicates that the issue could be attributed to a structural model error (see Fig. 4). While the CH_{4} module within LPJGUESS is relatively comprehensive when compared with many other similar models, the model's process description and parameterisation remain incomplete. For instance, in the real world, wind plays a crucial role in CH_{4} emissions and its atmospheric concentration; however, wind speed is set to 0 for modelling convenience in LPJGUESS, thereby presenting a significant limitation. Similarly, the lack of the representation of air pressure, the simplified representation of ebullition (see Sect. 4.2.2), and the simplified representation of CH_{4} production (see Eq. 2) are also a major limitations. Another reason for this mismatch could be the variations in the input climate data. The correlation plots of the input environmental variables of LPJGUESS and the CH_{4} residuals (Fig. S3) indicate that, in these years, CH_{4} emissions showed a comparatively high correlation with swr and air temperature, although precipitation did not show any significant relation. The results of the sensitivity study indicate that both the prior and posterior model estimates are significantly sensitive to the input variables. However, the posterior model estimate exhibited a considerable reduction in sensitivity, especially for swr and precipitation (see Sect. S5 and Fig. S5).
As mentioned in Wania et al. (2010), the flux components are determined by complex processes that depend on changes in many environmental factors. The model is unable to represent peak emissions caused by these microenvironmental changes. For instance, ebullition (one of the more complex CH_{4} emissions processes in LPJGUESS, as explained in previous sections) depends on the volumetric content of wind and various gases as well as on hydrostatic and atmospheric pressure. However, the model does not use them as forcing variables. Ebullition is also affected by the concentration of CH_{4} and the density of nucleation sites, which are difficult to represent in the model.
It should be noted that the incomplete state vector used for optimisation might have affected the optimised model result. The process representations in LPJGUESS are complex and interconnected, with a multitude of parameters directly or indirectly linked to CH_{4} fluxes. Representing the indirect parameters can be intricate, as they may depend on other fluxes or model components. For instance, the soil module in LPJGUESS is intricately connected to the CENTURY model, featuring 10 soil compartments. Introducing a parameter related to soil temperature or the wtd into the framework would necessitate consideration of the intricacies of the CENTURY model. Furthermore, it might require the inclusion of additional flux species, such as net primary production (NPP), soil temperature profiles, or wtd. This would significantly increase the complexity of the problem, exceeding the scope of this paper. Given these caveats, the small negative biases obtained for the posterior mean estimates when compared against the observations (see Fig. 10b) are reasonable considering the quality and uncertainty in the input data used (see Sect. 2.1) and the complexity and structural issues of LPJGUESS.
4.5 Optimised simulation from LPJGUESS
A detailed time series distribution of prior and posterior model simulations plotted against the observations is shown in Fig. 12. The posterior model predictions were adjusted by the optimisation to fit the observations with considerable adjustment to the summer peaks. For example, the large peaks in the modelled emissions in 2005 and 2006, which largely contributed to the prior cost function, disappeared in the posterior emissions. In the following years, 2007 and 2008, the prior model simulations underestimated the observation, which also got corrected in the posterior. Furthermore, the posterior emissions largely capture the comparatively high peaks in the observations for the years 2010 and 2012, although the model still underestimated the observations. In 2013, the observations were high and the optimisation failed to capture this peak; rather, it tried to compensate for the underestimation by releasing a sudden high spike at the end of the summer that year. In winter months, the model simulated very low or almost zero fluxes (as discussed before), whereas, the observations showed a small emission (around 8.3 % of the assimilated total), often with some small spikes (possibly from the ebullition). This inability of the model to capture the wintertime emission has contributed to the posterior model uncertainty and model–data misfit.
A significant mismatch in soil temperature and wtd has been observed between the model and observations, especially during the wintertime (see Sect. S4 and Fig. S4). The model tends to overestimate wintertime temperatures and underestimate the wintertime wtd, indicating completely frozen soil with a very low wtd. This discrepancy could be a reason for the complete suspension of model CH_{4} emissions during the winter. Observations show many days with a wtd above ground level, both in summer and winter. As the wtd is a key factor that can affect all flux components, this error could contribute to the misfits observed after the optimisation, although it cannot explain the systematic under estimation observed in the posterior. It should be noted that there are no considerable differences observed between the prior and posterior soil temperature or wtd.
As discussed in Sect. 4.2.3, the contribution of ebullition to the posterior estimate is comparatively negligible. Compared with the posterior, there were many emissions spikes observed in the prior estimate, especially during the beginning and the end of the summer months. Apart from these spikes, the prior CH_{4} estimates during the summer were a bit low in most of the years. The posterior estimate has considerably reduced these high spikes and adjusted the summer peaks to match the observations better. On the other hand, while compromising for the summer peaks in the observations, the model with the optimised parameters often failed to capture the abrupt high fluxes in the daily observation and simulated them at slightly wrong times. The spike shown at the end of 2013 is an example of this mistiming. This is likely caused by errors in the meteorological input data and missing wind and pressure representations.
It can be seen from the Fig. 12 that the majority of the observations lie within the 95 % confidence interval of the posterior estimate. Often, the observation uncertainty overlaps the confidence interval, except for the summer peak times of 2010, 2012, and 2013 in which the observations showed strong peaks compared with the average values. The few outliers in the observations are not captured by the model; these could likely be measurement artefacts and/or due to environmental forcing not considered here, such as wind speed or air pressure.
4.6 Merits and shortcomings of the GRaBAM framework
The advantages of the developed GRaBAM framework include the ability of the MCMC method to escape local maxima or minima, making it more robust in this respect than gradientdescentbased methods. Furthermore, the MCMC method is derivativefree, thereby avoiding issues with computing gradients for very rough (noncontinuous) functions. The adaptive part of the MCMC “learns” about parameter correlations and utilises these dependencies in the proposal, allowing it to better explore the parameter space. Potential issues for the MCMC are that very uneven cost functions can lead to the chain getting stuck (essentially, the local minima is too deep for the algorithm to escape). Here, we have alleviated this issue by tapering the cost function. We also acknowledge that the cost function could be improved, as the squared cost function essentially assumes Gaussian errors with equal variance for all values and ignores any temporal correlations. The observations are measured as time series, and concentration/flux observations are known to have spatiotemporal error correlations. Adapting the cost function to account for these factors would be of interest and is likely to have a larger impact on the posterior parameter uncertainties than on the estimates of the parameters.
In principle, the twin experiment should be able to recover all of the parameters completely. However, the twin experiment in this study did not completely recover the true values of some parameters, especially the ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ parameter. This could be attributed to one or more of the following issues: (1) due to the complex and extremely nonlinear nature of LPJGUESS, there is a high possibility of equifinality; (2) the high dimensionality of the problem and limited variability in the data could be a reason for the poor convergence of some parameters; and (3) the model is not fully constrained by the limited dimensions in both the data and the parameter space. Achieving complete convergence may require incorporating (a) additional complementary observations capable of constraining additional model processes and (b) additional parameters from different modules of the model that represent different model processes.
The failure observed in convergence is expected to yield suboptimal parameter estimates with overly optimistic posterior uncertainty estimates due to the equifinality problems in the realdata experiments. It may also result in slightly corrupted relationships between variables, leading to changes in the covariance structure of the posterior parameters estimated from the real data. Considering the high degree of nonlinearity in the model, a single type of twin flux from a single site might not be sufficient to identify all of the parameters. Therefore, additional sites spanning a large climatic variability and/or different species of twin fluxes from the same site may be necessary. For instance, regarding the ${\mathrm{CH}}_{\mathrm{4}}/{\mathrm{CO}}_{\mathrm{2}}$ parameter, one reason for the poor recovery could be that the study only optimises the CH_{4} component, leaving the CO_{2} fluxes significantly off. Incorporating both CH_{4} and CO_{2} fluxes to address both sides of the ratio would lead to a more accurate convergence. However, a challenge in this approach would involve formulating cost functions for different observations in a manner that properly weights them to equally represent the information without diminishing their significance. These considerations are beyond the scope of this paper.
This study represents an initial effort to optimise the model process parameters controlling the simulation of wetland CH_{4} fluxes within the LPJGUESS model using a RaoBlackwellised adaptive MCMC technique based on Bayesian statistics. The assimilation framework has been shown to be able to retrieve true parameter values within the uncertainty limits discussed above by performing a set of twin experiments. Furthermore, we used eddycovariance flux measurement data from a boreal wetland to calibrate the LPJGUESS model parameters for a sitespecific simulation. The results demonstrate that the fit to the observation of the CH_{4} simulation of a complex terrestrial DGVM like LPJGUESS can be systematically enhanced with a Bayesian parameter calibration. The results also show that the modelled processes and the estimated parameters were well constrained by the observations, leading to a substantial reduction in the posterior uncertainty in the simulated CH_{4} emissions. However, we note here that, due to the equifinality problems identified in the identical twin experiments, the posterior parameter values contain compensating effects caused by the nonlinearity of the model and the lack of an additional observational data constraint to uniquely identify each individual parameter. The results of the resampling experiment and the parameter and process correlations indicate that there are no redundant processes in the model description.
The robustness of the assimilation framework developed in this study calls for further application of the framework using observations from multiple sites in a simultaneous assimilation. Further validation of the framework's performance is necessary to confirm its applicability to other sites with diverse PFTs and climatic conditions. The relatively strong roughness in the shape of the cost function observed in this study is expected to be reduced in a multisite assimilation experiment, as has been observed by Kuppel et al. (2012); this would provide additional observational data constraints to allow the retrieval of the global minimum of the cost function more easily and reduce the equifinality problem. These further applications are beyond the scope of this paper and will be investigated in future studies.
The AMCMC code and CH_{4} flux data used for this article are available at https://doi.org/10.5281/zenodo.7339240 (Kallingal et al., 2022). The LPJGUESS model code can be obtained at https://doi.org/10.5281/zenodo.8065737 (Nord et al., 2021).
The supplement related to this article is available online at: https://doi.org/10.5194/gmd1722992024supplement.
JTK designed the GRaBAM assimilation framework with help from MS and JL. PAM developed the CH_{4} model in LPJGUESS based on the work of Wania et al. (2010). MS and PAM provided knowledge and advice about peatland CH_{4} processes as well as model parameters. JR and MR helped with collecting the CH_{4} observations. MS and JL provided supervision and technical advice with respect to programming the algorithm and analysing results. JTK prepared the manuscript with contributions from all coauthors.
At least one of the (co)authors is a member of the editorial board of Geoscientific Model Development. The peerreview process was guided by an independent editor, and the authors also have no other competing interests to declare.
The views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
The authors are grateful to Guillaume Monteil, Adrian Gustafson, Stefan Olin, and Johan Nord for discussions on the computational setup and the manuscript. The Institute for Atmospheric and Earth System Research (SMEARINAR) at the University of Helsinki is acknowledged for the CH_{4} flux data collected at Siikaneva. We would also like to acknowledge the Finnish Meteorological Institute (FMI), Finland; the Research Council of Finland (grant nos. 336573, 341753, and 325169); and ICOSFinland for opendata availability.
This research has been partly supported by the Strategic Research Area “Biodiversity and Ecosystem services in a Changing Climate” (BECC), Lund University (grant no. DnrV 2018/467), and is a contribution to the Strategic Research Area “ModElling the Regional and Global Earth system” (MERGE). BECC and MERGE are funded by the Swedish government. This research has also been partly supported by the “Greenhouse Gas Fluxes and Earth System Feedbacks” (GreenFeedBack) project from the European Union’s Horizon Europe Framework Programme for Research and Innovation (project no. 101056921), funded by the European Union.
This paper was edited by Julia Hargreaves and reviewed by two anonymous referees.
Aalto, J., Aalto, P., Keronen, P., Kolari, P., Rantala, P., Taipale, R., Kajos, M., Patokoski, J., Rinne, J., Ruuskanen, T., Leskinen, M., Laakso, H., Levula, J., Pohja, T., Siivola, E., Kulmala, M., and Ylivinkka, I.: SMEAR II Hyytiälä forest meteorology, greenhouse gases, air quality and soil, University of Helsinki, Institute for Atmospheric and Earth System Research, https://doi.org/10.23729/23dd00b2b9d7467a9ceeb4a122486039, 2022. a
Ahti, T., HämetAhti, L., and Jalas, J.: Vegetation zones and their sections in northwestern Europe, in: Annales Botanici Fennici, pp. 169–211, JSTOR, 1968. a, b
Alekseychik, P., Peltola, O., Li, X., Aurela, M., Hatakka, J., Pihlatie, M., Rinne, J., Haapanala, S., Laakso, H., Taipale, R., Matilainen, T., Salminen, T., and Levula, J.: SMEAR II Siikaneva 1 wetland eddy covariance, University of Helsinki, Institute for Atmospheric and Earth System Research, https://doi.org/10.23729/bcc98726ead845d4ac391e4b1bf5e243, 2019a. a
Alekseychik, P., Kolari, P., Rinne, J., Haapanala, S., Laakso, H., Taipale, R., Matilainen, T., Salminen, T., Levula, J., and Tuittila, E.S.: SMEAR II Siikaneva 1 wetland meteorology and soil, Universiy of Helsinki, Institute for Atmospheric and Earth System Research, https://doi.org/10.23729/371cd3e426ae41c996d769acccc206f7, 2019b. a, b
Andrieu, C. and Thoms, J.: A tutorial on adaptive MCMC, Stat. Comput., 18, 343–373, 2008. a, b, c
Aurela, M., Riutta, T., Laurila, T., Tuovinen, J.P., Vesala, T., Tuittila, E.S., Rinne, J., Haapanala, S., and Laine, J.: CO_{2} exchange of a sedge fen in southern FinlandThe impact of a drought period, Tellus B, 59, 826–837, 2007. a
Aurela, M., Lohila, A., Tuovinen, J.P., Hatakka, J., Riutta, T., and Laurila, T.: Carbon dioxide exchange on a northern boreal fen, Boreal Environ. Res., 14, 699, 2009. a
Bauer, L. and Hamby, D.: Relative sensitivities of existing and novel model parameters in atmospheric tritium dose estimates, Radiat. Prot. Dosim., 37, 253–260, 1991. a
Bousquet, P., Ciais, P., Miller, J., Dlugokencky, E. J., Hauglustaine, D., Prigent, C., Van der Werf, G., Peylin, P., Brunke, E.G., Carouge, C., and Langenfelds, R.: Contribution of anthropogenic and natural sources to atmospheric methane variability, Nature, 443, 439–443, 2006. a
Braswell, B. H., Sacks, W. J., Linder, E., and Schimel, D. S.: Estimating diurnal to annual ecosystem parameters by synthesis of a carbon flux model with eddy covariance net ecosystem exchange observations, Glob. Change Biol., 11, 335–355, 2005. a
Bridgham, S. D., Pastor, J., Dewey, B., Weltzin, J. F., and Updegraff, K.: Rapid carbon response of peatlands to climate change, Ecology, 89, 3041–3048, 2008. a
Broecker, W. S. and Peng, T.H.: Gas exchange rates between air and sea, Tellus, 26, 21–35, 1974. a
Carrassi, A., Bocquet, M., Bertino, L., and Evensen, G.: Data assimilation in the geosciences: An overview of methods, issues, and perspectives, Wiley Interdisciplinary Reviews: Climate Change, 9, e535, https://doi.org/10.1002/wcc.535, 2018. a
Christensen, J. H., Hewitson, B., Busuioc, A., Chen, A., Gao, X., Held, I., Jones, R., Kolli, R. K., Kwon, W.T., Laprise, R., and Magaña Rueda, V.: Regional climate projections, in: Climate Change 2007: the physical science basis Chapter 11, Cambridge University Press, 847–940, 2007. a
Ciais, P., Sabine, C., Bala, G., Bopp, L., Brovkin, V., Canadell, J., Chhabra, A., DeFries, R., Galloway, J., Heimann, M., and Jones, C.: Carbon and other biogeochemical cycles. Climate change 2013: the physical science basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, Comput. Geom., 18, 95–123, 2013. a
Cronk, J. K. and Fennessy, M. S.: Wetland plants: biology and ecology, CRC press, https://doi.org/10.1201/9781420032925, 2016. a
Dee, D. P.: Bias and data assimilation, Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, Applied Meteorology and Physical Oceanography, 131, 3323–3343, 2005. a
Dlugokencky, E.: NOAA/GML (gml. noaa. gov/ccgg/trends_ch4/), NOAA/GML, https://doi.org/10.15138/P8XGAA10, 2021. a
Forster, P., Storelvmo, T., Armour, K., Collins, W., Dufresne, J.L., Frame, D., Lunt, D., Mauritsen, T., Palmer, M., Watanabe, M., Wild, M., and Zhang, H.: The Earth’s Energy Budget, Climate Feedbacks, and Climate Sensitivity, Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 923–1054, https://doi.org/10.1017/9781009157896.009, 2021. a
Gelman, A., Roberts, G., and Gilks, W.: Efficient Metropolis jumping rules, in: Bayesian Statistics 5, edited by: Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., 599–607, Oxford University Press, https://doi.org/10.1093/oso/9780198523567.003.0038, 1996. a
Ghil, M. and MalanotteRizzoli, P.: Data assimilation in meteorology and oceanography, in: Advances in geophysics, 33, 141–266, Elsevier, https://doi.org/10.1016/S00652687(08)604422, 1991. a
Gustafson, A.: On the role of terrestrial ecosystems in a changing Arctic, MediaTryck, Lund University, Sweden, 2022. a
Hastings, W.: Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 97–109, 1970a. a, b
Hastings, W. K.: Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57, 97–97, 1970b. a
Hoffman, F. O. and Miller, C. W.: Uncertainties in environmental radiological assessment models and their implications, Tech. rep., Oak Ridge National Lab., https://doi.org/10.1093/biomet/57.1.97, 1983. a
Jacob, D., Bärring, L., Christensen, O. B., Christensen, J. H., de Castro, M., Deque, M., Giorgi, F., Hagemann, S., Hirschi, M., Jones, R., and Kjellström, E.: An intercomparison of regional climate models for Europe: model performance in presentday climate, Clim. Change, 81, 31–52, 2007. a
Jennison, C.: Discussion on the meeting on the Gibbs sampler and other Markov chain Monte Carlo methods, J. Roy. Stat. Soc. B, 55, 54–56, 1993. a, b
Johansson, T., Malmer, N., Crill, P. M., Friborg, T., Åkerman, J. H., Mastepanov, M., and Christensen, T. R.: Decadal vegetation changes in a northern peatland, greenhouse gas fluxes and net radiative forcing, Glob. Change biol., 12, 2352–2369, 2006. a
Kallingal, J. T., Scholze, M., A. Miller, P., Lindstrom, J., Rinne, J., and Raivonen, M.: Optimising the CH_{4} simulations from the LPJGUESS model using Adaptive MCMC, Zenodo [code, data set], https://doi.org/10.5281/zenodo.7339240, 2022. a
Kirschke, S., Bousquet, P., Ciais, P., Saunois, M., Canadell, J. G., Dlugokencky, E. J., Bergamaschi, P., Bergmann, D., Blake, D. R., Bruhwiler, L., and CameronSmith, P.: Three decades of global methane sources and sinks, Nat. Geosci., 6, 813–823, 2013. a, b
Korrensalo, A., Männistö, E., Alekseychik, P., Mammarella, I., Rinne, J., Vesala, T., and Tuittila, E.S.: Small spatial variability in methane emission measured from a wet patterned boreal bog, Biogeosciences, 15, 1749–1761, https://doi.org/10.5194/bg1517492018, 2018. a
Korrensalo, A., Mammarella, I., Alekseychik, P., Vesala, T., and Tuittila, E.: Plant mediated methane efflux from a boreal peatland complex, Plant Soil, 471, 375–392, 2022. a
Kuppel, S., Peylin, P., Chevallier, F., Bacour, C., Maignan, F., and Richardson, A. D.: Constraining a global ecosystem model with multisite eddycovariance data, Biogeosciences, 9, 3757–3776, https://doi.org/10.5194/bg937572012, 2012. a
Lerman, A.: Geochemical processes. Water and sediment environments, John Wiley and Sons, Inc., 1979. a
Mathijssen, P. J., Väliranta, M., Korrensalo, A., Alekseychik, P., Vesala, T., Rinne, J., and Tuittila, E.S.: Reconstruction of Holocene carbon dynamics in a large boreal peatland complex, southern Finland, Quaternary Sci. Rev., 142, 1–15, 2016. a
McGuire, A., Sitch, S., Clein, J. S., Dargaville, R., Esser, G., Foley, J., Heimann, M., Joos, F., Kaplan, J., Kicklighter, D., and Meier, R.: Carbon balance of the terrestrial biosphere in the twentieth century: Analyses of CO_{2}, climate and land use effects with four processbased ecosystem models, Global Biogeochem. Cy., 15, 183–206, 2001. a
McGuire, A. D., Christensen, T. R., Hayes, D., Heroult, A., Euskirchen, E., Kimball, J. S., Koven, C., Lafleur, P., Miller, P. A., Oechel, W., Peylin, P., Williams, M., and Yi, Y.: An assessment of the carbon balance of Arctic tundra: comparisons among observations, process models, and atmospheric inversions, Biogeosciences, 9, 3185–3204, https://doi.org/10.5194/bg931852012, 2012. a
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E.: Equations of state calculations by fast computing machines, J. Chem. Phys., 21, 1087–1092, 1953a. a
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.: Equation of state calculations by fast computing machines, J. Chem. Phys., 21, 1087–1092, 1953b. a
Millington, R. and Quirk, J.: Permeability of porous solids, T. Faraday Soc., 57, 1200–1207, 1961. a
Nord, J., Anthoni, P., Gregor, K., Gustafson, A., Hantson, S., Lindeskog, M., Meyer, B., Miller, P., Nieradzik, L., Olin, S., Papastefanou, P., Smith, B., Tang, J., Wårlind, D., and and past LPJGUESS contributors: LPJGUESS Release v4.1.1 model code (4.1.1), Zenodo [code], https://doi.org/10.5281/zenodo.8065737, 2021. a
Rinne, J., Riutta, T., Pihlatie, M., Aurela, M., Haapanala, S., Tuovinen, J.P., Tuittila, E.S., and Vesala, T.: Annual cycle of methane emission from a boreal fen measured by the eddy covariance technique, Tellus B, 59, 449–457, 2007. a, b
Rinne, J., Tuittila, E.S., Peltola, O., Li, X., Raivonen, M., Alekseychik, P., Haapanala, S., Pihlatie, M., Aurela, M., Mammarella, I., and Vesala, T.: Temporal variation of ecosystem scale methane emission from a boreal fen in relation to temperature, water table position, and carbon dioxide fluxes, Global Biogeochem. Cy., 32, 1087–1106, 2018. a
Roberts, G., Gelman, A., and Gilks, W.: Weak convergence and optimal scaling of random walk Metropolis algorithms, Ann. Probab., 7, 110–120, 1997. a
Santaren, D., Peylin, P., Bacour, C., Ciais, P., and Longdoz, B.: Ecosystem model optimization using in situ flux observations: benefit of Monte Carlo versus variational schemes and analyses of the yeartoyear model performances, Biogeosciences, 11, 7137–7158, https://doi.org/10.5194/bg1171372014, 2014. a
Saunois, M., Jackson, R., Bousquet, P., Poulter, B., and Canadell, J.: The growing role of methane in anthropogenic climate change, Environ. Res. Lett., 11, 120207, https://doi.org/10.1088/17489326/11/12/120207, 2016. a
Saunois, M., Stavert, A. R., Poulter, B., Bousquet, P., Canadell, J. G., Jackson, R. B., Raymond, P. A., Dlugokencky, E. J., Houweling, S., Patra, P. K., Ciais, P., Arora, V. K., Bastviken, D., Bergamaschi, P., Blake, D. R., Brailsford, G., Bruhwiler, L., Carlson, K. M., Carrol, M., Castaldi, S., Chandra, N., Crevoisier, C., Crill, P. M., Covey, K., Curry, C. L., Etiope, G., Frankenberg, C., Gedney, N., Hegglin, M. I., HöglundIsaksson, L., Hugelius, G., Ishizawa, M., Ito, A., JanssensMaenhout, G., Jensen, K. M., Joos, F., Kleinen, T., Krummel, P. B., Langenfelds, R. L., Laruelle, G. G., Liu, L., Machida, T., Maksyutov, S., McDonald, K. C., McNorton, J., Miller, P. A., Melton, J. R., Morino, I., Müller, J., MurguiaFlores, F., Naik, V., Niwa, Y., Noce, S., O'Doherty, S., Parker, R. J., Peng, C., Peng, S., Peters, G. P., Prigent, C., Prinn, R., Ramonet, M., Regnier, P., Riley, W. J., Rosentreter, J. A., Segers, A., Simpson, I. J., Shi, H., Smith, S. J., Steele, L. P., Thornton, B. F., Tian, H., Tohjima, Y., Tubiello, F. N., Tsuruta, A., Viovy, N., Voulgarakis, A., Weber, T. S., van Weele, M., van der Werf, G. R., Weiss, R. F., Worthy, D., Wunch, D., Yin, Y., Yoshida, Y., Zhang, W., Zhang, Z., Zhao, Y., Zheng, B., Zhu, Q., Zhu, Q., and Zhuang, Q.: The Global Methane Budget 2000–2017, Earth Syst. Sci. Data, 12, 1561–1623, https://doi.org/10.5194/essd1215612020, 2020. a
Segers, R.: Methane production and methane consumption: a review of processes underlying wetland methane fluxes, Biogeochemistry, 41, 23–51, 1998. a
Sitch, S., Smith, B., Prentice, I. C., Arneth, A., Bondeau, A., Cramer, W., Kaplan, J. O., Levis, S., Lucht, W., Sykes, M. T., and Thonicke, K.: Evaluation of ecosystem dynamics, plant geography and terrestrial carbon cycling in the LPJ dynamic global vegetation model, Glob. Change Biol., 9, 161–185, 2003. a
Smith, B.: LPJGUESSan ecosystem modelling framework, Department of Physical Geography and Ecosystems Analysis, INES, Sölvegatan, 12, 22362, 2001. a
Smith, B., Wårlind, D., Arneth, A., Hickler, T., Leadley, P., Siltberg, J., and Zaehle, S.: Implications of incorporating N cycling and N limitations on primary production in an individualbased dynamic vegetation model, Biogeosciences, 11, 2027–2054, https://doi.org/10.5194/bg1120272014, 2014. a, b
Susiluoto, J., Raivonen, M., Backman, L., Laine, M., Makela, J., Peltola, O., Vesala, T., and Aalto, T.: Calibrating the sqHIMMELI v1.0 wetland methane emission model with hierarchical modeling and adaptive MCMC, Geosci. Model Dev., 11, 1199–1228, https://doi.org/10.5194/gmd1111992018, 2018. a, b, c, d
Tang, J., Miller, P. A., Persson, A., Olefeldt, D., Pilesjö, P., Heliasz, M., JackowiczKorczynski, M., Yang, Z., Smith, B., Callaghan, T. V., and Christensen, T. R.: Carbon budget estimation of a subarctic catchment using a dynamic ecosystem model at high spatial resolution, Biogeosciences, 12, 2791–2808, https://doi.org/10.5194/bg1227912015, 2015. a
Tarantola, A.: Inversion of travel times and seismic waveforms, Seismic tomography, 135–157,https://doi.org/10.1007/9789400938991_6, 1987. a, b, c
Wania, R., Ross, I., and Prentice, I. C.: Integrating peatlands and permafrost into a dynamic global vegetation model: 1. Evaluation and sensitivity of physical land surface processes, Global Biogeochem. Cy., 23, https://doi.org/10.1029/2008GB003412, 2009a. a, b
Wania, R., Ross, I., and Prentice, I. C.: Implementation and evaluation of a new methane model within a dynamic global vegetation model: LPJWHyMe v1.3.1, Geosci. Model Dev., 3, 565–584, https://doi.org/10.5194/gmd35652010, 2010. a, b, c, d, e, f, g, h, i, j, k, l, m
Wania, R., Melton, J. R., Hodson, E. L., Poulter, B., Ringeval, B., Spahni, R., Bohn, T., Avis, C. A., Chen, G., Eliseev, A. V., Hopcroft, P. O., Riley, W. J., Subin, Z. M., Tian, H., van Bodegom, P. M., Kleinen, T., Yu, Z. C., Singarayer, J. S., Zürcher, S., Lettenmaier, D. P., Beerling, D. J., Denisov, S. N., Prigent, C., Papa, F., and Kaplan, J. O.: Present state of global wetland extent and wetland methane modelling: methodology of a model intercomparison project (WETCHIMP), Geosci. Model Dev., 6, 617–641, https://doi.org/10.5194/gmd66172013, 2013. a
Wramneby, A., Smith, B., Zaehle, S., and Sykes, M. T.: Parameter uncertainties in the modelling of vegetation dynamics–effects on tree community structure and ecosystem functioning in European forest biomes, Ecol. Model., 216, 277–290, 2008. a
Zhang, W., Miller, P. A., Smith, B., Wania, R., Koenigk, T., and Döscher, R.: Tundra shrubification and treeline advance amplify arctic climate warming: results from an individualbased dynamic vegetation model, Environ. Res. Lett., 8, 034023, https://doi.org/10.1088/17489326/8/3/034023, 2013. a
Zhang, Z., Zimmermann, N. E., Stenke, A., Li, X., Hodson, E. L., Zhu, G., Huang, C., and Poulter, B.: Emerging role of wetland methane emissions in driving 21st century climate change, P. Natl. Acad. Sci. USA, 114, 9647–9652, 2017. a