Bayesian Inference and Predictive Performance of Soil Respiration Models in the Presence

71 Bayesian inference of microbial soil respiration models is often based on the assumptions that the 72 residuals are independent (i.e. no temporal or spatial correlation), identically distributed (i.e. 73 Gaussian noise) and with constant variance (i.e. homoscedastic). In the presence of model 74 discrepancy, since no model is perfect, this study shows that these assumptions are generally 75 invalid in soil respiration modeling such that residuals have high temporal correlation, an 76 increasing variance with increasing magnitude of CO2 efflux, and non-Gaussian distribution. 77 Relaxing these three assumptions stepwise results in eight data models. Data models are the basis 78 of formulating likelihood functions of Bayesian inference. This study presents a systematic and 79 comprehensive investigation of the impacts data model selection on Bayesian inference and 80 predictive performance. We use three mechanistic soil respiration models with different levels of 81 model fidelity (i.e. model discrepancy) with respect to number of carbon pools and explicit 82 representations of soil moisture controls on carbon degradation, and accordingly have different 83 levels of model complexity with respect to the number of model parameters. The study shows data 84 models have substantial impacts on Bayesian inference and predictive performance of the soil 85 respiration models such that: (i) the level of complexity of the best model is generally justified by 86 the cross-validation results for different data models; (ii) not accounting for heteroscedasticity and 87 autocorrelation might not necessarily result in biased parameter estimates or predictions, but will 88 definitely underestimate uncertainty; (iii) using a non-Gaussian data model improves the parameter 89 estimates and the predictive performance; and (iv) separate accounting for autocorrelation or joint 90 inversion of correlation and heteroscedasticity can be problematic and requires special treatment. 91 Although the conclusions of this study are empirical, the analysis may provide insights for 92 selecting appropriate data models for soil respiration models. 93 3 Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-272 Manuscript under review for journal Geosci. Model Dev. Discussion started: 26 November 2018 c © Author(s) 2018. CC BY 4.0 License.

1 Introduction 94 Developing accurate soil respiration models is important for realistic projection of global 95 carbon [C] cycle, as global soils store 2,300Pg carbon, an amount more than 3 times that of the modeling is a vital tool for the model-data integration (Luo et al., 2011(Luo et al., , 2014Wieder et al., 2015). 105 In addition, use of state-of-the-art statistical methods is necessary to accurately quantify  This study evaluates the above assumptions by considering eight data models which relaxes 153 these three assumptions stepwise as shown in Section 2. For example, combining the assumptions 154 of independent, homoscedastic, and Gaussian residuals leads to the standard least squares data 155 model. This model is the simplest one among the eight data models, since it requires only one 156 parameter, i.e., the constant variance of the Gaussian distribution. Note that there is a difference homoscedasticity, and normality) are relaxed, the resulting data models become more complex and 163 require more parameters. This systematic way of formulating data models is similar to that of 164 Smith et al. (2010b, 2015), and it is necessary to evaluate appropriateness of the three basic 165 assumptions and their impacts on Bayesian inference. 166 The assumptions of heteroscedastic, correlated, and non-Gaussian residuals are accounted for 167 using the method of Schoups and Vrugt (2010) in the following procedure: (i) the correlation is  After investigating the impacts of the data models on Bayesian inference, this study evaluates 185 the impacts of the data models on predictive performance of the three soil respiration models.

186
Using random samples generated during the Bayesian inference, a prediction ensemble is produced  predictive performance given by two data models (weighted least square and skew exponential 196 power distribution after removing heteroscedasticity and autocorrelation) are dramatically 197 different, and a definitive conclusion was drawn that one data model is better than the other. The 198 evaluation of predictive analysis is conducted for the following two cases: (1) the prediction 199 ensemble is generated by random samples of the soil respiration models only (i.e. credible 200 interval), and (2) the prediction ensemble is generated by random samples of not only the soil 201 respiration models but also the data models (i.e. predictive interval). The two cases lead to different 202 conclusions about the predictive performance. It is expected that the evaluation of predictive 203 performance conducted in this study can help select the most appropriate data model to achieve 204 optimal model predictions.

205
The remainder of the paper is organized as follows. Section 2 starts with a description of the followed by a brief summary of the three soil respiration models. The results of Bayesian inference 208 are discussed in Section 3 and Section 4, addressing the data model implications on parameter 209 estimation and predictive performance, respectively. Section 5 summarizes the key findings and 210 limitations of this study, and provides recommendations for approaching data model selection.  This study considers eight evolving data models starting from a data model that assumes 218 independent, homoscedastic, and Gaussian residuals to a data model that relaxes all the three 219 assumptions. The eight data models are based on the generic normalized residual, (3)

236
The two unknown parameters 0 σ and 1 σ are estimated jointly with unknown physical model The unknown parameters of 0 σ , 1 σ , and i φ are estimated jointly with physical model 251 parameters. Equations (2) -(5) assume that the residuals are Gaussian.

252
The next four data models are similar to the previous four models except that the standard 253 normal distribution of t a is replaced by the skew exponential power distribution, (0,1, , ) SEP

256
where zero is mean, one is standard deviation, ξ is skewness, β is kurtosis, which is the data model of SLS in equation (2).   In comparison with the Gaussian data models, the SEP-based data models have two more 276 parameters ( ξ and β ) to be estimated jointly with physical model parameters. of previous studies or expert judgment. When prior information is lacking, a common practice is 289 to assume uniform distributions with relatively large parameter ranges so that the prior 290 distributions do not affect the estimation of posterior distributions.

291
The data models above can be used to construct the likelihood functions. For the Gaussian data 292 models given in equations (2) -(5), the corresponding Gaussian likelihood functions are 293 straightforward, and an example is equation (7). For the SEP data models, the corresponding 294 likelihood that is called generalized likelihood function is (Schoups and Vrugt, 2010) where n is the dimension of d. The Gaussian likelihood functions are special case of the generalized 297 likelihood functions. For example, by setting is the volumetric soil moisture, and s θ [-] is the porosity.

335
In addition to using the new rate equations, models 5C and 6C have more carbon pools. In 336 model 5C, DOC is split into two sub-pools for wet zone and dry zone of soil pores, and only the 337 wet DOC is used by MIC, as shown in Figure 1. The moisture-controlled microbial uptake rate [gCm -3 ] is the DOC pool size in the wet soil pores. Model 6C is more complex in 341 that ENZ is further split into two sub-pools for wet and dry pores, and both the wet and dry ENZ 342 are subject to degradation, as shown in Figure 1. The moisture-controlled SOC degradation rate for the wet ENZ and   analyzer.

373
The parameters estimated in this study include the parameters of the soil respiration models 374 (4C -6C) and the parameters of the data models described in Section 2.  (1992) R-statistic is used for convergence diagnostic, and it approaches one in less than 4×10 4 387 samples. The initial 50% of the samples are discarded during the burn-in period.
As a summary, Figure   411 3 shows that it is important to examine the residuals and to determine whether a data model is 412 adequate for charactering the residuals. Although WSEP-AC still cannot perfectly characterize εt, 413 it is significantly better than SLS.

414
Although the Gaussian assumption used in SLS is violated for model 4C (Figure 3c), this is 415 not generally the case for other data models and physical models. This is shown in Figure 4, which 416 presents the quantile-quantile (Q-Q) plot for the eight data models and the three soil respiration  . This is often undesirable, if we seek to make the models more mechanistically descriptive. 448 We focus our discussion on carbon use efficiency (CUE) for microbial growth since CUE is a

500
Based on the last one third of the CO2 efflux observations, a cross-validation test was conducted 501 for all the 24 models, the combinations of three soil respiration models and eight data models.

502
Given the cross-validation data, the predictive performance is examined using the four statistical  Three criteria are used to evaluate the predictive performance of the soil respiration models 510 and data models, and they are central mean tendency, dispersion, and reliability. Each criteria is 511 measured by a single metric. In addition, a newly defined metric is also used for simultaneously 512 measuring the three criteria. The central mean tendency is measured in this study using the Nash-

515
where n is the number of cross-validation data, di is the i-th data, d is the mean of the data, and 516 i X is the mean of the prediction ensemble, Xi, for di. NSME ranges from -∞ to 1, with NSME = 1 517 corresponding to a perfect match between data and mean prediction, i.e., the ensemble is centered 518 on the data. NSME = 0 indicates that the model predictions are as only accurate as the mean of the 519 data, while an efficiency NSME < 1 indicates that the mean of data is a better prediction than the 520 mean prediction.

550
The four statistics above (i.e. NSME, sharpness, coverage, and RMS) are calculated for the 551 three soil respiration models and the eight data models. Taking data models SLS and WSEP-AC 552 as an example, Figure 6 plots the data (for the calibration and cross-validation periods separately) 553 along with the mean and 95% credible intervals of the prediction ensemble for the three models.  With respect to the overall predictive performance, the same variation pattern and exception 576 are also observed in the RMS plots in Figures 7g and 7h. This is not surprising because RMS is 577 the metric that can be used to measure all the three criteria (central mean tendency, sharpness, and 578 reliability). Since the prediction ensemble is not centered on the data, the sharpness and reliability 579 are the decisive factors for evaluating the predictive performance.

580
As a summary, while it is necessary to account for heteroscedasticity in a data model, caution Section 4.2. 606 We start by the visual assessment of the predictive performance. Figure 8 is similar to Figure   607 6 with the exception that Figure 8 considers the overall all predictive uncertainty (i.e. parametric 608 and output uncertainty), while Figure 6 considers the parametric uncertainty only.