Articles | Volume 15, issue 12
Geosci. Model Dev., 15, 5021–5043, 2022
Geosci. Model Dev., 15, 5021–5043, 2022
Model evaluation paper
01 Jul 2022
Model evaluation paper | 01 Jul 2022

Using a surrogate-assisted Bayesian framework to calibrate the runoff-generation scheme in the Energy Exascale Earth System Model (E3SM) v1

Using a surrogate-assisted Bayesian framework to calibrate the runoff-generation scheme in the Energy Exascale Earth System Model (E3SM) v1
Donghui Xu1, Gautam Bisht1, Khachik Sargsyan2, Chang Liao1, and L. Ruby Leung1 Donghui Xu et al.
  • 1Atmospheric Sciences and Global Change Division, Pacific Northwest National Laboratory, Richland, WA, USA
  • 2Chemistry, Combustion, and Materials Science Center, Sandia National Laboratories, Livermore, CA, USA

Correspondence: Donghui Xu (


Runoff is a critical component of the terrestrial water cycle, and Earth system models (ESMs) are essential tools to study its spatiotemporal variability. Runoff schemes in ESMs typically include many parameters so that model calibration is necessary to improve the accuracy of simulated runoff. However, runoff calibration at a global scale is challenging because of the high computational cost and the lack of reliable observational datasets. In this study, we calibrated 11 runoff relevant parameters in the Energy Exascale Earth System Model (E3SM) Land Model (ELM) using a surrogate-assisted Bayesian framework. First, the polynomial chaos expansion machinery with Bayesian compressed sensing is used to construct computationally inexpensive surrogate models for ELM-simulated runoff at 0.5× 0.5 for 1991–2010. The error metric between the ELM simulations and the benchmark data is selected to construct the surrogates, which facilitates efficient calibration and avoids the more conventional, but challenging, construction of high-dimensional surrogates for the ELM simulated runoff. Second, the Sobol' index sensitivity analysis is performed using the surrogate models to identify the most sensitive parameters, and our results show that, in most regions, ELM-simulated runoff is strongly sensitive to 3 of the 11 uncertain parameters. Third, a Bayesian method is used to infer the optimal values of the most sensitive parameters using an observation-based global runoff dataset as the benchmark. Our results show that model performance is significantly improved with the inferred parameter values. Although the parametric uncertainty of simulated runoff is reduced after the parameter inference, it remains comparable to the multimodel ensemble uncertainty represented by the global hydrological models in ISMIP2a. Additionally, the annual global runoff trend during the simulation period is not well constrained by the inferred parameter values, suggesting the importance of including parametric uncertainty in future runoff projections.

1 Introduction

Runoff is an essential source of freshwater, and its variability has profound socioeconomic impacts (Hall et al., 2014; Vörösmarty et al., 2000). Flooding in wet regions during peak streamflow is among the most impactful natural hazards of all weather-related events in terms of fatalities and material costs (Doocy et al., 2013). However, higher streamflow replenishes reservoirs that help provide water for agriculture and hydropower generation and transports nutrients to the floodplain. Drought is a form of hydrological extreme that can also result in immense damages to the ecosystem and agriculture (Mishra and Singh, 2010). It is associated with abnormally low runoff, especially in arid and semi-arid regions. Therefore, understanding the spatial and temporal patterns of runoff is crucial for flood control, water management, crop yield, ecosystem services, etc. The runoff variability has been impacted by human-induced land use and climate change (Milly et al., 2008; Fischer and Knutti, 2016; Bosmans et al., 2017; Dai, 2013; Xu et al., 2021a), and the changes are projected to be more significant towards the end of this century (Xu et al., 2021a).

The spatial and temporal patterns of runoff and its response to climate change for water security assessments and water management are commonly studied using Earth system models (ESMs; Milly et al., 2002; Hirabayashi et al., 2013; Schewe et al., 2014). Current generation ESMs have large uncertainty in the simulation of runoff and its changes under future scenarios. However, statistical methods have been applied recently to reduce uncertainty in model predictions (Yang et al., 2017; Gosling and Arnell, 2011; Lehner et al., 2019; Xu et al., 2021a). Uncertainties in ESM simulations of runoff stem from uncertain model inputs, model structural uncertainty, and parametric uncertainty (Sun et al., 2013; Giuntoli et al., 2018). Input uncertainties consist of uncertainties in atmospheric forcing and land surface cover data that can be reduced by improving observation quality as more data become available. Model structural uncertainty is due to knowledge gaps or simplifications of the physical processes of the Earth system. Specifically, the typical coarse resolution ( 100 km) of ESMs cannot capture a few of the key physical factors that control runoff-generation processes such as terrain and soil variations. Downscaling methods have been developed to reduce model bias when projecting the changes in hydrological variables from the coarse resolution ESM simulation to a fine resolution (Tebaldi et al., 2005; Knutti et al., 2010; Xu et al., 2019). Recent development in the Energy Exascale Earth System Model (E3SM) has introduced a subgrid topography-based downscaling of precipitation (Tesfa et al., 2020) to understand the role of topography in hydrological processes. Over the past few decades, the land component of ESMs has continuously been improved by developing new representations of physical processes, such as implementing variable soil thickness (Brunke et al., 2016), solving the variably saturated flow in groundwater dynamics (Bisht et al., 2018), including land–river interactions (Decharme et al., 2019; Xu et al., 2021b), representing lateral subsurface flow (Swenson et al., 2019), and increasing spatial resolution (Haarsma et al., 2016). While these advances improve our understanding of the Earth system, they may not lead to reduced uncertainties in future projections (Knutti and Sedláèek, 2012; Lehner et al., 2020). This is because parametric uncertainty may increase as new processes are included in the model. The uncertainty in ESM simulated runoff must be reduced before reliable conclusions can be drawn regarding ESM projections of future changes in the runoff characteristics.

The parametric uncertainties in simulated runoff can be reduced by model calibration (Gupta et al., 1998). Previous studies have shown that it is possible to constrain the uncertainty of runoff by calibrating the relevant model parameters at a regional scale (Ray et al., 2015; Sun et al., 2013; Sheng et al., 2017; Xie et al., 2007; Troy et al., 2008; Hou et al., 2012; Huang et al., 2013). Hou et al. (2012) and Huang et al. (2013) identified the most sensitive hydrologic parameters of the Community Land Model (CLM) for simulating runoff and surface energy fluxes at a few selected watersheds and flux tower sites in the USA. They found that reducing the dimensionality of uncertain parameters using sensitivity analysis speeds up the calibration processes (Huang et al., 2013). Consequently, Sun et al. (2013) successfully applied a Bayesian inversion approach to estimate the optimal parameters to improve the performance of runoff generation in CLM. Troy et al. (2008) proposed an efficient framework to calibrate the Variable Infiltration Capacity (VIC) model for the contiguous USA by interpolating the calibrated parameters from small gauged basins. While previous studies performed comprehensive model calibration of runoff at regional scales, it remains challenging to calibrate the land components of ESM at global scales due to (1) the lack of runoff observations and (2) the high computational cost of running a large ensemble of global land model simulations. For (1), it is common to validate land models with streamflow (i.e., flow rate accumulated from runoff within a drainage area) observation (Li et al., 2015; Krysanova et al., 2020; Beck et al., 2017; Zhang et al., 2016), as runoff is not directly measured. However, routing the runoff to simulate streamflow at a coarse resolution introduces additional uncertainties due to the representation of the stream network (Wu et al., 2011; Liao et al., 2022) and river channel geometry (Andreadis et al., 2013). A recent observation-based global runoff dataset (GRUN; Ghiggi et al., 2019a) provides a good benchmark for calibrating runoff-generation-related parameters without the need of coupling the land model with a river-routing model. For (2), tens of thousands of simulations are typically needed for parameter calibration when the parameter dimension is high, but it is not computationally feasible to run a large ensemble of ESM simulations at the global scale.

The computational cost of model calibration can be significantly reduced by using an Uncertainty Quantification (UQ) framework that develops surrogate models of complex physical models. UQ frameworks include the following steps: (1) construction of a surrogate model that can mimic the behavior of a physical model, (2) identification of sensitive parameters to reduce the dimensionality of uncertain parameters, and (3) use of the parameter inference process to constrain the parametric uncertainty by comparing surrogate model prediction against observation. The surrogate modeling approach has received wide attention in hydrological applications (Razavi et al., 2012; Ivanov et al., 2021; Wang et al., 2014) to calibrate large-scale land models in terms of different hydrological processes (Gong et al., 2015; Lu et al., 2018; Müller et al., 2015; Huang et al., 2016; Ray et al., 2015; Sargsyan et al., 2014; Ricciuto et al., 2018). Multiple methods falling into the class of surrogate models include Gaussian process models, artificial neural networks, support vector machines, and polynomial chaos expansions (PCEs). In this study, we rely on PCEs as convenient machinery for uncertain input parameter representation and surrogate construction. The PCE surrogate captures the complex, nonlinear behavior of the physical model through a learned polynomial expansion. This method also provides convenient global sensitivity analysis (Dwelle et al., 2019). Furthermore, we employ Bayesian compressive sensing (BCS) to arrive at sparse PCEs that include only polynomial terms relevant to the model, thus facilitating PCE surrogate construction in the presence of a large number of uncertain inputs and a relatively small number of model simulations (Sargsyan et al., 2014). Once the surrogate model is constructed, it replaces the expensive physical model in simulation-intensive studies such as global sensitivity analysis and parameter inference.

The objective of this work is to use the UQ framework to improve the performance of runoff generation at a monthly scale and quantify the associated parametric uncertainty in the E3SM Land Model version 1 (ELM v1; E3SM; Golaz et al., 2019). This study is organized in the following manner. We briefly describe the runoff-generation process in ELM v1, the UQ framework, and the data used in this study in Sects. 2, 3, and 4, respectively. In Sect. 5, we first present the validation of the surrogate models, sensitivity of simulated runoff to the uncertain parameters, dimensional reduction of uncertain parameters, and estimation of optimal parameters. Then we evaluate the performance of ELM-simulated runoff with the optimal parameters, the runoff sensitivity to precipitation, and the changes due to the use of optimal parameters on ELM-simulated water- and energy-related variables against various benchmarks using the International Land Model Benchmarking (ILAMB) package (Collier et al., 2018). Last, we present the simulated runoff uncertainty associated with parameters and theirs impacts on runoff trends at global scale. Section 6 discusses the limitations of this work, followed by the conclusions in Sect. 7.

2 E3SM Land Model

2.1 Runoff-generation scheme in ELM v1

The ELM v1(hereafter, v1 is omitted) was developed based on the Community Land Model 4.5 (CLM4.5; Oleson et al., 2013) to understand the water availability and water cycle extremes (Leung et al., 2020). The new physical processes added in ELM to better represent the terrestrial water cycle include a variably saturated flow model (Bisht et al., 2018), a soil erosion model (Tan et al., 2020), dynamic roots (Drewniak, 2019), and a two-way coupled irrigation scheme (Zhou et al., 2020). The runoff generation in ELM is based on the simple TOPMODEL-based runoff parameterization (SIMTOP; Niu et al., 2005) in which the total runoff (Rtotal) consists of the following three components: surface runoff (Rover; e.g., saturation excess runoff), surface water runoff (Rh2osfc; e.g., surface water drainage from depressions/wetlands), and subsurface runoff (Rdrai), as seen in the following:

(1) R total = R over + R h 2 osfc + R drai .

A fraction of the flux of water reaching the soil surface (qliq) generates surface runoff, and the fraction is determined by the saturation fraction (fsat) of the grid cell as follows:


where fmax represents the maximum saturation fraction for a given grid cell that is calculated with high-resolution compound topographic indices, fover is a decay factor, and z is the water table depth.

ELM includes surface water storage to represent inland/wetland surface water dynamics (Ekici et al., 2019). When the surface water storage is fully filled, surface water runoff is generated as follows:

(4) R h 2 osfc = k h 2 osfc f connected W sfc - W c 1 Δ t ,

where kh2osfc represents the linear storage coefficient, fconnected is the interconnected fraction of the inundated areas, Wsfc is the mass of surface water, Wc is the mass of surface water when the storage is full, and Δt is the model time step. Wsfc is formulated as follows:

(5) W sfc = d 2 1 + erf d σ micro 2 + σ micro 2 π e - d 2 2 σ micro 2 ,

where erf represents the error function, d is the height of the surface water relative to the cell averaged elevation, and σmicro is the standard deviation of the microtopographic distribution that characterizes subgrid elevation variation. Given the surface water height from the previous equation, the surface water fraction (fh2osfc) of a cell is estimated with the following:

(6) f h 2 osfc = 1 2 1 + erf d σ micro 2 .

The inundation areas are assumed to be randomly distributed within the grid cell, and the interconnected fraction of the inundated areas can be estimated based on percolation theory, as follows:

(7) f connected = f h 2 osfc - f c μ if f h 2 osfc > f c 0 , if f h 2 osfc f c ,

where fc is the threshold below which the inundated areas are not connected, and μ is a scaling exponent. The default parameter values in ELM of fc and μ are 0.4 and 0.14 for all the global cells, respectively.

The subsurface runoff is parameterized as an exponential function of water table depth and includes an ice impedance factor (Θice) to account for the reduction in the bottom drainage when ice is present in the soil (Swenson et al., 2012), as follows:


where qdrai,max is the maximum drainage rate, fdrai is the decay factor, θiceθsat represents the ice-filled fraction of the pore space for the soil under the water table, and Ω is an adjustable parameter.

We follow the work of Huang et al. (2013) in selecting uncertain parameters and their corresponding ranges (Table 1). There are three additional parameters included in this study for surface water storage drainage and impacts of ice to subsurface runoff and soil water dynamics, which represent new features in ELM compared to CLM4.0 used in Huang et al. (2013). All the parameter prior distributions are assumed to be a uniform distribution.

Table 1Uncertain parameters' information. Note: DEM is for digital elevation model.

Download Print Version | Download XLSX

2.2 Model configuration

We ran ELM globally at a spatial resolution of 0.5× 0.5, driven by the Global Soil Wetness Project forcing dataset (GSWP3v1) from 1991 to 2010, featuring 3 h, 0.5× 0.5 global atmosphere forcing. GSWP3v1 has been dynamically downscaled and bias corrected based on the reanalysis data of Compo et al. (2011). The default configuration of ELM was used with a 30 min time step. With the default configuration, the hydrologic representations of ELM are the same as those in CLM4.5, as new model features such as the variably saturated flow model and subgrid topography are not included. Except the uncertain parameters listed in Table 1, the default values of all other ELM parameters were used in this study.

3 Uncertainty Quantification framework

A detailed derivation of the PCE-based Uncertainty Quantification framework and BCS method used in this work is presented in Sargsyan et al. (2014) and Debusschere et al. (2016). In this study, we used the Uncertainty Quantification Toolkit (UQTk; Debusschere et al., 2004, 2016) that includes implementations of PCE construction with BCS and subsequent global sensitivity analysis. Only a brief description of the construction of the PCE-based surrogate for the ELM simulations is summarized below.

3.1 Polynomial chaos expansion

Let M denote a physical model (e.g., ELM) with uncertain parameters X, where X=[X1,X2,,XD], and D represents the total number of uncertain parameters. In this study, the uncertain parameters X are listed in Table 1 and D is 11. A scalar quantity of interest (QoI), y^ (e.g., runoff at a specified time from a specified location), obtained using a sample of random parameters, x, can be expressed as a polynomial expansion, as follows:

(10) y ^ = M x = α c α Ψ α ( x ) ,

where Ψα is a polynomial, and cα is the corresponding coefficient. In practice, x is scaled to [−1, 1] from the original uncertainty input range. The polynomial expansion in Eq. (10) is written, with respect to multivariate orthogonal polynomials, as follows:

(11) Ψ α ( x ) = i = 1 D Ψ α i x i ,

where Ψαixi is a univariate polynomial, whose form is associated with the prior distribution of uncertain input variable Xi (e.g., Legendre polynomials are used when the input variable follows a uniform distribution), and αi is a member of the multi-index α=α1,α2,,αD, which represents the degrees of the univariate polynomial terms. Readers should refer to Dwelle et al. (2019) for details about the selection of polynomial terms and an illustration of how the multi-index is used to construct a PCE-based surrogate. In practice, Eq. (11) is approximated with a truncated PCE by only selecting terms with a total degree of polynomials smaller than a certain value p (Xiu and Karniadakis, 2002; Lin and Karniadakis, 2009). This leads to a finite set Ap=(α:i=1Dαip) for the multi-index α to take the following:

(12) M x M PC x = α A p c α Ψ α x = j = 0 P c j Ψ j x ,

where j represents the counter-index of any possible multi-index α in Ap in a predefined order (see details in Appendix B of Dwelle et al., 2019). The coefficients (cj) for the P+1 polynomial bases are computed using training simulations of M(x) (e.g., ELM) to construct the truncated PCE approximation in Eq. (12). The number of the polynomial basis is determined by both the input dimension D and the total degree for truncation p (Xiu and Karniadakis, 2002) is as follows:

(13) P + 1 = D + p ! D ! p ! .

The value P increases rapidly as the number of uncertainty input variables increases. For example, 11 uncertain parameters (e.g., D=11) with a truncated PCE order of p=4 lead to 1365 coefficients to solve in Eq. (12). It is computationally prohibitive to run 1365 global ELM simulations, so we adopted the BCS method of Sargsyan et al. (2014) that requires a much smaller number of ELM simulations to construct a PCE-based surrogate. The BCS method computes only a sparse set of cj to construct the surrogate of a form given by Eq. (12) because not all Ψj(x) are relevant for the given QoI (Sargsyan et al., 2014).

3.2 Global sensitivity analysis

In this study, we performed variance-based, global sensitivity analysis using Sobol' indices (Sobol', 2001). For a PCE-based surrogate model, the main Sobol' index, Si, for the uncertain parameter Xi can be estimated as follows:

(14) S i = j Π i c j 2 | | Ψ j | | 2 j = 0 P c j 2 | | Ψ j | | 2 ,

where Πi denotes all the indices of polynomial basis terms in Eq. (12) that only involve parameter Xi, and ||Ψj|| is the norm of the polynomial Ψj(x). The main Sobol' index Si can be interpreted as the fraction of variance in the output that is associated with the uncertainty model parameter Xi only when other parameters are fixed at constant values. Similarly, one can estimate the Sobol' index for any pair of parameters Xi and Xi to represent parameter interaction sensitivity with the coefficients cj (Sargsyan et al., 2014).

3.3 Parameter inference

Parameter inference is used to determine a set of model parameters that reduces the error between observation and model prediction. The model inverse problem can be solved with the Bayes' theorem, as follows:

(15) p X | y = L y | X p ( X ) p ( y ) ,

where p(X|y) is the posterior distribution of parameter X given observation y, L(y|X) is the likelihood function, p(X) represents the prior distribution of X, and p(y) is merely a normalizing constant for the purposes of parameter calibration. The discrepancy between the model and observations, ϵ=y-MX, should be included in the likelihood function. It is common to assume the error term (e.g., ϵ) follows a Gaussian distribution with a vanishing mean:

(16) ϵ i N 0 , σ 2 , i = 1 , 2 , , N ,

where N is the number of observations used to infer the parameters (e.g., time series of monthly runoff), and the standard deviation, σ, can be inferred from the data (see Sect. 3.4). Then, the likelihood function can be written as follows:

(17) L y | X = i = 1 N 1 2 π σ 2 exp - y i - M i X 2 2 σ 2 .

The logarithm of Eq. (17) leads to the least squares objective function that is used for deterministic parameter estimation in practice (Sargsyan et al., 2015), as follows:

(18) log L y | X = - i = 1 N y i - M i X 2 2 σ 2 - N 2 log ( 2 π σ 2 ) .

The posterior distribution in Eq. (18) is difficult to compute in practice; hence, we estimate it through samples obtained by the Markov Chain Monte Carlo (MCMC) method. Specifically, 1000 iterations are used as the burn-in period in this study, and the sampling of the posterior distribution is saved every 10 iterations. We run MCMC for 10 000 steps, resulting in 1800 samples to construct the posterior distribution. We have employed adaptive MCMC method of Haario et al. (2001), in which the parameter space is searched according to proposal steps with a covariance that is updated spontaneously.

3.4 Quantity of interest

In this study, the physical model M and the QoI y^ correspond to ELM and runoff, respectively. The development of a surrogate model for the simulated runoff for each grid cell for each month of a 20-year simulation would require 240 (= 12 months × 20 years) PCE-based surrogates. Although developing a PCE-based surrogate is not expensive, it is computationally expensive to train 240 PCEs for each of the 70 302 grid cells in the global domain. The parameter inference process for 240 PCEs for each grid cell will further increase the computational cost. We reduce the number of QoIs by training the surrogate model for the root mean square error (RMSE) between the simulated runoff and observations instead of training the surrogate model to predict monthly runoff. The RMSE is given as follows:

(19) RMSE = 1 N i = 1 N R i sim - R i obs 2 ,

where Risim and Riobs represent the grid-level simulated total runoff and observed total runoff, respectively, for ith month in the simulated period, and N represents the number of simulation months. Consequently, only one surrogate model is needed for each grid cell to quantify the performance of ELM in capturing the monthly runoff variation for a given uncertain parameter set. The selection of RMSE as QoI in constructing surrogate models significantly reduces the computational burden of the surrogates' construction and parameter inference. We performed ELM simulations using 200 parameter sets that were randomly sampled from the range specified in Table 1. A set of 175 ELM simulations were used for training the surrogate models, and the other 25 simulations were used for validating their performances. The performance of the PCE-based surrogate model can be affected by the truncated order (Dwelle et al., 2019). For each grid cell, we train the surrogate with p=1,2,,7 separately and picked the order that minimizes the relative norm 2 error (RE) of validation simulations as follows:

(20) RE = | RMSE val PC - RMSE val M | 2 | RMSE val M | 2 ,

where RMSEvalPC and RMSEvalM represent the PCE-simulated and ELM-simulated vector of RMSE of the 25 validation simulations, respectively. Then, the trained surrogate models, RMSEPC, can be plugged into the likelihood function of Eq. (18) seamlessly, as follows:

(21) log L y | X = - N 0 - RMSE PC 2 2 σ 2 - N 2 log ( 2 π σ 2 ) .

The standard deviation of error between model simulated runoff and observation exhibits a significant monthly variation. To provide a reasonable value of σ, we further assume σ in Eq. (21) has a different meaning than that in Eq. (18) by taking RMSE as model simulation, and 0 as the target. Therefore, σ is approximated as the standard deviation of the difference between 0 and RMSEs, where each RMSE was calculated using simulated runoff and observation for a given training simulation. Our estimation of σ leads to a reasonable posterior (see Sect. 5.4), though other methods can also be used to estimate σ. We acknowledge that the value of σ may have an impact on the parameter posteriors, but investigating the sensitivity of σ on the posteriors is beyond the scope of this study.

3.5 Calibration procedure

In summary, the following procedures were implemented to determine the optimal parameter values and their joint probability distribution:

  1. Run ELM with 200 parameter sets randomly sampled with the range specified in Table 1.

  2. Construct PCE-based surrogate models to mimic the RMSE between the ELM and GRUN runoff dataset with 175 simulations and validate the performance of the surrogate models with the other 25 simulations.

  3. Implement sensitivity analysis with the surrogate models to reduce parameter dimensionality for calibration by ignoring the parameters with negligible Sobol' index (e.g., less than 0.05).

  4. Estimate the Bayesian posterior of the most sensitive parameters for each grid through MCMC process with the runoff dataset of Ghiggi et al. (2019a).

It has been shown that small surrogate error can result in significant deviation of the inferred parameter (Laloy and Jacques, 2019). To further search the optimal parameters and construct the runoff posterior uncertainty, we ran ELM simulations with additional 100 samples from the posteriors of the three most sensitive parameters for all global grid cells, and default values were used for less sensitive parameters.

The parameters with the minimum RMSE between simulations and reference runoff data from the 100 ELM simulations were determined as the optimal parameter value for each grid cell.

4 Data

4.1 Observation-based runoff data

The 0.5× 0.5 observation-based global runoff dataset (GRUN) dataset of Ghiggi et al. (2019a) was used in this study as the observation within the calibration framework for parameter inference. The GRUN dataset was generated from a trained random forest (RF) model (Breiman, 2001) that used precipitation and near-surface temperature to predict monthly runoff. The training runoff data were derived from the Global Streamflow Indices and Metadata Archive (GSIM; Gudmundsson et al., 2018; Do et al., 2018), and only the gauges with contributing area comparable to cell area of 0.5× 0.5 were used. GSWP3 atmospheric forcing was used for training and reconstruction of the monthly global runoff.

4.2 Model benchmarks

The ILAMB package (Collier et al., 2018) was used to evaluate the simulated water and energy cycles from the calibrated ELM against various benchmarks. Specifically, a gridded energy flux data (FLUXCOM; Jung et al., 2019) that was generated by machine learning with flux tower measurements was used to evaluate latent and sensible heat fluxes, the Global Land Evaporation Amsterdam Model version 3 (GLEAMv3; Martens et al., 2017) product was used to evaluate global evapotranspiration (ET), and Gravity Recovery And Climate Experiment (GRACE; Kim et al., 2009) data were used to evaluate a terrestrial water storage anomaly (TWSA). Details about ILAMB can be found at (last access: 23 June 2021).

The Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) archived simulations from multiple global hydrological models and land surface model forced by the same atmosphere forcings (Warszawski et al., 2014). We used 13 available models from the second-phase water sector (ISIMIP2a; Gosling et al., 2019) to provide a benchmark for the uncertainty of annual runoff magnitude and trend. Only the models in ISIMPI2a that were driven by the GSWP3 forcing without accounting for human activity impacts were selected here to be consistent with ELM's configuration.

4.3 Evaluation metrics

There were two metrics used to evaluate ELM's performance of simulating runoff at a monthly scale with calibrated parameters, including the Nash–Sutcliffe efficiency (NSE; Nash and Sutcliffe, 1970) and the Kling–Gupta efficiency (KGE; Gupta et al., 2009) which are computed as follows:


where Risim and Riobs represent cell-level simulated total runoff and observed total runoff, respectively, for the ith month, μobs is the corresponding averaged observation, ρ is the correlation coefficient between simulation and observation, σsim is the standard deviation in simulations, σobs is the standard deviation in observations, and μsim is the simulation mean. Both NSE and KGE vary from −∞ to 1, and a perfect model performance is indicated by NSE = 1 and KGE = 1. NSE < 0 and KGE <−0.41 mean the simulations are worse estimates than the mean of observations, indicating a bad model performance (Knoben et al., 2019).

The sensitivity of runoff to precipitation is a critical aspect for runoff simulation evaluation, considering changes in precipitation will continue in the future (Trenberth, 2011). Therefore, we evaluated the sensitivity of runoff to the precipitation anomalies with the calibrated parameters. The sensitivity was quantified by the slope of linear regression (β) between runoff anomalies (ΔR) and precipitation anomalies (ΔP) as follows:

(24) Δ R = β Δ P + ϵ .

The interception ϵ ≈0, implies the mean runoff is related to mean precipitation.

We also evaluated the impacts of parameters on the runoff trend. Specifically, the magnitude of runoff trend was calculated with Sen's slope (Sen, 1968), which is nonparametric and not sensitive to the outliers. Then, Mann–Kendall test was used to determine if the trend is significant or not at confidence level α=0.05.

Figure 1Validation of surrogate performance for an example grid cell from an arid region. Panel (a) shows runoff seasonality from all 200 simulations with samples from parameter priors. Panel (b) shows the validation of the surrogate model trained with original ranges of fdrai given in Table 1 in the main text. Panel (c) shows the validation of the surrogate model trained with constrained fdrai.


5 Results

5.1 Refinement of fdrai for arid regions

The proposed prior for fdrai is not suitable for all the climate regions, such as the simulations with the full the range of fdrai defined in Table 1 results in unrealistic runoff for arid regions. For example, the simulated runoff from an example grid cell with fdrai<0.4 shows higher magnitudes and lower variabilities compared to simulations with fdrai≥0.4 (Fig. 1a). Lower fdrai can lead to unrealistically high subsurface runoff according to the exponential function of baseflow drainage (Eq. 8) for the arid regions, where the precipitation is not enough to maintain the water table at a reasonable level. Such simulations with fdrai<0.4 result in high nonlinearity in the simulated runoff, and hence, the PCE-based surrogate model cannot capture the model behaviors (Fig. 1b). The performance of surrogate models is improved by constraining the lower bound of fdrai to 0.4 (Fig. 1c). Therefore, fdrai is refined as [0.4, 5] for areas that are identified as arid climate in the Köppen climate classification (Fig. S1 in the Supplement), and [0.1, 5] is used in other regions.

5.2 Validation of surrogate models

The PCE-based surrogate models can mimic the variations in RMSE between ELM-simulated runoff and the GRUN runoff with the truncated order determined in Fig. S2. Specifically, the surrogate models exhibit good performance for the validation simulations with RE<0.1 for 70 % of the global domain (Fig. 2a). The global averaged RE of surrogate models for the validation simulations is around 0.1, with the largest error over the arid regions (Fig. 2b). While 41 % of the arid region shows an acceptable performance in the surrogate models when narrowing the range of fdrai with RE less than 0.15, the RE of other arid areas remain high (Fig. 2a). Additional simulations were performed to investigate if the lower performance of surrogate models for arid regions is due to insufficient number of training simulations. We randomly selected 20 grid cells from the arid region and ran 2000 ELM simulations with random samples from the parameter priors, as summarized in Table 1. The RE of surrogate models for the 20 grid cells remained large (e.g., RE > 0.2), even as the number of training simulations were increased (Fig. S3). Thus, the lower performance of surrogate models over the arid regions is not dependent on the number of training simulations.

Figure 2Relative norm 2 error of the surrogate models for the validation simulations. Panel (a) shows the spatial distribution of the errors, and panel (b) shows the average errors for the grid cells in each climate defined by Köppen climate classification.

Figure 3Plot of relative norm 2 error (RE) of surrogate models for the validation simulations vs. averaged annual runoff magnitude with all the grid cells from the arid region.


Most surrogate models with large RE are in extremely dry arid regions; for example, RE > 0.2 are mainly from grid cells with annual runoff <0.05 mm d−1 (Fig. 3). The RE of surrogate models tends to decrease for areas with relatively higher annual runoff that are still from arid regions (annual runoff < 0.5 mm d−1 in Fig. 3). However, the runoff uncertainty in extremely dry areas will have negligible impact on the global water cycle. Surrogate model with RE > 0.15 is considered as not sufficiently accurate, and such grid cells are excluded in the sensitivity analysis presented next.

Figure 4Spatial distribution of the main Sobol' index for the sensitivity parameters.

Figure 5Averaged main Sobol' index and joint Sobol' index for different climates defined by the Köppen climate classification. Only the cells with the relative norm 2 errors of PCE-based surrogate models for validating simulations of less than 0.15 are used for estimating the averaged sensitivity for each climate region. The size of the circles and thickness of the lines are proportional to main Sobol' index and joint Sobol' index, respectively. The legend in the right bottom subplot shows the Sobol' index for the corresponding size of circle and thickness of line.


5.3 Global sensitivity analysis

The most significant ELM parameters identified for runoff generation are fover, fdrai, ψs, fc, and Ω, based on the spatial distribution of the main Sobol' indices (Fig. 4), while the other six parameters have negligible contributions to the runoff variations (Fig. S4). In equatorial regions, fdrai and fover are equally sensitive and account for 39 % and 36 % of the average runoff variations, respectively, as indicated by the size of circles in Fig. 5a, while ψs is the secondary sensitive parameter. For the arid regions, fdrai is the most sensitive parameter, and fover, fc, Ks, and ψs are secondary sensitive parameters with a similar value for the main Sobol' indices (Fig. 5b). Although other parameters show negligible main Sobol' indices for arid regions, they have shown sensitivities when interacting with each other, as denoted by the thickness of the lines between each pair of parameters in Fig. 5b. The complex joint sensitivity results in high nonlinearity in the runoff variations, representing a possible reason for the poor performance of PCE for arid regions. The most significant uncertain parameters for the warm temperate region are the same as those for the equatorial region (Fig. 5c). Snow and polar climates have similar sensitivity pattern, with fc and Ω being the two most important uncertain parameters (Fig. 5d and e). In colder regions, the contribution of surface water storage drainage, which is controlled by fc, is large to the total runoff because of prominent surface water areas (Pekel et al., 2016). The hydraulic conductivity and groundwater drainage when ice is present in the soil is controlled by Ω, which has a significant impact on runoff-generation process when the soil is partial or fully frozen. The surface water storage and ice impedance factor, which were not included in the version of the model used in previous study (Hou et al., 2012; Huang et al., 2013), are found to be the most sensitive parameters in cold regions. Besides the arid region, other regions show smaller sensitivities to parameter interactions.

5.4 Parameter dimensionality reduction

The ELM simulated runoff is significantly sensitive to three or fewer parameters with Sobol' index >0.05 for 81.3 % of the total grid cells (Fig. S5). Therefore, we sampled only the three most sensitive parameters in each grid cell in the Markov chain Monte Carlo (MCMC) process to perform parameter inference, as mentioned in Sect. 3.3. The posteriors of the three calibrated parameters (fdrai, fover, ψs) at an example grid cell (56.75 W, 11.25 S) are much more constrained than the priors after the MCMC simulation with the surrogate model (Fig. 6a, b, and c). The third parameter, ψs, has a relatively wider posterior than the first two parameters because its sensitivity is much smaller (e.g., Sobol' index = 0.08). The Gelman–Rubin R statistic of Gelman and Rubin (1992) computed with five MCMC chains (after the burn-in period) is 1.002, 1.004, and 1.003 for fdrai, fover, and ψs, respectively, suggesting that our MCMC simulation has converged (see the convergence curve in Fig. S6). ELM simulations with a large number of samples from parameter priors are needed to identify the optimal parameter that minimizes RMSE; for example, 10 000 surrogate simulations are used to find the parameters that yield RMSE=1 (Fig. 6d). In contrast, due to the reduced parameter dimensionality and narrowed range, much fewer samples (e.g., 100) are needed to find the better parameter values (e.g., corresponding to RMSE<1) when they are sampled from the parameter posteriors (Fig. 6d). The spatial distribution of the parameter values at 5 % and 95 % of the posteriors is shown in Figs. S7 and S8, respectively.

Figure 6Posteriors of (a) fdrai, (b) fover, and (c) ψs from parameter inference process at an example grid cell. Panel (d) shows the probability density function (PDF) of RMSE evaluated with surrogate models forced by 100 samples from parameter posteriors and 10 000 samples from parameter priors.


5.5 Optimal parameter values

The procedure described in Sect. 3.5 is used to find the optimal parameter values for the three most sensitive parameters for each grid cell. For the grid cells with RE > 0.15 for surrogate models, the optimal parameter value is determined from the training and validation simulations (e.g., 200 simulations with random parameter values from priors) that yield minimum RMSE. The optimal parameter values show clear regional patterns (Fig. 7). Specifically, the optimal fover tends to be lower than the default value for the equatorial and partial snow areas (Fig. 7a). The optimal fover is found to be higher than the default value for the arid areas, while it is around the default value on average for the warm temperate areas (Fig. 7a). For the same water table depth, lower fover leads to higher saturation fraction (Eq. 3), that in turn leads to larger surface runoff (Eq. 2). The calibrated fdrai is lower than the default value for both equatorial and arid regions (Fig. 7b). The optimal fdrai for warm temperate areas show different patterns, with higher values over eastern USA and Europe but lower values over southeastern China. The generation of subsurface runoff depends on fdrai (Eq. 8), with lower fdrai leading to larger subsurface runoff. ψs affects the runoff generation through its impact on soil water movement, such as the soil water flux being larger at saturation with higher ψs. As shown in Fig. 7c, higher ψs are needed to minimize the RMSE for all regions that show sensitivity to this parameter, except for some grid cells from polar area. Over the high latitudes of Northern Hemisphere, higher fc and lower Ω are found in the optimal parameters (Fig. 7d, e). The surface water storage can store more water at higher fc by reducing surface water runoff (Eqs. 4, 7), thus leading to a lower and delay peak runoff than the default values. Furthermore, the lower Ω values have fewer impacts of ice on hydraulic conductivity (Eq. 7.89 in Olson et al., 2016) and drainage (Eq. 9), leading to higher runoff for the winter seasons.

Figure 7Optimal values for the sensitive parameters. The default values for the parameters are defined at the midpoint of the color map. There are no certainty bounds for ψs from different grid cells because it is determined by the soil properties. Therefore, the values of ψs are scaled to [−1, 1] in panel (c) for each grid cell with the corresponding upper bound (ψs,max) and lower bound (ψs,min): 2ψs,max-ψs,minψs-ψs,max+ψs,minψs,max-ψs,min.

5.6 Evaluation of ELM with the optimal parameters

The ELM-simulated runoff with the optimal parameter values shows improved skills of capturing the spatiotemporal variation in monthly runoff at a global scale with higher NSE and KGE compared to the simulation with default parameter values (Fig. 8). Specifically, the median of NSE and KGE from all global grid cells increases from −0.88 and −0.05 to 0.06 and 0.31, respectively. Over the western USA coast, southeast and midwest of USA, western Europe, and equatorial areas, the performance of the calibrated ELM is better with NSE > 0.5 and KGE > 0.7. While the performance of other areas (e.g., western USA, Sahara and Arabian desert, central and eastern Asia, and partial high latitude regions) is improved compared to simulations with the default parameter values, the NSE and KGE still have negative values. The higher model errors in those regions cannot be resolved by calibration as (1) the simulation resolution is too coarse to resolve the topographic impacts (Chegwidden et al., 2020), (2) the snow-melting processes are not calibrated in this study, and the onset of snowmelt in ELM is poorly represented (Toure et al., 2018), and (3) the hydrology of arid areas is not well understood (Pilgrim et al., 1988). Except for the calibration period, ELM with the optimal parameters also shows an improved performance in runoff simulation for another period (2011–2013; Fig. S9).

Figure 8Evaluation of simulated monthly runoff for 1991–2010 at grid level with default and optimal parameters. Panels (a) and (b) show the NSE metrics between the GRUN runoff and simulated runoff with default and optimal parameters, respectively. Panel (c) shows the comparison of the probability density function (PDF) of NSE metrics from all the global grid cells. Panels (d), (e), and (f) illustrate the evolution with the KGE metric.

Figure 9Sensitivity of runoff to precipitation (β) estimated from the (a) GRUN runoff dataset, (b) ELM simulation with default parameter, and (c) ELM simulation with optimal parameter. The inserts show the scatterplots with density for cell-to-cell comparison of β between GRUN and ELM simulations.

Table 2ILAMB benchmark scores for latent heat flux, sensible heat flux, evapotranspiration, and terrestrial water storage anomaly with default and optimized parameters in ELM. A description of each score metric can be found at (last access: 23 June 2021).

Download Print Version | Download XLSX

Compared to the reference runoff (Fig. 9a), the ELM simulation with default parameter values tends to overestimate the sensitivity of runoff to precipitation (β in Eq. 24) for the equatorial and arid regions but underestimates β in the warm temperate regions, such as the eastern USA, China, and eastern coast of Australia (Fig. 9b). The simulation with optimal parameter values is able to more accurately estimate β than the simulation with default parameter values with improved spatial correlation coefficient from 0.22 to 0.56 and lower RMSE from 1.22 to 0.65 (Fig. 9c). However, some significant discrepancy of β still exists in the simulation with optimal parameter values (e.g., eastern China), implying that the sensitivity is not well constrained in ELM for certain regions even after model calibration.

According to the evaluation with the ILAMB package, ELM shows a similar performance in simulating other variables (e.g., latent heat flux, sensible heat flux, ET, and TWSA) with optimal parameter values compared to the use of the default parameter values (Table 2). However, both the default and optimal simulations fail to capture the spatial variation in TWSA with a spatial distribution score of less than 0.05. This is because the coarse resolution (e.g., several hundred kilometers) of GRACE product (Seyoum et al., 2019) cannot resolve the spatial variability in TWSA for our model resolution.

5.7 Parametric uncertainty

The parameter priors listed in Table 1 result in significant uncertainties in the total runoff, with the global average annual runoff for 1991–2010 varying from 30 999–76 496 km3 yr−1 (Fig. 10a). After parameter inference, the uncertainty of the runoff constructed using simulations with parameter posteriors is constrained to 35 389–49 741 km3 yr−1. The constrained annual runoff uncertainty captures the reference runoff (38 443 km3 yr−1) and is consistent with previous global runoff studies (Schellekens et al., 2017; Rodell et al., 2015; Clark et al., 2015; Haddeland et al., 2011). The simulation with the optimal parameter values yields an averaged global annual runoff of 42 156 km3 yr−1, overestimating the reference runoff by 9.6 %. The overestimation is mainly from the Amazon, Asia, and eastern Europe (Fig. 10b), and Ghiggi et al. (2019a) reported a similar spatial bias pattern between global hydrological model simulations in ISIMP2a. The simulation with the default parameter values shows smaller biases in terms of the annual runoff magnitude as compared to the reference runoff data, with an overestimation of 5.3 % on average. However, the smaller biases of annual runoff with the default parameters are because of the canceling out of the monthly errors to some extent. For example, the default parameters tend to overestimate the runoff during the wet periods but underestimate the runoff during the dry periods in the Amazon basin (Fig. S10a). While the default simulation shows higher RMSE and lower NSE at a monthly scale, it yields smaller biases at an annual scale than the optimal parameter (Fig. S10b). Therefore, the calibrated simulation shows better performance in capturing the spatiotemporal variability (higher NSE and KGE in Fig. 8b and e), but it does not lead to a reduced bias at an annual scale. We further acknowledge that 200 simulations with 11 random parameters may not be sufficient to capture the full variations in simulated runoff.

Figure 10(a) Annual global runoff from default ELM simulation, optimal ELM simulation, and GRUN runoff dataset for the simulation period (1991–2010). The red and blue shade areas represent the uncertainties constructed from the simulations with the parameter sampled on priors and posteriors, respectively. Panel (b) shows the absolute difference in annual average runoff between ELM simulation with optimal parameter and GRUN runoff data.

Figure 11Annual runoff at basin scale from the default ELM simulation, optimal ELM simulation, and GRUN runoff dataset for the simulation period (1991–2010). The red and blue shaded areas represent the uncertainties constructed from the simulations with parameter sampled on priors and posteriors, respectively.


Figure 12(a) Annual global runoff from 13 global hydrological models participated in ISIMIP2a and ELM simulated runoff uncertainty constructed using simulations with parameter posteriors. (b) Sen's slope for the global annual runoff for the GRUN runoff dataset and simulations. The violin plots (Hintze and Nelson, 1998) are generated with Sen's slope of ELM simulations with parameter posteriors. The white point is the median value, and the gray line represents range of the 25 %–75 % percentile. The MATLAB function of Bechtold (2016) was used to create the violin plot. The cross signs are the Sen's slopes estimated from the ISIMIP2a model simulations.


The runoff uncertainties associated with parameters are constrained significantly with the parameter posteriors at basin scale as well (Fig. 11). Noticeably, the posterior uncertainty of annual runoff is larger over the equatorial regions (e.g., Paraná, the Amazon, Godavari River, and Congo River) than other regions. The simulation with optimal parameter values yields larger overestimation of total runoff compared to the simulation using the default parameter values for the selected basins, except for the Mississippi River, Godavari River, and Loire basin (Table S1 in the Supplement). The reason for the overestimations is that the optimal parameters are determined by maximizing NSE at a monthly scale, which cannot ensure that the annual runoff is appropriately constrained. There exist significant discrepancies between simulations and GRUN for basins located at high latitudes (e.g., Mackenzie, Volga, Ob, Yenisei, and Lena rivers) even when the posterior uncertainties are considered (Fig. 11), highlighting the importance of snow-melting processes in snow-dominated regions. However, the large difference between ELM and the reference runoff in Yangtze River basin may be caused by the bias of the reference runoff since a previous study reported annual discharge to be around 900 km3 yr−1 (Yang et al., 2015).

Despite being constrained by the parameter inference process, the parametric uncertainty of ELM-simulated annual runoff is considerable. Specifically, the posterior uncertainty of global runoff simulated by ELM is comparable to that of the multimodel ensemble constructed with the 13 global hydrological models from ISIMIP2a (Fig. 12a). The parametric uncertainty affects not only the magnitude of global runoff but also the trend for the simulation period during which a rapid increase in temperature has occurred (Fig. S11a). The Sen's slope (Sen, 1968) for the reference runoff data is found to be 54.7 km3 yr−1, but this increasing trend is not significant according to the Mann–Kendall test (Fig. 12b). Other studies also reported no significant changes in the global runoff with observed streamflow data (Alkama et al., 2013; Dai et al., 2009; Milliman et al., 2008; Alkama et al., 2011). However, the default and calibrated ELM simulations yielded the Sen's slope to be 188.9 and 133.8 km3 yr−1, respectively. Although the Sen's slope is reduced with the optimal parameters, the increasing trend remains significant. Likewise, all the other global hydrological models of ISIMIP2a exhibit significant increasing trend in the annual runoff, with the Sen's slope varying from 93 to 272 km3 yr−1 (Fig. 12b). Considering that the GRUN dataset and all model simulations are forced by the same atmosphere forcing (i.e., GSWP3), the differences in the global runoff trends can be attributed to the model structural/parametric uncertainty. We note that there exists a significant trend in GSWP3 precipitation at a global scale, with an increase of 246.1 km3 yr−1 during the simulation period (Fig. S11b). But it remains unclear how the runoff responds to the increase in precipitation at a global scale because the concurrent increased temperature (Fig. S11a) leads to more ET, which can potentially balance the increased precipitation to some extent. The inconsistency of the global runoff trend between the model simulations and observation-based data can be caused by uncertainties of different sources. For example, the accuracy of GRUN is limited by the coverage of the streamflow gauges, as over half of the global areas are ungauged (Alkama et al., 2013). The model parametric uncertainty is significant, as ELM simulations with parameters posteriors show a wide range of annual runoff trend, from no trend to a significant increasing trend (Fig. 12b). This highlights the necessity of including parametric uncertainty in future runoff projections, since runoff trend is not well constrained even if the model performance in the control period is improved.

6 Limitations

We note there can be other better choices of priors for the parameter whose range covers several orders of magnitude. For example, sampling qdrai,max on a uniform distribution (e.g., U[10-610-1]) results in fewer prior samples with values less than 10−2. A log-transformed uniform distribution can be a good alternative to guarantee enough samples over each range of the desired values. Using a log-transformed uniform distribution for qdrai,max prior does not impact our results significantly because the simulated runoff is not sensitive to qdrai,max (Fig. S4), and more samples over smaller values of qdrai,max will not lead to more variation in runoff. However, careful selection of prior distributions can be important for sensitive parameters in a future application of surrogate-assisted calibration framework.

By using RMSE instead of the simulated runoff as the QoI, only one PCE-based surrogate model is constructed for each grid cell to represent the ELM performance of simulating a monthly runoff time series. Although selecting RMSE as QoI significantly reduces the computational burden of the surrogates' construction and parameter inference, the corresponding surrogates cannot be used to estimate posterior uncertainty of physical model outputs. For example, we still need to run ELM simulations after the parameter inference to construct the runoff posterior uncertainty (Sect. 3.5). Additionally, the objective of this study is to minimize RMSE at a monthly scale; hence, an improved model performance at an annual scale is not guaranteed. Including both monthly and annual performance metrics in objective function may balance the performance at different temporal scales. However, only one objective is accepted in the Uncertainty Quantification framework used in this study.

We further acknowledge the poor performance of PCE-based surrogate model in capturing the ELM-simulated runoff over extremely arid regions (Figs. 2 and 3). This can be attributed to the limitation of polynomial-based surrogate models in capturing highly non-smooth or strongly nonlinear relationships. Machine learning algorithms (Dagon et al., 2020) and deep neural networks (Tsai et al., 2021) are alternative techniques for surrogate modeling, which are better at capturing non-smooth or nonlinear functions, but future research is needed to investigate the capability.

The calibrated parameters have a significant impact on baseflow index, which is the ratio between subsurface runoff and total runoff. For example, the baseflow of Amazon basin with default and optimal parameters are 0.53 and 0.70, respectively (Fig. S12). Mortatti et al. (1997) reported the baseflow index of the Amazon basin to be 0.70 with the isotopic tracer method, which is consistent with our simulation with optimal parameter values. However, accurate separation of surface runoff and subsurface runoff over other regions is not guaranteed, though the total runoff has been calibrated to match with the reference runoff dataset. The global baseflow index dataset of Beck et al. (2013) that derived from observed streamflow provides us the benchmark for evaluating the baseflow index simulated in ELM. Constraining the baseflow index during the ELM validation and calibration study will be investigated in the future.

We further note that uncertainty in the reference runoff data of GRUN used in the parameter inference is inevitable. While Ghiggi et al. (2019a) found that GRUN outperformed other global hydrological models and multimodel ensemble, lower accuracy over mountainous regions due to the coarse resolution has been reported. Additionally, the irrigation and water management impacts on streamflow was included for some regions during the training process of GRUN (Ghiggi et al., 2019a), but irrigation and water management are not active in the ELM configuration used in this study. This inconsistency may explain the significant overestimation of ELM simulated runoff compared to GRUN for certain regions, for example, the Yangtze River basin (Fig. 10).

Another limitation of this study is that the snow-melting processes were not calibrated. A poor representation of the snow-melting process can result in the poor skill of runoff generation in snow-dominant regions, where snowmelt is an important contribution to runoff (Jenicek and Ledvinka, 2020). This could explain the low performance (i.e., negative NSE) of calibrated ELM over the Northern Hemisphere's high latitudes and mountainous regions. However, including parameterizations of snow processes such as snow albedo, solar absorption, and snow aging (Lawrence et al., 2011) can introduce more uncertain parameters, which will make calibration more challenging (Huang et al., 2013). In the future, a dedicated calibration on the snow-melting process is needed to improve the runoff generation in snow-dominated regions.

7 Conclusion

In this study, we applied an UQ framework to calibrate the runoff-generation-relevant parameters in the ELM v1 using an observation-based runoff dataset as the benchmark. The parameters with higher sensitivity are identified through the sensitivity analysis with the PCE-based surrogate models. While different sensitivity patterns are found for different regions, 81.3 % of the global cells show significant sensitivities to 3 or fewer parameters of the 11 selected parameters. The results of our sensitivity analysis are consistent with those of previous studies over the North American continent (Huang et al., 2013; Sun et al., 2013), with runoff showing the largest sensitivity to the subsurface runoff parameter. The Bayesian posterior distribution of the highly sensitive parameters at each grid cell is estimated with MCMC simulations, using the surrogate model to construct the likelihood function. Additional ELM simulations with parameter samples from the posterior run to estimate the optimal parameter values and construct the parametric uncertainty for the simulated runoff. While the optimal parameter values improve the model performance of runoff significantly, the parametric uncertainty is comparable to the uncertainty in a multimodel ensemble in ISIMIP2a, which is appreciable. Furthermore, the parameters are found to impact the annual global runoff trend for our simulation period. Specifically, the simulations with parameter posteriors show a wide range of the annual runoff trends at a global scale, from no trend to a significant increasing trend. In summary, parameter calibration is necessary to improve model performance, and parametric uncertainties should be considered for comprehensive analysis of runoff and its projections.

Code and data availability

The current version of ELM is available from E3SM project (, last access: 24 June 2022). The UQTk code and documentation are available from (last access: 24 June 2022). The exact version of ELM, exact version of UQTk source code, and scripts to produce the plots in this study are archived on Zenodo (, Xu, 2022a). MATLAB version R2019b update 4 was used to run the processing and plotting scripts. ILAMB version 2 was used in this study, and the package can be accessed at (Collier et al., 2018). The domain file and surface data file that used to run ELMv1, and processed ISIMP2a runoff data are archived on Zenodo (, Xu, 2022b). The GRUN runoff dataset was downloaded from (Ghiggi et al., 2019b).


The supplement related to this article is available online at:

Author contributions

DX and GB designed the study. DX ran the simulations, performed the analysis, visualized the results, and prepared the first draft of the paper. GB mentored DX through this study. KS helped with the uncertainty quantification methodology. CL and LRL investigated the results. All authors contributed to the discussion and review of the results and writing the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


This research was conducted at the Pacific Northwest National Laboratory, operated by Battelle for the U.S. Department of Energy (contract no. DE-AC05-76RLO1830). Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration (contract no. DE-NA-0003525). We thank the modelling groups listed in Fig. 12, and the Inter-Sectoral Impact Model Intercomparison Project (ISIMP) for making the ISIMP2a multi-model dataset available. We also thank two anonymous reviewers for their constructive comments that improved the manuscript.

Financial support

This work was supported by the Earth System Model Development program area of the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, as part of the multi-program, collaborative integrated Coastal Modeling (ICoM) project (grant no. KP1703110/75415). Chang Liao was supported through Next-Generation Ecosystem Experiments-Tropics, funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, at Pacific Northwest National Laboratory (grant no. 830403000/71073).

Review statement

This paper was edited by Dan Lu and reviewed by two anonymous referees.


Alkama, R., Decharme, B., Douville, H., and Ribes, A.: Trends in Global and Basin-Scale Runoff over the Late Twentieth Century: Methodological Issues and Sources of Uncertainty, J. Climate, 24, 3000–3014,, 2011. 

Alkama, R., Marchand, L., Ribes, A., and Decharme, B.: Detection of global runoff changes: results from observations and CMIP5 experiments, Hydrol. Earth Syst. Sci., 17, 2967–2979,, 2013. 

Andreadis, K. M., Schumann, G. J.-P., and Pavelsky, T.: A simple global river bankfull width and depth database, Water Resour. Res., 49, 7164–7168,, 2013. 

Bechtold, B.: Violin Plots for Matlab, Github Project, Zenodo [code],, 2016. 

Beck, H. E., van Dijk, A. I. J. M., Miralles, D. G., de Jeu, R. A. M., Bruijnzeel, L. A., McVicar, T. R., and Schellekens, J.: Global patterns in base flow index and recession based on streamflow observations from 3394 catchments, Water Resour. Res., 49, 7843–7863,, 2013. 

Beck, H. E., van Dijk, A. I. J. M., de Roo, A., Dutra, E., Fink, G., Orth, R., and Schellekens, J.: Global evaluation of runoff from 10 state-of-the-art hydrological models, Hydrol. Earth Syst. Sci., 21, 2881–2903,, 2017. 

Bisht, G., Riley, W. J., Hammond, G. E., and Lorenzetti, D. M.: Development and evaluation of a variably saturated flow model in the global E3SM Land Model (ELM) version 1.0, Geosci. Model Dev., 11, 4085–4102,, 2018. 

Bosmans, J. H. C., van Beek, L. P. H., Sutanudjaja, E. H., and Bierkens, M. F. P.: Hydrological impacts of global land cover change and human water use, Hydrol. Earth Syst. Sci., 21, 5603–5626,, 2017. 

Breiman, L.: Random Forests, Mach. Learn., 45, 5–32,, 2001. 

Brunke, M. A., Broxton, P., Pelletier, J., Gochis, D., Hazenberg, P., Lawrence, D. M., Leung, L. R., Niu, G.-Y., Troch, P. A., and Zeng, X.: Implementing and evaluating variable soil thickness in the Community Land Model, version 4.5 (CLM4. 5), J. Climate, 29, 3441–3461, 2016. 

Chegwidden, O. S., Rupp, D. E., and Nijssen, B.: Climate change alters flood magnitudes and mechanisms in climatically-diverse headwaters across the northwestern United States, Environ. Res. Lett., 15, 094048,, 2020. 

Clark, E. A., Sheffield, J., van Vliet, M. T., Nijssen, B., and Lettenmaier, D. P.: Continental runoff into the oceans (1950–2008), J. Hydrometeorol., 16, 1502–1520, 2015. 

Collier, N., Hoffman, F. M., Lawrence, D. M., Keppel-Aleks, G., Koven, C. D., Riley, W. J., Mu, M., and Randerson, J. T.: The International Land Model Benchmarking (ILAMB) System: Design, Theory, and Implementation, J. Adv. Model. Earth Sy., 10, 2731–2754,, 2018 (code available at: 

Cosby, B. J., Hornberger, G. M., Clapp, R. B., and Ginn, T. R.: A Statistical Exploration of the Relationships of Soil Moisture Characteristics to the Physical Properties of Soils, Water Resour. Res., 20, 682–690,, 1984. 

Dagon, K., Sanderson, B. M., Fisher, R. A., and Lawrence, D. M.: A machine learning approach to emulation and biophysical parameter estimation with the Community Land Model, version 5, Adv. Stat. Clim. Meteorol. Oceanogr., 6, 223–244,, 2020. 

Dai, A.: Increasing drought under global warming in observations and models, Nat. Clim. Change, 3, 52–58,, 2013. 

Dai, A., Qian, T., Trenberth, K. E., and Milliman, J. D.: Changes in Continental Freshwater Discharge from 1948 to 2004, J. Climate, 22, 2773–2792,, 2009. 

Debusschere, B., Sargsyan, K., Safta, C., and Chowdhary, K.: Uncertainty Quantification Toolkit (UQTk), in: Handbook of Uncertainty Quantification, edited by: Ghanem, R., Higdon, D., and Owhadi, H., Springer International Publishing, Cham, 1–21,, 2016. 

Debusschere, B. J., Najm, H. N., Pébay, P. P., Knio, O. M., Ghanem, R. G., and Maître, O. P. L.: Numerical Challenges in the Use of Polynomial Chaos Representations for Stochastic Processes, SIAM J. Sci. Comput., 26, 698–719,, 2004. 

Decharme, B., Delire, C., Minvielle, M., Colin, J., Vergnes, J.-P., Alias, A., Saint-Martin, D., Séférian, R., Sénési, S., and Voldoire, A.: Recent Changes in the ISBA-CTRIP Land Surface System for Use in the CNRM-CM6 Climate Model and in Global Off-Line Hydrological Applications, J. Adv. Model. Earth Sy., 11, 1207–1252,, 2019. 

Do, H. X., Gudmundsson, L., Leonard, M., and Westra, S.: The Global Streamflow Indices and Metadata Archive (GSIM) – Part 1: The production of a daily streamflow archive and metadata, Earth Syst. Sci. Data, 10, 765–785,, 2018. 

Doocy, S., Daniels, A., Murray, S., and Kirsch, T. D.: The human impact of floods: a historical review of events 1980–2009 and systematic literature review, PLoS Curr., 5,, 2013. 

Drewniak, B. A.: Simulating Dynamic Roots in the Energy Exascale Earth System Land Model, J. Adv. Model. Earth Sy., 11, 338–359,, 2019. 

Dwelle, M. C., Kim, J., Sargsyan, K., and Ivanov, V. Y.: Streamflow, stomata, and soil pits: Sources of inference for complex models with fast, robust uncertainty quantification, Adv. Water Resour., 125, 13–31,, 2019. 

Ekici, A., Lee, H., Lawrence, D. M., Swenson, S. C., and Prigent, C.: Ground subsidence effects on simulating dynamic high-latitude surface inundation under permafrost thaw using CLM5, Geosci. Model Dev., 12, 5291–5300,, 2019. 

Fischer, E. M. and Knutti, R.: Observed heavy precipitation increase confirms theory and early models, Nat. Clim. Change, 6, 986–991,, 2016. 

Gelman, A. and Rubin, D. B.: Inference from Iterative Simulation Using Multiple Sequences, Stat. Sci., 7, 457–472,, 1992. 

Ghiggi, G., Humphrey, V., Seneviratne, S. I., and Gudmundsson, L.: GRUN: an observation-based global gridded runoff dataset from 1902 to 2014, Earth Syst. Sci. Data, 11, 1655–1674,, 2019a. 

Ghiggi, G., Gudmundsson, L., and Humphrey, V.: G-RUN: Global Runoff Reconstruction, figshare [data set],, 2019b. 

Giuntoli, I., Villarini, G., Prudhomme, C., and Hannah, D. M.: Uncertainties in projected runoff over the conterminous United States, Climatic Change, 150, 149–162,, 2018. 

Golaz, J.-C., Caldwell, P. M., Van Roekel, L. P., Petersen, M. R., Tang, Q., Wolfe, J. D., Abeshu, G., Anantharaj, V., Asay-Davis, X. S., Bader, D. C., Baldwin, S. A., Bisht, G., Bogenschutz, P. A., Branstetter, M., Brunke, M. A., Brus, S. R., Burrows, S. M., Cameron-Smith, P. J., Donahue, A. S., Deakin, M., Easter, R. C., Evans, K. J., Feng, Y., Flanner, M., Foucar, J. G., Fyke, J. G., Griffin, B. M., Hannay, C., Harrop, B. E., Hoffman, M. J., Hunke, E. C., Jacob, R. L., Jacobsen, D. W., Jeffery, N., Jones, P. W., Keen, N. D., Klein, S. A., Larson, V. E., Leung, L. R., Li, H.-Y., Lin, W., Lipscomb, W. H., Ma, P.-L., Mahajan, S., Maltrud, M. E., Mametjanov, A., McClean, J. L., McCoy, R. B., Neale, R. B., Price, S. F., Qian, Y., Rasch, P. J., Reeves Eyre, J. E. J., Riley, W. J., Ringler, T. D., Roberts, A. F., Roesler, E. L., Salinger, A. G., Shaheen, Z., Shi, X., Singh, B., Tang, J., Taylor, M. A., Thornton, P. E., Turner, A. K., Veneziani, M., Wan, H., Wang, H., Wang, S., Williams, D. N., Wolfram, P. J., Worley, P. H., Xie, S., Yang, Y., Yoon, J.-H., Zelinka, M. D., Zender, C. S., Zeng, X., Zhang, C., Zhang, K., Zhang, Y., Zheng, X., Zhou, T., and Zhu, Q.: The DOE E3SM Coupled Model Version 1: Overview and Evaluation at Standard Resolution, J. Adv. Model. Earth Sy., 11, 2089–2129,, 2019. 

Gong, W., Duan, Q., Li, J., Wang, C., Di, Z., Dai, Y., Ye, A., and Miao, C.: Multi-objective parameter optimization of common land model using adaptive surrogate modeling, Hydrol. Earth Syst. Sci., 19, 2409–2425,, 2015. 

Gosling, S., Müller Schmied, H., Betts, R. A., Chang, J., Ciais, P., Dankers, R., Döll, P., Eisner, S., Flörke, M., Gerten, D., Grillakis, M., Hanasaki, N., Hagemann, S., Huang, M., Huang, Z., Jerez, S., Kim, H., Koutroulis, A., Leng, G., Liu, X., Masaki, Y., Montavez, P., Morfopoulos, C., Oki, T., Papadimitriou, L., Pokhrel, Y., Portmann, F. T., Orth, R., Ostberg, S., Satoh, Y., Seneviratne, S., Sommer, P., Stacke, T., Tang, Q., Tsanis, I., Wada, Y., Zhou, T., Büchner, M., Schewe, J., and Zhao, F.: ISIMIP2a Simulation Data from Water (global) Sector (V. 1.1), GFZ Data Services [data set],, 2019. 

Gosling, S. N. and Arnell, N. W.: Simulating current global river runoff with a global hydrological model: model revisions, validation, and sensitivity analysis, Hydrol. Process., 25, 1129–1145,, 2011. 

Gudmundsson, L., Do, H. X., Leonard, M., and Westra, S.: The Global Streamflow Indices and Metadata Archive (GSIM) – Part 2: Quality control, time-series indices and homogeneity assessment, Earth Syst. Sci. Data, 10, 787–804,, 2018. 

Gupta, H. V., Sorooshian, S., and Yapo, P. O.: Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information, Water Resour. Res., 34, 751–763,, 1998. 

Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91,, 2009. 

Haario, H., Saksman, E., and Tamminen, J.: An adaptive Metropolis algorithm, Bernoulli, 7, 223–242, 2001. 

Haarsma, R. J., Roberts, M. J., Vidale, P. L., Senior, C. A., Bellucci, A., Bao, Q., Chang, P., Corti, S., Fučkar, N. S., Guemas, V., von Hardenberg, J., Hazeleger, W., Kodama, C., Koenigk, T., Leung, L. R., Lu, J., Luo, J.-J., Mao, J., Mizielinski, M. S., Mizuta, R., Nobre, P., Satoh, M., Scoccimarro, E., Semmler, T., Small, J., and von Storch, J.-S.: High Resolution Model Intercomparison Project (HighResMIP v1.0) for CMIP6, Geosci. Model Dev., 9, 4185–4208,, 2016. 

Haddeland, I., Clark, D. B., Franssen, W., Ludwig, F., Voß, F., Arnell, N. W., Bertrand, N., Best, M., Folwell, S., and Gerten, D.: Multimodel estimate of the global terrestrial water balance: setup and first results, J. Hydrometeorol., 12, 869–884, 2011. 

Hall, J. W., Grey, D., Garrick, D., Fung, F., Brown, C., Dadson, S. J., and Sadoff, C. W.: Coping with the curse of freshwater variability, Science, 346, 429–430,, 2014. 

Hintze, J. and Nelson, R.: Violin plots: A box plot-density trace synergism, Am. Stat., 52, 181–184, 1998. 

Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Yamazaki, D., Watanabe, S., Kim, H., and Kanae, S.: Global flood risk under climate change, Nat. Clim. Change, 3, 816–821,, 2013. 

Hou, Z., Huang, M., Leung, L. R., Lin, G., and Ricciuto, D. M.: Sensitivity of surface flux simulations to hydrologic parameters based on an uncertainty quantification framework applied to the Community Land Model, J. Geophys. Res.-Atmos., 117, D15108,, 2012. 

Huang, M., Hou, Z., Leung, L. R., Ke, Y., Liu, Y., Fang, Z., and Sun, Y.: Uncertainty Analysis of Runoff Simulations and Parameter Identifiability in the Community Land Model: Evidence from MOPEX Basins, J. Hydrometeorol., 14, 1754–1772,, 2013. 

Huang, M., Ray, J., Hou, Z., Ren, H., Liu, Y., and Swiler, L.: On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites, J. Geophys. Res.-Atmos., 121, 7548–7563,, 2016. 

Ivanov, V. Y., Xu, D., Dwelle, M. C., Sargsyan, K., Wright, D. B., Katopodes, N., Kim, J., Tran, V. N., Warnock, A., Fatichi, S., Burlando, P., Caporali, E., Restrepo, P., Sanders, B. F., Chaney, M. M., Nunes, A. M. B., Nardi, F., Vivoni, E. R., Istanbulluoglu, E., Bisht, G., and Bras, R. L.: Breaking Down the Computational Barriers to Real-Time Urban Flood Forecasting, Geophys. Res. Lett., 48, e2021GL093585,, 2021. 

Jenicek, M. and Ledvinka, O.: Importance of snowmelt contribution to seasonal runoff and summer low flows in Czechia, Hydrol. Earth Syst. Sci., 24, 3475–3491,, 2020. 

Jung, M., Koirala, S., Weber, U., Ichii, K., Gans, F., Camps-Valls, G., Papale, D., Schwalm, C., Tramontana, G., and Reichstein, M.: The FLUXCOM ensemble of global land-atmosphere energy fluxes, Scientific Data, 6, 1–14, 2019. 

Kim, H., Yeh, P. J. F., Oki, T., and Kanae, S.: Role of rivers in the seasonal variations of terrestrial water storage over global basins, Geophys. Res. Lett., 36, L17402,, 2009. 

Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331,, 2019. 

Knutti, R. and Sedláèek, J.: Robustness and uncertainties in the new CMIP5 climate model projections, Nat. Clim. Change, 3, 369–373,, 2012. 

Knutti, R., Furrer, R., Tebaldi, C., Cermak, J., and Meehl, G. A.: Challenges in Combining Projections from Multiple Climate Models, J. Climate, 23, 2739–2758, 2010. 

Krysanova, V., Zaherpour, J., Didovets, I., Gosling, S. N., Gerten, D., Hanasaki, N., Müller Schmied, H., Pokhrel, Y., Satoh, Y., Tang, Q., and Wada, Y.: How evaluation of global hydrological models can help to improve credibility of river discharge projections under climate change, Climatic Change, 163, 1353–1377,, 2020. 

Laloy, E. and Jacques, D.: Emulation of CPU-demanding reactive transport models: a comparison of Gaussian processes, polynomial chaos expansion, and deep neural networks, Comput. Geosci., 23, 1193–1215, 2019. 

Lawrence, D. M., Oleson, K. W., Flanner, M. G., Thornton, P. E., Swenson, S. C., Lawrence, P. J., Zeng, X., Yang, Z.-L., Levis, S., Sakaguchi, K., Bonan, G. B., and Slater, A. G.: Parameterization improvements and functional and structural advances in Version 4 of the Community Land Model, J. Adv. Model. Earth Sy., 3, M03001,, 2011. 

Lehner, F., Wood, A. W., Vano, J. A., Lawrence, D. M., Clark, M. P., and Mankin, J. S.: The potential to reduce uncertainty in regional runoff projections from climate models, Nat. Clim. Change, 9, 926–933,, 2019. 

Lehner, F., Deser, C., Maher, N., Marotzke, J., Fischer, E. M., Brunner, L., Knutti, R., and Hawkins, E.: Partitioning climate projection uncertainty with multiple large ensembles and CMIP5/6, Earth Syst. Dynam., 11, 491–508,, 2020. 

Leung, L. R., Bader, D. C., Taylor, M. A., and McCoy, R. B.: An Introduction to the E3SM Special Collection: Goals, Science Drivers, Development, and Analysis, J. Adv. Model. Earth Sy., 12, e2019MS001821,, 2020. 

Li, H.-Y., Leung, L. R., Getirana, A., Huang, M., Wu, H., Xu, Y., Guo, J., and Voisin, N.: Evaluating Global Streamflow Simulations by a Physically Based Routing Model Coupled with the Community Land Model, J. Hydrometeorol., 16, 948–971,, 2015. 

Liao, C., Zhou, T., Xu, D., Barnes, R., Bisht, G., Li, H.-Y., Tan, Z., Tesfa, T., Duan, Z., Engwirda, D., and Leung, L. R.: Advances in hexagon mesh-based flow direction modeling, Adv. Water Resour., 160, 104099,, 2022. 

Lin, G. and Karniadakis, G. E.: Sensitivity analysis and stochastic simulations of non-equilibrium plasma flow, Int. J. Numer. Meth. Eng., 80, 738–766,, 2009. 

Lu, D., Ricciuto, D., Stoyanov, M., and Gu, L.: Calibration of the E3SM Land Model Using Surrogate-Based Global Optimization, J. Adv. Model. Earth Sy., 10, 1337–1356,, 2018. 

Martens, B., Miralles, D. G., Lievens, H., van der Schalie, R., de Jeu, R. A. M., Fernández-Prieto, D., Beck, H. E., Dorigo, W. A., and Verhoest, N. E. C.: GLEAM v3: satellite-based land evaporation and root-zone soil moisture, Geosci. Model Dev., 10, 1903–1925,, 2017. 

Milliman, J. D., Farnsworth, K. L., Jones, P. D., Xu, K. H., and Smith, L. C.: Climatic and anthropogenic factors affecting river discharge to the global ocean, 1951–2000, Global Planet. Change, 62, 187–194,, 2008. 

Milly, P. C. D., Wetherald, R. T., Dunne, K. A., and Delworth, T. L.: Increasing risk of great floods in a changing climate, Nature, 415, 514–517,, 2002. 

Milly, P. C. D., Betancourt, J., Falkenmark, M., Hirsch, R. M., Kundzewicz, Z. W., Lettenmaier, D. P., and Stouffer, R. J.: Stationarity Is Dead: Whither Water Management?, Science, 319, 573–574,, 2008. 

Mishra, A. K. and Singh, V. P.: A review of drought concepts, J. Hydrol., 391, 202–216,, 2010. 

Mortatti, J., Moraes, J., Rodrigues, J., Victoria, R., and Martinelli, L.: Hydrograph separation of the Amazon River using 18O as an isotopic tracer, Sci. Agr., 54, 167–173, 1997. 

Müller, J., Paudel, R., Shoemaker, C. A., Woodbury, J., Wang, Y., and Mahowald, N.: CH4 parameter estimation in CLM4.5bgc using surrogate global optimization, Geosci. Model Dev., 8, 3285–3310,, 2015. 

Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, J. Hydrol., 10, 282–290,, 1970. 

Niu, G.-Y., Yang, Z.-L., Dickinson, R. E., and Gulden, L. E.: A simple TOPMODEL-based runoff parameterization (SIMTOP) for use in global climate models, J. Geophys. Res.-Atmos., 110, D21106,, 2005. 

Oleson, K., Lawrence, D. M., Bonan, G. B., Drewniak, B., Huang, M., Koven, C. D., Levis, S., Li, F., Riley, W. J., Subin, Z. M., Swenson, S., Thornton, P. E., Bozbiyik, A., Fisher, R., Heald, C. L., Kluzek, E., Lamarque, J.-F., Lawrence, P. J., Leung, L. R., Lipscomb, W., Muszala, S. P., Ricciuto, D. M., Sacks, W. J., Sun, Y., Tang, J., and Yang, Z.-L.: Technical description of version 4.5 of the Community Land Model (CLM) (No. NCAR/TN-503+STR), UCAR,, 2013. 

Olson, R., Fan, Y. A., and Evans, J. P.: A simple method for Bayesian model averaging of regional climate model projections: Application to southeast Australian temperatures, Geophys. Res. Lett., 43, 7661–7669,, 2016. 

Pekel, J.-F., Cottam, A., Gorelick, N., and Belward, A. S.: High-resolution mapping of global surface water and its long-term changes, Nature, 540, 418–422,, 2016. 

Pilgrim, D. H., Chapman, T. G., and Doran, D. G.: Problems of rainfall-runoff modelling in arid and semiarid regions, Hydrolog. Sci. J., 33, 379–400,, 1988. 

Ray, J., Hou, Z., Huang, M., Sargsyan, K., and Swiler, L.: Bayesian Calibration of the Community Land Model Using Surrogates, SIAM/ASA Journal on Uncertainty Quantification, 3, 199–233,, 2015. 

Razavi, S., Tolson, B. A., and Burn, D. H.: Review of surrogate modeling in water resources, Water Resour. Res., 48, W07401,, 2012. 

Ricciuto, D., Sargsyan, K., and Thornton, P.: The Impact of Parametric Uncertainties on Biogeochemistry in the E3SM Land Model, J. Adv. Model. Earth Sy., 10, 297–319,, 2018. 

Rodell, M., Beaudoing, H. K., L'Ecuyer, T., Olson, W. S., Famiglietti, J. S., Houser, P. R., Adler, R., Bosilovich, M. G., Clayson, C. A., and Chambers, D.: The observed state of the water cycle in the early twenty-first century, J. Climate, 28, 8289–8318, 2015. 

Sargsyan, K., Safta, C., Najm, H. N., Debusschere, B. J., Ricciuto, D., and Thornton, P.: Dimensionality Reduction for Complex Models Via Bayesian Compressive Sensing, Int. J. Uncertain. Quan., 4, 63–93, 2014. 

Sargsyan, K., Najm, H. N., and Ghanem, R.: On the Statistical Calibration of Physical Models, Int. J. Chem. Kinet., 47, 246–276,, 2015. 

Schellekens, J., Dutra, E., Martínez-de la Torre, A., Balsamo, G., van Dijk, A., Sperna Weiland, F., Minvielle, M., Calvet, J.-C., Decharme, B., Eisner, S., Fink, G., Flörke, M., Peßenteiner, S., van Beek, R., Polcher, J., Beck, H., Orth, R., Calton, B., Burke, S., Dorigo, W., and Weedon, G. P.: A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset, Earth Syst. Sci. Data, 9, 389–413,, 2017. 

Schewe, J., Heinke, J., Gerten, D., Haddeland, I., Arnell, N. W., Clark, D. B., Dankers, R., Eisner, S., Fekete, B. M., Colón-González, F. J., Gosling, S. N., Kim, H., Liu, X., Masaki, Y., Portmann, F. T., Satoh, Y., Stacke, T., Tang, Q., Wada, Y., Wisser, D., Albrecht, T., Frieler, K., Piontek, F., Warszawski, L., and Kabat, P.: Multimodel assessment of water scarcity under climate change, P. Natl. Acad. Sci. USA, 111, 3245–3250,, 2014. 

Sen, P. K.: Estimates of the Regression Coefficient Based on Kendall's Tau, J. Am. Stat. Assoc., 63, 1379–1389,, 1968. 

Seyoum, W. M., Kwon, D., and Milewski, A. M.: Downscaling GRACE TWSA Data into High-Resolution Groundwater Level Anomaly Using Machine Learning-Based Models in a Glacial Aquifer System, Remote Sens., 11, 824,, 2019. 

Sheng, M., Lei, H., Jiao, Y., and Yang, D.: Evaluation of the Runoff and River Routing Schemes in the Community Land Model of the Yellow River Basin, J. Adv. Model. Earth Sy., 9, 2993–3018,, 2017. 

Sobol', I. M.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates, Math. Comput. Simulat., 55, 271–280,, 2001. 

Sun, Y., Hou, Z., Huang, M., Tian, F., and Ruby Leung, L.: Inverse modeling of hydrologic parameters using surface flux and runoff observations in the Community Land Model, Hydrol. Earth Syst. Sci., 17, 4995–5011,, 2013. 

Swenson, S. C., Lawrence, D. M., and Lee, H.: Improved simulation of the terrestrial hydrological cycle in permafrost regions by the Community Land Model, J. Adv. Model. Earth Sy., 4, M08002,, 2012. 

Swenson, S. C., Clark, M., Fan, Y., Lawrence, D. M., and Perket, J.: Representing Intrahillslope Lateral Subsurface Flow in the Community Land Model, J. Adv. Model. Earth Sy., 11, 4044–4065,, 2019. 

Tan, Z., Leung, L. R., Li, H.-Y., Tesfa, T., Zhu, Q., and Huang, M.: A substantial role of soil erosion in the land carbon sink and its future changes, Glob. Change Biol., 26, 2642–2655,, 2020. 

Tebaldi, C., Smith, R. L., Nychka, D., and Mearns, L. O.: Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles, J. Climate, 18, 1524–1540, 2005. 

Tesfa, T. K., Leung, L. R., and Ghan, S. J.: Exploring Topography-Based Methods for Downscaling Subgrid Precipitation for Use in Earth System Models, J. Geophys. Res.-Atmos., 125, e2019JD031456,, 2020. 

Toure, A. M., Luojus, K., Rodell, M., Beaudoing, H., and Getirana, A.: Evaluation of Simulated Snow and Snowmelt Timing in the Community Land Model Using Satellite-Based Products and Streamflow Observations, J. Adv. Model. Earth Sy., 10, 2933–2951,, 2018. 

Trenberth, K. E.: Changes in precipitation with climate change, Clim. Res., 47, 123–138, 2011. 

Troy, T. J., Wood, E. F., and Sheffield, J.: An efficient calibration method for continental-scale land surface modeling, Water Resour. Res., 44, W09411,, 2008. 

Tsai, W.-P., Feng, D., Pan, M., Beck, H., Lawson, K., Yang, Y., Liu, J., and Shen, C.: From calibration to parameter learning: Harnessing the scaling effects of big data in geoscientific modeling, Nat. Commun., 12, 5988,, 2021. 

Vörösmarty, C. J., Green, P., Salisbury, J., and Lammers, R. B.: Global Water Resources: Vulnerability from Climate Change and Population Growth, Science, 289, 284–288,, 2000. 

Wang, C., Duan, Q., Gong, W., Ye, A., Di, Z., and Miao, C.: An evaluation of adaptive surrogate modeling based optimization with two benchmark problems, Environ. Modell. Softw., 60, 167–179,, 2014. 

Warszawski, L., Frieler, K., Huber, V., Piontek, F., Serdeczny, O., and Schewe, J.: The Inter-Sectoral Impact Model Intercomparison Project (ISI–MIP): Project framework, P. Natl. Acad. Sci. USA, 111, 3228–3232,, 2014. 

Wu, H., Kimball, J. S., Mantua, N., and Stanford, J.: Automated upscaling of river networks for macroscale hydrological modeling, Water Resour. Res., 47, W03517,, 2011. 

Xie, Z., Yuan, F., Duan, Q., Zheng, J., Liang, M., and Chen, F.: Regional Parameter Estimation of the VIC Land Surface Model: Methodology and Application to River Basins in China, J. Hydrometeorol., 8, 447–468,, 2007. 

Xiu, D. and Karniadakis, G. E.: The Wiener–Askey Polynomial Chaos for Stochastic Differential Equations, SIAM J. Sci. Comput., 24, 619–644,, 2002. 

Xu, D.: Code for “Using an Uncertainty Quantification Framework to Calibrate the Runoff Generation Scheme in E3SM Land Model V1”, Zenodo [code],, 2022a. 

Xu, D.: Data for “Using an Uncertainty Quantification Framework to Calibrate the Runoff Generation Scheme in E3SM Land Model V1”, Zenodo [data set],, 2022b. 

Xu, D., Ivanov, V. Y., Kim, J., and Fatichi, S.: On the use of observations in assessment of multi-model climate ensemble, Stoch. Env. Res. Risk A., 33, 1923–1937,, 2019. 

Xu, D., Ivanov, V. Y., Li, X., and Troy, T. J.: Peak Runoff Timing is Linked to Global Warming Trajectories, Earths Future, 9, e2021EF002083,, 2021a. 

Xu, D., Bisht, G., Zhou, T., Leung, L. R., and Pan, M.: Development of Land-River Two-Way Coupling in the Energy Exascale Earth System Model, Earth and Space Science Open Archive [preprint],, 2021b. 

Yang, H., Zhou, F., Piao, S. L., Huang, M. T., Chen, A. P., Ciais, P., Li, Y., Lian, X., Peng, S. S., and Zeng, Z. Z.: Regional patterns of future runoff changes from Earth system models constrained by observation, Geophys. Res. Lett., 44, 5540–5549,, 2017. 

Yang, S. L., Xu, K. H., Milliman, J. D., Yang, H. F., and Wu, C. S.: Decline of Yangtze River water and sediment discharge: Impact from natural and anthropogenic changes, Sci. Rep.-UK, 5, 12581,, 2015. 

Zhang, Y., Zheng, H., Chiew, F. H. S., Arancibia, J. P. A., and Zhou, X.: Evaluating Regional and Global Hydrological Models against Streamflow and Evapotranspiration Measurements, J. Hydrometeorol., 17, 995–1010,, 2016. 

Zhou, T., Leung, L. R., Leng, G., Voisin, N., Li, H.-Y., Craig, A. P., Tesfa, T., and Mao, Y.: Global Irrigation Characteristics and Effects Simulated by Fully Coupled Land Surface, River, and Water Management Models in E3SM, J. Adv. Model. Earth Sy., 12, e2020MS002069,, 2020. 

Short summary
The runoff outputs in Earth system model simulations involve high uncertainty, which needs to be constrained by parameter calibration. In this work, we used a surrogate-assisted Bayesian framework to efficiently calibrate the runoff-generation processes in the Energy Exascale Earth System Model v1 at a global scale. The model performance was improved compared to the default parameter after calibration, and the associated parametric uncertainty was significantly constrained.