Benchmarking soil moisture and its relationship to ecohydrologic variables in Earth System Models

Massoud, Elias C.; Collier, Nathan; Wang, Yaoping; Mao, Jiafu; Harpold, Adrian; Kannenberg, Steven A.; Koren, Gerbrand; Kumar, Mukesh; Raghav, Pushpendra; Ray, Pallav; Shi, Mingjie; Tao, Jing; Vasu, Sreedevi P.; Wang, Huiqi; Zhu, Qing; Hoffman, Forrest M.

doi:10.5194/gmd-19-3427-2026

Articles | Volume 19, issue 8

https://doi.org/10.5194/gmd-19-3427-2026

Articles | Volume 19, issue 8

Model evaluation paper

27 Apr 2026

Model evaluation paper |

| 27 Apr 2026

Benchmarking soil moisture and its relationship to ecohydrologic variables in Earth System Models

Elias C. Massoud, Nathan Collier, Yaoping Wang, Jiafu Mao, Adrian Harpold, Steven A. Kannenberg, Gerbrand Koren, Mukesh Kumar, Pushpendra Raghav, Pallav Ray, Mingjie Shi, Jing Tao, Sreedevi P. Vasu, Huiqi Wang, Qing Zhu, and Forrest M. Hoffman

Abstract

Soil moisture (SM) is a key regulator of ecosystem biogeophysics, influencing plant water relations and land-atmosphere energy exchanges. We evaluate the representation of SM in 16 Earth System Models from the Coupled Model Intercomparison Project Phase 6 (CMIP6) using the International Land Model Benchmarking (ILAMB) framework, focusing on surface (0–5, 0–10 cm) and rootzone (0–100 cm) depths, as well as key ecohydrological variables like gross primary productivity (GPP), leaf area index (LAI), and evapotranspiration (ET), and their coupling. Models are benchmarked against multiple observational and assimilated datasets to assess both state variables and cross-variable relationships. Surface SM is generally well represented (r > 0.87), while rootzone SM variability is systematically overestimated (normalized standard deviation > 1). ET shows strong agreement with observations (r > 0.9), whereas GPP and LAI exhibit larger inter-model spread. Skill in individual variables does not guarantee realistic SM–ecohydrology coupling, which varies strongly across models and depends on the reference dataset. Köppen-based regional analyses reveal strong regime dependence, with several models performing well in Tropical and Temperate regions but degrading in Continental (high-latitude) zones. Across both global and regional benchmarks, models cluster by land surface framework, indicating that structural choices in soil hydrology and soil–plant coupling exert a first-order control on performance. These results provide process-relevant benchmarks and suggest that improving the representation of vertical soil structure, rooting depth distributions, and soil–plant hydraulic coupling will be central to advancing soil moisture realism in next-generation Earth system models.

Download & links

Article (PDF, 5737 KB)

Download & links

How to cite.

Received: 21 Jul 2025 – Discussion started: 10 Sep 2025 – Revised: 12 Mar 2026 – Accepted: 13 Apr 2026 – Published: 27 Apr 2026

This paper has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The United States government retains and the publisher, by accepting the article for publication, acknowledges that the United States government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this paper, or allow others to do so, for United States government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, last access: 20 April 2026).

1 Introduction

Soil moisture (SM) plays a central role in regulating Earth system processes by controlling the storage and exchange of water, carbon, and energy between the land surface and atmosphere (Clark et al., 2015; Trugman et al., 2018; Green et al., 2019; Massoud et al., 2020). Accurately representing SM in Earth System Models (ESMs) is essential for improving predictions of the Earth system (Seneviratne et al., 2010; Hauser et al., 2016; Humphrey et al., 2021). ESMs simulate these processes through coupled biogeochemical and hydrological cycles, both of which are strongly influenced by SM. However, accurately modeling SM at ESM grid scales remains challenging due to heterogeneity in soil properties, scale mismatches between physical processes and model resolution, and limited knowledge of subsurface boundary conditions (e.g., groundwater depth). As a result, ESMs adopt a range of approaches to simulate SM. The majority rely on “bucket-type” models that route water through discrete soil layers using threshold-based parameters (e.g., field capacity) and rate-dependent functions (e.g., hydraulic conductivity).

This work seeks to evaluate the representation of SM in ESMs by using the International Land Model Benchmarking (ILAMB) framework (Collier et al., 2018), a tool extensively used to assess the performance of land models. ILAMB provides a standardized, reproducible evaluation framework that compares model output against observational benchmarks using a suite of complementary metrics, including measures of bias, variability, seasonal cycle, and spatial pattern agreement. Its modular design and consistent scoring methodology make it well suited for systematic, multi-model intercomparison and for diagnosing strengths and weaknesses across different components of the land surface system. Although ILAMB has been applied to various ecohydrologic processes and properties such as gross primary production (GPP) (Caen et al., 2022), evapotranspiration (ET) (Wu et al., 2020), and leaf area index (LAI) (Yang et al., 2023), SM has remained mostly underrepresented in ILAMB-based model evaluations until now.

Despite advances in ESMs, the accurate simulation of SM in Coupled Model Intercomparison Project Phase 6 (CMIP6) remains a persistent challenge due to structural biases and uncertainties in land surface processes. Several studies have highlighted both improvements and limitations in how CMIP6 models represent SM. For example, Yuan et al. (2021) showed that CMIP6 models better capture historical surface SM trends over the contiguous United States (CONUS) compared to CMIP5, particularly in regions like the Northwest and Midwest. However, considerable inter-model variability remains, suggesting a need for further refinement in future model generations. Similarly, Wang et al. (2022) conducted a comprehensive evaluation of CMIP6 SM simulations over China and found that while the multi-model mean (MME) generally captured observed spatial patterns and seasonal cycles of both near-surface and rootzone SM, substantial inter-model spread persisted, particularly in trends and interannual variability. Their findings also emphasized the dominant role of land surface schemes in driving model behavior, as models developed by the same institution often exhibited similar performance. Moreover, Purdy et al. (2018) illustrated the potential for significant improvement in model performance through better SM representation, showing that integrating SM information from satellite data into model simulations reduced global ET errors by up to 23 % in dry regions. These studies point to the critical need to identify and correct structural biases that limit current model skill in simulating SM.

At the same time, the importance of accurately simulating SM extends beyond model performance metrics. Zuo et al. (2024) highlighted that maintaining current global SM levels could reduce nearly a third of projected land warming under low-emission scenarios. Their results, based on outputs from historical CMIP6 experiments and other model intercomparison projects, highlight the central role SM plays in climate feedbacks and the reliance on models like CMIP6 to inform future projections. Given both the challenges and stakes involved, this study benchmarks SM and its coupling with ecohydrologic processes such as GPP, LAI, and ET, with the goal of identifying model limitations and guiding improvements as we look toward CMIP7 and beyond.

The evaluation of SM in global models has a long history, with early efforts dating back to Robock et al. (1998) as part of the Atmospheric Model Intercomparison Project (AMIP). In their analysis, they found significant discrepancies in how SM was represented and simulated across different models, an issue that was prominent 25 years ago and remains a challenge today. More recently, Qiao et al. (2022) conducted a detailed evaluation of SM using a suite of CMIP6 models, examining both surface and deeper SM (up to 2 m) across various subregions around the globe. They found that the multimodel ensemble mean generally produces reasonable representations for overall climatology. However, their study relied on reanalysis products and data assimilation systems as the reference for benchmarking (Qiao et al., 2022), which, while widely used, are not true SM observational products but rather model-based estimates. Furthermore, Qiao et al. (2022) assumed that the model outputs mrsos (surface SM) and mrsol (layered SM) represent the same variable, despite their structural differences (Massoud et al., 2026). In reality, mrsol provides moisture values at multiple soil layers, from which mrsos, representing moisture in the top 10 cm of soil, is derived. Other studies used a combination of observation and model-based benchmarks, but are limited to regional domains and similarly do not distinguish the different nature of mrsos and mrsol (Yuan et al., 2021; Wang et al., 2022). In this current study, we distinguish between these two variables and apply depth-specific derivations to more accurately evaluate SM representation across different layers in the models. Here, we not only benchmark SM at various depths, but also evaluate key ecohydrologic variables such as GPP, LAI, and ET, both individually and in relation to SM, to provide a more comprehensive assessment of land surface processes in CMIP6 models.

One of the persistent challenges in benchmarking SM is the limited availability of high-quality datasets. To address this, we utilize two key datasets for global-scale benchmarking in this study. The first is the Wang et al. (2021) product, which is a weighted average of multiple sources, including offline land surface model simulations, remote sensing data, and reanalysis products, that was found to outperform the original sources in that study. This dataset provides estimates for both surface SM (top 10 cm) and rootzone SM (up to 1 m), offering a view of SM at different depths. The second dataset is the European Space Agency Climate Change Initiative SM (ESA-CCI SM) product (Dorigo et al., 2015, 2017; Gruber et al., 2019; Preimesberger et al., 2021, 2025), which is derived from a blend of passive and active satellite sensors. ESA-CCI SM represents surface SM down to 5 cm, and its combination of different satellite platforms helps mitigate the limitations of individual sensors, providing an observational estimate for surface SM that is robust. Together, these datasets allow for a more thorough evaluation of SM in CMIP6 models. We emphasize that neither of these products represents a definitive “truth,” as both incorporate model-based information and are subject to their own structural assumptions and uncertainties. In contrast to studies that rely on a single reanalysis-based reference (e.g., Qiao et al., 2022), our approach intentionally combines complementary datasets with differing methodologies, including satellite-based observations (ESA-CCI) and a blended product integrating offline models, reanalysis, and satellite constraints (Wang et al., 2021). By using multiple reference products rather than a single benchmark, our goal is to provide a more robust and balanced assessment of model performance and to reduce reliance on any one dataset. We note that soil moisture below 1 m can also be important for ecosystem functioning (Stocker et al., 2023), particularly in regions with deep rooting systems (Kühnhammer et al., 2023), but evaluating deeper soil layers is beyond the scope of this study and we therefore focus on surface and rootzone soil moisture.

While benchmarking SM alone can offer insights into model performance, understanding how SM interacts with other ecosystem processes such as GPP, ET, and LAI can yield additional clues into the strengths and weaknesses of these models (Guswa et al., 2002; Wang et al., 2019). By examining how models' skills are related to certain model processes, we aim to uncover patterns that point to specific model limitations, whether they stem from structural design, input data, or parameterizations. Ultimately, this work seeks to guide improvements in model development and reduce uncertainties in global SM simulations. The goals of this study are threefold. First, we benchmark CMIP6 models in their simulation of SM at multiple depths, specifically at 5, 10, and 100 cm, using various datasets. Second, we assess model performance in simulating key ecohydrologic variables, including GPP, LAI, ET, as well as the relationships between SM and each of these variables. Third, we aim to identify specific areas for improvement, whether in individual models or as systematic issues across the CMIP6 ensemble. To support these objectives, we also implement a Köppen climate region analysis (Peel et al., 2007) within the ILAMB framework to evaluate model performance across distinct climate zones.

2 Materials and Methods

2.1 Models and Variables in CMIP6 Simulations

The model data used in this study come from CMIP6 (Eyring et al., 2016), an international effort to standardize ESM simulations, enabling direct comparison across models. We use a suite of CMIP6 models (detailed in Table 1) that differ in their spatial resolutions and land surface model components. Table 1 lists each model's horizontal grid spacing and the land surface scheme used to simulate SM, GPP, LAI, and ET.

The SM variables analyzed are mrsol and mrsos, which represent layered and surface soil moisture, respectively. Both variables are provided in units of mass per unit area [kg m⁻²]. Other simulated variables include gpp (units g m² d⁻¹), lai (unitless), and evspsbl (units mm d⁻¹) corresponding to GPP, LAI, and ET. All variables are analyzed at monthly temporal resolution.

To derive SM at specific depths (e.g., 0–5, 0–10, or 0–100 cm), we calculate depth-integrated soil moisture estimates from the mrsol variable, which provides total SM contained in each discrete model soil layer. Since the vertical layering varies across models, we apply a depth-weighted integration approach that also converts units from mass to volumetric SM (i.e., from [kg m⁻²] to [m³ m⁻³]). This conversion and integration is expressed as:

\begin{matrix} (1) & \begin{aligned} {SM}_{integrated} & = \{[\sum_{layer i = 1}^{layer n - 1} \frac{mrsol (i)}{ρ_{w} \cdot d z (i)} \cdot d z (i)] \\ + [\frac{mrsol (n)}{ρ_{w} \cdot d z (i)} \cdot z_{remaining}]\} / z_{total} \end{aligned} \end{matrix}

In this equation, mrsol(i) represents the mass of SM in the ith model-defined soil layer, reported in units of [kg m⁻²]. The variable ρ_w is the density of liquid water, assumed to be a constant value of 1000 kg m⁻³. The quantity dz(i) refers to the thickness of the ith soil layer in meters [m], which is used to compute the volumetric contribution of each layer. The term z_remaining represents the portion in [m] of the final layer that partially overlaps with the target integration depth (e.g., if the target is 10 cm and the final layer spans from 8–15 cm, then z_remaining=2 cm = 0.02 m. The total integration depth, z_total, is the sum of all full-layer thicknesses plus z_remaining, defining the vertical extent over which SM is integrated.

This formula first converts each layer's SM from mass per unit area [kg m⁻²] to volumetric SM [m³ m⁻³] by dividing by the product of water density and layer thickness. Then, it multiplies by the layer thickness to compute the volume per unit area. Summing over all layers and dividing by the total soil depth yields the average volumetric SM over the target depth. This method standardizes SM across models with differing vertical discretizations and unit conventions, enabling accurate and consistent comparisons with benchmarking datasets, which report SM as a volume fraction. For surface SM (mrsos), which represents a shallow fixed-depth layer (i.e., 0.1 m), we apply the same conversion logic by assuming that fixed depth during volumetric transformation. This approach is consistent with prior studies (e.g., Qiao et al., 2022; Wang et al., 2022; Massoud et al., 2026).

Table 1CMIP6 models used in this study, along with their latitude and longitude grid sizes and the land models with their soil depths and number of soil layers used in each model.

Download Print Version | Download XLSX

2.2 Soil Moisture Datasets Used for Benchmarking

In this study, we evaluate the SM performance of CMIP6 models using two primary benchmark datasets. The first is the ESA-CCI SM product (Dorigo et al., 2017; Gruber et al., 2019; Preimesberger et al., 2021), which provides a global, long-term record of surface SM (∼ 2–5 cm) spanning over 40 years (1978–2023) at a daily temporal resolution that is aggregated to reflect monthly values and spatial resolution of approximately 25 km (Fig. 1A). The ESA-CCI SM product is updated annually through an algorithmic process that incorporates data from both passive and active satellite sensors, producing a blended dataset that integrates multiple sensor sources while extending the time series with each update. This dataset has been extensively used in hydrological and climatological research (e.g., An et al., 2016; McNally et al., 2016; Ciabatta et al., 2018; Massoud et al., 2023; Li et al., 2025a), including the Bulletin of the American Meteorological Society's annual “State of the Climate” reports. Its long-term, global coverage makes it a valuable resource for evaluating surface SM in ESMs. To facilitate direct comparison with the ESA-CCI SM product that typically represents SM at a depth of 5 cm, the mrsol variable in each CMIP6 model is integrated to the 5 cm layer using Eq. (1). Given that some models have first layers that are deeper than 5 cm, this integration may introduce additional uncertainty in the estimated SM.

The second SM dataset used in this study is the Wang et al. (2021) product, which provides a global, gap-free, long-term record of SM across four depths (0–10, 10–30, 30–50, and 50–100 cm) from 1970 to 2016, with a monthly temporal resolution and spatial resolution of 0.5°. This dataset synthesizes SM information from diverse sources, including in situ observations, satellite data, reanalysis products, and offline land surface model simulations. It employs three statistical approaches, unweighted averaging, optimal linear combination (OLC), and emergent constraint (EC), to produce a merged product that outperforms individual source datasets in terms of bias, root mean square error (RMSE), and correlation when compared to in situ observations. For this study, we utilize the OLC version of the Wang et al. (2021) product, which we hereafter refer to as Wang2021OLC, because it is constrained by in situ observational values and performs among the best of the paper's reported method-data source combinations. This hybrid dataset offers harmonized spatial, temporal, and vertical coverage, making it highly suitable for large-scale ESM benchmarking of both surface (Fig. 1B) and rootzone (Fig. 1C) SM. Since this dataset provides SM estimates at both 10 and 100 cm depths, we integrate the mrsol variable from each CMIP6 model to these depths using Eq. (1) to enable direct comparison. In addition, because the mrsos variable in CMIP6 models also represents surface SM at approximately 10 cm, it is separately benchmarked against the 10 cm layer from the Wang2021OLC product. This dual use of mrsol and mrsos allows us to assess the internal consistency and depth representation of SM across different model variables and observational references.

While these datasets are widely used and provide valuable long-term, global-scale estimates, each has inherent limitations, including retrieval uncertainties, differences in spatial and temporal resolution, and dependence on model-based or algorithmic assumptions. Because no single dataset can fully capture the complexity of SM dynamics, the use of multiple, complementary observational and assimilated products helps quantify uncertainty in the benchmark results presented here.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f01

Figure 1Long-term mean soil moisture (SM) from observational datasets. (A) ESA-CCI Surface SM [m³ m⁻³] (top 5 cm) (1978–2023), providing global estimates based on satellite data. (B) Wang et al. (2021) OLC Surface SM [m³ m⁻³] (top 10 cm) (1970–2016), derived from a merged dataset of satellite and in-situ observations. (C) Wang et al. (2021) OLC Rootzone SM [m³ m⁻³] (top 100 cm), representing long-term mean soil moisture across the rootzone. Each subplot shows the global distribution of soil moisture at the depths represented in each product.

2.3 Other Benchmark Datasets

For the ecohydrologic relationship variables, we incorporate observational datasets specific to each variable that exist in the ILAMB data library. GPP observations are derived from the Water, Energy, and Carbon with Artificial Neural Networks (WECANN) dataset, which provides globally gridded estimates based on advanced machine learning approaches that integrate remote sensing and meteorological inputs (Alemohammad et al., 2017) from 2007 to 2016. LAI data is sourced from the NOAA Climate Data Record (CDR) of AVHRR Leaf Area Index (AVH15C1) from 1981 to 2019 (Claverie et al., 2016). ET data comes from the Global Land Evaporation Amsterdam Model (GLEAM) v3.3a, which provides daily estimates (aggregated here to reflect monthly values) from 1980 to 2018 (Miralles et al., 2011; Martens et al., 2017). The spatial resolution of the WECANN GPP product (Fig. 2A) is 0.5°, that of the AVH15C1 LAI data (Fig. 2B) is 0.05°, and that of the GLEAMv3.3a ET data (Fig. 2C) is 0.25°.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f02

Figure 2Long-term mean ecohydrological variables from observational datasets. (A) WECANN GPP [g m⁻² d⁻¹ ] (2007–2016), providing global estimates of gross primary productivity derived from machine learning techniques. (B) AVH15C1 LAI [unitless] (1981–2019), representing global leaf area index values from the NOAA Climate Data Record. (C) GLEAMv3.3a ET [mm d⁻¹] (1980–2018), showing global evapotranspiration estimates derived from satellite observations and meteorological data. Each subplot displays the global distribution of these ecohydrological variables.

While ILAMB supports multiple observational datasets for each variable, we utilize a single benchmark dataset per variable in this study to maintain consistency and simplicity in our analysis. GPP observations are taken from the WECANN product (Alemohammad et al., 2017), which provides globally gridded estimates derived from machine-learning approaches that integrate satellite and meteorological data. LAI is evaluated using the NOAA Climate Data Record AVH15C1 product (Claverie et al., 2016; NOAA CDR AVH15C1), which offers a long-term, internally consistent record based on AVHRR observations. ET is benchmarked against GLEAM v3.3a (Miralles et al., 2011; Martens et al., 2017), a widely used dataset that combines satellite observations with a physically based modeling framework.

It is worth noting that although these products are widely used as observational references, they are themselves derived from models or statistical algorithms informed by observational inputs, and therefore should be interpreted as observationally informed estimates rather than direct measurements. Each dataset also carries known limitations, including uncertainties related to retrieval algorithms, input data quality, and assumptions embedded in the underlying models. Nonetheless, these datasets have been extensively evaluated and applied in previous studies and provide reliable and widely accepted benchmarks for evaluating modeled GPP, LAI, and ET (represented by the gpp, lai, and evspsbl variables), supporting a comprehensive evaluation of ecohydrologic processes and their relationships with SM.

2.4 ILAMB Framework

The ILAMB framework is an open-source software package used to assess the performance of ESMs by comparing model outputs to a suite of observational datasets. ILAMB's scoring system integrates several key performance metrics: bias, RMSE, seasonal cycle representation, and spatial distribution (Fig. A1). These metrics are synthesized into an overall score using the method detailed in Collier et al. (2018), offering a quantitative view of model fidelity. Bias measures the average deviation from observational data, RMSE quantifies error magnitude, and the seasonal cycle and spatial metrics assess the temporal and geographic accuracy of model outputs. By combining these metrics, ILAMB generates diagnostic graphics and scores to help identify strengths and weaknesses in model simulations. The ILAMB framework has been widely adopted for model evaluation and intercomparison, aiding in the continuous development of more accurate land model components (Collier et al., 2018, 2023).

In this study, we extend the ILAMB analysis by incorporating Köppen climate classifications, which allows for a more detailed evaluation of model performance across diverse climate zones. These regions, which include tropical, desert and semi-arid, temperate, and continental climates, reflect varying environmental conditions that significantly influence SM and vegetation dynamics. By examining model skill within these distinct climate zones, we gain deeper insights into region-specific strengths and weaknesses, allowing for targeted improvements in land models across climate types.

3 Results from ILAMB

3.1 Overall benchmark scores in ILAMB

In this section, we summarize the global performance of CMIP6 models using the ILAMB framework across surface and rootzone soil moisture (SM) and key ecohydrological variables. The combined results are shown in the ILAMB portrait plot (Fig. 3), where colors indicate relative performance across models and metrics, with purple denoting higher skill and orange lower skill. The Overall Score is a composite metric that combines normalized measures of bias, error magnitude (RMSE), seasonal cycle representation, and spatial pattern agreement following the ILAMB methodology (Collier et al., 2018). The Seasonal Cycle Score evaluates how well models capture the timing and amplitude of the observed annual cycle, while the Spatial Distribution Score measures agreement in spatial patterns across grid cells.

Rather than showing uniform behavior across variables, the models exhibit distinct and structured performance patterns. Some models display relatively strong skill for ecohydrological variables (GPP, LAI, and ET) but weaker performance for soil moisture itself, while others perform well for soil moisture states but less well for vegetation and flux variables. As a result, the coupling metrics between SM and ecohydrological variables often differ markedly from the skill in the individual variables alone. This indicates that reproducing realistic mean states or variability does not necessarily imply that models capture the correct sensitivity of ecosystem processes to soil moisture.

Notably, models based on the same land surface framework tend to exhibit similar performance signatures (e.g., CESM2 and NorESM2-LM using CLM5, and MPI-ESM1-2-LR and AWI-ESM-1-1-LR using JSBACH; see Table A1). This structural coherence suggests that differences in land model formulation play an important role in shaping benchmarking outcomes, motivating a more detailed examination of soil moisture, ecohydrological variables, and their coupling in the following subsections. We note that comparable process-level documentation is not always consistently available across all CMIP6 models; therefore, Table A1 summarizes key characteristics only for those models for which this information is clearly documented in the literature, and the absence of entries for some models reflects limitations in publicly available documentation rather than an exhaustive comparison.

Figures A1–A3 provide additional context for these patterns. Figure A1 shows the individual ILAMB component scores for global surface SM (mrsol integrated to 10 cm) against Wang2021OLC, Fig. A2 presents the quantitative Overall Scores underlying Fig. 3, and Fig. A3 shows global maps of model bias. Together, these diagnostics illustrate that the overall rankings emerge from a combination of biases, variability errors, and spatial pattern mismatches, rather than from any single performance metric alone.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f03

Figure 3Benchmarking results from the ILAMB global run. Each model is compared against observational datasets: ESA-CCI Surface SM [m³ m⁻³] (top 5 cm), Wang2021OLC Surface SM [m³ m⁻³] (top 10 cm), Wang2021OLC Rootzone SM [m³ m⁻³] (top 100 cm), WECANN GPP [g m⁻² d⁻¹ ], AVH15C1 LAI [unitless], and GLEAMv3.3a ET [mm d⁻¹ ]. Additionally, the relationships between soil moisture and ecohydrological variables (GPP, LAI, and ET) are also benchmarked. Purple represents better relative skill, while orange indicates relatively worse performance.

Download

3.2 Surface and Rootzone Soil Moisture

The evaluation of surface and rootzone SM highlights sensitivity both to the observational reference and to the soil depth considered. When surface SM is evaluated using ESA-CCI (top 5 cm) and Wang2021OLC (top 10 cm), roughly half of the models change their relative skill ranking while others remain similar (Fig. 3), indicating that benchmarking outcomes at shallow depths depend in part on the choice of observational dataset. In contrast, when different model representations of shallow SM (mrsos and mrsol integrated to 10 cm) are evaluated against Wang2021OLC, most models retain similar rankings, suggesting relatively consistent performance across comparable near-surface depths within a common benchmark. Rootzone SM (0–100 cm), also evaluated against Wang2021OLC, shows a mixture of stable and shifting rankings relative to the surface evaluation: several models change skill category or show modest shifts, while many remain broadly similar. Together, these patterns indicate that differences among observational products influence surface benchmarking outcomes, while depth-dependent differences in model performance emerge more clearly when extending the evaluation to the rootzone.

As noted in Sect. 2.2, differences in model vertical discretization introduce an additional source of uncertainty, particularly for surface SM. Some models have first soil layers that are deeper than 5 cm, so integrating mrsol to 5 cm to match ESA-CCI can introduce representational errors, which likely contributes to part of the spread in surface SM performance. This structural mismatch between model layers and observational depth definitions complicates direct comparisons and underscores the challenges of benchmarking near-surface SM consistently across models.

The Taylor diagram analysis (Fig. 4), based on monthly mean fields from both models and observational datasets, further illustrates systematic differences between surface and rootzone performance. Across models, correlations with observed surface SM are generally higher than agreement in variability, indicating that models tend to better capture relative wetness and dryness patterns than absolute soil moisture magnitudes. In contrast, rootzone SM exhibits a systematic overestimation of variability across nearly all models. This suggests that ESMs tend to simulate larger fluctuations in deeper soil moisture than indicated by available reference data, pointing to a persistent challenge in representing subsurface water storage and buffering processes. At the same time, the limited availability and greater uncertainty of deep soil moisture observations likely contribute to the reduced observed variability at depth, implying that part of the model–data discrepancy may reflect observational limitations as well as model deficiencies. Together, these results indicate that while many models achieve reasonable performance for surface SM patterns, substantial challenges remain in representing the magnitude and variability of rootzone SM. The contrast between relatively robust surface correlations and systematically inflated deep-soil variability highlights the importance of improving subsurface hydrological processes and their vertical coupling in land surface models.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f04

Figure 4Taylor diagrams evaluating the performance of CMIP6 model SM simulations compared to different observational datasets: (A) Surface SM from ESA-CCI (top 5 cm), (B) Wang2021OLC Surface SM using mrsos from the models (top 10 cm), (C) Wang2021OLC Surface SM using mrsol from the models (top 10 cm), and (D) Wang2021OLC Rootzone SM (top 100 cm).

Download

3.3 Ecohydrological Variables

The evaluation of ecohydrological variables (GPP, LAI, and ET) reveals substantial diversity in model performance across variables and across models (Fig. 3). Unlike soil moisture, where some coherence in relative ranking is evident, models that perform well for one ecohydrological variable often perform poorly for others, resulting in a wide spread of rankings and few consistently high-performing models across all three benchmarks. This lack of coherence highlights the difficulty of achieving balanced realism in vegetation productivity, canopy structure, and surface fluxes within a single model framework.

The Taylor diagram analysis (Fig. 5), based on monthly mean fields from models and observational datasets, further illustrates systematic differences in how these variables are represented. Across most models, both GPP and LAI exhibit a pronounced high bias in variability, indicating that models tend to overestimate fluctuations in vegetation productivity and canopy structure relative to observations. In contrast, ET generally shows stronger agreement with observations, both in terms of spatial correlation and variability. This pattern suggests that while models capture the large-scale controls on surface energy and water fluxes relatively well, they tend to exaggerate the amplitude of vegetation responses.

The comparatively stronger performance in ET likely reflects, at least in part, tighter physical constraints through the surface energy balance and the availability of multiple independent observational products used to evaluate and calibrate this variable. Although transpiration is fundamentally a vegetation-controlled process and is represented in land models through the same canopy and stomatal conductance formulations that regulate carbon uptake, ET can nonetheless appear better constrained because it is subject to additional physical and observational constraints.

In addition, this relatively strong ET skill may also arise from compensating parameter choices that mask deficiencies in soil moisture representation. For example, biases in soil hydraulic properties or effective soil water holding capacity may compensate for missing or simplified processes such as groundwater interactions or root water uptake, allowing models to reproduce realistic surface fluxes despite underlying errors in subsurface moisture dynamics. By comparison, GPP and LAI are influenced by a broader set of interacting biological and phenological processes, which are represented differently across models and remain more weakly constrained by observations (Massoud et al., 2019; Li et al., 2025b). As a result, larger inter-model spread and systematic variability biases in GPP and LAI are not unexpected, even in cases where ET appears relatively well simulated.

Together, these results indicate that skill in simulating surface fluxes does not necessarily imply comparable skill in representing vegetation structure or productivity, and that improvements in ecohydrological realism will likely require more consistent treatment of vegetation dynamics in addition to improvements in hydrological processes.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f05

Figure 5Taylor diagrams evaluating the performance of CMIP6 models in simulating ecohydrologic variables: (A) Gross Primary Productivity (GPP) from WECANN, (B) Leaf Area Index (LAI) from AVH15C1, and (C) Evapotranspiration (ET) from GLEAMv3.3a.

Download

3.4 Relationship of Soil Moisture to Ecohydrological Variables

The relationship between soil moisture and ecohydrological variables provides a more stringent test of model behavior than evaluation of the individual variables alone. Figures 6 and 7 illustrate these relationships for the ACCESS-ESM1-5 model using two different observational surface SM products, based on monthly mean fields consistent with the temporal resolution of the analysis. The comparisons reveal that apparent agreement in individual variables does not necessarily translate into agreement in their coupled behavior, with notable differences emerging depending on the benchmark dataset and the depth of soil moisture considered.

When evaluated against the ESA-CCI surface SM product (0–5 cm; Fig. 6), models exhibit substantial discrepancies in the diagnosed relationships between SM and GPP, LAI, and ET, indicating that vegetation sensitivity to near-surface moisture is not consistently represented. In contrast, when the Wang2021OLC product is used with SM integrated to 10 cm (Fig. 7), agreement between models and observations improves for most variables, suggesting that part of the mismatch in surface-based comparisons arises from differences in observational products and soil depth representation. This highlights the strong sensitivity of coupling diagnostics to both the choice of reference dataset and the effective depth over which soil moisture is evaluated. We note that the climatological fields shown in Figs. 6 and 7 are computed within ILAMB from the available gridded data over the evaluation period. Because the ESA-CCI product contains spatial and temporal gaps in coverage, the resulting inferred climatological means and relationships may be influenced by incomplete sampling, potentially affecting the model–data comparison.

Extending this analysis across the full CMIP6 ensemble (Fig. 8) confirms that coupling skill varies widely among models and does not simply mirror performance in the individual SM or ecohydrological variables. Some models that perform well for SM or for GPP, LAI, and ET individually nevertheless show weak performance in the coupling metrics, indicating that reproducing realistic mean states or variability is insufficient to guarantee realistic dynamic sensitivity between soil moisture and ecosystem processes. Conversely, other models exhibit relatively stronger coupling skill despite more modest performance in the individual variables, underscoring that different aspects of model behavior are being tested by these diagnostics.

Together, these results demonstrate that benchmarking SM–ecosystem relationships provides complementary information to traditional state-based evaluations. The systematic differences in coupling performance across models and datasets point to persistent challenges in representing how soil moisture anomalies propagate into vegetation and surface flux responses, and motivate further examination of the underlying structural differences among land models.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f06

Figure 6Relationship of SM (y-axis) versus ecohydrological variables (x-axis) shown through dot-and-whisker plots for the ACCESS-ESM1-5 model (blue) and observations (black). The ESA-CCI SM product (typically to 5 cm) is used here for SM, compared to the ACCESS-ESM1-5 model's SM integrated to the same depth (mrsol to 5 cm). The relationship of these SM estimates are shown for (a) WECANN GPP, (b) AVH15C1 LAI, and (c) GLEAMv3.3a ET at the global scale. Whiskers indicate interquartile ranges across land grid cells. These plots reveal a significant discrepancy between the model and observations when using ESA-CCI as the SM benchmark.

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f07

Figure 7Similar to Fig. 6, but using the Wang2021OLC SM product (integrated to 10 cm) and the corresponding ACCESS-ESM1-5 model SM at the same depth (mrsol to 10 cm). Dot-and-whisker plots show the relationship of soil moisture (y-axis) versus (a) WECANN GPP, (b) AVH15C1 LAI, and (c) GLEAMv3.3a ET for both model (blue) and observations (black) at the global scale. Whiskers represent interquartile ranges. Compared to Fig. 6, these plots demonstrate a marked reduction in discrepancies between modeled and observed relationships.

Download

3.5 Other ILAMB capabilities: Köppen Classification

While the preceding analyses focus on global-scale benchmarking, one of ILAMB's key strengths is its ability to evaluate model performance within specific biogeographic regimes. Using Köppen classifications, we assess model behavior across Tropical, Desert and Semi-arid, Temperate, and Continental climate zones (Fig. A4). The regional results from ILAMB (Figs. A5–A9) reveal that model performance is strongly dependent on the climate zone and that global scores can obscure substantial regional contrasts.

In the Tropical region (Figs. A5–A6), CLM5-based models (CESM2 and NorESM2-LM) exhibit consistently high skill across soil moisture, ecohydrological variables, and their relationships, indicating coherent performance in energy- and water-rich environments. In contrast, JSBACH-based models (MPI-ESM1-2-LR and AWI-ESM-1-1-LR) show systematically weaker performance across many of these diagnostics, while UKESM1-0-LL (JULES) exhibits mixed behavior, with relatively stronger performance for rootzone SM and surface fluxes but weaker performance for surface SM and GPP. These patterns suggest that model differences that are muted in global averages can become more pronounced when evaluated in climate regimes dominated by deep rooting, strong transpiration control, and tight soil–vegetation coupling.

In Desert and Semi-arid regions (Fig. A7), CLM5-based models again show generally strong performance across most variables, although skill in rootzone SM is reduced, leading to weaker coupling metrics involving deep soil moisture. JSBACH-based models tend to perform relatively well for soil moisture states but less well for ecohydrological variables, resulting in degraded coupling diagnostics. UKESM1-0-LL again shows mixed behavior, with relatively strong performance for rootzone SM and ET but weaker performance for surface SM and GPP, and correspondingly weaker coupling metrics. These results highlight that in water-limited environments, differences in how soil moisture anomalies propagate into vegetation and flux responses become particularly important.

In Temperate regions (Fig. A8), CLM5-based models maintain generally high performance across most benchmarks, although some degradation is evident for LAI. JSBACH-based models show relatively good performance for surface SM and LAI but weaker performance for rootzone SM, GPP, and ET, leading to poor coupling metrics. UKESM1-0-LL again exhibits stronger performance for rootzone SM and ecohydrological variables than for surface SM, with coupling metrics remaining comparatively weak. This intermediate regime underscores the growing importance of vertical soil moisture structure and rooting depth in shaping both state and coupling performance.

The Continental (high-latitude) region (Fig. A9) shows the most pronounced divergence among models. Here, CLM5-based models exhibit markedly degraded performance across soil moisture, ecohydrological variables, and their relationships, likely reflecting the strong influence of cold-region processes such as organic soils, freeze–thaw dynamics, and high soil porosity, which lead to substantially higher simulated soil moisture than indicated by current observational products. In contrast, JSBACH-based models show relatively good performance for surface SM and ecohydrological variables but weaker performance for rootzone SM, resulting in depth-dependent differences in coupling skill. These results demonstrate that high-latitude regions pose distinct challenges for both observations and models, and that process representations that improve performance in one climate regime may degrade it in another.

Taken together, the regional diagnostics show that no single model performs uniformly well across all climate regimes and benchmarks. The strong regime dependence of model skill highlights the importance of regional evaluation for identifying both strengths and weaknesses that are not evident from global metrics alone, and reinforces the need for process-based interpretation of benchmarking results.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f08

Figure 8Similar to the bias plots shown in Fig. 7A, but for all models. These plots highlight the observed relationships (black curves) between the surface SM (from the Wang2021OLC top 10 cm SM product) and GPP (from WECANN), as well as the relationship shown in each model (colored curves) using the simulated mrsol to 10 cm and gpp variables.

Download

4 Discussion

4.1 Potential Drivers for Divergences in Model Simulations

The substantial spread in SM estimates among CMIP6 models reflects differences in how land surface models (LSMs) represent key processes and input data, as evidenced by both the global and Köppen-region ILAMB diagnostics (Sect. 3.1 and 3.5). Although precipitation is a major driver of SM, it does not explain much of the mean inter-model variability in this study. Most CMIP6 models simulate similar global mean precipitation rates of approximately 2.5 mm d⁻¹ (Tapiador et al., 2018), suggesting that differences in total precipitation alone are insufficient to account for the spread in simulated SM (see Fig. A1, column “Model Period Mean”). However, the timing and intensity of precipitation events, rather than total amounts, may contribute significantly to model divergence, especially in regions with highly seasonal precipitation regimes where the timing of precipitation governs the length of dry-down periods. This effect is particularly pronounced in seasonally snowmelt-dominated systems (Harpold and Molotch, 2015), where variations in precipitation and snowmelt seasonality strongly influence the partitioning between infiltration, runoff, and ET. Differences in how models represent storm frequency, sub-daily rainfall variability, and snowmelt dynamics can therefore propagate into systematic differences in SM variability and ecohydrological coupling, as seen in the regional results. Similarly, the treatment of soil ice is unlikely to contribute significantly to SM variability in a direct sense, as the CMIP6 definition of SM includes both liquid water and ice, which should be handled consistently across models and is only relevant in cold regions. This implies that other factors, such as soil properties, hydrologic parameterizations, and land–atmosphere feedbacks, are likely more important contributors to the inter-model spread highlighted in Sect. 3.2–3.5.

One important driver is the treatment of soil properties, particularly in high-latitude regions. Models that include representations of organic soils, such as CESM2 and NorESM2-LM, tend to simulate higher SM levels, likely due to the greater water-holding capacity of organic matter. Both models use the CLM5 land surface scheme, which explicitly accounts for organic soil processes in northern latitudes (Lu et al., 2020). This feature provides a plausible explanation for the large positive SM biases seen for these models in the Continental region (Figs. A9 and A3), while also being consistent with their comparatively strong performance in Tropical and Temperate regions (Figs. A5 and A8), where organic-rich soils and strong vegetation–soil coupling are widespread. It remains unclear whether all CMIP6 models have updated their treatment of organic soils to a similar degree, which could contribute to persistent inter-model differences in both SM states and SM–vegetation relationships.

Soil porosity is another critical input affecting SM simulations. Previous work (Dai et al., 2019) has shown that many CMIP5 models used outdated soil maps (e.g., FAO, 2003), and some CMIP6 models may still rely on these legacy datasets. More modern global soil datasets, such as the Harmonized World Soil Database v2.0 (Nachtergaele et al., 2023), the Global Soil Dataset for Earth System Modeling (Shangguan et al., 2014), and SoilGrids (Hengl et al., 2014), offer improved estimates of soil properties, including porosity. Differences in the underlying soil maps and parameter choices may therefore help explain why some models perform well for surface SM but poorly for rootzone SM, or vice versa, as seen in several Köppen regions (Sect. 3.5).

Model-specific representations of hydrologic processes further influence SM estimates and coupling behavior. For example, EC-Earth3-CC consistently simulates high SM values, potentially due to its inclusion of groundwater–soil interactions not present in many other models. With a few exceptions (Oleson et al., 2013; Lawrence et al., 2019), ESMs generally rely on “bucket-type” soil moisture representations that drain water between layers based on thresholds (e.g., field capacity) and rates (e.g., hydraulic conductivity), rather than fully pressure-driven flow as in Richards' equation. Across the CMIP6 ensemble, differences in the number and spacing of soil layers, rooting profiles, ET parameterizations, infiltration schemes, soil hydraulic properties, total soil depth, and lower boundary conditions (e.g., free drainage versus groundwater influence; see Table A1) likely contribute to the contrasting behaviors observed among model families, such as the different surface-versus-rootzone and state-versus-coupling skill patterns seen for CLM5-, JSBACH-, and JULES-based models.

Finally, land–atmosphere feedbacks may also contribute to the inter-model spread. CMIP models have been shown to overestimate the strength of feedbacks between SM and atmospheric variables, including temperature, ET, and surface fluxes (Levine et al., 2016). These exaggerated feedbacks may amplify SM variability and introduce additional divergence across models. For instance, models that strongly couple soil moisture to surface heat fluxes may simulate more aggressive drying or wetting cycles, depending on local conditions (Schumacher et al., 2022), potentially contributing to the widespread pattern of relatively good ET skill but weaker SM–ET coupling seen in several models (Sect. 3.4). The degree of inter-model variability in feedback strength is probably quite high, but further studies are needed to quantify how this affects inter-model spread in SM (Vogel et al., 2018; Talib et al., 2023).

In summary, the observed variability in SM simulations across CMIP6 models likely stems from a combination of factors, including differences in soil properties, legacy input datasets, precipitation event timing, hydrologic parameterizations, vertical structure and boundary conditions, and land–atmosphere feedbacks. The regime-dependent patterns revealed by the Köppen analysis suggest that these factors operate differently across climate zones, reinforcing the need for both global and regional diagnostics when evaluating and improving SM representation in ESMs.

4.2 Shared characteristics of models with similar performance

The global and regional ILAMB diagnostics reveal clear clustering of model behavior by land surface framework (Sect. 3.1 and 3.5; Table A1), indicating that structural choices in land model formulation exert a first-order control on both state and coupling performance. Models that share a common land surface model tend to exhibit similar strengths and weaknesses across soil moisture (SM), ecohydrological variables, and their interrelationships, suggesting that performance differences are not purely idiosyncratic but are linked to shared process representations.

CESM2 and NorESM2-LM, which both use CLM5, provide a clear example of this structural coherence. These models show relatively strong and internally consistent performance for ecohydrological variables (GPP, LAI, and ET) and their coupling to SM across several regions, particularly in the Tropical and Temperate zones (Sect. 3.5, Figs. A5 and A8). This behavior is consistent with CLM5's more detailed treatment of soil hydrology and explicit soil–plant hydraulic stress, which promotes coherent interactions between soil water availability, transpiration, and photosynthesis (Table A1). At the same time, both models exhibit degraded performance in the Continental (high-latitude) region, where their inclusion of organic soils and cold-region processes leads to systematically higher simulated SM than indicated by current observational products (Fig. A9). This combination of strong ecohydrological coupling but biased SM states highlights how a given structural choice can improve process coherence while simultaneously introducing regional biases.

In contrast, the JSBACH-based models (MPI-ESM1-2-LR and AWI-ESM-1-1-LR) tend to show relatively good performance for some SM state variables, particularly near the surface in several regions, but weaker performance for ecohydrological variables and, notably, for SM–vegetation coupling metrics. This pattern is especially evident in arid and temperate regimes (Sect. 3.5), where these models often reproduce mean SM reasonably well but fail to translate SM variability into realistic vegetation or flux responses. A plausible explanation is that JSBACH relies on more implicit vegetation water-stress formulations and simplified soil–plant coupling (Table A1), which may limit the model's ability to propagate soil moisture anomalies consistently into photosynthesis and transpiration, even when the soil moisture state itself is reasonably represented.

UKESM1-0-LL (JULES) exhibits a different, mixed behavior. This model often shows weaker performance for surface SM but comparatively better performance for rootzone SM and surface fluxes, along with moderate skill in some coupling metrics. This depth-dependent pattern is consistent with JULES's prescribed root distributions and implicit soil moisture stress functions, which can yield reasonable integrated flux responses while still allowing biases in shallow soil layers (Table A1). The recurring finding that UKESM1-0-LL captures some aspects of ecohydrological behavior despite weaker SM states reinforces the idea that realistic fluxes do not necessarily guarantee realistic subsurface moisture dynamics.

Taken together, these results demonstrate that similarities in land surface model architecture translate into recognizable performance “signatures” across both global and regional benchmarks. They also show that models can achieve good performance in individual variables while still failing to capture the correct coupling between soil moisture and ecosystem processes. This underscores the importance of evaluating not only state variables but also cross-variable coherence, and suggests that targeted improvements to soil hydrology, rooting depth distributions, and soil–plant coupling schemes could yield broad benefits across multiple aspects of model performance.

4.3 Consideration of uncertainty and limitations in the benchmark results

Evaluating model performance inherently involves uncertainty, both from observational datasets and from model representations. To better characterize this uncertainty, we employed multiple observational products for surface and rootzone SM, including ESA-CCI and Wang2021OLC surface SM, as well as Wang2021OLC rootzone SM. By comparing models against more than one observational reference, we aimed to capture a broader envelope of observational variability, rather than relying on any single product that may contain its own structural or systematic biases. We reiterate that none of the benchmark datasets used here represents a definitive truth, and that our multi-product approach is intended to reduce dependence on any single reference and to better characterize observational uncertainty. Similarly, we assessed different model representations of SM, including the mrsos and mrsol variables integrated to 5, 10, and 100 cm depths, to examine how model configuration affects evaluation outcomes.

We note that GPP and LAI are integrative ecosystem variables influenced by multiple interacting processes beyond soil moisture alone, and therefore discrepancies in SM–GPP or SM–LAI relationships cannot be interpreted mechanistically as arising solely from deficiencies in soil hydrology. In this study, these relationships are instead used to assess whether models reproduce the emergent ecosystem-scale coupling between soil moisture variability and vegetation functioning. More process-level hydrological quantities, such as runoff, drainage, or evaporation–transpiration partitioning, would be valuable for mechanistic diagnosis of soil moisture controls, but their use as global benchmarking targets is currently limited by observational availability and consistency. We therefore focus on GPP, LAI, and ET as comparatively well-constrained, widely evaluated ecosystem-scale diagnostics, while acknowledging the associated limitations in process-level attribution.

Soil moisture presents unique challenges in benchmarking due to the diversity and limitations of observational data sources. Satellite-derived products offer broad spatial coverage but are limited to shallow depths and may have coarser resolution for passive microwave technology, whereas active microwave retrievals can achieve higher resolution but are subject to greater uncertainty from surface roughness and vegetation structure (Zeng et al., 2023). In contrast, in-situ observations provide deeper and more accurate point measurements but are geographically sparse and face substantial scaling challenges when compared against ESM grid cells. While our study did not include in-situ datasets, their inclusion could strengthen future benchmarks, particularly for deeper soil layers and for regions where satellite coverage is limited. The variation in spatial resolution, vertical depth coverage, and methodological differences between observational datasets contributes to uncertainty in benchmarking, and any conclusions drawn from SM comparisons, especially in data-sparse regions such as high latitudes and arid zones, should be interpreted within this context.

To address these complexities, we adopted a multi-product comparison approach. However, additional strategies may further enhance benchmarking efforts. In atmospheric science, for example, it is common to evaluate models not only against individual datasets but also against an ensemble of observations (Yamaguchi et al., 2016), which helps define a consensus observational baseline. While this strategy is less frequently used in the SM modeling community, it may offer a valuable pathway forward, particularly when observational datasets diverge or when no single product can be considered definitively superior.

Beyond observational uncertainty, ensemble modeling (using a collection of model runs or parameter sets) can also help characterize structural and parametric uncertainty within the models themselves. While not implemented in this study, future benchmarking efforts could benefit from integrating probabilistic or ensemble-based approaches to better understand model sensitivity (cf. Massoud et al., 2019) and confidence in simulated soil moisture.

5 Conclusions

This study evaluated the performance of 16 CMIP6 models in simulating key land surface and ecohydrological variables, including surface and rootzone soil moisture (SM), gross primary productivity (GPP), leaf area index (LAI), and evapotranspiration (ET), using the ILAMB framework. By combining state-variable benchmarks with diagnostics of SM–ecohydrology coupling and regional (Köppen) analyses, we provide a multi-faceted assessment of land model behavior. The main findings are summarized as follows:

i.
Model performance in simulating soil moisture (SM) varies substantially depending on the observational benchmark used. For example, some models (e.g., EC-Earth3-CC) perform well against ESA-CCI surface SM but perform substantially worse relative to the Wang2021OLC product. This sensitivity highlights the importance of using multiple observational references to better characterize uncertainty and avoid conclusions based on a single dataset.
ii.
Model skill differs markedly across variables and reflects structural similarities among land surface model frameworks. For instance, CESM2 and NorESM2-LM exhibit relatively strong performance for ecohydrological variables (GPP, LAI, and ET) but show persistent SM biases in high-latitude regions. These shared behaviors reflect their common use of the CLM5 land surface model and illustrate how structural choices shape both strengths and weaknesses in simulated land surface states and fluxes. In particular, models with more explicit soil–plant hydraulic coupling, such as CLM5-based models, tend to reproduce SM–vegetation relationships more consistently across regions, whereas models relying on more implicit vegetation water-stress formulations (e.g., JSBACH- and JULES-based models) more often reproduce soil moisture states without fully capturing the associated vegetation or flux responses.
iii.
Evaluating SM–ecohydrology relationships further shows that models more consistently reproduce observed coupling patterns when assessed against the Wang2021OLC dataset than against ESA-CCI. This suggests that deeper soil moisture estimates may provide a more stable reference for diagnosing vegetation–water interactions, while also reinforcing that good performance in individual variables does not necessarily imply realistic cross-variable sensitivity.
iv.
Model skill also varies strongly by climate regime. For example, CESM2 and NorESM2-LM perform relatively well in Tropical and Temperate regions but degrade substantially in Continental (high-latitude) zones, where cold-region processes and organic soils play a larger role. These results highlight the limitations of relying on global mean scores alone and demonstrate the value of regional benchmarking for diagnosing where and why models succeed or fail.

Taken together, these results indicate that no single CMIP6 model performs best across all variables, coupling metrics, or regions. Instead, performance patterns often cluster by land surface model family, suggesting that structural choices in soil hydrology, rooting representation, and soil–plant coupling influence both soil moisture states and ecohydrological interactions. The ILAMB framework therefore provides a useful approach not only for benchmarking individual variables but also for diagnosing cross-variable coherence and regional dependencies in model behavior.

Future work should expand these benchmarking efforts by incorporating in-situ observations, applying ensemble-based and probabilistic approaches to both models and observations, and further refining regional analyses. Such developments will be important for improving constraints on soil moisture dynamics, strengthening understanding of land–atmosphere coupling, and guiding the development of next-generation Earth system models, including those participating in CMIP7.

Appendix A

Table A1Overview of key land surface process representations for some of the selected CMIP6 models used in this study, including hydrology scheme, root distribution, lower boundary condition, and treatment of plant hydraulics. This table provides structural context for interpreting differences in soil moisture, ecohydrological variables, and their coupling across models. Sources for each entry are listed in the table. Because process-level documentation is not consistently available for all CMIP6 models, this table summarizes key characteristics only for models with clearly documented information in the literature.

Download Print Version | Download XLSX

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f09

Figure A1Evaluation of global simulations of SM using the mrsol variable to 10 cm, compared to the Wang et al. (2021) dataset at the same depth. The various metrics shown are the different benchmark scores used in the ILAMB evaluation, including Bias Score, RMSE Score, Seasonal Cycle Score, and Spatial Distribution Score. These scores contribute to the final benchmark, represented here as the Overall Score, and are the results that are presented in Fig. 3 (and Figs. S5 and S7–9).

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f10

Figure A2Overall scores that provide the ILAMB results in Fig. 3 are depicted numerically here.

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f11

Figure A3Bias plots (in m³ m⁻³) comparing Wang2021OLC top 10 cm SM product to the mrsol variable up to 10 cm for all models. The bias is calculated as the difference between model simulations and observations in each grid cell, providing insight into simulated SM performance in various regions around the globe.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f12

Figure A4Köppen climate regions evaluated in this study through ILAMB. The default region is “global”, while other regions include (A) Tropical, (B) Desert and Semi-arid, (C) Temperate, and (D) Continental. Regional mean values of surface SM derived from the ESA-CCI product are shown. The ESA-CCI dataset includes some data gaps, particularly in densely forested regions (e.g., the Amazon and Congo), ice-covered areas, and urban zones, due to limitations in microwave satellite observations. These gaps are more prevalent in earlier years and become less frequent over time. However, the primary purpose of this figure is to broadly illustrate the Köppen climate regions rather than serve as a detailed analysis of SM.

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f13

Figure A5ILAMB results similar to Fig. 3, but for the “Tropical” Köppen region.

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f14

Figure A6Relationships between surface SM and ecohydrologic variables, similar to Fig. 7, but for the “Tropical” Köppen region. The model results shown here are for the ACCESS-ESM1-5 model. Panels (A), (D), and (G) show the heat maps of the observed relationships, panels (B), (E), and (H) show the heat maps of simulated relationships in the ACCESS-ESM1-5 model, and panels (C), (F), and (I) portray both the observed and simulated relationships in the form of dot and whisker plots for easy comparison.

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f15

Figure A7ILAMB results similar to Figs. 3 and A5, but for the “Desert and semi-arid” Köppen region.

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f16

Figure A8ILAMB results similar to Figs. 3 and A5, but for the “Temperate” Köppen region.

Download

https://gmd.copernicus.org/articles/19/3427/2026/gmd-19-3427-2026-f17

Figure A9ILAMB results similar to Figs. 3 and A5, but for the “Continental” Köppen region.

Download

Code and data availability

The ILAMB software used in this study is publicly available at https://github.com/rubisco-sfa/ILAMB (last access: 2 January 2026) and archived with DOI: https://doi.org/10.18139/ILAMB.v002.00/1251621 (Collier et al., 2023). The specific assets related to the RUBISCO Soil Moisture Working Group benchmarking effort are available at https://github.com/rubisco-sfa/soil-moisture (last access: 2 January 2026).

Author contributions

This work arises from collaborative efforts and discussions within the RUBISCO Soil Moisture Working Group, which inspired the ideas behind this paper. E.C.M. performed the analysis, wrote the original draft, and led the editing process. N.C., Y.W., and J.M. helped with the analysis, contributed to discussions that shaped the paper, and assisted with editing. A.H., S.A.K., G.K., M.K., Pu.R., Pa.R., M.S., J.T., S.P.V., H.W., Q.Z., and F.M.H. contributed to the development of ideas through discussions and assisted with manuscript editing.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Acknowledgements

This research was sponsored by the Regional and Global Model Analysis (RGMA) activity of the Earth and Environmental Systems Modeling (EESM) Program in the Earth and Environmental Systems Sciences Division (EESSD) of the Office of Biological and Environmental Research (BER) in the U.S. Department of Energy Office of Science. This work is the result of the RUBISCO project Soil Moisture Working Group effort on benchmarking soil moisture in Earth System Models, supported by the RUBISCO SFA. P.R. acknowledges a grant from the Climate Program Office (CPO) of the National Oceanic and Atmospheric Administration (NOAA, NA22OAR4310612). M.S. is supported by the Earth and Biological Sciences Directorate's Laboratory Directed Research and Development (LDRD) Program at Pacific Northwest National Laboratory (PNNL). PNNL is a multi-program national laboratory operated for the US DOE by Battelle Memorial Institute under contract no. DE-AC05-76RL01830.

Financial support

This research has been supported by the Biological and Environmental Research (grant no. RUBISCO SFA KP1703)).

Review statement

This paper was edited by Amos Tai and reviewed by two anonymous referees.

References

Alemohammad, S. H., Fang, B., Konings, A. G., Aires, F., Green, J. K., Kolassa, J., Miralles, D., Prigent, C., and Gentine, P.: Water, Energy, and Carbon with Artificial Neural Networks (WECANN): a statistically based estimate of global surface turbulent fluxes and gross primary productivity using solar-induced fluorescence, Biogeosciences, 14, 4101–4124, https://doi.org/10.5194/bg-14-4101-2017, 2017.

An, R., Zhang, L., Wang, Z., Quaye-Ballard, J. A., You, J., Shen, X., Gao, W., Huang, L., Zhao, Y., and Ke, Z.: Validation of the ESA CCI soil moisture product in China, Int. J. Appl. Earth Obs. Geoinf., 48, 28–36, https://doi.org/10.1016/j.jag.2015.09.009, 2016.

Best, M. J., Pryor, M., Clark, D. B., Rooney, G. G., Essery, R. L. H., Ménard, C. B., Edwards, J. M., Hendry, M. A., Porson, A., Gedney, N., Mercado, L. M., Sitch, S., Blyth, E., Boucher, O., Cox, P. M., Grimmond, C. S. B., and Harding, R. J.: The Joint UK Land Environment Simulator (JULES), model description – Part 1: Energy and water fluxes, Geosci. Model Dev., 4, 677–699, https://doi.org/10.5194/gmd-4-677-2011, 2011.

Clark, D. B., Mercado, L. M., Sitch, S., Jones, C. D., Gedney, N., Best, M. J., Pryor, M., Rooney, G. G., Essery, R. L. H., Blyth, E., Boucher, O., Harding, R. J., Huntingford, C., and Cox, P. M.: The Joint UK Land Environment Simulator (JULES), model description – Part 2: Carbon fluxes and vegetation dynamics, Geosci. Model Dev., 4, 701–722, https://doi.org/10.5194/gmd-4-701-2011, 2011.

Clark, M. P., Fan, Y., Lawrence, D. M., Adam, J. C., Bolster, D., Gochis, D. J., and Hooper, R. P.: Improving the representation of hydrologic processes in Earth System Models, Water Resour. Res., 51, 5929–5956, https://doi.org/10.1002/2015WR017096, 2015.

Caen, A., Smallman, T. L., de Castro, A. A., Robertson, E., von Randow, C., Cardoso, M., and Williams, M.: Evaluating two land surface models for Brazil using a full carbon cycle benchmark with uncertainties, Climate Res. Sustain., 1, e10, https://doi.org/10.1002/cli2.10, 2022.

Claverie, M., Matthews, J., Vermote, E., and Justice, C.: A 30+ Year AVHRR LAI and FAPAR Climate Data Record: Algorithm Description and Validation, Remote Sens., 8, 263, https://doi.org/10.3390/rs8030263, 2016.

Ciabatta, L., Massari, C., Brocca, L., Gruber, A., Reimer, C., Hahn, S., Paulik, C., Dorigo, W., Kidd, R., and Wagner, W.: SM2RAIN-CCI: a new global long-term rainfall data set derived from ESA CCI soil moisture, Earth Syst. Sci. Data, 10, 267–280, https://doi.org/10.5194/essd-10-267-2018, 2018.

Collier, N., Hoffman, F. M., Lawrence, D. M., Keppel-Aleks, G., Koven, C. D., Riley, W. J., Mu, M., and Randerson, J. T.: The International Land Model Benchmarking (ILAMB) system: design, theory, and implementation, J. Adv. Model. Earth Sy., 10, 2731–2754, https://doi.org/10.1029/2018MS001354, 2018.

Collier, N., Hoffman, F. M., Mu, M., Randerson, J. T., Riley, W. J., Koven, C. D., Keppel-Aleks, G., and Lawrence, D. M.: International Land Model Benchmarking (ILAMB) Package (v002.00), RUBISCO Science Focus Area [code], https://doi.org/10.18139/ILAMB.v002.00/1251621, 2023.

Dai, Y., Shangguan, W., Wei, N., Xin, Q., Yuan, H., Zhang, S., Liu, S., Lu, X., Wang, D., and Yan, F.: A review of the global soil property maps for Earth system models, SOIL, 5, 137–158, https://doi.org/10.5194/soil-5-137-2019, 2019.

Dorigo, W. A., Gruber, A., De Jeu, R. A. M., Wagner, W., Stacke, T., Loew, A., Albergel, C., Brocca, L., Chung, D., Parinussa, R. M., and Kidd, R.: Evaluation of the ESA CCI soil moisture product using ground-based observations, Remote Sens. Environ., 162, 380–395, https://doi.org/10.1016/j.rse.2014.07.023, 2015.

Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., and Haas, E.: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions, Remote Sens. Environ., 203, 185–215, https://doi.org/10.1016/j.rse.2017.07.001, 2017.

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016.

FAO: The Digitized Soil Map of the World Including Derived Soil Properties (version 3.6), FAO Land and Water Digital Media Series #1, FAO, Rome, Italy, ISBN 978-92-3-103889-1, 2003.

Green, J. K., Seneviratne, S. I., Berg, A. M., Findell, K. L., Hagemann, S., Lawrence, D. M., and Gentine, P.: Large influence of soil moisture on long-term terrestrial carbon uptake, Nature, 565, 476–479, https://doi.org/10.1038/s41586-018-0848-x, 2019.

Gruber, A., Scanlon, T., van der Schalie, R., Wagner, W., and Dorigo, W.: Evolution of the ESA CCI Soil Moisture climate data records and their underlying merging methodology, Earth Syst. Sci. Data, 11, 717–739, https://doi.org/10.5194/essd-11-717-2019, 2019.

Guswa, A. J., Celia, M. A., and Rodriguez-Iturbe, I.: Models of soil moisture dynamics in ecohydrology: A comparative study, Water Resour. Res., 38, 5-1, https://doi.org/10.1029/2001WR000826, 2002.

Harpold, A. A. and Molotch, N. P.: Sensitivity of soil water availability to changing snowmelt timing in the western US, Geophys. Res. Lett., 42, 8011–8020, https://doi.org/10.1002/2015GL065855, 2015.

Hauser, M., Orth, R., and Seneviratne, S. I.: Role of soil moisture versus recent climate change for the 2010 heat wave in western Russia, Geophys. Res. Lett., 43, 2819–2826, https://doi.org/10.1002/2016GL068036, 2016.

Hengl, T., De Jesus, J. M., MacMillan, R. A., Batjes, N. H., Heuvelink, G. B., Ribeiro, E., Samuel-Rosa, A., Kempen, B., Leenaars, J. G., Walsh, M. G., and Gonzalez, M. R.: SoilGrids1km – global soil information based on automated mapping, PloS one, 9, e105992, https://doi.org/10.1371/journal.pone.0105992, 2014.

Humphrey, V., Berg, A., Ciais, P., Gentine, P., Jung, M., Reichstein, M., Seneviratne, S. I., and Frankenberg, C.: Soil moisture–atmosphere feedback dominates land carbon uptake variability, Nature, 592, 65–69, https://doi.org/10.1038/s41586-021-03325-5, 2021.

Kühnhammer, K., van Haren, J., Kübert, A., Bailey, K., Dubbert, M., Hu, J., and Beyer, M.: Deep roots mitigate drought impacts on tropical trees despite limited quantitative contribution to transpiration, Sci. Total Environ., 893, 164763, https://doi.org/10.1016/j.scitotenv.2023.164763, 2023.

Lawrence, D. M., Fisher, R. A., Koven, C. D., Oleson, K. W., Swenson, S. C., Bonan, G., and Zeng, X.: The Community Land Model version 5: Description of new features, benchmarking, and impact of forcing uncertainty, J. Adv. Model. Earth Sy.s, 11, 4245–4287, https://doi.org/10.1029/2018MS001583, 2019.

Levine, P. A., Randerson, J. T., Swenson, S. C., and Lawrence, D. M.: Evaluating the strength of the land–atmosphere moisture feedback in Earth system models using satellite observations, Hydrol. Earth Syst. Sci., 20, 4837–4856, https://doi.org/10.5194/hess-20-4837-2016, 2016.

Li, L., Lin, X., Fang, Y., Hou, Z. J., Leung, L. R., Wang, Y., Mao, J., Xu, Y., Massoud, E., and Shi, M.: A unified ensemble soil moisture dataset across the continental United States, Sci. Data, 12, 546, https://doi.org/10.1038/s41597-025-04657-x, 2025a.

Li, W., Migliavacca, M., Miralles, D. G., Reichstein, M., Anderegg, W. R., Yang, H., and Orth, R.: Disentangling Effects of Vegetation Structure and Physiology on Land–Atmosphere Coupling, Glob. Change Biol., 31, e70035, https://doi.org/10.1111/gcb.70035, 2025b.

Lu, X., Du, Z., Huang, Y., Lawrence, D., Kluzek, E., Collier, N., Lombardozzi, D., Sobhani, N., Schuur, E. A., and Luo, Y.: Full implementation of matrix approach to biogeochemistry module of CLM5, J. Adv. Model. Earth Sy., 12, e2020MS002105, https://doi.org/10.1029/2020MS002105, 2020.

Martens, B., Miralles, D. G., Lievens, H., van der Schalie, R., de Jeu, R. A. M., Fernández-Prieto, D., Beck, H. E., Dorigo, W. A., and Verhoest, N. E. C.: GLEAM v3: satellite-based land evaporation and root-zone soil moisture, Geosci. Model Dev., 10, 1903–1925, https://doi.org/10.5194/gmd-10-1903-2017, 2017.

Massoud, E., Turmon, M., Reager, J., Hobbs, J., Liu, Z., and David, C. H.: Cascading dynamics of the hydrologic cycle in California explored through observations and model simulations, Geosciences, 10, 71, https://doi.org/10.3390/geosciences10020071, 2020.

Massoud, E. C., Xu, C., Fisher, R. A., Knox, R. G., Walker, A. P., Serbin, S. P., Christoffersen, B. O., Holm, J. A., Kueppers, L. M., Ricciuto, D. M., Wei, L., Johnson, D. J., Chambers, J. Q., Koven, C. D., McDowell, N. G., and Vrugt, J. A.: Identification of key parameters controlling demographically structured vegetation dynamics in a land surface model: CLM4.5(FATES), Geosci. Model Dev., 12, 4133–4164, https://doi.org/10.5194/gmd-12-4133-2019, 2019.

Massoud, E. C., Andrews, L., Reichle, R., Molod, A., Park, J., Ruehr, S., and Girotto, M.: Seasonal forecasting skill for the High Mountain Asia region in the Goddard Earth Observing System, Earth Syst. Dynam., 14, 147–171, https://doi.org/10.5194/esd-14-147-2023, 2023.

Massoud, E. C., Collier, N. O., Xu, M., Shi, M., and Hoffman, F. M.: Diagnosing the representation of surface and layered soil moisture in Earth system models, Environ. Res. Lett., 21, 024032, https://doi.org/10.1088/1748-9326/ae2f72, 2026.

McNally, A., Shukla, S., Arsenault, K. R., Wang, S., Peters-Lidard, C. D., and Verdin, J. P.: Evaluating ESA CCI soil moisture in East Africa, Int. J. Appl. Earth Obs. Geoinf., 48, 96–109, https://doi.org/10.1016/j.jag.2016.01.001, 2016.

Miralles, D. G., Holmes, T. R. H., De Jeu, R. A. M., Gash, J. H., Meesters, A. G. C. A., and Dolman, A. J.: Global land-surface evaporation estimated from satellite-based observations, Hydrol. Earth Syst. Sci., 15, 453–469, https://doi.org/10.5194/hess-15-453-2011, 2011.

Nachtergaele, F., van Velthuizen, H., Verelst, L., Wiberg, D., Henry, M., Chiozza, F., Yigini, Y., Aksoy, E., Batjes, N., Boateng, E., and Fischer, G.: Harmonized World Soil Database version 2.0, FAO and IIASA, Rome, Italy and Laxenburg, Austria, https://doi.org/10.4060/cc3823en, 2023.

Oleson, K. W., Lawrence, D. M., Bonan, G. B., Fisher, R. A., Lawrence, P. J., and Muszala, S. P.: Technical description of version 4.5 of the Community Land Model (CLM), NCAR/TN-503+ STR, 503, https://doi.org/10.5065/D6RR1W7M, 2013.

Peel, M. C., Finlayson, B. L., and McMahon, T. A.: Updated world map of the Köppen–Geiger climate classification, Hydrol. Earth Syst. Sci., 11, 1633–1644, https://doi.org/10.5194/hess-11-1633-2007, 2007.

Preimesberger, W., Scanlon, T., Su, C. -H., Gruber, A., and Dorigo, W.: Homogenization of Structural Breaks in the Global ESA CCI Soil Moisture Multisatellite Climate Data Record, IEEE T. Geosci. Remote Sens., 59, 2845–2862, https://doi.org/10.1109/TGRS.2020.3012896, 2021.

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

Purdy, A. J., Fisher, J. B., Goulden, M. L., Colliander, A., Halverson, G., Tu, K., and Famiglietti, J. S.: SMAP soil moisture improves global evapotranspiration, Remote Sens. Environ., 219, 1–14, https://doi.org/10.1016/j.rse.2018.09.023, 2018.

Qiao, L., Zuo, Z., and Xiao, D.: Evaluation of soil moisture in CMIP6 simulations, J. Climate, 35, 779–800, https://doi.org/10.1175/JCLI-D-20-0827.1, 2022.

Reick, C. H., Gayler, V., Goll, D., Hagemann, S., Heidkamp, M., Nabel, J. E. M. S., Raddatz, T., Roeckner, E., Schnur, R., and Wilkenskjeld, S.: JSBACH 3 – The land component of the MPI Earth System Model: Documentation of version 3.2, Berichte zur Erdsystemforschung, 240, Max-Planck-Institut für Meteorologie, https://doi.org/10.17617/2.3279802, 2021.

Robock, A., Schlosser, C. A., Vinnikov, K. Y., Speranskaya, N. A., Entin, J. K., and Qiu, S.: Evaluation of the AMIP soil moisture simulations, Global Planet. Change, 19, 181–208, https://doi.org/10.1016/S0921-8181(98)00047-2, 1998.

Schumacher, D. L., Keune, J., Dirmeyer, P., and Miralles, D. G.: Drought self-propagation in drylands due to land–atmosphere feedbacks, Nat. Geosci., 15, 262–268, https://doi.org/10.1038/s41561-022-00912-7, 2022.

Seneviratne, S. I., Corti, T., Davin, E. L., Hirschi, M., Jaeger, E. B., Lehner, I., Orlowsky, B., and Teuling, A. J.: Investigating soil moisture–climate interactions in a changing climate: A review, Earth-Sci. Rev., 99, 125–161, https://doi.org/10.1016/j.earscirev.2010.02.004, 2010.

Shangguan, W., Dai, Y., Duan, Q., Liu, B., and Yuan, H.: A global soil data set for earth system modeling, J. Adv. Model. Earth Sy., 6, 249–263, https://doi.org/10.1002/2013MS000293, 2014.

Stocker, B. D., Tumber-Dávila, S. J., Konings, A. G., Anderson, M. C., Hain, C., and Jackson, R. B.: Global patterns of water storage in the rooting zones of vegetation, Nat. Geosci., 16, 250–256, https://doi.org/10.1038/s41561-023-01125-2, 2023.

Talib, J., Müller, O. V., Barton, E. J., Taylor, C. M., and Vidale, P. L.: The Representation of Soil Moisture-Atmosphere Feedbacks across the Tibetan Plateau in CMIP6, Adv. Atmos. Sci., 40, 2063–2081, https://doi.org/10.1007/s00376-023-2296-2, 2023.

Tapiador, F. J., Navarro, A., Jiménez, A., Moreno, R., and García-Ortega, E.: Discrepancies with satellite observations in the spatial structure of global precipitation as derived from global climate models, Q. J. Roy. Meteor. Soc., 144, 419–435, https://doi.org/10.1002/qj.3289, 2018.

Trugman, A. T., Medvigy, D., Mankin, J. S., and Anderegg, W. R. L.: Soil moisture stress as a major driver of carbon cycle uncertainty, Geophys. Res. Lett., 45, 6495–6503, https://doi.org/10.1029/2018GL078131, 2018.

Vogel, M. M., Zscheischler, J., and Seneviratne, S. I.: Varying soil moisture–atmosphere feedbacks explain divergent temperature extremes and precipitation projections in central Europe, Earth Syst. Dynam., 9, 1107–1125, https://doi.org/10.5194/esd-9-1107-2018, 2018.

Wang, A., Kong, X., Chen, Y., and Ma, X.: Evaluation of soil moisture in CMIP6 multi model simulations over conterminous China, J. Geophys. Res.-Atmos., 127, e2022JD037072, https://doi.org/10.1029/2022JD037072, 2022.

Wang, C., Fu, B., Zhang, L., and Xu, Z.: Soil moisture–plant interactions: an ecohydrological review, J. Soil. Sediment., 19, 1–9, https://doi.org/10.1007/s11368-018-2167-0, 2019.

Wang, Y., Mao, J., Jin, M., Hoffman, F. M., Shi, X., Wullschleger, S. D., and Dai, Y.: Development of observation-based global multilayer soil moisture products for 1970 to 2016, Earth Syst. Sci. Data, 13, 4385–4405, https://doi.org/10.5194/essd-13-4385-2021, 2021.

Wu, G., Cai, X., Keenan, T. F., Li, S., Luo, X., Fisher, J. B., Cao, R., Li, F., Purdy, A. J., Zhao, W., and Sun, X.: Evaluating three evapotranspiration estimates from model of different complexity over China using the ILAMB benchmarking system, J. Hydrol., 590, 125553, https://doi.org/10.1016/j.jhydrol.2020.125553, 2020.

Yamaguchi, M., Lang, S. T., Leutbecher, M., Rodwell, M. J., Radnoti, G., and Bormann, N.: Observation-based evaluation of ensemble reliability, Q. J. Roy. Meteor. Soc., 142, 506–514, https://doi.org/10.1002/qj.2675, 2016.

Yang, X., Thornton, P., Ricciuto, D., Wang, Y., and Hoffman, F.: Global evaluation of terrestrial biogeochemistry in the Energy Exascale Earth System Model (E3SM) and the role of the phosphorus cycle in the historical terrestrial carbon balance, Biogeosciences, 20, 2813–2836, https://doi.org/10.5194/bg-20-2813-2023, 2023.

Yuan, S., Quiring, S. M., and Leasor, Z. T.: Historical changes in surface soil moisture over the contiguous United States: An assessment of CMIP6, Geophys. Res. Lett., 48, e2020GL089991, https://doi.org/10.1029/2020GL089991, 2021.

Zeng, J., Peng, J., Zhao, W., Ma, C., and Ma, H.: Microwave remote sensing of soil moisture, Remote Sens., 15, 4243, https://doi.org/10.3390/rs15174243, 2023.

Zuo, Z., Qiao, L., Zhang, R., Chen, D., Piao, S., Xiao, D., and Zhang, K.: Importance of soil moisture conservation in mitigating climate change, Sci. Bull., 69, 1332–1341, https://doi.org/10.1016/j.scib.2024.02.033, 2024.

Articles

Short summary

Soil moisture helps control how plants grow and how land exchanges water and energy with the atmosphere. We evaluated how well 16 Earth System Models represent soil moisture and its links to plant activity using a benchmarking system and observational data. Many models simulate surface soil moisture well but show larger errors deeper in the soil and in colder regions. Models that represent plant–soil water interactions more directly better capture how ecosystems respond to water availability.