Understanding each other’s models: an introduction and a standard representation of 16 global water models to support intercomparison, improvement, and communication

Global water models (GWMs) simulate the terrestrial water cycle, on the global scale, and are used to assess the impacts of climate change on freshwater systems. GWMs are developed within different modeling frameworks and consider different underlying hydrological processes, leading to varied model structures. Furthermore, the equations used to describe 40 various processes take different forms and are generally accessible only from within the individual model codes. These factors


Introduction
Many multi-model intercomparison projects (MIPs) have been designed to provide insights into various Earth system processes. They provide many multi-model ensembles that consisting of multiple models driven by the output of multiple other models. These multi-model ensembles offer the opportunity to inter-compare models for an increased improved understanding of process representation and inter-model differences as well as for model improvement. Some MIPs examples include 60 FireMIP for the fire regime and its drivers (Rabin et al., 2017); CMIP for past, present, and future climate changes and their drivers (Eyring et al., 2016;Kageyama et al., 2018); LakeMIP for physical and biogeochemical processes of lakes (Stepanenko et al., 2010;Thiery et al., 2014); AgMIP for crop growth (Rosenzweig et al., 2013), and WaterMIP or ISIMIP for the water cycle Frieler et al., 2017).
MIPs have encountered many challenges in how to inter-compare models and interpret various model results (von Lampe et 65 al., 2013), realize the standardization of data and scenarios and integrate transdisciplinary knowledge in modeling (Rosenzweig et al., 2013), and identify and reduce uncertainties (Sitch et al., 2008). They have been affected by scientific complexity, input data quality, technical infrastructure, and even cultural and organizational challenges (Eyring et al., 2016). Hence, they have evaluated models' performance in the past and have focused on the inter-models' agreement for the future. Ultimately, MIPs and their multi-model ensembles have been blocked in interpreting inter-model differences because of models' complexity, 70 missing information abouth others models, incomplete or missing information about heterogeneity and dynamism of natural systems (Clark et al., 2011).
Nevertheless, MIPs have underlined the need to go beyond good overall model performance and to improve process representation in the models (Guseva et al., 2020), integrate missing processes (Friend et al., 2014), and reduce uncertainties (Warszawski et al. 2014). MIPs showed that robust similarities exist among models and, as a result, models are not strictly 75 independent of each other given previous and legacy versions, and existing links among modeling communities who indirectly transfer some models' strengths and weaknesses by sharing their ideas and codes (Masson and Knutti, 2011;Knutti et al., We present the modeling approaches and terminology used in global water modelling in section 2. In section 3, we present key characteristics of the models analyzed in the present study. In section 4, we describe our standard writing style of model equations. In section 5, we present similarities and differences among models. In section 6, we present the number of water flows, water storage compartments, and human water use sectors included in the 16 GWMs. In section 7, we discuss potential 115 future research in global water modeling. Finally, Iin section 8, we present recommendations for future multi-model intercomparison projects and extended assessments.

Differences in modeling approaches
On global scale, the terrestrial water cycle is simulated by three different communities that have developed three types of 120 models: (i) the climate community that has developed land surface models (LSMs); (ii) the global hydrological community that has developed global hydrological models (GHMs); (iii) the vegetation community that has developed dynamic global vegetation models (DGVMs). In this study, Lland surface models are CLM4.5, CLM5.0, DBH, JULES-W1, MATSIRO, and ORCHIDEE. Global hydrologic models are CWatM, H08, Mac-PD20, mHM, MPI-HM, PCR-GLOBWB, VIC, WaterGAP2, and WAYS. One model (LPJmL) is a dynamic global vegetation model. 125 However, these three communities focus on specific hydrological and atmospheric processes, as well as anthropogenic impacts.
These key aspects are important for their specific research leading to different modelling approaches, specific evaluation studies of model performance (Archfield et al., 2015), and different field-specific meanings of terminology used (Beven and Young, 2013). Thus, combining the expertise in their key aspects would create a strong synergy and improve the models of these communities, but for this goal, they have to interact with each other, identify their similarities and differences and share 130 experiences. They need to undertake joint experiments, share present and discuss their results, discuss how they influence and depend on each other, and how water modeling can be improved (Cucchi et al., 2020).
The global hydrological community focuses primarily on surface water and groundwater availability, its human interference, and their changes of the water cycle daily to century-scale changes. GHMs simulate the water cycle with its water flows, water compartments, and human water use sectors. These models simulate water abstracted for the irrigation, domestic use, livestock, 135 industry (manufacturing and electricity), and desalination sectors. Furthermore, reservoir management and its streamflow alteration are included. One of their main foci is streamflow simulation and their ability to reproduce historical observations of this variable. They focus on lateral and vertical flows, comprehensively simulating the following surface water bodies: (i) lakes, (ii) wetlands, (iii) rivers.
The climate community focuses on simulating climate and its change over decades and centuries using global climate models 140 (GCMs) and Earth system models (ESMs). A fundamental component of these are the LSMs, which simulate the water and energy exchanges between the land surface and the atmosphere, specifically focusing on vertical flow exchanges. Therefore, these models simulate the energy cycle, the water cycle, the carbon and nitrogen cycles, and vegetation and crop responses to temperature, precipitation, and CO2 concentrations. Further, they represent the soil with a higher vertical resolution and evapotranspiration and snow dynamics in a more physical manner than the global hydrological models (GHMs; Döll et al., 145 2016;Pokhrel et al., 2016;Wada et al., 2017).
The vegetation community focuses on vegetation distribution and growth in an area and over a time interval and is primarily interested in the global carbon cycle. DGVMs simulate shifting vegetation, driven by biogeochemistry, hydrology, and anthropogenic influences. These models simulate the vegetation composition and distribution as well as compartments and flows of carbon and water, for both natural and agricultural ecosystems. Specifically, they model the active response of 150 vegetation to changes in air temperature, precipitation, and CO2 concentrations.
The different viewpoints of these communities are readily visible in very basic concepts such as the solar energy. This is the main driver that connects the processes of terrestrial water cycle simulated by these communities. It specifically links the water and energy budgets with vegetation processes. This link can be exemplified by the latent heat flux of evaporation that describes the heat or the energy required to change the liquid water into water vapor. This heat or energy is locked in the humid air as 155 water vapor, and is released when the humid air touches cold air and water vapor condensation starts. Therefore, continental evaporation is considered to be water loss by the global hydrological and vegetation modeling communities, but a water source (for cloud formation) by the climate community (those that simulate the atmosphere), with implications for agriculture and ecosystems (Abbott et al., 2019). Additionally, transpiration represents a water source for the vegetation community, necessary for photosynthesis and plant growth, and water loss for the global hydrological community. 160 OverallIn the end, these three communities have developed three types of models to simulate the terrestrial water cycle on the global scale despite fundamental differences in model structure, model equations, and output variables. Hence, we decided to include the three types of models in one group and call them global water models (GWMs).

Definitions used in global water modelling
A global water model describes the dynamic behavior of a hydrological system that includes input variables, state variables, 165 parameters, constants, and output variables (Bierkens and van Geer, 2007). State variables define how much water is in a compartment or storage at the beginning of the simulation, and can change in space and time, for example, canopy soil water storage. Their variation is caused by a variation of the input variables, for example, precipitation. State variables are related to the input variables and output variables through parameters, for example, infiltration capacity of the soil. Parameters and coefficients represent numbers that describe a particular characteristic of reality, of the model, of the catchment area or flow 170 domain. Some examples are runoff coefficient, soil porosity, hydraulic conductivity of different soil horizons, maximum soil water storage, maximum canopy water storage, mean residence time in the saturated zone, surface roughness, and vegetation properties (Beven, 2012). A model also uses physical and mathematical constants meaning characteristics of the model that do not change in space and time such as catchment area. Physical constants are physical quantities that can be measured and have a constant value in time, for example, the density of water at 0°C, the density of ice. Mathematical constants cannot be 175 measured, but can be calculated and have a fixed numerical value, for example, e = 2.718…, π = 3.142, i 2 = −1. Ultimately, output variables vary in space and time, for example, streamflow in a river catchment. Ultimately, output variables are results of the simulation and vary in space and time, for example, streamflow in a river catchment.
Thus, a water global model includes many equations written with a programming language in a model code to simulate freshwater systems. During simulations, many parameters receive specific values because they cannot be measured 180 everywhere, therefore, they are calibrated or tuned or estimated from auxiliary data that can be measured (such as lookup tables for vegetation properties based on remote sensing observations), to attain the best match between simulated and observed data. The final steps of a simulation are is to validate (evaluate) simulated model output andwith observed data, to find out how well they fit, and to evaluate the simulated results through analysis and visualization.

Key characteristics of 16 global water models included in the study 185
In this study, we analyze 16 state-of-the-art global water models included in the global water sector of the Inter-Sectoral Impact Model Intercomparison Project phase 2b (ISIMIP2b: Frieler et al., 2017). GWMs include six land surface models (LSMs), nine global hydrologic models (GHMs), and one dynamic global vegetation model (DGVM: LPJmL, Tables 6 and 12). Land surface models are CLM4.5, CLM5.0, DBH, JULES-W1, MATSIRO, and ORCHIDEE. Global hydrologic models are CWatM, H08, Mac-PD20, mHM, MPI-HM, PCR-GLOBWB, VIC, WaterGAP2, and WAYS. Generally, tThese models, as 190 applied within the ISIMIP2b framework, are suitable for application over a catchment size of not smaller than 9,000 km 2 or at least four grid cells (Döll et al., 2003;Hunger and Döll, 2008). For smaller catchments, the results are often not reasonable (e.g., Beck et al., 2016) and require corrections due to inaccurate input data, spatial heterogeneity, and the missing representation of some hydrological processes (Döll et al., 2003;Hunger and Döll, 2008).

General setup 195
These models contribute to an experiment setup designed to assess the impact of historical and future warming under the Paris Agreement (Frieler et al., 2017). They are driven by the same climate input datasets under representative concentration pathways (RCPs) and socioeconomic scenarios (SSPs). The time span of the simulations is divided into pre-industrial (1661-1860), historical  and future (RCP2.6, RCP6.0, and RCP8.5, 2006-2099(2299). These models simulate the terrestrial water cycle, on the global land area (except Antarctica), with a spatial resolution of 0.5° × 0.5° (~55 km × 55 km at 200 the Equator), and quantify water flows, water storage compartments, and human water use under the given climatic and socioeconomic conditions. They do not simulate the ocean component of the global water cycle or water quality. Some of these models also consider reservoir operations.

Temporal and spatial characteristics
Twelve models have a daily temporal resolution (Table 6), while two models have a 6 hourly resolution (CLM4.5 and 205 CLM5.0). MATSIRO has an hourly temporal resolution and. Four models (CLM4.5, CLM5.0, MATSIRO, and ORCHIDEE ) have has 30-min temporal resolution, . while JULES-W1 has a 1-hour resolution. Fifteen models run with a spatial resolution of 0.5°. ORCHIDEE runs with a spatial resolution of 1.0° and has its outputs converted to 0.5° spatial resolution. Some models include subgrids for some components: CLM4.5 and CLM5.0 for vegetation, surface runoff and evapotranspiration; H08 and CWatM for land cover; MPI-HM for surface runoff and evapotranspiration; PCR-GLOBWB for vegetation and land cover; 210 WaterGAP2, CWatM and MATSIRO for snow; VIC for vegetation and elevation. Furthermore, MATSIRO divides a subgrid cell in snow-covered and snow-free portions with flows and storages resolved separately for these portions, both for land and canopy surfaces.

River networks used
Nine models (CLM4.5, CLM5.0, CWatM, H08, LPJmL, MATSIRO, MPI-HM, PCR-GLOBWB, WaterGAP2) use the 30-min 215 global drainage direction map DDM30 (Döll and Lehner, 2002), a raster map with a spatial resolution of 0.5° × 0.5° (~ 50 km × 50 km), to outline the drainage directions of surface water collected by creeks, rivulets, and rivers. In this map, 67,420 discrete grid cells are characterized by their specific drainage direction and are organized into drainage basins that drain from the Earth's land surface into the ocean or inland sinks. The mHM uses a river network (0.5° × 0.5°) upscaled from HydroSHEDS (Lehner et al., 2006). ORCHIDEE uses the river network from the Simulated Topological Networks (STN-30p: 220 Vörösmarty et al., 2000). Five models (DBH, JULES-W1, Mac-PDM.20, VIC, and WAYS) do not use any river routing scheme for the ISIMIP2b because they do not compute streamflow.

Calibration approaches for ISIMIP2b
Six Three GHMs GWMs perform calibration of their hydrological components, using different approaches, for ISIMIP2b (Table 6). CWatM calibrates monthly or daily streamflow for 12 catchments using the Distributed Evolutionary Algorithms in 225 Python (DEAP) approach (Burek et al., 2020), while WaterGAP2 uses a basin-specific approach to match long-term mean annual observed streamflow at the outlet of 1,319 river basins. It considers runoff as a nonlinear function of soil moisture and uses a runoff coefficient plus up to two additional factors for calibration (Müller Schmied et al., 2014;Müller Schmied et al., 2021). Mac-PDM.20 is calibrated using the generalized likelihood uncertainty estimation (GLUE) approach (Smith, 2016). In mHM, calibration of global model parameters is performed against the daily observed streamflow, along with gridded global 230 fields of FLUXNET evaporation (Jung et al., 2011) and a GRACE terrestrial water storage anomaly, using the ERA5 climate forcing (Landerer and Swenson, 2012). VIC uses the source datasets and parameter sets from Nijssen et al. (2001), namely the AVHRR-derived landcover dataset (Hansen et al., 2000) and the FAO soil textures (FAO, 1995), and is sub-sampled to 0.5° × 0.5° via a nearest-neighbor approach. WAYS is calibrated against data from the International Satellite Land Surface Climatology Project (ISLSCP) Initiative II of the University of New Hampshire and GRDC composite monthly runoff data 235 (Fekete et al., 2011), from 1986 to 1995 at a 0.5° spatial resolution. CLM5.0 performs hydrological calibration in a Bayesian framework using a sequential Monte Carlo method (Lawrence et al., 2019). Five Seven models (CLM4.5, CWatM, DBH, JULES-W1, MATSIRO, ORCHIDEE, and PCR-GLOBWB) adjust some parameters according to vegetation or soil properties, but they have no hydrologic calibration. Neither JULES-W1 nor LPJmL calibrates hydrology, although they it do calibrates biophysical processes and crop yield, respectively. MPI-HM and H08 are not hydrologic calibrated. 240

Creating the standard writing style of model equations
In this study, the rationale in finding similarities and differences among 16 GWMs is based on how models simulate the terrestrial water cycle. We created a standard writing style for model equations and used the same symbols to write those equations, following seven steps to achieve our main goal.

Investigation of 16 global water models 245
Generally, the models have different style in describing their structure, defining their variables, and writing their equations.
Furthermore, a unique equation can be implemented in various ways (e.g., discrete vs. analytical form, focusing on flows or water compartments) or parameterized differentlycan use different model parameter values. Therefore, we started our study with a literature review on the 16 GWMs analyzed in the present study. We analyzed the nomenclature of each model to identify a good way of writing the model equations and habits that exist in global water modelling. Another aim was to 250 familiarize ourselves with model equations.

Generation of the lists with water storage compartments, flows, and human water use sectors, included in 16 global water models
In the next step, we assembled a list with water storage compartments and human water use sectors, included in the models, to simulate the terrestrial water cycle. We decided to describe 16 GWMs based on the equations implemented for eight water 255 storage compartments and, six human water use sectors. The analyzed water storage compartments are canopy, snow, soil, groundwater, lake, wetland, reservoir, and river. The human water use sectors are irrigation, domestic (households), livestock, manufacturing, electricity. Thus, the present model intercomparison study is based on the lists presented in Tables 1 to 5.

Creation of glossary with variables definitions
We decided upon clear definitions of the analyzed variables. However, we encountered many ambiguities and challenges in 260 defining the analyzed variables and labeling processes as being similar or different among them. Some examples are presented in the following lines. We present other definitions of water storage compartments, flows, and human water use sectors, used in this paper work, in the supplementary information (Table S84).
We decided to use the expression input data for climate variables of the 16 GWMs to avoid confusion among readers. We define parameterization as changes of model parameter values (Samaniego et al., 2010). 265 We use active vegetation to highlight if models include the photosynthesis scheme in their structure and if they have the ability to simulate actively changes in vegetation, in an area, because of changes in the CO2 concentration, air temperature, and precipitation. We use dynamic vegetation to define changes in vegetation from one geographical area to another because of competitive and biogeographical processes determined by climate change (geographical distribution of plants) or human activities. We decided to use subsurface runoff synonymously with interflow and to define it as the amount of water that leaves 270 the soil layer laterally.
We define baseflow as the low part of the streamflow that is supplied by groundwater, drainage from lakes, wetlands, glaciers, and interflow during long periods when no precipitation or snowmelt occurs. Ultimately, we have excluded the variable baseflow from the analysis because it is not simulated by 16 GWMs in ISIMIP2b.
We discovered that groundwater runoff and baseflow are used synonymously and define the water that leaves groundwater 275 storage. We also found that baseflow and subsurface runoff are used synonymously, and define the amount of water estimated for the third soil layer (VIC). We noticed that MPI-HM includes additional storage, called baseflow storage, that collects the drainage leaving through the bottom of the soil storage and applies a substantial time lag before passing it on to the river storage. In ISIMIP2b, the drainage computed by MPI-HM was submitted as subsurface runoff, but considering that this baseflow storage acts similarly to a groundwater storage, drainage could be used as groundwater recharge in ISIMIP3a/b. 280 Consequently, its outflow could be submitted as groundwater runoff. However, the purpose of this baseflow storage, for MPI-HM, is predominantly to cause a delay in river discharge and not to simulate groundwater in detail.
We decided to define groundwater recharge as the amount of water that reaches the groundwater storage, because of its hydrological meaning. However, we found out that the words drainage (MPI-HM), aquifer recharge (CLM4.5), and groundwater recharge (GHMs) are used synonymously among 16 GWMs. ISIMIP2b relates seepage with groundwater 285 recharge for the models that do not include a groundwater storage, supposing that this water would reach groundwater storage if it would exist.
Another discovery was that throughfall and drip in some models were considered synonyms and they were used to describe precipitation that falls to the ground through canopy spaces (CLM4.5, CLM5.0, MATSIRO). In this case, we decided to separate these words and to define throughfall as being precipitation that falls to the ground through canopy spaces and drip 290 as being precipitation that leaks at the edge of canopy.
In this study, we define streamflow as the volumetric flow rate of water through a river cross-section. Therefore, the streamflow is the water transfer which is routed through a channel towards the ocean or towards an inland sink. We define the total runoff as the (not routed) total amount of water that runs-off the grid-cell, either over the soil surface, or from the subsurface (lateral flow). In some studies, the streamflow is converted to runoff by dividing the streamflow values with the area upstream of the 295 gauging station (for example, the area upstream of station according to the DDM30 river network: Döll and Lehner, 2002).
In summary, in global water modelling, we need to be aware of differences in vocabulary. A widely accepted list of definitions would avoid confusion and facilitate successful interaction and collaboration. Furthermore, we need to clarify hydrological terms to peers from other disciplines, stakeholders, and a general audience (Brunner et al., 2018) to facilitate easier communication, understanding, and analysis. 300

Variable naming
We notated each variable of model equations. We used nultiple subscripts and superscripts to properly identify water storage compartments, flows, and human water use sectors because of the large number of storage compartments included in the model structures. We selected "S" to describe water stock storage, "P" to describe everything connected to precipitation, "E" for everything related to evaporation, "R" for everything related to runoff, "Q" for everything related to streamflow and outflow, 305 and "A" for water abstractions. We used two letters for subscripts and superscripts, ideally, the first two letters of the word, for example, "ca" for canopy; "sn" for snow; "so" for soil, and so on (see list of symbols and glossary in the Supplement), while we used the first letter of each word in case of compounds words such as groundwater ("gw") or surface water ("sw").
We separated subscripts and superscripts from one another using comma. We did not write full words for subscripts and superscripts, because equations became too long and difficult to read and understand. Some of these decisions correspond with 310 some habits that exist in the hydrological community (e.g., gw and sw) and we decided to keep them to make a comfortable and easy workflow for modelers and readers.

Collection of the equations from the modelling teams
In the next step, modelling teams created and provided the model equations, used to provide simulations for ISIMIP2b, according to the generated lists. Each modelling team, involved in this study, internally checked and reviewed its model, based 315 on the model code and peer-review articles mentioned in Table 11 or only on the peer-review articles on model description mentioned in Table 11. In some cases, modelling teams provided the equations using our standard writing style and symbols presented in subsection 4.4, while in other cases using their specific writing style. Therefore, the modelling teams checked the model equations on their plausibility correctness.

Homogenization of the model equations 320
We homogenized all variables and standardized variables' units in Tables S1-S83. We used overleaf platform, an online LaTeX editor with its glossaries package, to homogenize all model equations of 16 GWMs, write some model equations, and rewrite other model equations using our symbols. This online LaTeX editor enabled us an online collaboration, correction of model equations many times, and saving a lot of time in all this process. Therefore, the supplementary information provides an overview of the 16 GWMs, analyzed in this study, and enables readers to understand similarities and differences among 325 these models and identify included water compartments and human water use sectors and their flows. Ultimately, the readers get an overview of hydrological knowledge complexity behind these models (Tables S1-S84S97).

Evaluation of collected information
In the final step, we reevaluated the collected and homogenized model equations for plausibility their consistency with the model code. We found similarities and differences among 16 GWMs analysed in this study. We analyzed the model equations 330 to find the models that simulate the same water flow (e.g., evaporation), the same water storage compartment (e.g., canopy storage), the same human water use sector (e.g., irrigation sector). For example, five models (CWatM, JULES-W1, MATSIRO, MPI-HM, and WAYS) use the same equation to compute potential evapotranspiration in Table S2. Ten models (CWatM, DBH, JULES-W1, LPJmL, mHM, ORCHIDEE, PCR-GLOBWB, VIC, WaterGAP2, and WAYS) compute changes in canopy water storage taking into account the same variables such as total precipitation, throughfall, and canopy evaporation. Other 335 three models (CLM4.5, CLM5.0, and MATSIRO) compute changes in canopy water storage differently than the nine models, by taking into account the precipitation intercepted by canopy storage, liquid and solid throughfall, additional to canopy evaporation (Table S3). We also conclude that 12 models compute canopy evaporation (Tables S3, S7). Therefore, in the next section (section 5), we present our results according to two main parts of the terrestrial water cycle: hydrological part and water use part. The hydrological part includes the eight water storage compartments and their flows, while the water use part includes 340 five human water use sectors and their flows.

Similarities and differences among 16 global water models
Several studies highlighted the need to understand better modeling approaches, model structures, model equations, and similarities and differences among models (Zhao et al., 2017;Veldkampet al., 2018;Schewe et al., 2019). Therefore, in this section, we present some similarities and differences among 16 GWMs in simulating the terrestrial water cycle. This 345 information enables us the interpretation of the different model results found in some model comparison and ensemble studies (Zaherpour et al.,2018;Wartenburger et al., 2018;Scanlon et al., 2019), as well as those by Reinecke et al., 2020;. This information also strengthens our understanding of how these models work.
Briefly, the 16 analyzed GWMs include in their structure similar hydrological processes, but they have different model structures. 350

Canopy water storage.
The changes in canopy water storage depend on how much water evaporates (canopy evaporation) and how much water is intercepted by canopy. Three models do not compute potential evapotranspiration (Table S2, Tables 7 and 8 Thirteen models include canopy water storage in their structure, while three other models do not include it (H08, Mac-PDM.20, and MPI-HM: Table S3, Figure 1). Ten models compute canopy water storage by subtracting the throughfall amount and 360 canopy evaporation from the total precipitation. Other three models (CLM4.5, CLM5.0, and MATSIRO) compute change in canopy water storage by subtracting the liquid or solid throughfall and canopy evaporation from the precipitation intercepted by the canopy storage. MATSIRO is the only model that has two canopy water compartments: one for rainfall interception and one for snowfall interception. It also computes in detail how much water is intercepted by canopies in stormy areas with high wind speeds and in calm areas with low wind speeds. In these areas, precipitation depends, mainly, on leaf area index 365 (LAI) and water deficit in the canopy storage.
Three land surface models (CLM4.5, CLM5.0, and MATSIRO) divide total precipitation into precipitation intercepted by canopy, precipitation that penetrates the canopy and then reaches the ground (throughfall), and precipitation that falls directly on the ground (Tables S4-S6). Therefore, these models distinguish between rainfall and snowfall. Further, they also divide throughfall into liquid and solid phases. 370 Two models compute an interception scheme based on a leaf and stem area index, while seven models use only a leaf area index (Tables 7 and 8). Ten models compute this considering vegetation type (a plant functional type system) (Tables 7 and   8). MPI-HM uses prescribed data taken from Land Surface Parameter dataset version 2 (Hagemann, 2002). PCR-GLOBWB uses HYDE3.2 (Klein Goldewijk, 2017), MIRCA (Portmann et al., 2010), and GlobCover datasets (ESA GlobCover Project,

2005). Generally, prescribed vegetation ignores the decisive interaction between vegetation and runoff and interactions 375
between the atmosphere and Earth's surface (Gerten et al., 2004;McPherson, 2007;Nicholson, 2000). In the ISIMIP2b, the word "prescribed" has two meanings: (i) data which are simulated by other models and provided by the ISIMIP2b framework as input (https://www.isimip.org/gettingstarted/details/38/); (ii) data obtained from satellite observations, other datasets, or maps. Prescribed data highlight some limitations of the models or underline the lack of some processes that were intentionally or non-intentional removed from the model structure, according to the purpose of the model development or other priorities 380 such as time.
Four models (CLM4.5, CLM5.0, LPJmL, and ORCHIDEE; Tables 7 and 8) account for the CO2 fertilization effect, in the LAI estimation, by using a photosynthesis scheme (active vegetation mentioned in section 4.3), and they have the ability to simulate 390 the CO2 effect on plant functioning. Generally, it was found that simulations depend on the number of plant functional types (PFTs) prescribed or defined in the model and on the processes used to estimate plants' ability to adapt, acclimate, and grow 360 in new environmental conditions (Sitch et al., 2008). Sitch et al., 2008 found that simulations on CO2 fertilization effect depend on the number of plant functional types (PFTs) prescribed or defined in the model and on the processes used to estimate plants' ability to adapt, acclimate, and grow in new environmental conditions. 395

Snow water storage
Snow storage accumulates snow below freezing temperatures and declines by melting and surface and/or snowdrift sublimation. GHMs typically use the degree-day method to compute snow accumulation and snowmelt, while LSMs use the energy balance method (Tables 7 and 8, Figure 1). Among GHMs, H08 is the only one that applies the energy balance method to compute snow accumulation and melt. Additionally, three models (CLM4.5, CLM5.0, and CWatM) include glacier storage. 400 CLM4.5 and CLM5.0 use a mechanistic snow module to calculate snow accumulation and melt; therefore, they include multiple snow layers where compaction, melt, refreezing, firn, and other snow related processes take place.
Four models (CLM4.5, CLM5.0, MPI-HM, and VIC) have two water storage compartments for snow: for estimation of frozen water and for liquid water content (Table S8). WaterGAP2 calculates snow accumulation and melting in 100 subgrid cells (Schulze and Döll, 2004;Müller Schmied et al., 2014), while CWatM calculated using 3 to 10 elevation zones per grid. Five 405 models (CLM5.0, DBH, JULES-W1, MATSIRO, and VIC) estimate snow held on the canopy (Table S9). Further, seven models differentially estimate snow under the canopy (Table S10). Five models do not estimate sublimation: Mac-PDM.20, mHM, MPI-HM, PCR-GLOBWB, and WAYS (Table S11). All models simulate snowmelt (Table S12). MATSIRO is the only model that distinguishes between sublimation and evaporation on snow-covered ground and snow-free ground. The number of snow layers is fixed and it varies among 16 GWMs between 1 (most of the GHMs) and 12 (CLM5.0; Tables 7 and  410 8). Most of the GWMs present no upper limit for snow storage (Tables S48 -S51).

Soil water storage
Soil water storage keeps and loses water from flows above and below the ground's surface. Hydrologically, this includes the unsaturated zone or vadose zone, the part of Earth between the land surface and the top of the phreatic zone (water table).
Soil hydrologic processes. Overall, 10 models consider initial infiltration as inflow of the soil storage, while three models 415 (H08, JULES-W1, and WAYS) consider throughfall (Table S14). Mac-PDM.20 considers total precipitation as inflow of soil storage (Table S14). Thus, infiltration, throughfall, and total precipitation have different values among 16 models because the models compute infiltration and throughfall differently, while total precipitation represents the input data for some models.
All models compute surface runoff (Table S20, Figure 1), soil evaporation (Table S24), and infiltration (Table S25), while six models compute interflow (Table S26). H08 computes runoff properties varying according to the climate zone (Table 7). 420 CLM4.5 includes an empirical soil evaporation resistance method, while CLM5.0 includes a mechanistically based method where the soil evaporation is controlled by a dry surface layer. Therefore, CLM5.0 has the ability to model the seasonality of soil evaporation and soil water storage in (semi-)arid regions. It also explicitly simulates spatial variation in soil thickness (0.4 to 8.5 m) and columnar water holding capacity, unlike CLM4.5 (Lawrence et al., 2019). These models have a large number of soil layers, each having moisture storage potential depending on the soil texture. They use the same approach to calculate 425 surface runoff and have the ability to compute liquid runoff and solid runoff from snow capping. Both models consider subsurface runoff as a product of an exponential function of the water table depth and a single coefficient (Niu et al., 2005).
VIC uses the variable infiltration curve (Zhao et al., 1980) to account for the spatial heterogeneity of runoff generation, and assumes that surface runoff from the upper two soil layers is generated by those areas where precipitation exceeds the storage capacity of the soil. The mHM model has one more bucket between the soil storage and groundwater storage named 430 "unsaturated storage" representing the source for interflow and groundwater recharge.
LPJmL was adjusted, and the water from the uppermost soil layers is considered to contribute to surface runoff if excess of storage is calculated according to the infiltration or percolation rates, which depend on soil type. LPJmL routes, what was previously lateral runoff, from "layer 0" (first 20 cm), as surface runoff.
In JULES-W1, water that reaches the soil surface is split between water that infiltrates into the soil and surface runoff. 435 Infiltration takes place at a rate equal to saturated hydraulic conductivity multiplied by an infiltration enhancement factor, which is dependent on the presence and type of vegetation. If a soil layer becomes saturated, the water in excess of saturation is put into the layer below. JULES-W1 also uses a "zero-layer" scheme that does not use explicit model layers to represent snow, instead adapting the topsoil level to represent existent snow processes. In the original "zero-layer", snow scheme has a constant thermal conductivity and density. Bulk thermal conductivity of snow on the surface layer decreases due to both the 440 increased layer thickness and the different conductivities of snow and soil. Surface energy balance and heat flux between the surface layer are controlled by insulation factors and layer thickness (Best et al., 2011). WAYS simulates the water storage and flows in soil only for the entire root zone (Table 8). In the DBH model, runoff is generated directly when soil layer is saturated, or is generated when rainfall intensity is larger than the infiltration rate estimated with the Green-Ampt method (Tang et al., 2006). 445 Two models (CWatM and MPI-HM) have an additional water storage compartment to compute the runoff concentration in a grid cell that has a lag time before entering the river storage compartment (Table S44). Consequently, this storage serves to create a delay between runoff and streamflow, and accounts for the average distance that runoff, generated at a specific point within a grid cell, has to travel before reaching the river. This storage collects water from rivulets and creeks or concentrates runoff in rivulets and creeks before it enters the river storage, because the rivulets and creeks are smaller than the size of a 450 single grid cell and have different water retention properties from the main river channel within the grid cell. Therefore, this compartment does not act as a floodplain, to delay floods, or as overland flow, to express too much water in the soil. In its original structure, MPI-HM named this compartment "overland flow", but we decided to rename it "rivulet storage" to avoid confusion among readers.
Some GWMs compute vertical water movement in unsaturated soils by applying the Richards equation (Richards, 1931;e.g., 455 CLM4.5, CLM5.0, CWatM, JULES-W1, MATSIRO, ORCHIDEE, VIC). However, the Richards equation may not be relevant for the models that have one soil layer because of its complexity and of missing capillary rise (Lee and Abriola, 1999;Farthing and Ogden, 2017). LPJmL uses a percolation scheme to estimate vertical water movement that applies the storage routine technique developed by Arnold et al. (1990) and simulates free water in the soil bucket. DBH uses the Green-Ampt equation to compute infiltration in unsaturated soils. 460 Five models compute capillary rise (CLM4.5, CLM5.0, CWatM, MATSIRO, and PCR-GLOBWB), with CWatM and PCR-GLOBWB using the same approach (Table S28).
ORCHIDEE uses a relatively deeper soil column to account for soil thermalic processes. LPJmL has five hydrologically and 465 thermal active soil layers plus one thermal active soil layer. MPI-HM defines soil storage in terms of the maximum water column, varying between 0 and 5 m; therefore, this cannot be translated into soil depth directly.

Groundwater storage
Groundwater storage, beneath the soil water storage compartment, receives water from drainage (e. g., MPI-HM) or aquifer recharge (e. g., CLM4.5) or groundwater recharge (e. g., WaterGAP2) (Tables 9 and 10). It loses water through capillary rise, 470 groundwater runoff, and groundwater abstraction for human water use. In GWMs, the groundwater compartment simulates hydrologically the saturated zone or phreatic zone (WaterGAP2) or an unconfined aquifer (CLM4.5). Eleven models include groundwater storage in their structure, and most of them have only one groundwater layer (Table S29, Figure 1). In ISIMIP2b, two models (JULES-W1 and LPJmL) consider the water excess from the bottom soil layer as seepage and relate equate this variable with groundwater recharge because they do not have a groundwater compartment. 475 CLM4.5 simulates an unconfined aquifer as a groundwater component, below the saturated soil storage and with a prescribed maximum value (5000 mm), while CLM5.0 simulates an impermeable bedrock with five layers and therefore assumes no groundwater flow as bottom boundary conditions. In CLM4.5, the unconfined aquifer interacts with the saturated soil storage through the water table, whether it is within or below this storage. When the water table is below the soil storage, the aquifer recharge is estimated by applying Darcy's law across the water table (Lawrence et al., 2019). 480 MATSIRO has a dynamic groundwater scheme (Koirala et al., 2014;Pokhrel et al., 2015) in which the number of soil layers in the saturated zone (i.e., groundwater) varies in time between 1 and 13 depending on water table location ( Table 7). The twoway interaction between the unsaturated zone (for which vertical moisture movement is resolved by solving the Richards equation) and the underlying aquifer is simulated through moisture flux exchange at the water table. This flux exchange is determined as the algebraic sum of downward gravity drainage from the unsaturated soil layer overlying the water table and 485 the upward capillary flux (Koirala et al., 2014;Pokhrel et al., 2015). The water balance of the saturated zone is resolved by considering recharge to the groundwater aquifer and groundwater runoff that is determined by using a two-parameter, statistical-dynamical formulation considering soil hydraulic properties and basin geomorphology (Yeh and Eltahir, 2005). The variation in the water table is also determined by the aquifer specific yield.
In Mac-PDM.20, it is assumed that all water in excess of field capacity drains in one day to the deep store, which for ISIMIP2b 490 is used to represent groundwater recharge (Rgwr). The total runoff (qtot) is the sum of direct runoff (qs) plus delayed runoff from the deep soil and groundwater (qsb). This delayed runoff (qsb) is assumed to be a non-linear function of the amount of water held in the groundwater and deep soil store (Table S31). Thus, like with MPI-HM, the purpose of the delayed runoff (or baseflow) is predominantly to cause a delay in river discharge and not to simulate groundwater in detail.
H08 separates groundwater into renewable and one nonrenewable layers (Hanasaki et al., 2008). WaterGAP2 is the only model 495 that simulates the groundwater recharge from surface water bodies in semiarid and arid grid cells (Döll et al., 2014).

Lake storage
Lake storage fills with water through flows above and below the ground and stores water for a certain residence time. It loses 500 water through discharge to other storages, evaporation, groundwater recharge, and water abstraction for human water use. Ten models do not include lakes (Tables 9 and 10; Figure 2). Five models compute evaporation from lakes, three of them based on a PET approach (Table S33), while four models compute outflow from lakes (Table S34). CLM4.5 and CLM5.0 compute the lake storage as virtual storage where the difference between precipitation and evaporation is balanced automatically by their outflow, named lake runoff. CLM4.5 uses constant lake depth, while CLM5.0 uses spatially variable lake depth, and freezing 505 and thawing are included in the lake body (Vanderkelen et al., 2020).
LPJmL treats natural lakes and rivers in a similar way in terms of inputs and output. Lake inputs to a river can also include upstream river inputs to the lake. LPJmL also keeps track of a lake fraction in the river input. WaterGAP2 and CWatM have two types of lake storage: "local lake storage", gets water from runoff resulting within the cell, and "global lake storage", gets water from runoff resulting within the cell and the upstream cell . 510

Reservoir storage
Reservoir storage fills with water behind dams through flows above and below the ground and stores water for a residence time. It loses water through discharge to other storages, evaporation, groundwater recharge, and water abstraction for human water use. Ten models (CLM4.5, CLM5.0, DBH, JULES-W1, Mac-PDM.20, mHM, MPI-HM, ORCHIDEE, VIC, and WAYS) do not include reservoir storage for ISIMIP2b (Tables 9, 10, S35; Figure 2). Six models compute outflow from 515 reservoirs (Table S37), while evaporation from reservoirs is computed by four models (Table S38).
In general, most of the models use the Global Reservoir and Dam database (GRanD: Lehner et al., 2011), but with a different number of active managed reservoirs, used for reservoir operation during simulations. Three models (LPJmL, WaterGAP2 and PCR-GLOBWB) merge more than one reservoir per grid cell into one reservoir, if required.
Four models (CWatM, H08, MATSIRO, and WaterGAP2) use two water compartments, global and local reservoirs, to 520 represent the reservoirs, following the reservoir algorithm developed by H08. However, there are some differences on how the scheme was implemented in the models, mainly, because of model structure, but the approach is essentially the same. These four models use the same approach in selecting active managed reservoirs for reservoir operation, but they use different thresholds. WaterGAP2 considers 1109 active managed reservoirs and handles reservoirs below 0.5 km 3 storage capacity as local lakes. MATSIRO considers only 728 out of 6862 reservoirs for reservoir operation. In MATSIRO, global reservoirs have 525 more than 1 km 3 total storage capacity and "local reservoirs" or "ponds" have less than 1 km 3 (around 6134 reservoirs ;Hanasaki et al., 2006;Pokhrel et al., 2012a and b). H08 considers 963 active managed reservoirs (global reservoirs) and 5824 local reservoirs; therefore, global reservoirs regulate river flow, while local reservoirs do not. Global reservoirs have 4773 km 3 of total storage capacity, while local reservoirs have 1300 km 3 of total storage capacity. In H08, when multiple local reservoirs are present in a grid cell, their capacity is added together. CWatM considers 3663 active managed reservoirs, while PCR-GLOBWB considers 6177. LPJmL includes 4134 reservoirs that become active after the first year of operation. In LPJmL, reservoirs are not managed according to an operation scheme, they are modeled as lakes with a maximum storage amount and the water over this amount is released as reservoir outflow; irrigation water can also be taken from the reservoir.
Five models (CWatM, H08, LPJmL, MATSIRO, and WaterGAP2) use a retrospective reservoir algorithm, while one model (PCR-GLOBWB) uses a prospective reservoir algorithm. The retrospective reservoir algorithm uses river flows and water 535 demand, which were processed in a previous step, while the prospective reservoir algorithm uses forecasts of river flows and water demand (van .

Wetland storage
Wetland storage fills and empties with water similarly to lake and reservoir compartments, except that water use is not satisfied from wetlands. Two models (MPI-HM and WaterGAP2) compute wetland compartment, evaporation, and outflow from land 540 (Tables S39-S42; Figure 2). WaterGAP2 has two types of wetland storage: "local wetland storage", which obtains water from runoff resulting within the cell, and "global wetland storage", which obtains water from runoff resulting within the cell and the upstream cell (Döll et al., 2012).

River storage
River storage is increased by surface and sub-surface runoff. It loses water through streamflow, evaporation, channel 545 transmission, and water abstraction for human water use. Five models (DBH, JULES-W1, Mac-PDM2.0, VIC, WAYS) do not include river storage for ISIMIP2b simulations, because of computational and resource constraints, nor do they compute streamflow (Tables 9, 10, S43, and S46; Figure 2). Four models (LPJmL, MATSIRO, MPI-HM, WaterGAP2) use a linear reservoir cascade approach to compute the water balance of the river storage (Tables 9 and 10). Furthermore, MATSIRO uses Total Runoff Integrating Pathways (TRIP) for river routing through a channel. Three models (CWatM, H08, and LPJmL) 550 consider the minimum release for environmental flow. CWatM adopts a kinematic wave approach, approximation of the Saint-Venant equation (Chow et al., 1998), linked with dynamic reservoir and lake operation. Further, CWatM computes runoff concentrated in creeks and rivulets, with a lag time before entering the river storage, by using a triangular weighting function (Burek et al., 2020). ORCHIDEE includes a river transport module that involves the Simulated Topological Network (STN-  (Tables 9 and 10). LPJmL considers a standard river flow velocity of 1 m s -1 . MPI-HM uses the Manning-Strickler equation only for flow velocity computation in wetlands, while, for rivers, it computes a slopedependent flow velocity following the approach by Sausen et al. (1994). 565 Inflow from upstream grid cell surface water bodies represents the sum of inflow water from neighboring upstream grid cells for CLM4.5, CLM5.0, CWatM, mHM, and WaterGAP2 (Table S45). Additionally, CWatM and WaterGAP2 route this water also through lakes and reservoirs before it reaches its final point. H08 computes it as being the product between a 0.5 m s -1 flow velocity and river storage from upstream grid cells. LPJmL considers it as being the outflow of river storage reduced by evaporation from lakes and reservoirs, while MPI-HM considers it as being the sum of outflow from rivulet storage, 570 groundwater runoff, and streamflow from the upstream grid cells, then reduced by inflow from the wetland of an upstream grid cell. MATSIRO considers it as being the sum of inflow water from the neighboring upstream grid cell multiplied by outflow of river from an upstream grid cell. ORCHIDEE calculates it as being the sum of stream river storage of upstream grid cells divided by topographic index of the retention time and a reduction factor of stream river storage. PCR-GLOBWB takes into account the outflow from river storage, time of process duration, length of river sections, and the coefficient friction 575 of the reservoir weir.
Evaporation from rivers is computed only by three models, CWatM, LPJmL, and PCR-GLOBWB, based on a PET approach (Table S47).

Similarities and differences in simulating human water use sectors
Some GWMs simulate water extracted from surface water compartments and/or a groundwater compartment that is used for 580 human activities. Human water abstraction represents the sum of the water consumed by humans, evaporative water and other water losses (named water consumption), and water returned to the groundwater or surface water compartments (named return flow, being the part of the water not consumed). Generally, three models extract water for human activities from groundwater or surface water bodies (H08, PCR-GLOBWB, and WaterGAP2). Seven models (DBH, JULES-W1, Mac-PDM.20, mHM, ORCHIDEE, VIC, and WAYS) do not include any human water use sectors in their structures (Table 6). 585

Irrigation sector
Irrigation water demand (potential irrigation water abstraction) is computed by nine models (Table S52). Groundwater abstraction for the irrigation sector is simulated by six models (CWatM, H08, MATSIRO, MPI-HM, PCR-GLOBWB, and WaterGAP2: Tables S53), while five models compute the return flow (Table S55). Irrigation surface water abstraction is calculated by nine models (Table S56, Tables S93-S94). CWatM includes a "normal irrigation scheme", to mimic rainfall 590 when the plants need it, and a paddy rice irrigation scheme, to mimic the flooding of the rice area (Table S56).
The water source for the irrigation sector is river for nine models (CLM4.5, CLM5.0, CWatM, H08, LPJmL, MATSIRO, MPI-HM, PCR-GLOBWB, WaterGAP2: Table S93). Six models (CWatM, H08, MATSIRO, MPI-HM, PCR-GLOBWB, WaterGAP2) consider groundwater a source for the irrigation sector (Table S93). Four models take water from lakes for the irrigation sector and five models take water from reservoirs ( Figure 3). Return flows from irrigation sector recharge mainly 595 the soil and groundwater (seven models), while the return flows from domestic and manufacturing recharge mainly rivers (four models; Figure 4).

Domestic, livestock, and industry sectors
Five models (CWatM, H08, MATSIRO, PCR-GLOBWB, and WaterGAP2) simulate water abstraction, water consumption, and return flow for the domestic sector (household : Tables S59-S64). Three models (MATSIRO, PCR-GLOBWB, and 600 CWatM) combine manufacturing and electricity sectors in one sector, the industry sector. CWatM only calculates total abstraction from groundwater or surface water. MATSIRO and LPJmL used prescribed input data for water demand of the domestic and industry sectors, offered by the ISIMIP2b framework. These input datasets provide water consumption, but not return flow from these sectors. Consumption water can return to the atmosphere as evapotranspiration. LPJmL used prescribed input data for domestic and industrial water consumption data, provided by the ISIMIP2b framework, and assumed that only 605 the consumed water amount is withdrawn. MATSIRO used prescribed input data for domestic and industrial water demand and it computed itself the water abstraction and consumption for these sectors. ISIMIP2b does not offer prescribed input data for livestock sector as the global numbers are, compared to other sectors, low . PCR-GLOBWB computes amount of water abstracted and consumed for livestock sector, taken from groundwater and surface water bodies (Tables S65-S68), while WaterGAP2 computes only the amount of water taken from surface water bodies for livestock (Tables  610   S67-S68).
CWatM sums up calculates the water withdrawal in total from all users and afterwards it distributes the total withdrawal to three different sources: (i) surface water, (ii) sustainable groundwater (available renewable groundwater = long-term groundwater recharge of the last 30 years in the analyzed time interval), and (iii) unsustainable groundwater (nonrenewable groundwater = additional water gained by groundwater abstraction in surplus of groundwater recharge; Wada et al. 2012 620 human water use sectors (domestic, livestock, irrigation, industry). Each withdrawal that is depleting the groundwater storage beyond groundwater recharge is using fossil groundwater (unsustainable groundwater).
MATSIRO and WaterGAP2 take similar approaches to compute groundwater abstraction: groundwater abstraction for the irrigation sector is reduced by the sum of groundwater abstraction for the domestic and industry sectors. MPI-HM considers groundwater abstraction as being equal only to groundwater abstraction for the irrigation sector, as other sectors are not 625 included in the model. MPI-HM considers lake abstraction equal to surface water abstraction for the irrigation sector.
H08 considers reservoir abstraction as being sum of monthly water abstraction for the irrigation, industry, and domestic sectors.
LPJmL computes lake and reservoir abstraction by adding up the gross irrigation requirement and household, industry, and livestock demand at the grid cell with the gross irrigation requirement and household, industry, and livestock demand at the downstream grid cell. 630 MATSIRO computes reservoir abstraction by adding up water abstraction from reservoir for the domestic, industry, and irrigation sectors.
PCR-GLOBWB computes lake and reservoir abstraction by adding up water abstraction demand for the industry, irrigation, domestic (household), and livestock sectors.
CLM5.0 considers river abstraction equal to water abstraction for irrigation sector.. 635 WaterGAP2 computes lake, reservoir, and river abstractions as the sum of water abstraction for the irrigation, livestock, domestic, manufacturing, and electricity sectors taken from surface water bodies. The net surface water abstraction is satisfied in WaterGAP2 in the following order: 1) river, 2) global lakes and reservoirs, and 3) local lakes.

Number of water flows, water storage compartments, and human water use sectors included in the 1GWMs
One way of showing the model structures is to count the number of water flows, compartments, and human water use sectors 640 included in each model participating in ISIMIP2b. For example, a model includes three water compartments if it computes canopy water storage, soil water storage, and snow water storage. In this section, we want to increase readers' awareness of model structures and offer the readers a final overview of how the models work, and how many water storage compartments, flows, and human water use sectors are included in their structures.
Generally, GHMs have a high number of water storage compartments because their main purpose is to simulate the water 645 cycle. LSMs and DGVMs have a relatively smaller number of processes (in this count and in this study), but each process has a mechanistic interpretation. LSMs exclude some hydrological processes because they are not relevant for their research purpose, spatial resolution, or cannot be parametrized in a general manner, adding some uncertainty.
In this study, WaterGAP2 includes the highest number of water storage compartments (11; see Seven models do not simulate water used by humans for economic purposes such as irrigation, domestic, livestock, manufacturing, electricity, and desalination ( Figure 6). Three models (CWatM, MATSIRO, and PCR-GLOBWB) combine the manufacturing and electricity sectors in one sector: the industry sector. WaterGAP2 simulates five human water use sectors: 660 irrigation, domestic, livestock, manufacturing, and industry. Two models (PCR-GLOBWB and CWatM) simulate four human water use sectors such as irrigation, domestic, livestock, and industry. H08 simulates four human water use sectors: irrigation, domestic, industry, and desalination. MATSIRO simulates three human water use sectors: irrigation, domestic, and industry.
WaterGAP2 and CWatM have the highest number of water flows (23) to simulate human water use, while MPI-HM has the 665 lowest number (3; Figure 6). Water flows range between 3 (MPI-HM) and 23 (CWatM and WaterGAP2), among five GHMs.
LPJmL used four water flows to simulate irrigation sector.
Ultimately, GWMs include in their structure similar processes, but they are lacking other processes, mentioned in section 5, or include other processes resulting in different model structures or have used other parameter values determining various 670 model results (Figure 3 and 4). Therefore, in section 7 we present future research on model development of 16 modelling groups involved in the present study.

Potential future research of 16 global water models
Each model, analyzed is this study, is continuously updated with the purpose to improve simulations. Therefore, in this section, we summarize model developments done outside ISIMIP framework and potential future research of 16 GWMs (Tables S95  675 and S96). Each modelling team collected and provided these model developments. Some of the 16 analyzed GWMs include, in their original structure, additional water storage compartments, water flows, and human water use sectors that have not been used for ISIMIP2b. Additional information on the 16 analyzed GWMs can be found in the peer-review articles mentioned in Table 11. Some analyzed GWMs have the ability to operate at various spatial-temporal scales: CWatM, CLM4.5, CLM5.0 (3 h time 680 step at around 11 km). CLM team improved the irrigation scheme , the extraction of groundwater (Felfelani et al., 2020), the representation of land cover and land management (Meier et al., 2018;Hirsch et al., 2017;, and the implementation of reservoirs (Hauser et al., 2019). Numerous developments can be followed on the model's GitHub page (https://github.com/ESCOMP/CTSM). 685 CWatM developed a groundwater scheme with linkages to MODFLOW for 5 arcmin and 30 arcsec spatial resolution. The CWatM modeling group plans to develop a reservoir storage including different operation schemes (e.g., energy, irrigation), to increase the temporal resolution (at 1 h), to apply a global calibration also for ungauged catchments, such as using the Budyko framework (Greve et al. 2020), applying both the day-degree method and energy balance method to estimate snow accumulation and melt, and applying several methods to estimate evaporation based on changing CO2 concentration. 690 DBH plans to include human water uses (industrial and domestic sectors), either by developing a new module or using the simulations from other models (e.g., WFaS dataset), to calibrate the model in the new ISIMIP3 simulation round, and to improve the input/output module to read and write netcdf files.
The H08 modeling team used an approximate Bayesian computation technique to calibrate four parameters that are transferred to other regions containing no observations, mainly, based on Köppen-Geiger regions. The modeling group also increased the 695 spatial resolution to 5 min and improved the representation of crops used for biofuel in the model.
The JULES-W1 modeling group plans to make a technical update that will enable the river routing module to estimate discharge.
The LPJmL group developed an improved energy balance module and soil hydrological scheme that can estimate permafrost dynamics (Schaphoff et al., 2013)  MATSIRO modeling group has implemented a land-use change process, terrestrial biogeochemical processes, and an additional crop growth process into MATSIRO to develop a new modeling framework. As key interactions are taken into 705 account and all processes are coupled, important boundary conditions for hydrological simulations can be dynamically simulated internally. This hydrological simulation modeling framework has been coupled with MIROC GCM (global climate model) and has been used as an Earth system model. In addition, the group recently proposed new schemes for lateral groundwater flow, water temperature, and sediment transportation.
Ongoing efforts to improve the realism of hydrological processes in the mHM include the development of the multiscale lake 710 module (mLM), a comprehensible framework for reservoir regulation as well as natural processes in lakes. Near-future developments will focus on a glacial module, to better account for processes in cold regions, as well as coupling it to a groundwater model that will replace the current linear groundwater reservoir.
The MPI-HM modeling group plans to increase the spatial resolution of regional versions. The group currently implemented canopy storage in the latest model version and is developing experiments to integrate reservoir storage. 715 The ORCHIDEE group is focusing on calibration, soil storage, groundwater storage, river storage, reservoir storage and wetland storage (MacBean et al., 2019;Verbeke et al., 2019;Yin et al., 2020;Schrapffer et al., 2020;Mizuochi et al., 2020).
The PCR-GLOBWB modeling group plans to increase the temporal and spatial resolution of the input data, to increase the temporal resolution (3 h) for energy balance calculations and the global spatial resolution (1 km), to improve the soil representation by including the Richards equation, to add more snow elevation layers, to include additional fast runoff 720 component for improving daily discharge simulations, and to improve the reservoir operating scheme .
The VIC modeling group developed different irrigation practices (Shah et al., 2019a and b) and included a reservoir (Dang et al., 2019 and 2020) as well as a groundwater scheme in the model structure.
The WaterGAP2 modeling group plans to update the GRanD dataset used by the model, to include water temperature calculations, to couple the new developed groundwater model (Reinecke et al., 2019), and to update the non-irrigation water 725 use datasets.
The WAYS modeling group plans to develop a new human water use module to consider agricultural, industrial, and domestic water use in the water cycle.

Recommendations for future multi-model intercomparison projects and extended assessments
We assert that this study was realized through a multi-model intercomparison project (ISIMIP) and is based on communication 730 and collaboration. Ideally, through a unified perspective and effective collaborations towards physically realistic hydrologic models (Clark et al., 2015, Clark et al., 2017, communities will fill in existing knowledge gaps (Wagener, 2020), improve the quality of the input data and the processes in the models, and implement the missing processes in the models. In addition to these statements, we propose focusing the effective collaborations on effective wish lists, including specific research questions, goals to answer these questions, methods to achieve the goals, datasets to be used, tasks to be done, and, at the end of the 735 project, a retrospective analysis on what has been done and what could improvement be improvemedoptions. Certainly, collaboration among these communities results in new multi-model intercomparison projects (MIPs) and multi-model ensembles that facilitate new analyses, comparisons, understandings, and improvements.
In global water modelling, there are some more methodologies that can be tested to evaluate multi-model structures and model equations, also considered as hypotheses on runoff generation, for example, Rainfall-Runoff Modelling Toolbox (Wagener et 745 al., 2001); the rejectionist framework (Vaché and McDonnell, 2006) models, and to apply more reasonable regionalization schemes in global water research (Bierkens, 2015). Other methods can be found in frameworks proposed by Romero-Lankao, 2017 andKundzewicz et al., 2018. 755 We recommend, for the benefit of the MIPs, to 1. ) maintain very good documentation of the model code; 2. ) always start research with a list, for example, with water storage compartments, flows, and human water use sectors included in the model structures; 3. ) have clear definitions of the variables, water storage compartments, flows, and human use sectors, describing exactly their role in the model; 4. ) have synonyms for variables, helping to show similarities and differences among models; 5. ) collect all ideas, recommendations, and improvements received from everyone (in our case, they were required to complete 760 our study); 6. ) collaborate and communicate with peers, which was very useful in our study for identifying synonyms among communities; 7. ) describe your model or a model through your eyes and other's people eyes eyes, to identify differences in terminology and assumptions in model code and similarities and differences among the models; and 8. ) invest much time and patience and be meticulous about extracting equations of water storage, flow, and human water use sectors from the model code. 765 We encourage communities to write and convey a clear, simple, and understandable text for large audiences. We consider that simplicity improves communication, and communication starts with a common language, the same words having the same meaning for the sender and the receiver. While trivial in theory, in practice there are some discrepancies among scientists, as well as between scientists and stakeholders by using vocabulary differently in climate impact science (Sultan et al., 2020).
Our future research will include describing the GWMs analyzed in this study, through a standard visualization of the water 770 cycle that will show the water storage compartments, water flows, and human water use sectors included in the ISIMIP2b model structures. These diagrams would be connected with the tables presented in the supplement of the present paper (Tables S1-S83). Another future study might focus on the numerical implementation of each model code.
We note that this review and description study had a positive impact on the modeling groups, motivating them to re-think and re-analyze model structures, equations, and descriptions. We affirm that ISMIP global water sector needs to organize 775 workshops on some parameterization experiments, by changing model parameter values. Other evaluation studies could focus on the equations applied to compute water compartments, water flows, and human water use sectors, as well as considering model outputs, to identify the effect of different water compartments on model results. ISIMIP community could increase the number of regional and pilot studies, that could validate global studies, and the number of cross-sectoral climate impact assessments. 780 Certainly, simulating the terrestrial water cycle on the global scale involves many challenges, as we presented in this study.
Other challenges have also been synthesized by reviewing articles published by the climate, global hydrological, and vegetation communities and have been classified according to the 23 unsolved problems in hydrology (UPH) identified by Blöschl et al., 2019 (Table S97). In summary, these challenges can generally be overcome through innovative and creative collaboration among communities and investment in technical infrastructure. In the end, Arheimer et al., 2020 showed that the catchment 785 models can be applied at a global scale because of the new global datasets, increased computational capacity, new methods to estimate parameters, and collaboration. Ultimately, GWMs may even become specific features of GWMs such as dam operation, human water abstractions, routing approaches, and calibration might become a part of the Earth System Models used to simulate the water cycle at a high resolution, including human water demand and use (Wood et al., 2011;Bierkens et al., 2015). 790

Conclusions
Global water models are used to simulate the climate-water-human system. However, recent evaluation studies show that there is a need to better simulate this system by including other hydrological processes, data on physical infrastructure, societal behavior, cultural behavior, water diversions, and virtual water, as well as by identifying its teleconnections on the global scale (Zaherpour et al., 2018;Veldkamp et al., 2018;Wada et al., 2017). Some studies also underline the need to better explain 795 various model results and better undersand how models work (Reinecke et al., 2021;.
We undertook the present study mainly to find similarities and differences among global water models that will facilitate interpretation of various results, as well as those of further intercomparison studies. We developed a standard equation writing style to achieve this goal. We found that there are some similarities among the models when applying similar equations for the same hydrological processes; however, model structures are different and various values have been used for parameters or 800 variables.
In summary, we mention that our approach was affected by models' complexity and is limited to eight water storage compartments and their flows, and six human water use sectors mainly, because of models' complexity. We conclude that the standard writing style of the equations is useful and necessary for finding similarities and differences among models for each water storage, human water use sector, and water flow. In addition, it can be leveraged for explaining the different model 805 outputs, for classification of the models based on cluster analysis, and for selecting the right model for the right application. It can also be used for drawing a standard schematic visualization of the water cycle, for describing models on ISIMIP and ISIpedia platforms (the open climate-impacts encyclopedia, a part of the ISIMIP, https://www.isipedia.org/), and for understanding how models work. Other modelling teams can apply, in their studies, our lists with water storage compartments, flows, and human water use sectors and the symbols presented in the supplementary information. They can follow our steps 810 in creating a standardized writing style of model equations and they may be aware of some challenges that could encounter.
This study represents a roadmap in finding similarities and differences among models. However, it should be noted that these equations are available only for model versions used for ISIMIP2b.
We consider this study as a blueprint for other studies because it offers a practical approach to identify similarities and differences among models that are necessary for a better interpretation of their various results. 815 We highlight the need to undertake experiments on individual water compartments in order to analyze the equations and parameters used, as well as the results obtained. We also underline the need to make multi-model intercomparison projects: firstly, because they enhance collaboration and communication between modeling groups, communities, countries and cultures; secondly, through communication and collaboration, these projects enhance creativity and open opportunities to finding new ways to improve the models. 820

Supplement
Tables with equations of each water storage, water flow, human water use sectors, datasets used by global water models, models' structures, future research perspectives.

Acknowledgement
The present study was possible through the international ISIMIP framework. CET and HMS are supported by the German 825

Code availability
Information on the availability of source code for the models featured in this article can be found in the Table 11. CET conducted the analysis, visualization, and wrote the original draft. All authors reviewed, commented, and contributed to the final draft.