Understanding each other’s models: a standard representation of global water models to support improvement, intercomparison, and communication

Global water models (GWMs) simulate the terrestrial water cycle, on the global scale, and are used to assess the impacts of climate change on freshwater systems. GWMs are developed within different modeling frameworks and consider different underlying hydrological processes, leading to varied model structures. Furthermore, the equations used to describe 40 various processes take different forms and are generally accessible only from within the individual model codes. These factors https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c © Author(s) 2021. CC BY 4.0 License.

warming, and to assess the impacts from different future socioeconomic development scenarios. More details regarding the ISIMIP framework can be found on the ISIMIP webpage (https://www.isimip.org/) and in Frieler et al., 2017. However, one 175 recommendation for the ISIMIP community will be to increase the number of regional studies and pilot studies that could validate global studies, which could be another effective way of studying climate change.

Steps taken to realize the standard writing style of model equations
All models that provide simulations for the global water sector in ISIMIP2b are included in this paper, although some of them have not yet finished their simulations for this phase. The rationale of describing models is based on how models simulate the 180 water cycle. Therefore, in this study, a global water model describes the dynamic behavior of a hydrological system that includes input variables, state variables, parameters, constants, and output variables (Bierkens and van Geer, 2007). State variables define the state of the water in a compartment or storage at the beginning of the simulation, and can change in space and time, for example, with canopy water storage. Their variation is caused by a variation of the input variables, for example, precipitation. State variables are related to the input variables and output variables through parameters. Parameters may change 185 in space, but do not change in time. Parameters and coefficients represent numbers that describe a particular characteristic of reality, of the model, of the catchment area or flow domain such as runoff coefficient, soil porosity, hydraulic conductivity of different soil horizons, maximum soil water storage, maximum canopy water storage, and mean residence time in the saturated zone (Beven, 2012). Some processes are parameterized, meaning that their values are precisely marked in the computer code and are not calculated by the model itself. Originally, to parameterize by itself means to describe a process or a phenomenon 190 by the use of parameters. Therefore, in hydrological modeling many parameterization methods or techniques (equations running in some algorithms) have been implemented to simulate hydrological processes. Ultimately, a model also uses constants, properties of the model that do not change in space and time, and "output variables," which vary in space and time.
Hence, we describe GWMs based on the equations implemented for eight water storage compartments, five human water use sectors, and desalination. The analyzed water storage compartments are (1) canopy, (2) snow, (3) soil, (4) groundwater, (5) 195 lake, (6) wetland, (7) reservoir, and (8) river. The human water use sectors are (1) irrigation, (2) domestic (households), (3) livestock, (4) manufacturing, (5) electricity. It was extremely challenging to label processes as being similar or different among the 16 models because a unique equation can be implemented in various ways (e.g., discrete vs. analytical form, focusing on flows or water storage compartments) or parameterized differently. Therefore, we created a standard writing style for GWMs equations and used the same symbols to 200 write those equations, thereby highlighting their similarities and differences in a consistent way. In the end, the standard writing style facilitated comparison among models, but it has also raised many challenges, mainly because we decided to write selfexplanatory equations that could be understandable by readers with or without knowledge in hydrology.
In the supplement, we report tables containing various equations for each water storage compartment, human water use sector, and their related water flows (tables and figures in the supplementary information are denoted by an "S" in their numbering). 205 All variables have been harmonized and their units have been standardized (Tables S1-S83).
We made the standard writing style of the model equations following some steps. Firstly, we analyzed the nomenclature of each model with the purpose of identifying the best practices. Secondly, we assembled a list with water storage compartments, flows, and human water use sectors included in the models. Thirdly, we established clear definitions for all variables that have been collected inside a glossary of terms. Finally, we collated all model equations of each inflow and outflow for each water 210 storage compartment and human water use sector.
Multiple subscripts or superscripts are required to properly identify water storage compartments, flows, and human water use sectors because of the large number of compartments that are included in the model structures. Thus, we selected "S" to describe water storage, "P" to describe everything connected to precipitation, "E" everything related to evaporation, "R" everything related to runoff, "Q" everything related to streamflow and outflow, and "A" for water abstractions. We used two 215 letters for subscripts and superscripts, ideally, the first two letters of the word, for example, "ca" for canopy; "sn" for snow; "so" for soil, and so on (see list of symbols and glossary in the Supplement), while we used the first letter of each word in case of compounds words such as groundwater ("gw") or surface water ("sw"). We separated subscripts and superscripts from one another using comma. Some of these decisions correspond with some habits that exist in the hydrological community (e.g., gw and sw) and we decided to keep them to make a comfortable and easy workflow for modelers and readers. We did not write 220 full words for subscripts and superscripts, because equations became too long and difficult to read and understand.
In the end, the standard writing style of the equations is useful and necessary for finding similarities and differences among models for each water storage, human water use sector, and water flow. In addition, it can be leveraged for explaining the different model outputs, for classification of the models based on cluster analysis, and for selecting the right model for the right application. It can also be used for drawing a standard schematic visualization of the water cycle, for describing models 225 on ISIMIP and ISIpedia platforms (the open climate-impacts encyclopedia, a part of the ISIMIP, https://www.isipedia.org/), and for understanding how models work. It should be noted that these equations are available only for model versions used for ISIMIP2b.

Key characteristics of the global water models
The present model intercomparison study is based on the lists presented in Tables 1 to 5 that show water storage compartments, 230 flows, and use sectors included in the GWMs. Generally, the model description is separated into two parts: the hydrological part and human water use part. The hydrological part includes the water cycle processes described as water storage compartments and flows, with flow presented as inflow and outflow of each water storage, while the water use part includes human water use, specifically water abstracted from the groundwater or surface waters. In the supplement, we provide tables that present an overview of the GWMs, helping readers to understand similarities and differences among models, identify 235 included water storage and flows, and get an overview of hydrological knowledge complexity behind models (Tables S1-S103).
Fifteen models run with a spatial resolution of 0.5°. ORCHIDEE runs with a spatial resolution of 1.0° and has its outputs converted to 0.5° spatial resolution. All models divide the land into grids of discrete "cells" (excluding Greenland and 245 Antarctica) and, in addition, some models include subgrids for some components: CLM4.5 and CLM5.0 for vegetation, surface runoff and evapotranspiration; H08 for land cover (via 19 crop types); CWatM for land cover (6 land cover types) and snow (10 elevation zones); MPI-HM for surface runoff and evapotranspiration; PCR-GLOBWB for vegetation and land cover; WaterGAP2 and MATSIRO for snow; VIC for vegetation and elevation. Further, MATSIRO divides a subgrid cell in snowcovered and snow-free portions with flows and storages resolved separately for these portions, both for land and canopy 250 surfaces.
Nine models (CLM4.5, CLM5.0, CWatM, H08, LPJmL, MATSIRO, MPI-HM, PCR-GLOBWB, WaterGAP2) use the 30-min global drainage direction map DDM30 (Döll and Lehner, 2002), a raster map with a spatial resolution of 0.5° × 0.5° (~ 50 km × 50 km), to outline the drainage directions of surface water collected by creeks, rivulets, and rivers. In this map, 66,896 discrete grid cells are connected to each other by their specific drainage direction and are organized into drainage basins that 255 drain from the Earth's land surface (excluding Antarctica) into the ocean or into an inland sink. The mHM uses a river network (0.5° × 0.5°) upscaled from HydroSHEDS (Lehner et al., 2006). ORCHIDEE uses the river network from the Simulated Topological Networks (STN-30p: Vörösmarty et al., 2000). Five models (DBH, JULES-W1, Mac-PDM.20, VIC, and WAYS) do not use any river routing scheme for the ISIMIP2b; therefore, they do not compute streamflow.
MATSIRO and LPJmL use prescribed data for the domestic (household) and industry sectors; therefore, they do not consider 260 the two-way interaction between water system and humans. In hydrological modeling and in the ISIMIP2b, the word "prescribed" has two meanings: (i) data which are simulated by other models and provided by the ISIMIP2b framework as input (https://www.isimip.org/gettingstarted/details/38/); (ii) data obtained from satellite observations, other datasets, or maps.
Prescribed data highlight some limitations of the models or underline the lack of some processes that were intentionally or non-intentional removed from the model structure, according to the purpose of the model development or other priorities such 265 as time.
Six GHMs perform calibration of their hydrological components, using different approaches (Table 6). CWatM calibrates monthly or daily streamflow for 12 catchments using the Distributed Evolutionary Algorithms in Python (DEAP) approach , while WaterGAP2 uses a beta function for the calibration of 1,319 gauged hydrological stations considering runoff as a nonlinear function of soil moisture. WaterGAP2 uses a runoff coefficient and two correction factors to 270 calibrate the simulated and observed streamflow . Mac-PDM.20 is calibrated using the https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. generalized likelihood uncertainty estimation (GLUE) approach, comprising a 100,000-member ensemble based on different model parameterizations run with Watch Forcing Data and evaluated against Global Runoff Data Centre (GRDC) streamflow data (Smith, 2016). In mHM, calibration of global model parameters is performed against the daily observed streamflow of GRDC stations, along with gridded global fields of FLUXNET evaporation (Jung et al., 2011) and a GRACE terrestrial water 275 storage anomaly, using the ERA5 climate forcing (Landerer and Swenson, 2012). VIC uses the source datasets and parameter sets from Nijssen et al. (2001), namely the AVHRR-derived landcover dataset (Hansen et al., 2000) and the FAO soil textures (FAO, 1995), and is sub-sampled to 0.5° × 0.5° via a nearest-neighbor approach. WAYS is calibrated against data from the International Satellite Land Surface Climatology Project (ISLSCP) Initiative II of the University of New Hampshire or GRDC composite monthly runoff data (Fekete et al., 2011), from 1986 to 1995 at a 0.5° spatial resolution. These datasets are composite 280 runoff data that combine simulated water balance model runoff estimates and monitored river streamflow (GRDC). CLM5.0 performs hydrological calibration in a Bayesian framework using a sequential Monte Carlo method (Lawrence et al., 2019).
Five models (CLM4.5, DBH, MATSIRO, ORCHIDEE, and PCR-GLOBWB) adjust some parameters according to vegetation or soil properties, but they have no hydrologic calibration. Neither JULES-W1 nor LPJmL calibrate hydrology, although they do calibrate biophysical processes and crop yield, respectively. MPI-HM and H08 are not hydrologic calibrated. Generally, 285 GHMs are hydrologic calibrated based on their main goal of quantitatively simulating the continental water cycle.

Review of the global water models included in the study
Global water models were developed from the earliest land surface models created by Manabe (1969), Freeze andHarlan (1969), andDeardorff (1978). These first land surface models simulated the terrestrial water cycle by considering vegetation processes, evaporation, soil moisture, and snow cover. Later on, Dooge (1982) identified the two major challenges of global 290 hydrology: scaling and parameterization. Eagleson (1986) declared the necessity of global-scale hydrology. Inevitably, during the 1990s, the first global hydrological models were developed (Alcamo et al., 1997;Vörösmarty et al., 1998, Arnell, 1999. Over the years, many models have been developed and improved and many studies have been done to assess freshwater resources on the global scale (Bierkens, 2015).
In the present study, we analyze the state-of-the-art global water models included in the global water sector of the Inter-Sectoral 295 Impact Model Intercomparison Project (ISIMIP: Frieler et al., 2017). GWMs simulate the terrestrial water cycle, on the global scale, and quantify water flows, water storage compartments, and human water use under past, current, and future climate and socioeconomic conditions. Some of these models also consider reservoir operations. In this study, GWMs do not simulate the ocean component of the global water cycle or water quality. They use input data at 0.5° × 0.5° spatial resolution to obtain their boundary conditions and parameters (Wada et al., 2017). Generally, these models are suitable for application over a minimum 300 catchment size of 9,000 km 2 or at least four grid cells, at 0.5° × 0.5° spatial resolution (Döll et al., 2003;Hunger and Döll, 2008). For smaller catchments, the results are often not reasonable (e.g., Beck et al., 2016) and require some corrections https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License.
(eventual post-process) due to inaccurate input data, spatial heterogeneity, and the lack representation of some hydrological processes, for example, capillary rise, artificial transfers, and pond development (Döll et al., 2003;Hunger and Döll, 2008). Hattermann et al. (2017) highlighted the role of global and regional water models. Global water models assess the large-scale 305 impacts of climate change and its variability, while regional water models assess the small-scale impacts that are specific to a particular river, catchment, or region. Gosling et al. (2017) underlined that the global and regional water models share many similarities regarding runoff simulation results and their conceptual approach to model development, although the GWM results vary more than regional water results.
Ultimately, GWMs have faced many challenges in selecting a good method to estimate water storage compartments, water 310 flows, and human water use sectors. Some of these are presented in the following subsections.

Evaluation of global water model to observations
Many ISIMIP studies have evaluated the performance of GWMs for historical time intervals and have highlighted the importance of certain hydrological processes, in addition to many model shortcomings. For example, Wartenburger et al. (2018) concluded that the values of actual land evapotranspiration are affected by the methods used to estimate 315 evapotranspiration, number of soil layers, model structure, and uncertainties in the climate input datasets. Zaherpour et al. (2018) showed that GWMs overestimated mean and extreme monthly runoff, mostly because the ISIMIP precipitation dataset had too-high values and due to the method used to generate surface runoff. They recommended improving the prediction of low runoff and the magnitude and timing of seasonal cycles, investigating methods to calibrate models, testing models with different parameter values, and examining the interconnected uncertainties (e.g., perturbed parameter ensembles: Gosling, 320 2013). Further, Veldkamp et al. (2018) identified that mean, high and low flows are improved by the parameterization of water abstractions and reservoir operations. However, these are also influenced by uncertainties regarding water abstraction sources, return flow sinks, and the timing of these issues. Masaki et al. (2017) reported that different simulated outflows from reservoirs depend on dam operation algorithms, with similar concepts in some cases, and on the simulated river inflows. Zhao et al. (2017) highlighted the influence of the routing scheme on streamflow timing and magnitude and recommended inclusion of 325 floodplain storage and backwater effects in models. Scanlon et al. (2019) highlighted that GWMs underestimated GRACE-derived seasonal water storages amplitudes in tropical and (semi-)arid basins and overestimated them in northern high-latitude basins. They suggested to increase the number of soil layers in the models, improve the simulation of snow physics by including processes that delay snowmelt, improve evapotranspiration schemes, and add surface water and groundwater storage compartments to some models. 330 GWMs were also evaluated a specific case: the 2003 European heatwave and drought (Schewe et al., 2019). The study showed that the models underestimated the streamflow on some European rivers, where no high anomalies were noticed, and underlined the need to further evaluate and improve the models for extreme conditions and to consider all optimistic and pessimistic results in an ensemble as hypotheses.
Nevertheless, GWMs must be evaluated for historical periods before making future projections, in order to validate their 335 performance and reduce uncertainties Do et al., 2020).

Climate impact assessments with global water models
Historical performance evaluation studies provide context for further work by evaluating modeled projections of climate change on irrigation water requirements (Wada et al. 2013) and the impact on regional and global water scarcity (Schewe et al., 2014) and on hydrological drought (Prudhomme et al., 2014). The first two studies, Wada et al. 2013 and Schewe et al., 340 2014, explained the high variation of projected impacts of climate change on the irrigation sector and river discharge through the differences extant in the model structures. A fundamental conclusion of these studies was that hydrological model uncertainty is higher than climate model uncertainty. Prudhomme et al. (2014) underlined that models project little or even no increase in drought frequency if they include the active response of vegetation to CO2 and to climate change in their structure. Reinecke et al. (2020) highlighted less severe decreases of groundwater recharge, and even increases in some regions, when 345 the CO2 fertilization effect (active vegetation) is considered. Grillakis (2019) found that agricultural droughts (soil moisture droughts) are expected to increase in frequency. Milly and Dunne (2017) concluded that hydrological models overestimated potential evapotranspiration, causing overestimation of actual evapotranspiration and an underestimation of the runoff, in comparison with climate models.
Nevertheless, studies on water scarcity and their results are affected by their methodology, definitions, and assumptions. 350

Uncertainties of the global water models
Multi-model intercomparison studies showed a significant variation in the model results. One explanation could be that global hydrological modeling imposes uncertainties from forcing data, model parameters, processes included or excluded, and numerical algorithms used. Additionally, each modeling group has a different model development concept and purpose.
Ultimately, hydrology is an inexact science influenced by aleatory (random) and epistemic (lack of knowledge) uncertainties 355 (Beven, 2018). Therefore, many models combined in an ensemble approach collect many uncertainties and structural differences.
It has been found that uncertainties of evapotranspiration and snow water equivalent depend on model structures and their algorithms, while uncertainties of runoff depend on climate forcing, specifically, precipitation Hagemann et al., 2013). 360 Other uncertainties derive from meteorological data ; model structure complexity ; parameter estimation (Samaniego et al., 2017); model calibration ; future scenarios of greenhouse gas emissions, land use management, water management, and socio-economic patterns (Wada et al., 2016a).
Finally, many studies concluded that the uncertainty of the hydrological results is primarily determined by the selection of hydrological model and it exceeds the uncertainty caused by selection of climate model or emission scenario (Wada et al.,365 https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. 2013; Schewe et al., 2014, Greve et al., 2018. Therefore, there is a need to better understand the models' structure complexity, their equations, and their approaches, and to improve the quality of the input data.
Some methodologies were also created on the catchment scale and they support the evaluation of multi-model structures and parameterizations, also considered as hypotheses on runoff generation, for example, analytical framework (Wagener et al., 2001); the rejectionist framework (Vaché and McDonnell, 2006)

(Ecohydrologic model, EcH2O
). Therefore, these methods might offer some solutions for reducing the high number of 380 parameters and their values still found in global water models, and to apply more reasonable regionalization schemes in global hydrological research (Bierkens, 2015).
Other methods can also be found in frameworks proposed by Döll andRomero-Lankao, 2017 andKundzewicz et al., 2018. In the end, Arheimer et al., 2020 showed that the catchment models can be applied at a global scale because of the new global datasets, increased computational capacity, new methods to estimate parameters, and collaboration. Thus, GWMs may even 385 become a part of the ESMs used to simulate the water cycle at a high resolution, including human water demand and use (Wood et al., 2011;Bierkens, 2015).

Similarities and differences among global water models
Similarities and differences among models are presented according to eight water stocks, five human water use sectors, and desalination. 390

Similarities and differences in simulating water storage compartments
Canopy water storage. The changes in canopy water storage depend on how much water evaporates (canopy evaporation) and how much water is intercepted by canopy. Thirteen models include canopy water storage in their structure, while three other models do not include it (H08, Mac-PDM.20, and MPI-HM: Table S3). Ten models compute canopy water storage by subtracting the throughfall amount and canopy evaporation from the total precipitation. Other three models (CLM4.5, CLM5.0, 395 and MATSIRO) compute canopy water storage by subtracting the liquid or solid throughfall and canopy evaporation from the https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. precipitation intercepted by the canopy storage. MATSIRO is the only model that has two canopy water compartments: one for rainfall interception and one for snowfall interception. It also computes in detail how much water is intercepted by canopies in stormy areas with high wind speeds and in calm areas with low wind speeds. In these areas, precipitation depends, mainly, on leaf area index (LAI) and water deficit in the canopy storage. 400 Three land surface models (CLM4.5, CLM5.0, and MATSIRO) divide total precipitation into precipitation intercepted by canopy, precipitation that penetrates the canopy and then reaches the ground (throughfall), and precipitation that falls directly on the ground (Tables S4-S6). Further, they also divide throughfall into liquid and solid phases.
Two models compute an interception scheme based on a leaf and stem area index, while seven models use only a leaf area index (Tables 7 and 8). Ten models compute this considering vegetation type (a plant functional type system) (Tables 7 and  405 8). MPI-HM used prescribed data taken from Land Surface Parameter dataset version 2 (Hagemann, 2002). PCR-GLOBWB uses HYDE3.2 (Klein Goldewijk, 2017), MIRCA , and GlobCover datasets (ESA GlobCover Project, 2005). Generally, prescribed vegetation ignores the decisive interaction between vegetation and runoff as well as interactions between the atmosphere and Earth's surface.
Four models (CLM4.5, CLM5.0, LPJmL, and ORCHIDEE; Tables 7 and 8) account for the CO2 fertilization effect, in the LAI estimation, by using a photosynthesis scheme (active vegetation mentioned in section 2.1), and they have the ability to simulate the CO2 effect on plant functioning. Generally, it was found that simulations depend on the number of PFTs prescribed or defined in the model and on the processes used to estimate plants' ability to adapt, acclimate, and grow in new environmental 420 conditions (Sitch et al., 2008).

Snow water storage accumulates snow below freezing and loses snow by melting and surface and/or snowdrift sublimation.
GHMs use the degree-day method to compute snow accumulation and snowmelt, while LSMs use the energy balance method (Tables 7 and 8). Among GHMs, H08 is the only one that applies the energy balance method to compute snow accumulation and melt. Additionally, three models (CLM4.5, CLM5.0, and CWatM) include glacier storage. CLM4.5 and CLM5.0 use a 425 physically based snow module to calculate snow accumulation and melt; therefore, they include multiple snow layers where compaction, melt, refreezing, firn, and other snow related processes take place.
MATSIRO is the only model that distinguishes between sublimation on snow-covered ground and snow-free ground. Snow layers vary between 1 (most of the GHMs) and 12 (CLM5.0; Tables 7 and 8). 435 Soil water storage keeps and loses water from flows above and below the ground's surface. Hydrologically, this includes an unsaturated zone.
CLM4.5 includes an empirical soil evaporation resistance parameterization, while CLM5.0 includes a mechanistically based parameterization where the soil evaporation is controlled by a dry surface layer. Therefore, CLM5.0 has the ability to model the seasonality of soil evaporation and soil water storage in (semi-)arid regions. It also explicitly simulates spatial variation in 445 soil thickness (0.4 to 8.5 m) and columnar water holding capacity, unlike CLM4.5 (Lawrence et al., 2019). These models have a large number of soil layers, each having moisture storage potential depending on the soil texture. They use the same approach to calculate surface runoff and have the ability to compute liquid runoff and solid runoff from snow capping. Both models consider subsurface runoff as a product of an exponential function of the water table depth and a single coefficient (Niu et al., 2005). VIC uses the variable infiltration curve (Zhao et al., 1980) to account for the spatial heterogeneity of runoff generation, 450 and assumes that surface runoff from the upper two soil layers is generated by those areas where precipitation exceeds the storage capacity of the soil. The mHM model has one more bucket between the soil storage and groundwater storage named "unsaturated storage" representing the source for interflow and groundwater recharge.
LPJmL was adjusted, and the water from the uppermost soil layers is considered to contribute to surface runoff if excess of storage is calculated according to the infiltration or percolation rates, which depend on soil type. LPJmL routes, what was 455 previously lateral runoff, from "layer 0" (first 20 cm), as surface runoff.
In JULES-W1, water that reaches the soil surface is split between water that infiltrates into the soil and surface runoff.
Infiltration takes place at a rate equal to saturated hydraulic conductivity multiplied by an infiltration enhancement factor, which is dependent on the presence and type of vegetation. If a soil layer becomes saturated, the water in excess of saturation is put into the layer below. JULES-W1 also uses a "zero-layer" scheme that does not use explicit model layers to represent 460 snow, instead adapting the topsoil level to represent lying snow processes (Best et al., 2011). WAYS simulates the water storage and flows in soil only for the entire root zone (Table 8). In the DBH model, runoff is generated directly when soil layer is saturated, or is generated when rainfall intensity is larger than the infiltration rate estimated with the Green-Ampt method (Tang et al., 2006). https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License.
Two models (CWatM and MPI-HM) have an additional water storage compartment to compute the runoff concentration in a 465 grid cell that has a lag time before entering the river storage compartment (Table S19). Consequently, this storage serves to create a delay between runoff and streamflow, and accounts for the average distance that runoff, generated at a specific point within a grid cell, has to travel before reaching the river. This storage collects water from rivulets and creeks or concentrates runoff in rivulets and creeks before it enters the river storage, because the rivulets and creeks are smaller than the size of a single grid cell and have different water retention properties from the main river channel within the grid cell. Therefore, this 470 compartment does not act as a floodplain, to delay floods, or as overland flow, to express too much water in the soil. In its original structure, MPI-HM named this compartment "overland flow", but we decided to rename it "rivulet storage" to avoid confusion among readers.
Some GWMs compute vertical water movement in unsaturated soils by applying the Richards equation (Richards, 1931;e.g., CLM4.5, CLM5.0, CWatM, JULES-W1, MATSIRO, ORCHIDEE, VIC). However, the Richards equation might be not 475 relevant for the models that have one soil layer. LPJmL uses a percolation scheme to estimate vertical water movement that applies the storage routine technique developed by Arnold et al. (1990) and simulates free water in the soil bucket. DBH uses the Green-Ampt equation to compute infiltration in unsaturated soils.
Soil column configuration. Number of soil layers ranges between 1 (H08, MPI-HM, and WaterGAP2) and 25 (20 soil layers + 5 bedrock layers: CLM5.0), while total soil depth is between 1 m (H08) and 49.6 m (CLM5.0; Tables 7 and 8). ORCHIDEE uses a relatively deeper soil column to account for soil thermic. LPJmL has five hydrologically and thermal active soil layers plus one thermal active soil layer. MPI-HM defines soil storage in terms of the maximum water column, varying between 0 485 and 5 m; therefore this cannot be translated into soil depth directly. Five models (CLM4.5, CLM5.0, CWatM, MATSIRO, and VIC) compute frozen soil (Table S13).
Groundwater storage, beneath the soil water storage compartment, receives water from seepage and groundwater recharge.
It loses water through capillary rise, groundwater runoff, and abstraction for human water use. Hydrologically, it includes the saturated zone or phreatic zone. Eleven models include groundwater storage in their structure, and most of them have only one 490 groundwater layer (Tables 9, 10, S29). In ISIMIP2b, two models (JULES-W1 and LPJmL) consider the water excess from the bottom soil layer as seepage and relate this variable with groundwater runoff and groundwater recharge because they do not have a groundwater compartment.
CLM4.5 simulates an unconfined aquifer parameterization as a groundwater component, below the saturated soil storage and with a prescribed maximum value (5000 mm), while CLM5.0 simulates an impermeable bedrock with five layers and therefore 495 assumes no groundwater flow as bottom boundary conditions. In CLM4.5, the unconfined aquifer interacts with the saturated soil storage through the water table, whether it is within or below this storage. When the water table is below the soil storage, the aquifer recharge is estimated by applying Darcy's law across the water table (Lawrence et al., 2019). MATSIRO has a dynamic groundwater scheme (Koirala et al., 2014;Pokhrel et al., 2015) in which the number of soil layers in the saturated zone (i.e., groundwater) varies in time depending on water table location between 1 and 13 ( Table 7). The two-500 way interaction between the unsaturated zone (for which vertical moisture movement is resolved by solving the Richards equation) and the underlying aquifer is simulated through moisture flux exchange at the water table. This flux exchange is determined as the algebraic sum of downward gravity drainage from the unsaturated soil layer overlying the water table and the upward capillary flux (Koirala et al., 2014;Pokhrel et al., 2015). The water balance of the saturated zone is resolved by considering recharge to the groundwater aquifer and groundwater runoff that is determined by using a two-parameter, 505 statistical-dynamical formulation considering soil hydraulic properties and basin geomorphology (Yeh and Eltahir, 2005). The variation in the water table is also determined by the aquifer specific yield.
In Mac-PDM.20, it is assumed that all water in excess of field capacity drains in one day to the deep store, which for ISIMIP2b is used to represent groundwater recharge (Rgwr). The total runoff (qtot) is the sum of direct runoff (qs) plus delayed runoff from the deep soil and groundwater (qsb). This delayed runoff (qsb) is assumed to be a non-linear function of the amount of water 510 held in the groundwater and deep soil store (Table S31). Thus, like with MPI-HM, the purpose of the delayed runoff (or baseflow) is predominantly to cause a delay in river discharge and not to simulate groundwater in detail.
H08 separates groundwater into renewable and one nonrenewable layers (Hanasaki et al., 2008). WaterGAP2 is the only model that simulates the groundwater recharge from surface water bodies in semiarid and arid grid cells .
Lake storage fills with water through flows above and below the ground and stores water for a certain residence time. It loses water through discharge to other storages, evaporation, groundwater recharge, and water abstraction for human water use. Ten models do not include lakes (Tables 9 and 10). Five models compute evaporation from lakes, three of them based on a PET approach (Table S33), while four models compute outflow from lakes (Table S34). CLM4.5 and CLM5.0 compute the lake 520 storage as virtual storage where the difference between precipitation and evaporation is balanced automatically by their outflow, named lake runoff. CLM4.5 uses constant lake depth, while CLM5.0 uses spatially variable lake depth, and freezing and thawing are included in the lake body (Vanderkelen et al., 2020).
LPJmL treats natural lakes and rivers in a similar way in terms of inputs and output. Lake inputs to a river can also include upstream river inputs to the lake. LPJmL also keeps track of a lake fraction in the river input. WaterGAP2 and CWatM have 525 two types of lake storage: "local lake storage", gets water from runoff resulting within the cell, and "global lake storage", gets water from runoff resulting within the cell and the upstream cell (Döll et al., 2012). For ISIMIP2b, MPI-HM has used the prescribed wetlands and lakes extent, taken from the Land Surface Parameter dataset 2 (Hagemann, 2002).
In general, most of the models use the Global Reservoir and Dam database (GRanD: Lehner et al., 2011), but with a different 535 number of active managed reservoirs, used for reservoir operation during simulations. Three models (LPJmL, WaterGAP2 and PCR-GLOBWB) merge more than one reservoir per grid cell into one reservoir, if required.
Four models (CWatM, H08, MATSIRO, and WaterGAP2) use two water compartments, global and local reservoirs, to represent the reservoirs, following the reservoir algorithm developed by H08. However, there are some differences on how the scheme was implemented in the models, mainly, because of model structure, but the approach is essentially the same. These 540 four models use the same approach in selecting active managed reservoirs for reservoir operation, but they use different thresholds. WaterGAP2 considers 1109 active managed reservoirs and handles reservoirs below 0.5 km 3 storage capacity as local lakes. MATSIRO considers only 728 out of 6862 reservoirs for reservoir operation. In MATSIRO, global reservoirs have more than 1 km 3 total storage capacity and "local reservoirs" or "ponds" have less than 1 km 3 (around 6134 reservoirs ;Hanasaki et al., 2006;Pokhrel et al., 2012a and b). H08 considers 963 active managed reservoirs (global reservoirs) and 5824 local 545 reservoirs; therefore, global reservoirs regulate river flow, while local reservoirs do not. Global reservoirs have 4773 km 3 of total storage capacity, while local reservoirs have 1300 km 3 of total storage capacity. In H08, when multiple local reservoirs are present in a grid cell, their capacity is added together. CWatM considers 3663 active managed reservoirs, while PCR-GLOBWB considers 6177. LPJmL includes 4134 reservoirs that become active after the first year of operation. In LPJmL, reservoirs are not managed according to an operation scheme, they are modeled as lakes with a maximum storage amount and 550 the water over this amount is released as reservoir outflow; irrigation water can also be taken from the reservoir.
Five models (CWatM, H08, LPJmL, MATSIRO, and WaterGAP2) use a retrospective reservoir algorithm, while one model (PCR-GLOBWB) uses a prospective reservoir algorithm. The retrospective reservoir algorithm uses river flows and water demand, which were processed in a previous step, while the prospective reservoir algorithm uses predicted river flows and water demand (van . 555 Wetlands storage fills and empties with water similarly to lake and reservoir compartments. Two models (MPI-HM and WaterGAP2) compute wetland compartment, evaporation, and outflow from land (Tables S39-S42). WaterGAP2 has two types of wetland storage: "local wetland storage", which obtains water from runoff resulting within the cell, and "global wetland storage", which obtains water from runoff resulting within the cell and the upstream cell (Döll et al., 2012).
River storage fills with water through flows above and below the ground. It loses water through streamflow, evaporation, 560 channel transmission, and water abstraction for human water use. Five models (DBH, JULES-W1, Mac-PDM2.0, VIC, WAYS) do not include river storage for ISIMIP2b simulations, because of computational and resource constraints, nor do they compute streamflow (Tables 9, 10, S43, and S46). Four models (LPJmL, MATSIRO, MPI-HM, WaterGAP2) use a linear reservoir cascade approach to compute the water balance of the river storage (Tables 9 and 10) Venant equation (Chow et al., 1998), linked with dynamic reservoir and lake operation. Further, CWatM computes runoff concentrated in creeks and rivulets, with a lag time before entering the river storage, by using a triangular weighting function . ORCHIDEE includes a river transport module that involves the Simulated Topological Network (STN-  Sausen et al. (1994). 580 Inflow from upstream grid cell surface water bodies represents the sum of inflow water from neighboring upstream grid cells for CLM4.5, CLM5.0, CWatM, mHM, and WaterGAP2 (Table S45). Additionally, CWatM and WaterGAP2 route this water also through lakes and reservoirs before it reaches its final point. H08 computes it as being the product between a 0.5 m s -1 flow velocity and river storage from upstream grid cells. LPJmL considers it as being the outflow of river storage reduced by evaporation from lakes and reservoirs, while MPI-HM considers it as being the sum of outflow from rivulet storage, 585 groundwater runoff, and streamflow from the upstream grid cells, then reduced by inflow from the wetland of an upstream grid cell. MATSIRO considers it as being the sum of inflow water from the neighboring upstream grid cell multiplied by outflow of river from an upstream grid cell. ORCHIDEE calculates it as being the sum of stream river storage of upstream grid cells divided by topographic index of the retention time and a reduction factor of stream river storage. PCR-GLOBWB takes into account the outflow from river storage, time of process duration, length of river sections, and the coefficient friction 590 of the reservoir weir.
Evaporation from rivers is computed only by three models, CWatM, LPJmL, and PCR-GLOBWB, based on a PET approach (Table S47).

Similarities and differences in simulating human water use sectors
Some GWMs simulate water extracted from surface water compartments and/or a groundwater compartment that is used for 595 human activities. Human water abstraction represents the sum of the water consumed by humans, evaporative and speculative water losses (named water consumption), and water returned to the groundwater or surface water compartments (named return flow, being the part of the water not consumed). Generally, three models extract water for human activities from groundwater https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. or surface water bodies (H08, PCR-GLOBWB, and WaterGAP2). Seven models (DBH, JULES-W1, Mac-PDM.20, mHM, ORCHIDEE, VIC, and WAYS) do not include any human water use sectors in their structures (Table 6). 600 Irrigation sector. Irrigation water demand (potential irrigation water abstraction) is computed by three models (Table S52).
Groundwater abstraction and its consumption for the irrigation sector is simulated by five models (CWatM, H08, MATSIRO, MPI-HM, and WaterGAP2: Tables S53 and S54), while three models explicitly compute the return flow (Table S55). Irrigation surface water abstraction is calculated by nine models (Table S56, Tables S100-S101). CWatM includes a "normal irrigation scheme", to mimic rainfall when the plants need it, and a paddy rice irrigation scheme, to mimic the flooding of the rice area 605 (Table S56).
The main water source for the irrigation sector is river for nine models (CLM4.5, CLM5.0, CWatM, H08, LPJmL, MATSIRO, MPI-HM, PCR-GLOBWB, WaterGAP2), and then the secondary source is groundwater for six models (CWatM, H08, MATSIRO, MPI-HM, PCR-GLOBWB, WaterGAP2). Five models take water from lakes for the irrigation sector and four models take water from reservoirs. Only two models take water for irrigation from the ocean (Figure 1). Return flows from 610 irrigation sector recharge the soil (seven models), groundwater (seven models), and rivers (six models), while the return flows from domestic and manufacturing recharge lakes (three models), reservoirs (two models), and rivers (five models; Figure 2).

Domestic, livestock, and industry sectors.
Generally, four models (H08, MATSIRO, PCR-GLOBWB, and WaterGAP2) simulate water abstraction, water consumption, and return flow for the domestic (household : Tables S59-S64) and manufacturing sectors (Tables S69-S74). MATSIRO and LPJmL used prescribed data for water demand of the domestic and 615 industry sectors, offered by the ISIMIP2b framework, representing annual sums divided evenly over all days. These input datasets provide water consumption, but not return flow from these sectors. Consumption water can return to the atmosphere as evapotranspiration. LPJmL used prescribed data for domestic and industrial water consumption data and assumed that only the consumed water amount is withdrawn. MATSIRO used prescribed data for domestic and industrial water demand and it computed itself the water abstraction and consumption for these sectors, by applying a simple approach. ISIMIP2b does not 620 offer prescribed data for livestock sector. MATSIRO combines manufacturing and electricity sectors in one sector, the industry sector. PCR-GLOBWB computes amount of water abstracted and consumed for livestock sector, taken from groundwater and surface water bodies (Tables S65-S68), while WaterGAP2 computes only the amount of water taken from surface water bodies (Tables S67-S68). WaterGAP2 is the only model that computes amount of water abstracted and consumed for electricity sector (Tables S75-S76). 625 (Table S77) is computed differently by MATSIRO, MPI-HM, and WaterGAP2. MATSIRO and WaterGAP2 take similar approaches: groundwater abstraction for the irrigation sector is reduced by the sum of groundwater abstraction for the domestic and industry sectors. MPI-HM considers the total as being equal only to groundwater abstraction for the irrigation sector, as other sectors are not included in the model. (Table S78)  equal to surface water abstraction for the irrigation sector, while PCR-GLOBWB sums water abstraction demand for the industry, irrigation, domestic (household), and livestock sectors. WaterGAP2 computes it as the sum of water abstraction for the irrigation, livestock, domestic, manufacturing, and electricity sectors taken from surface water bodies. The net surface 635 water abstraction is satisfied in WaterGAP2 in the following order: 1) river, 2) global lakes and reservoirs, and 3) local lakes. (Table S79) is computed differently by H08, LPJmL, MATSIRO, PCR-GLOBWB, and

Total reservoir abstraction
WaterGAP2. H08 considers it as being sum of monthly water abstraction for the irrigation, industry, and domestic sectors.
LPJmL adds up the gross irrigation requirement and household, industry, and livestock demand at the grid cell with the gross irrigation requirement and household, industry, and livestock demand at the downstream grid cell (similar to lake abstraction). 640 MATSIRO adds up water abstraction from reservoir for the domestic, industry, and irrigation sectors, while PCR-GLOBWB adds water abstraction demand for the industry, irrigation, domestic (household), and livestock sectors (similar to lake abstraction). WaterGAP2 sums up water abstraction for the irrigation, livestock, domestic, manufacturing, and electricity sectors taken from surface water bodies. The net surface water abstraction is satisfied in WaterGAP2 in the following order:

1) rivers, 2) global lakes and reservoirs and 3) local lakes. 645
Total river abstraction (Table S80) is computed by CLM5.0 and WaterGAP2. WaterGAP2 considers it as the sum of water abstraction for the irrigation, livestock, domestic, manufacturing, and electricity sectors taken from surface water bodies. The net surface water abstraction is satisfied in WaterGAP2 in the following order: 1) rivers, 2) global lakes and reservoirs and 3) local lakes (similar to lake and reservoir is equal to surface water bodies).

Similarities and differences in simulating desalination 650
Seawater abstraction, consumption, and return flows (Tables S81-S83) are computed only by H08 .
Seawater abstraction represents the sum of seawater abstraction for the municipal and industry sectors. Three conditions must be met in order to use desalination: i) GDP > USD 14,000 person year -1 in terms of purchasing power parity (PPP); ii) humidity index below 8%; iii) within three grid cells of the seashore. It is assumed that seawater desalination is not used for irrigation and that all demand for municipal and industrial water is abstracted by desalination if available. In the context that desalination 655 is not used for irrigation, seawater consumption represents seawater abstraction weighted by the ratio of consumption to withdrawal, which is equal to 0.1 and 0.15 for industrial and municipal water use. Return flow from seawater abstraction represents seawater abstraction weighted by the non-used fraction (0.1 and 0.15 for industrial and municipal water use) and proportion lost during delivery (set to zero).

Examples of how parameterization can differ between GWMs 660
Different equations used by GWMs led to different model results, for example, different evapotranspiration methods led to significant differences in runoff estimation (Gosling and Arnell, 2011b;Kingston et al., 2009 Groundwater recharge (Table S30) is computed by 14 models. JULES-W1 and LPJmL do not include in their structure groundwater storage and seepage (the water that seeps from the last soil layer), which was reported as groundwater recharge and groundwater runoff for ISIMIP2b. CLM4.5 and CLM5.0 use the same approach to compute groundwater recharge, by using the concept of soil matrix potential and considering the hydraulic conductivity of the layer containing the water table.
CWatM and PCR-GLOBWB use the same approach by reducing percolation with capillary rise, but CWatM also considers 670 preferential flow as inflow. DBH estimates it depending on potential soil water content multiplied by the soil depth layer and maximized by soil hydraulic conductivity. Three models (H08, WaterGAP2, and WAYS) consider it as being the minimum value between maximum groundwater recharge and total runoff from land, weighted by a relief-related factor, a soil texturerelated factor, a hydrogeology-related factor, and a permafrost-and/or glacier-related factor (Döll and Fiedler, 2008). Further, WaterGAP2 sets (semi-)arid grid cells, sandy texture, and grid cells with throughfall equal or below 12.5 (mm day −1 ) to 0. 675 MATSIRO estimated groundwater recharge as being the variation of unfrozen soil moisture over time. MPI-HM equals groundwater recharge to percolation. ORCHIDEE estimates it depending on relative soil water content, but it is capped down to 0 and maximized by groundwater runoff. (Table S7). Another example is that the models estimate differently the maximum value of canopy storage and leaf area index (LAI) values. WaterGAP2 and WAYS estimate the maximum value of canopy storage 680 by multiplying 0.3 mm with LAI. Further, WAYS estimates seasonal LAI depending on the growing-season index, day length, and the actual root zone water storage. The root zone soil moisture stress parameter is fixed at 0.07, while WaterGAP2 estimates LAI based on a simple growth model and based on land cover characteristics (Table S7). VIC multiplies 0.2 mm by the monthly LAI. Additional, VIC takes into account aerodynamic and architectural resistances. CWatM equals maximum value of canopy storage with LAI that varies every 10 days depending on land use classes. 685

How many water flows, water storage compartments, and human water use sectors are included in the GWMs?
One way of showing the number of equations and/or parameterization schemes that comprise a model is to count the number of water flows, storage compartments, and use sectors existent in each model participating in ISIMIP2b. Generally, GHMs have a high number of water storage compartments because their main purpose is to simulate the water cycle. LSMs and DGVMs have a relatively smaller number of process (in this count), but each process is simulated in a more sophisticated way 690 or has a physically based representation. LSMs exclude some hydrological processes because they are not relevant for their research purpose, spatial resolution, or cannot be parametrized in a general manner, adding some uncertainty.
In this study, WaterGAP2 includes the highest number of water storage compartments (11; see  Figure 4).

Challenges in making this intercomparison study
We encountered many challenges in harmonizing terminology among the 16 global water models, as well as among climate, 705 global hydrological, and vegetation communities. It was challenging to intercompare the global water models using their different style in describing the model structure, defining the variables, and writing the model equations. Therefore, we decided to create a standard writing style for model equations, presented in subsection 3.2, and to decide upon clear definitions of the analyzed variables.
We discovered that, in some models, groundwater runoff and baseflow are used synonymously and define the water that leaves 710 groundwater storage. We also found that baseflow and subsurface runoff are used synonymously, and define the amount of water estimated for the third soil layer (VIC). Hence, we decided to use subsurface runoff synonymously with interflow and to define it as the amount of water that leaves the soil layer laterally. In this paper, baseflow is considered to be the low part of the streamflow that is supplied by groundwater, drainage from lakes, wetlands, glaciers, and interflow during long periods when no precipitation or snowmelt occurs. Ultimately, we have excluded the variable baseflow from the analysis because it is 715 not simulated by GWMs in ISIMIP2b.
MPI-HM includes additional storage, called baseflow storage, that collects the drainage leaving through the bottom of the soil storage and applies a substantial time lag before passing it on to the river storage. In ISIMIP2b, the drainage computed by MPI-HM was submitted as subsurface runoff, but considering that this baseflow storage acts similarly to a groundwater storage, drainage could be used as groundwater recharge in ISIMIP3a/b. Consequently, its outflow could be submitted as 720 groundwater runoff. However, the purpose of this baseflow storage is predominantly to cause a delay in river discharge and not to simulate groundwater in detail.
We also found out that the words drainage (MPI-HM), aquifer recharge (CLM4.5), and groundwater recharge (GHMs) are used synonymously to define the amount of water that reaches the groundwater storage. In this case, we decided to use only groundwater recharge, because of its hydrological meaning. 725 We define percolation as the amount of water that infiltrates beyond the root zone and seepage as the amount of water that leaks at the bottom of the soil storage. We relate seepage with groundwater recharge and groundwater runoff for the models that do not include a groundwater storage, supposing that this water would reach groundwater storage if it would exist. https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License.
Another discovery was that throughfall and drip in some models were considered synonyms and they were used to describe precipitation that falls to the ground through canopy spaces (CLM4.5, CLM5.0, MATSIRO). In this case, we decided to 730 separate these words and to define throughfall as being precipitation that falls to the ground through canopy spaces and drip as being precipitation that leaks at the edge of canopy.
We conclude that within hydrology and among communities there is great confusion regarding the terminology used, including the variables and their definitions. Therefore, we need to communicate, interact, and engage to clarify the meaning of the words and processes, and to facilitate easier communication, understanding, and analysis. 735

Challenges in global hydrological modeling
Simulating the terrestrial water cycle on the global scale involves many challenges. These various challenges were identified by reviewing articles published by the climate, global hydrological, and vegetation communities (Table 11). The challenges have been classified according to the 23 unsolved problems in hydrology (UPH), identified by Blöschl et al., 2019, to harmonize the efforts of the global and catchment hydrological communities. These challenges can generally be overcome 740 through the development of new datasets, innovative and creative collaboration among communities, and investment in technical infrastructure.

Recommendations for multi-model intercomparison projects
Multi-model intercomparison projects (MIPs) are based on communication and collaboration. Ideally, through collaboration, communities will fill in existing knowledge gaps, improve the quality of the input data and the processes in the models, and 745 implement the missing processes in the models.
Wagener et al., 2020 well described the hydrological knowledge gaps as hydrologic lions, similar to the knowledge gaps of medieval maps represented as lions. They proposed focusing hydrological research on openly shared perceptual models, inclusion of metadata for each hydrologic study (e.g., location and time period covered by a study), and effective knowledge accumulation. In addition to these statements, we also propose focusing on effective collaboration that starts with effective 750 wish lists, including specific research questions, goals to answer these questions, methods to achieve the goals, datasets to be used, and tasks to be done.
We encourage communities to write and convey clear, simple, and understandable texts for large audiences. We consider that simplicity improves communication, and communication starts with a common language, the same words having the same meaning for the sender and the receiver. Theoretically this is possible, but in practice, there are some discrepancies among 755 scientists (highlighted in subsection 2.2), as well as between scientists and stakeholders, as revealed by Sultan et al. (2020).
They underlined that scientists and stakeholders use vocabulary differently in climate impact science.
The review of GWMs, presented in section 4, highlights the need to design hydrological inter-model comparison studies by nominating models or research questions according to some specific criteria, for example, (i) specific model compartments (Nazemi & Wheater, 2015;Wada et al., 2017), (ii) specific evaluation metrics (Gupta et al., 2009;Veldkamp et al., 2018;760 https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License. Zaherpour et al., 2018), (iii) locations of specific hydrological indicators, regions, or rivers (Masaki et al., 2017;Veldkamp et al., 2018). These studies still emphasized the need to improve the quality of the input data, upgrade the hydrological processes in the models' structure, integrate missing hydrological processes, and further reduce uncertainties.
We recommend, for the benefit of the multi-model intercomparison projects, to 1. maintain very good documentation of the model code; 2. always start research with a list, for example, with water storage compartments, flows, and human water use 765 sectors included in the model structures; 3. have clear definitions of the variables, water storage compartments, flows, and human use sectors, describing exactly their role in the model; 4. have synonyms for variables, helping to show similarities and differences among models; 5. collect all ideas, recommendations, and improvements received from everyone (in our case, they were required to complete our study); 6. collaborate and communicate with peers, which was very useful in our study for identifying synonyms among communities; 7. describe your model or a model through your eyes and other's people eyes; 8. 770 invest much time and patience and be meticulous about extracting equations of water storage, flow, and human water use sectors from the model code.
Our future research will include describing the GWMs analyzed in this study, through a standard visualization of the water cycle that will show the water storage compartments, water flows, and human water use sectors included in the ISIMIP2b model structures. These diagrams would be connected with the tables presented in the supplement of the present paper (Tables 775 S1-S83). We affirm that model intercomparison projects need to organize workshops on model parameterization including some parameterization experiments and some evaluation studies on the equations applied to compute water storage compartments, water flows, and human water use sectors, as well as considering model outputs. We also note that this review and description study had a positive impact on the modeling groups, motivating them to re-think and re-analyze model structures, equations, 780 and descriptions.

Potential future research in global hydrological modeling
In this study, we analyzed the GWMs version used for ISIMIP2b. Hence, in their original structure, these models include some water stocks, water flows, and human water use sectors that have not been used for ISIMIP2b. In this section, we present potential future research for the 16 analyzed modeling groups (Tables S102 and S103). 785 Some GWMs, such as gridded models, have the ability to operate at various spatial-temporal scales: CWatM, CLM4.5, CLM5.0 (3 h time step at around 11 km).
Numerous developments are ongoing within the CLM team and can be followed on the model's GitHub page (https://github.com/ESCOMP/CTSM). Active developments include the improvement of the irrigation scheme , the representation of land cover and land management (Meier et al., 2018;Hirsch et al., 2017;, and the 790 implementation of reservoirs (Hauser et al., 2019).
CWatM developed, in addition ISIMIP2b, a groundwater scheme with linkages to MODFLOW for 5 arcmin and 30 arcsec spatial resolution. The CWatM modeling group plans to develop a reservoir storage including different operation schemes https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License.
(e.g., energy, irrigation), to increase the temporal resolution (at 1 h), to apply a global calibration also for ungauged catchments, such as using the Budyko framework (Greve et al. 2020), applying both the day-degree method and energy balance method to 795 estimate snow accumulation and melt, and applying several methods to estimate evaporation based on changing CO2 concentration.
DBH plans to include human water uses (industrial and domestic sectors), either by developing a new module or using the simulations from other models (e.g., WFaS dataset), to calibrate the model in the new ISIMIP3 simulation round, and to improve the input/output module to read and write netcdf files. 800 The H08 modeling team used an approximate Bayesian computation technique to calibrate four parameters that are transferred to other regions containing no observations, mainly, based on Köppen-Geiger regions. The modeling group also increased the spatial resolution to 5 min and improved the representation of crops used for biofuel in the model.
The JULES-W1 modeling group plans to make a technical update that will enable the river routing module to estimate discharge. 805 The LPJmL group developed an improved energy balance module and soil hydrological scheme that can estimate permafrost dynamics (Schaphoff et al., 2013) and made the model source code freely available on GitHub (https://github.com/PIK-LPJmL/LPJmL; Schaphoff et al., 2018), hoping to engage a broader scientific community in LPJmL model development and applications.
The Mac-PDM.20 modeling group plans to develop a water use module. 810 MATSIRO modeling group has implemented a land-use change process, terrestrial biogeochemical processes, and an additional crop growth process into MATSIRO to develop a new modeling framework. As key interactions are taken into account and all processes are coupled, important boundary conditions for hydrological simulations can be dynamically simulated internally. This hydrological simulation modeling framework has been coupled with MIROC GCM and has been used as an Earth system model. In addition, the group recently proposed new schemes for lateral groundwater flow, water 815 temperature, and sediment transportation.
Ongoing efforts to improve the realism of hydrologic processes in the mHM include the development of the multiscale lake module (mLM), a comprehensible framework for reservoir regulation as well as natural processes in lakes. Near-future developments will focus on a glacial module, to better account for processes in cold regions, as well as coupling it to a groundwater model that will replace the current linear groundwater reservoir. 820 The MPI-HM modeling group plans to increase the spatial resolution of regional versions. The group currently implemented canopy storage in the latest model version and is developing experiments to integrate reservoir storage.
The PCR-GLOBWB modeling group plans to increase the temporal and spatial resolution of the input data, to increase the 825 temporal resolution (3 h) for energy balance calculations and the global spatial resolution (1 km), to improve the soil https://doi.org/10.5194/gmd-2020-367 Preprint. Discussion started: 8 January 2021 c Author(s) 2021. CC BY 4.0 License.
representation by including the Richards equation, to add more snow elevation layers, to include additional fast runoff component for improving daily discharge simulations, and to improve the reservoir operating scheme (Sutanudjaja et al., 2018).
The VIC modeling group developed different irrigation practices (Shah et al., 2019a and b) and included a reservoir (Dang et al., 2019 and 2020) as well as a groundwater scheme in the model structure. 830 The WaterGAP2 modeling group plans to update the GRanD dataset used by the model, to include water temperature calculations, to couple the new developed groundwater model (Reinecke et al., 2019), and to update the non-irrigation water use datasets.
The WAYS modeling group plans to develop a new human water use module to consider agricultural, industrial, and domestic water use in the water cycle. 835

Conclusions
Global water models are used to simulate the climate-water-human system. However, recent evaluation studies show that there is a need to better simulate this system by including other hydrological processes, data on physical infrastructure, societal behavior, cultural behavior, water diversions, and virtual water, as well as by identifying its teleconnections on the global scale (Zaherpour et al., 2018;Veldkamp et al., 2018;Wada et al., 2017). These studies also underline the need to better explain 840 various model results.
We undertook the present study mainly to find similarities and differences among global water models that will facilitate interpretation of various results, as well as those of further intercomparison studies. We developed a standard equation writing style to achieve this goal. We found that there are some similarities among the models when applying similar equations for the Three models (H08, WaterGAP2, and WAYS) compute groundwater recharge using the same approach described by Döll and Fiedler, 2008. Four models (CWatM, PCR-GLOBWB, WaterGAP2, and WAYS) use the same approach to simulate the groundwater runoff by considering it as a part of groundwater storage which is weighted by a groundwater discharge 860 coefficient.
Ten models do not include a lake compartment, while fourteen models do not include a wetland compartment. Four models (CWatM, H08, MATSIRO, and WaterGAP2) include a "global lakes" compartment and "local lakes" compartment in their structure. Furthermore, CWatM, MATSIRO, and WaterGAP2 use the reservoir algorithm developed by H08 modeling group, but with some adjustments. WaterGAP2 also includes a "global wetlands compartment" and "local wetlands compartment". 865 We highlight the need to undertake experiments on individual water compartments in order to analyze the equations used and the results obtained. We also underline the need to make multi-model intercomparison projects: firstly, because they enhance 880 collaboration and communication between modeling groups, communities, countries and cultures; secondly, through communication and collaboration, these projects enhance creativity and open opportunities to finding new ways to improve the models. The present study was possible through the international ISIMIP framework.
By improving these models, we can make better decisions on how to use water more efficiently. Therefore, we need to have global water models that develop scenarios regarding how the Earth's water cycle will develop and how it will be affected by 885 climate change. We consider the simulations provided by the ISIMIP2b global water models to represent good hypotheses of our water future and based on them we can make decisions.

Code availability
Information on the availability of source code for the models featured in this article can be found in the Table 12.