Development of the Community Water Model (CWatM v1.04) - a high-resolution hydrological model for global and regional assessment of integrated water resources management

Abstract. We develop a new large-scale hydrological and water resources model, the
Community Water Model (CWatM), which can simulate hydrology both globally
and regionally at different resolutions from 30 arcmin to 30 arcsec at
daily time steps. CWatM is open source in the Python programming environment
and has a modular structure. It uses global, freely available data in the
netCDF4 file format for reading, storage, and production of data in a
compact way. CWatM includes general surface and groundwater hydrological
processes but also takes into account human activities, such as water use
and reservoir regulation, by calculating water demands, water use, and
return flows. Reservoirs and lakes are included in the model scheme. CWatM
is used in the framework of the Inter-Sectoral Impact Model Intercomparison
Project (ISIMIP), which compares global model outputs. The flexible model
structure allows for dynamic interaction with hydro-economic and water quality
models for the assessment and evaluation of water management options.
Furthermore, the novelty of CWatM is its combination of state-of-the-art
hydrological modeling, modular programming, an online user manual and
automatic source code documentation, global and regional assessments at
different spatial resolutions, and a potential community to add to, change,
and expand the open-source project. CWatM also strives to build a community
learning environment which is able to freely use an open-source hydrological
model and flexible coupling possibilities to other sectoral models, such as
energy and agriculture.



Mask map
CWATM can run globally at 0.5 (30' or ≈ 50 km by 50 km at the equator) or 5' (≈ 10 km x10 km) or a set of coordinates or a mask map can be defined to use the global dataset but running CWATM locally. Figure S1 gives an example of a mask map for the Rhine basin. Each red grid cell (here: 5') is considered for calculation. Every other grid cell is not used.

Figure S1.1: Mask map for the Rhine basin at 5' with background from © OpenStreetMap contributors 2019.
Distributed under a Creative Commons BY-SA License.

Digital elevation model and river channel network
The model uses a digital elevation model and its derivate (e.g. standards deviation, slope) as variables for the snow processes and for the routing of surface runoff. The Shuttle Radar Topography Mission -SRTM ) is used for latitudes  (Wu et al., 2011) and CaMa-Flood (Yamazaki et al., 2009). These approaches uses the same hydrological sound digital elevation model but differ in the upscaling methods. Zhao et al. (2017) show the importance of routing schemes and river networks in peak discharge simulation. In CWATM the DDM30 is used for 0.5° and DRT is used for 5'. 2 To calculate the kinematic wave in CWATM static maps of channel width, channel bankful depth, channel gradient, manning's coefficient and channel length are needed which are calculated using the river network, the flow accumulation, elevation data and average river discharge data. Figure S1.2 illustrated the channel gradient at 5' for two different regions. Figure S1.2: Channel gradient at 5' in percentage

Lakes and reservoirs
The HydroLakes database (Bernhard Lehner et al., 2011;Messager et al., 2016) provides 1.4 million global lakes and reservoirs with a surface area of at least 10ha. CWATM differentiate between big 'global' lakes and reservoirs which are connected to the river network and smaller 'local' lakes and reservoirs which are part of a single grid cell and part of the runoff concentration within a grid cell. Therefore the HydroLakes database is separated into "big" lakes and reservoirs with an area ≥ 100 km 2 for 0.5 (5 km 2 for 5') or upstream area ≥ 5000 km 2 0.5 (200 km 2 for 5') and into "small" lakes which represents the non-big lakes. All lakes and reservoirs are set up grid cell level but big lakes can have the expansion of several grid cells.

Soil data
Soil data comes from the Harmonized World Soil Database 1. 3 A pedotransfer function from Zhang et al. (2017) is used to transfer the standard soil properties (soil texture, porosity, organic matter and bulk density) to the van Genuchten model parameters: maximal amount of moisture, residual amount of moisture, pore-size index, saturated conductivity of the soil (see Figure S1.3), inverse of air entry suction.

Groundwater
For groundwater modeling maps of the recession constant of the hydraulic conductivity and the storage coefficient are needed. Gleeson et al. (2011) and the database of GLHYMPS-Global Hydrogeology Maps of permeability and porosity (Gleeson et al., 2014) are providing the necessary data. Figure S1.4 shows the global recession constant.

Land cover
Land cover is used to calculate fraction of water, forest, irrigated area, rice irrigated area, sealed (impermeable area) and the remaining fraction for each cell. For each fraction the soil module runs separately. The total runoff of each cell is calculated by weighting the cell according to the different fractions. Data on urban area or imperious area is based on the 1 km version of Elvidge et al. (2007), Forest land cover data are provided in high resolution by Hansen et al. (2013). Global maps of irrigated areas are used from P.  and Siebert et al. (2005) with global data on crop coefficient based on MIRCA2000, Global data set of monthly irrigated and rainfed crop areas around the year 2000 (Portmann et al., 2010). A historical, gridded land use data set with 5' data on 15 different crop groups is used from the Hyde 3.2 database (Klein Goldewijk et al., 2017).

Global Albedo
A global dataset to calculate potential evaporation using albedo is provided by Muller et al. (2012).

Data sets of meteorological forcing
For calculating potential evaporation with Penman-Monteith some meteorological forcing data are needed:  For downscaling of 0.5 meteorological data to 5' or even to 30'' the dataset of WorldClim version2 (Fick et al., 2017) is used.
They provide average monthly dataset for different meteorological variables for 1970-2000 for different spatial resolutions from 30'' to 10'. The dataset is used to estimate the spatial heterogeneity inside a 0.5 or 5' grid cell.

Datasets for model calibration and validation
Observed historical daily river discharge data and lake water levels were available originating from several sources. Data made available through the GRDC (Global Runoff Data Centre, 2007) were used. Several discharge station data were obtained through bilateral exchanges between IIASA and national hydrological services. As much as possible, these historical discharge data have been used for model calibration and verification.

Socio economic datasets
Data on population and gross domestic product (GDP) are based on the SSP Database of IIASA (Riahi et al., 2017). Methods and data for spatial disaggregation of country based data to gridded 5' data are taken from Jones et al. (2016) Gridded industrial water data for 2000 is using the data from Shiklomanov (1997). The approach of Shen et al. (2008) and Wada et al. (2011) is including GDP, electricity production, energy consumption and household consumption. Domestic water demand needs population data and the rate of domestic water withdrawal per capita from FAO (2007) and Gleick et al. (2009). The Kling-Gupta Efficiency (KGE) is used as the objective function. Other model performance metrics are summarized in the table, including NS (Nash-Sutcliffe coefficient of efficiency) and its log form (NSlog), R 2 (coefficient of determination), Bias, RMSE (root mean squared error), and MAE (mean absolute error). Observed and simulated streamflow statistics are shown, including mean, minimum, maximum, and different quantiles (5%, 50%, 95%, 99%). Some basins do not show a validation period. Here we used all available observation data for calibration.  Figure S4.1 shows the modules of CWatM and the interlinkage of the modules. The model starts with cwatm3 and initiate the class dynamic model, which has two children cwatm_initial and cwatm_dynamic. Cwatm_initial role is to initiate all hydrological modules while cwatm_dynamic runs the model through the time steps. Each hydrological process group e.g. soil has an own module. Support modules for data handling, output generating or error interception etc are separate modules and triggered by the hydrological module e.g. for reading input data.