HydroPy (v1.0): A new global hydrology model written in Python

Global hydrological models (GHMs) are a useful tool in the assessment of the land surface water balance. They are used to further the understanding of interactions between water balance components as well as their past evolution and potential future development under various scenarios. While GHMs are a part of the Hydrologist’s toolbox since several decades, the models are continuously developed. In our study, we present the HydroPy model, a revised version of an established GHM, the Max-Planck Institute for Meteorology’s Hydrology Model (MPI-HM). Being rewritten in Python, the new model requires 5 much less effort in maintenance and due to its flexible infrastructure, new processes can be easily implemented. Besides providing a thorough documentation of the processes currently implemented in HydroPy, we demonstrate the skill of the model in simulating the land surface water balance. We find that evapotranspiration is reproduced realistically for the majority of the land surface but is underestimated in the tropics. The simulated river discharge correlates well with observations. Biases are evident for the annual accumulated discharge, however they can – at least to some part – be attributed to discrepancies 10 between the meteorological model forcing data and the observations. Finally, we show that HydroPy performs very similar to MPI-HM and, thus, conclude the successful transition from MPI-HM to HydroPy. Copyright statement. TEXT

effort for compiling the required input data prior to model application. Next, the model loops over the time axis provided by the meteorological forcing data. In the beginning of the loop, surface temperature (T srf ), precipitation (P ) and potential evapotranspiration (PET) are read and the latter two are converted into mass flux per time step. Next, all land cover type 60 (LCT) fractions are updated based on the actual date. This is followed by computing water fluxes for the various processes and updating the respective water storages. Figure 1 displays an overview about the water storages included in HydroPy and their connecting water flows. For most storages, outgoing water fluxes are computed based on the storage state at the start of the actual time step followed by updating the storage state using the actual fluxes. The order of storage updates is snow skin & canopy soil surface water & groundwater river flow and the related processes are presented in detail in the paragraphs 65 in section 2.2. At the end of every time step, all requested output variables are either written to a file (for time-step output) or added to an temporary field for time averaged output. Thus, no time series of variables need to be kept in memory, which strongly reduces the model's computational demands. All output variables are combined in one netCDF file per output time step resolution (daily, monthly, or yearly). Similar to the input, they are stored in SI units: [kg m −2 ] for water storages, [kg m −2 s −1 ] for water fluxes as well as [m 3 ] and [m 3 s −1 ] for the special cases of river storage and river discharge, respectively. 70 At the end of the time loop, screen output is created which informs about the overall water balance and any residuum in the sum of water sources, sinks and storage changes. In case the water balance residuum exceeds a given threshold, an additional netCDF output file is generated which provides fields containing the water balance components. This acts as an early warning system during model development while implementing new processes. The water balance residuum is diagnosed for every grid cell as one-dimensional water column and as water volume accumulated over the whole fields. Through the latter also water 75 balance errors caused by changes in LCT fractions can be spotted easily.
In case of errors, there is an option to output time series for single grid cells in which all variables are included as well as partial water balances for the individual water storages at model time step. This supports the identification of processes responsible for a water balance violation.

Model design and setup 80
The rational behind HydroPy is not so much to generate a highly efficient GHM, but rather a model which can be easily understood by scientists and extended with additional processes. Thus, we decided to separate functions based on their purpose (see Fig. 2). The file parameter_class.py contains default options and parameter settings together with variable definitions, the dataio_class.py file handles all input and output functions required by HydroPy, the file analysis_class.py includes functions related to spin-up, water balance checks and log data and the file utility_routines.py is a com-85 pilation of general routines like interpolation. All physical process representations are compiled in respective files describing their overall topic like hydrology, land cover and streamflow processes. This enables us to use the main routine to manage the simulation time loop and call the specific model processes in a clearly arranged manner. Furthermore, this file separation serves as a template for the implementation of extensions.
All variables representing physical properties are organised in a variable class that contains variable attributes and functions. periods and write output as well as restart data. At the start of the model one variable stream is created for all fluxes, states and land cover fractions, respectively. This allows to easily add new variables simply by adding some basic information to the respective definition function in the parameter file.
For the majority of hydrological processes, highly optimized NumPy arrays can be used because no information is exchanged 95 between grid cells. However, this is different for river routing. Here, lateral water transport is considered (see Sect. 2.2.5), which requires iterations across the field indices. Such loops are rather slow in an interpreted programming language like Python. For this reason, the river routing routine was written not only in Python, but also in Fortran. Optionally, the Fortran code can be compiled into a shared library during model run-time using the NumPy tool f2py and called directly from HydroPy. During model development, we made sure that identical results are returned from the Python and Fortran functions. Thus, the Python 100 version can be used during model development and debugging while the Fortran based river routing is recommended for production simulations due to its higher efficiency (see Fig. 3).
The model itself can be called via command line and includes a -help option to display calling parameters and further options. A file containing meteorological forcing data is required. Default configurations can be changed by either providing a json-style configuration file or by changing options directly via command line. A detailed technical documentation for model 105 and simulation setup is part of the model repository (Stacke and Hagemann, 2021b).
Although the MPI-HM followed a similar idea in terms of separating processes in different files, the structure of HydroPy is much cleaner and easier to extend and modify due to the use of variable objects. Also, the included diagnostics like spin-up state evaluation and water balance computation as well as the interactive debugging options facilitate the implementation of new processes. And finally, the simulation setup using a simple command line interface and a optional configuration file is a 110 considerable improvement compared to the rather extensive run-scripts required for MPI-HM simulations.

Hydrological processes
The following paragraphs describe all processes implemented in HydroPy in detail and all variables used are summarized in Tab 1. Generally, the processes are identical to the MPI-HM in terms of equations. A few exceptions from this rule exist, which are mentioned below. Furthermore, it should be noted that MPI-HM also includes a simple irrigation scheme which was used 115 to explore the effect of human impacts on river discharge (e.g Haddeland et al., 2014). This scheme is not yet implemented in the current version of HydroPy, but will become part of a model version dedicated to land management in a future release.

Snow processes
In the snow module, the snow storage S sn is updated based on snowfall P sn , snowmelt R sn and liquid snow storage content S snlq as: The involved water fluxes are computed as follows. Snowfall -if not provided directly in the forcing data -is diagnosed from precipitation as a function of surface temperature (Eq. 2) following Wigmosta et al. (1994). Snowmelt is calculated using the surface temperature T srf and day-length fraction f lday in a degree-day formular based on the HBV model (Bergström, 1992).
Naturally, snowmelt is limited by the available snow storage.

125
P sn = P · min 1, max 0, T sn,max − T srf T sn,max − T sn,min , with T sn,min = 272.05 K and T sn,max = 276.45 K (2) While MPI-HM used a sine function to estimate the daylength fraction based on the day of the year, this is replaced in HydroPy by explicitly computing the day-length based on the Earth's declination (Forsythe et al., 1995). This has the advantage that a reasonable snowmelt is computed also for the southern hemisphere. However, the effects are only visible for very few grid 130 cells due to the small extent of snow affected areas in the southern hemisphere.
Part of the snowmelt can be retained (F snlq ) as liquid snow storage content and is either subject to refreezing -if surface temperatures fall below the freezing point -or sustains snowmelt in the next time step. The liquid snow storage content is limited to a maximum of 6% of the total snow storage, which is a common value also used in other models (e.g. Wigmosta et al., 1994).

Skin and Canopy processes
A new addition to HydroPy is the implementation of a skin S skin and canopy storage S can , which enhances the evaporation 140 E for the grid cell fractions with the LCTs bare soil f bare and vegetated f veg , respectively. Both storages receive input from rainfall P ra and snowmelt R sn and provide overflowing water as throughfall towards the ground. Evaporation scales linearly with PET and available water, but is additionally limited by the maximum storage content S prc,max . Mechanistically, both storages are treated equally and are described using the index prc to denote the different skin und can processes: 145 The related fluxes are defined for the respective fraction of the grid cell and scaled to the overall grid cell during the final balancing. Thus, for all storage interactions its content needs to be normalized to the fraction f lct resulting in S prc,norm = Sprc f lct so that: E prc = PET · f wet · min 1, S prc,norm + P ra + R sn f wet · PET (9) While the processes for skin and canopy storage are parameterized similarly, they use specific maximum storage values: Both runoff fluxes are scaled by the respective LCT fractions and combined to provide the throughfall R tr that reaches the soil surface:

Soil processes
HydroPy employs a single-layer soil scheme in which the storage S so provides water for transpiration E T , bare soil evaporation E bs , and drainage R gr (used synonymously for groundwater recharge or subsurface runoff) towards a shallow groundwater storage. Furthermore, its content determines the surface runoff R srf . The soil storage does not represent the physical soil over 160 a given depth but only the water column stored in the soil. Thus, its maximum value is limited by the maximum soil water capacity S so,max . The soil moisture storage is defined as: Both, drainage and surface runoff, are implemented identically to these fluxes in MPI-HM using the same parameter values.
Drainage, following the formulation by Dümenil and Todini (1992), is defined such that it scales linearly with soil moisture 165 content up to a given threshold. Above, saturated flow is assumed resulting in accelerated downward percolation with exponential scaling: R gr,high = (R gr,max − R gr,min ) · ∆t · S so − S so,grmax S so,max − S so,grmax with R gr,exp = 1.5 Both are combined as: The soil moisture thresholds S so,grmin and S so,grmax are products of S so,max using the factors 0.05 and 0.9, respectively. The minimum and maximum drainage, R gr,min and R gr,max are set to 2.7e − 7 and 2.7e − 5 [kg m −2 s −1 ]. As both parameters are given as flux per second, they need to be multiplied with the model time step ∆t. All drainage related model parameters are taken from Roeckner et al. (1992).

175
The surface runoff R srf representation follows the Improved Arno Scheme (Hagemann and Gates, 2003). It assumes that S so,max is not homogeneous for a whole grid cell, but varies on sub-grid scale. Thus, parts of the grid cell, where the local storage capacity is low, can already generate surface runoff even though the cell average soil moisture state is still below its average maximum moisture holding capacity. Therefore, a fraction of R tr is converted into R srf as soon as the minimum soil moisture content is exceeded. This is realized by mapping S so onto the sub-grid soil moisture capacity distribution parameters 180 denoted by the index sg : where b is a shape parameter describing the distribution of sub-grid soil moisture capacities within a grid cell (see Sect. 3).
Overall R srf is computed as: where c 1 and c 2 are further scaling operations put into separate equations for convenience: The subgrid parameters S so,sg,max and S so,sg,min that determine the range of subgrid maximum water capacity variability are spatially distributed fields and their generation is discussed in Sect. 3.

190
As for runoff, transpiration E T and bare soil evaporation E bs are implemented based on the respective MPI-HM equations (which themselves are derived from Roeckner et al., 1996;Warrilow et al., 1986;Bauer et al., 1983) and use a linear scaling with soil moisture content. Even though they are attributed to specific LCT fractions, the vegetated f veg and bare soil f bare parts of the grid cell, they don't rely on fraction specific soil moisture storages but use the grid cell average for simplicity.
Still, it has to be considered that part of the evaporative demand on these fractions is already satisfied due to canopy and skin 195 evaporation: E bs = (PET − E skin ) · min 1, max 0, S so − f so,bs,low · S so,max (1 − f so,bs,low ) · S so,max , with f so,bs,low = 0.05 Again, the spatially distributed data fields S so,max and S so,wilt are part of the input dataset (Sect. 3).
Runoff and evaporation fluxes are visualized in Fig. 4 for a single grid cell. There, the threshold dependent change in the 200 scaling of individual fluxes in respect to specific thresholds can be easily recognized.

Surface water and shallow groundwater processes
The implementation of surface water has been modified compared to the formulation used in MPI-HM and being described in Stacke and Hagemann (2012). In MPI-HM the vertical land surface water balance module and the river routing module were strongly separated, each featuring storages which were restricted to be used within the respective module. The sole exception 205 was the surface water storage resulting from ponding water on the land surface, and, hence, used to represent small creeks, lakes and wetlands. This storage was part of the vertical land surface water balance but could also interact with the lateral river routing scheme. In HydroPy, we simplify the representation of lakes and wetlands and utilize storages that are already part of the routing scheme as will be explained in the next paragraphs.
The river routing scheme used in both, MPI-HM and HydroPy, is the Hydrological Discharge Model (HD-Model; Hagemann 210 and Dümenil, 1997;Hagemann et al., 2020). To represent river flow, it employs three storages: the overland flow storage receives the local grid cell's surface runoff, the baseflow storage receives the local drainage and the river flow storage receives inflow from upstream grid cells. All three storages provide water to the next downstream grid cell. The purpose of the overland and baseflow storages is to delay the flow of both runoff components before they enter the river. As such, they represent a major function of surface water (e.g. small lakes, wetlands and creeks smaller than the grid cell size) and shallow groundwater 215 -even though no effort was taken to explicitly represent any other hydrological functions.
In HydroPy, the former overland and baseflow storages are used as surface water S sw and groundwater storages S gw .
However, we like to emphasize that both storage representations are very limited and are not expected to simulate all relevant lake and groundwater processes. Neither circulation within lakes nor confined aquifers, fossil groundwater or lateral water movement within large scale aquifers are considered at this point. The surface water balance is adapted to allow for interception 220 of rainfall and snow melt as well as water loss due to evaporation over the surface fraction f sw while the shallow groundwater balance only considers groundwater recharge R gr and runoff R gw : Surface water evaporation is set to PET as long as there is enough water available over the surface water fraction. Surface water 225 and groundwater runoff are both computed as: The storage retention time LAG prc is computed based on the inner grid cell slope and grid cell size. More details can be found in Sect. 3.

River routing 230
River routing is realized based on formulations from the HD-Model, which was also part of the MPI-HM. It applies a river flow cascade with up to five intermediate storages (c max ), parametrizations for river flow retention times and uses prescribed routing directions. It is the only part of HydroPy that uses a volumetric unit [m 3 ] instead of water columns. Thus, runoff needs to be converted using the grid cell area A gc . The river flow storages S rf,n for the individual cascade members n are defined as: where the index up denotes the river flow Q rf from upstream grid cells. High river flow velocities combined with a daily model time step can lead to numerical instabilities at a given resolution in case water would travel more than one grid cell. For this 240 reason, the river flow cascade as well as the routing are encapsulated in a sub-step time loop which is called several times per model time step.

Land surface data
In HydroPy an effort was made to reduce the number of land surface datasets to a minimum. Thus, very specialized datasets like the river flow lag times are computed by HydroPy itself instead of requiring the model user to pre-process such data.

245
Of course, this comes with the disadvantage of a slight increase in computational demand, but as these data fields are only computed once per simulation, this is hardly noticeable. All land surface data fields are collected in one parameter file (Stacke and Hagemann, 2021a).

Primary data fields
All computations in HydroPy are restricted to land cells which are defined in the land sea mask as values > 0. All other fields 250 providing fractional information on LCTs are normalized with the land sea mask and therefore represent relative fractions. The LCTs considered comprise glaciers, permafrost, lakes, wetlands and vegetation. Glacier cells (Hagemann, 2002) are omitted from the simulation if the glaciers cover the full land surface fraction while permafrost (GEWEX ISLSCP Project, 2007) strongly reduces the soils capability to store water. The lake and wetland fractions are based on the Global Lake and Wetland Database (GLWD; Lehner and Döll, 2004). They are utilized to scale surface water evaporation, surface water runoff and 255 modify lateral flow velocities. The lake fractions are used without any modifications, but wetlands were restricted to the classes floodplains and peatlands for this study. Thus, we can best resemble the general wetland distribution used for the predecessor of HydroPy and facilitate a clean comparison between both (see Sect. 4.4).. The vegetated fraction is taken from the Land Surface Parameters Dataset 2 (LSP2; Hagemann, 2002) and included into the HydroPy parameter file as monthly mean climatology.
Another vegetation parameter dataset use by HydroPy is the monthly LAI climatology, which also originates from the LSP2 Three topographical variables are included in the parameter file, which are all based on the ETOPO1 dataset (Amante and Eakins, 2009). These variables are the grid cell mean surface orography and its standard deviation as well as the average slope within a grid cell. The slope was initially computed at the native dataset resolution (1 ≈ 0.01°). Mean slope, mean orography 265 and its standard deviation were determined during upscaling from the original to the target resolution of 0.5°. Both, mean orography and slope, are used to derive lateral water flow velocities in surface water, groundwater and rivers. The orographical standard deviation plays a role in determining the surface runoff.
The majority of the remaining parameters are related to soil properties. Maximum soil moisture capacity and plant available water are computed following the procedure described in Hagemann et al. (1999), but with a different dataset as source for 270 plant available water. Values for plant available water are derived via a multi-linear regression from a dataset of optimized plant rooting depth (Kleidon, 2004) and high resolution land cover data (Loveland et al., 2000). The same data was also used in the MPI-ESM, the Earth system model of the Max-Planck Institute for Meteorology Hagemann and Stacke (2015) . While the maximum soil moisture capacity S so,max affects all soil processes, the plant available water S so,wava is needed to calculate transpiration. Further properties are the sub-grid minimum and maximum soil moisture capacity as well as an exponent b sg 275 that describes the distribution of sub-grid soil moisture capacities within every model grid cell. All three of these values are discussed in Hagemann and Gates (2003) and are needed to compute surface runoff.
The final two parameters describe the river routing network using routing target indices. There is one field for each geographical dimension, i.e. latitude and longitude, and each grid cell is assigned with a flow target in the respective direction, which must be one of the eight neighboring grid cells. Note, that the target is provided as grid cell index, not as geographical 280 coordinate. As the indices follow Python notation, they start at 0. For cells without any valid target flow target, its own indices are used. The directions prescribed with the routing target indices for HydroPy are identical to the one used by MPI-HM. After their first publication in Hagemann and Dümenil (1997), they were continuously improved during the last years.

Derived data fields
Not all land surface variables are used directly in the simulated processes. Instead, they are (also) used to derive other variables.

285
This is done during model initialization and will be discussed in the following.
The impact of permafrost on the simulation is represented purely from a hydrological view point. All soil moisture capacity variables are reduced to reflect that only a small part of the soil column is not frozen and may actively contribute to the water balance. The reduction factor f pe,red is computed based on the permafrost cover fraction f pe as and then used to scale all soil moisture capacity fields: S = S · f pe,red with S = {S so,max , S so,wilt , S so,sg,min , S so,sg,max } Next, the shape parameter b describing the sub-grid distribution of soil moisture capacities (see Eqs. 15,19,and 20) is derived based on the distribution parameter b sg and modified with the normalized orographical standard deviation b oro : where σ 0 = 100 m and σ max = 1000 m (32) where σ h is the standard deviation of subgrid orography while the parameter σ 0 and σ max need to be modified according to the models spatial resolution.
Finally, the retention times LAG are determined, which are used in the simulation of surface water and shallow groundwater flow (Eq. 25) as well as river flow (Eq. 27). In all three cases, reference values obtained by sensitivity experiments (Table   300 3, Hagemann and Dümenil, 1997) for a representative grid cell are modified using the actual properties of grid cells for the remaining part of the land surface -specifically a reference lag time, the flow distance ∆l and the flow velocity v. The flow velocity is computed based on the slope s as: with the minimum flow velocity set to 0.1 [m s −1 ]. The actual lag for surface water sw and river flow rf is obtained by 305 modifying the reference lag time (see Table 3) for these processes prc: As surface water flow is limited to the respective grid cells, the flow distance ∆l sw is set to the cell's diameter and the slope to the average slope within the grid cell. In contrast, river flow transports water from one cell to the next. Thus, the flow distance ∆l rf is measured as the distance between the centers of a grid cell and its downstream grid cell. Similarly, the slope is 310 computed based on the height difference ∆h between actual and downstream grid cell as s = ∆h ∆l rf . Compared to surface water and river flow, shallow groundwater flow is assumed to be more independent of surface slope and the reference lag is modified using the grid cell diameter as flow distance and the normalized variation in orography b oro : Additionally, the maximum number of cascade storages c max (see Eq. 26) is assigned to surface water and river properties.

315
The reference values are listed in table 3. The lag times of surface water and river flow are further modified if the lake or wetland fractions are large enough to interact with lateral water flow as described in Hagemann and Dümenil (1998). Here, it is assumed that a small water fraction hardly impacts lateral flow velocities but slows down flow velocity significantly if a certain threshold f lct,crit = 0.5 is crossed. This behaviour is reflected in the following equations, where the flow reduction f lct,red is computed: 320 f lct,red = 0.5 · (tanh (4 · π · (f lct − f lct,crit )) + 1) where lct = {lake, wetland} The flow velocity of both processes prc is normalized with reference flow velocities v lct,ref for for the land cover types (lct) lakes and wetlands -set to 0.01 and 0.06 m s −1 , respectively -and scaled with the flow reduction: The number of storages within the flow cascade and the lag time are then updated accordingly: LAG prc = ∆l prc v prc,red · c max,prc · ∆t These computations are done for both water storage types prc = {surface water,river} and both land cover types lct = {lake,wetland}.
As a final step, all retention times are scaled such that the maximum cascade storage number c max becomes an integer value 330 and can easily be applied in an iteration.

Evaluation
The HydroPy evaluation is based on a global simulation, set up at 0.5°spatial resolution with a time step of 1 day. Precipitation and surface temperature time series were taken directly from the Global Soil Wetness Project Phase 3 forcing data (GSWP3, Dirmeyer et al., 2006), while potential evaporation was computed from the same dataset in a pre-processing step using the

Simulation spin-up
While it is possible to convert restart files from MPI-HM to initialize HydroPy, the much cleaner way is to start with empty 340 storages and run the model until the trends in its storages become reasonable small. With the currently implemented processes, a time period of 50 years was considered to be sufficient for this task. as their glacier fraction is smaller than the land fraction, but they have surface temperatures below 0 • C for most of the year.
Currently, no glacier flow processes are implemented in HydroPy and therefore these cells cannot reach a stable state. However, this issue affects less then 2% of the snow-covered grid cells and these cells can easily be masked during the analysis.
The remaining storages differ in the number of years needed to reach a stable state. Small, highly dynamical storages like canopy and skin (not shown) are well initialized already during the first year, while storages with larger values and slower 355 processes like soil moisture, groundwater and river storage show positive trends in their mean state until year 8. Thus, 50 years of spin-up seems somewhat excessive at a first glance. However, the mean storage state is not necessarily representative for the whole land surface as can be demonstrated by considering the storage content for the individual grid cells that show the largest residual trend at the end of the 50 year spin-up period (Fig. 5, orange lines). In the most extreme case, soil moisture spin-up takes the major part of the 50 years period for a grid cell in the Ganges delta where the soil moisture capacity is much larger 360 than for the majority of the other cells. Another effect is seen in the groundwater storage of a cell in the same region where a stable state was almost achieved after 40 years but was followed by a second increase period. This behaviour results from the drainage calculation (Eq. 16) where an exponential component is added in case the soil moisture exceeds a given threshold.
Thus, groundwater spin-up in cells with a positive water balance and large soil moisture storage cannot be completed until the overlying soil reaches an stable state. For river storage, the most extreme cell is not spinning up much slower than the average, 365 but looking at the single cell data, the storage content is shown to exceed those of other cells by several orders of magnitude.
This cell is located at the eastern border of Lake Ontario where the main path of the St. Lawrence River crosses a grid cell with a lake fraction only slightly smaller than one. This results in a strongly reduced flow velocity in this cell and consequently a huge water storage is build up. In summary, a shorter period of 10 years would be sufficient for most of the land surface. However, regional features very well justify a spin-up period of 50 years. Of course, for later studies the length of spin-up 370 simulations can be significantly reduced using restart data from another suitable HydroPy run.

Evaporation
As precipitation is provided as forcing, the major task of HydroPy is to realistically distribute precipitation into evaporation and runoff. Thus, a good representation of evaporation is crucial to successfully simulate river discharge. Unfortunately, there are no global datasets of directly observed evaporation time series available that could be used for model validation. However, data Comparing the long term zonal mean actual evaporation between HydroPy and GLEAM (Fig. 6, upper panel) both datasets agree very well for the high and mid latitudes in terms of zonal mean and standard deviation. However, for the tropical region 380 between 10°and -15°N HydroPy underestimates evapotranspiration by almost 1 kg m −2 d −1 . Considering seasonal differences ( Fig. 6, lower panel), some more deviations become evident. During MAM the underestimation of tropical evapotranspiration is most pronounced affecting a band of more than 10°width south of the equator. During JJA this is compensated by a slight overestimation south of -10°N while the evaporation underestimation shifts northwards. Also a small (≈ 0.5 kg m −2 d −1 ) negative bias is visible between 30°and 70°N.

385
While the implication of these biases for runoff are discussed in a following paragraph (see Sect. 4.3), we aim to identify possible reasons for their appearance. The most obvious source would be potential evaporation, which is provided to HydroPy as input based on atmospheric forcing data and used in every evaporation computation as maximum value -especially as the Penman-Monteith methodology is shown to underestimate PET in tropical regions (Weiland et al., 2015). However, PET is overestimated compared to GLEAM (see Fig. 7) and thus unlikely to cause too low values in ET. Instead, the ET biases 390 are well reflected in transpiration with even stronger values. This could hint on either issues with the overall size of the soil moisture storage or indicate a missing interaction between the shallow groundwater and soil moisture storages which could provide plants with additional water for transpiration during dry periods. The importance of a deep soil moisture layer for transpiration was already demonstrated by Hagemann and Stacke (2015) and might be relevant for HydroPy as well.

River discharge 395
River discharge is the most important output variable of HydroPy. It is an easily observable quantity that integrates the lateral water fluxes over a given catchment region and it is highly sensitive to all other components of the hydrological cycle. As such, river discharge is a prime target to evaluate the skill of GHMs. This is done for the whole globe using observations provided by GRDC (2020).
We use the Kling-Gupta Efficiency (KGE, Gupta et al., 2009) as primary evaluation metric for river discharge. The KGE is 400 defined as: with r being the correlation between simulated (sim) and observed (obs) river discharge time series, σ being their standard deviation and µ being their long-term mean value. The KGE ranges in the bounds between {−∞, 1}, with a value of KGE = −0.41 indicting a performance similar to using the observed mean flow (Knoben et al., 2019 for all KGE comparisons.
Please note, that river discharge observations are often not available directly at the river mouth. For our analysis, we use those stations which are located at the most downstream part of the catchment and compare it with discharge from the closest grid cell with similar inflow area. Nonetheless, the resulting values are projected onto the whole catchment for visualization.

415
Additionally, we analyse the temporal correlation r and the percentile bias pb given by the equations: indicates an insufficient performance as just using the observed long-term average is superior to the prediction. Thus, HydroPy shows a reasonable performance for the majority of catchments. About 65% of the catchment area exceed a KGE of 0 and 33% even exceed a KGE of 0.5. Contrary to the NSE, which was often used in hydrological analysis (Moriasi et al., 2007) prior to the development of the KGE, the individual KGE values are not associated with a specific performance rating. Still, positive values are generally considered to indicate skilful model application (Knoben et al., 2019). Taking into account that HydroPy 425 is neither calibrated nor adapted to any specific catchment, its performance is satisfactory for the majority of catchments and on the same level as its predecessor MPI-HM (see Sect. 4.4).
Still, some river catchments could not be represented sufficiently well. Thus, additional metrics were calculated to identify reasons for this failure. Figure 8 (lower left panel) shows the temporal correlation between HydroPy and GRDC monthly mean river discharge, with a good performance for most catchments. More than two third of the grid cell show a correlation 430 > 0.75 and less then 15% are below 0.5. Thus, HydroPy demonstrates a significant skill to represent seasonal river discharge variations in the majority of catchments. Exceptions can be seen for the Nile and Ganges rivers which also show low KGE values. Stronger regional discrepancies are evident for the percentile bias between HydroPy and GRDC river discharge (Fig.   8, lower right panel). Only 20% of the grid cells fall in the range of ±10% bias. For high northern latitude catchments, a strong underestimation of river discharge is visible, especially for Eastern Siberia and north-western Canada. Strong discharge 435 overestimation is found for many dry regions in the Sahel zone and Australia, but also in parts of South America. Combining these measures (Fig. 8, upper right panel) shows that low KGE values occur mostly in regions with strong, positive biases in mean discharge amounts and, hence, in total runoff. Therefore, this needs to be a focus of future model development. However, it might be possible that not all of this issues can be addressed by the model alone. Figure 9 shows the long-term water balance residuum based on GSWP3 precipitation, GLEAM evaporation and GRDC river discharge. While it is only a first 440 order approximation using a mixture of observed and computed quantities, it also highlights large negative values for Eastern Siberia and north-western Canada as well as positive values for parts of South America, the Sahel Zone and the Ganges basin.
Especially the wet bias areas coincide well with the regions showing strong river discharge biases in HydroPy. This points to the possibility that already the forcing or validation data might suffer from uncertainties and, thus, contribute to the diagnosed river discharge biases for the these regions.

445
For regional river discharge curves (see Fig. 10), it can be seen that for Arctic Rivers the peak flow month is represented very well, but the amount of peak flow is underestimated even though ET is too low during summer. In terms of model skill, this could be related to too little accumulated snow cover during the winter months. Also during August and September, the discharge stays below the observed level. Whether this is an effect of too small water storages in the soil, biases in the model forcing or related to other processes needs to be examined in later studies. For tropical rivers, a too early flow peak combined 450 with too much discharge is visible. This can be attributed mostly to the strong negative ET bias during MAM and might be also related to the missing representation of interactive floodplains which might enhance ET and mitigate the peak flow. However, it should be noted that the discharge curve is strongly dominated by Amazon and Congo, while other tropical rivers, like Orinoco, Mekong and Rio Magdalena, are much better simulated. Similarly, river discharges for mid-latitude regions like the European Rivers are represented very well by HydroPy.

Comparison to MPI-HM
In this section, HydroPy is compared to the original MPI-HM to prove its suitability as a successor. For this purpose, a simulation was conducted with MPI-HM using the same meteorological forcing data and spin-up duration. Similar to HydroPy, MPI-HM includes a number of grid cells with steadily increasing snow storage. These cells were masked during the analysis of both simulations.
460 Figure 11 shows the average water fluxes and storage contents for HydroPy and MPI-HM. As expected, both are rather similar. The average precipitation received by the models amount to a yearly sum of ≈ 760 kg m −2 a −1 . Evapotranspiration in HydroPy exceeds the one in MPI-HM by ≈ 20 kg m −2 a −1 while total runoff is ≈ 11 kg m −2 a −1 lower. The soil moisture storage is around ≈ 12 kg m −2 higher in MPI-HM compared to HydroPy. Although both models agree in the spatial distribution of soil moisture trends, the drying trends are slightly stronger in the HydroPy simulation resulting in an overall lower 465 soil moisture. The average snow water equivalent is similar in both models with a difference of < 1 kg m −2 . Still, regional differences can be found featuring an increase in snow cover in the high northern latitudes (due to later snow melt) balanced by a decrease in mid-latitudes. These small differences can be explained by the new processes implemented in HydroPy. The addition of skin and canopy evaporation increases overall evapotranspiration in HydroPy resulting in a slight decrease in soil moisture and therefore an decrease in total runoff. The differences in SWE are caused by the replacement of the normalized 470 sinus function in the snow melt scheme with an explicit calculation of the day-length fraction.
Comparing the quality of simulated river discharge between both models (see Fig. 12), HydroPy shows slightly better results in the normalized Kling-Gupta efficiency (NKGE) for about 60% of the area of considered catchments. While the changes in NKGE values are mostly small, there are some catchments with stronger signals, e.g. the Ganges-Brahmaputra, Yangtze and Yenisey. Climatologies for the three rivers are shown in the upper panels of Fig. 13. For these basins, HydroPy generates a delay 475 in river flow compared to MPI-HM and GRDC observations, which results in a later peak flow by 1 to 2 months and an overall smoother curve. This behaviour results from the representation of surface water in HydroPy and its effect on river flow velocity.
Indeed, the wetland masks used in both models differ. MPI-HM relied on a wetland distribution compiled by Matthews and Fung (1987). However, this dataset is only available at the rather low resolution of 1°. In order to enable HydroPy to run on resolutions higher than the default 0.5°in future applications, lakes and wetlands are now compiled using the GLWD (Lehner 480 and Döll, 2004, see Sect. 3), which is available at 5' resolution. While both maps agree on the general distribution of wetlands, there are regional differences. Most prominently, the Matthews and Fung (1987) dataset contains less wetlands than the GLWD in the Ganges Delta, the downstream area of the Yangtze River and the outlet of the Yenisey, which results in the overestimation of wetland impacts in HydroPy. The same effect causes the increase in NKGE for the Mississippi and Yukon catchments (see Fig. 13, middle panels) where HydroPy reproduces the timing of the peak flow much better than MPI-HM although it falls 485 off too quickly during the summer in the case of the Yukon. Here, the wetland impact acts too strong in MPI-HM while the selection of GLWD wetland classes in HydroPy leads to less wetlands in this area. Another example demonstrating the small differences between HydroPy and MPI-HM is the improved river discharge in the Danube catchment. Here, the small changes in the snow scheme slightly reduce the snow cover and thereby the snow melt peak.
The river discharge simulation for Ganges-Brahamaputra, Yangtze and Yenisey can be significantly improved in HydroPy 490 by adapting the reference flow velocity for wetlands (see Fig. 13, lower panel). However, the effect is diametrical to the skill in other wetland influenced catchments. An increase in the reference velocity reduces the wetland impact, thereby improving the simulation skill towards the level of MPI-HM. At the same time, it results in performance degradation in other catchments like the Ob, where due to the faster river flow through wetlands the snow melt peak appears earlier in the year while too less water is available during late summer.

495
Besides exploring the reasons behind differences between HydroPy and MPI-HM, this analysis demonstrates the strong sensitivity of river discharge to wetland extent in both models. However, the opposing effects of modifying the wetland reference flow velocities discourages any further attempts to optimize this value at this point. More likely, better results can be obtained at higher resolution when flow paths and wetland distribution are represented in greater detail. Then, only those wetlands would mitigate river discharge which are indeed close enough to the flow channel and not simply exist in the same 0.5°grid cell.

500
Finally, it should be noted that due to the active participation of MPI-HM in several inter-comparison projects (e.g. Haddeland et al., 2011;Zhang et al., 2017;Gädeke et al., 2020;Kumar et al., 2021), there are many analyses available that evaluate the performance of MPI-HM in comparison to other GHMs. As MPI-HM and HydroPy produce very similar results, these analyses are also a good indication for the performance of HydroPy at this stage of development.

505
This study introduces HydroPy, a global hydrological model written in Python. Similar to its predecessor, the MPI-HM, the model simulates the land surface water balance as well as the river routing towards the ocean. HydroPy uses a default resolution of 0.5°on global scale. It has rather humble data requirements, using surface temperature, precipitation and potential evaporation as meteorological input data together with land surface property data. Compared to MPI-HM, HydroPy can be setup more easily, has less requirements for land surface data, provides internal checks to aid the model development process 510 and requires less effort for maintenance and implementation of new processes.
The majority of process formulations were taken over from MPI-HM without any alterations, but some improvements were done for the snow melt, the skin and canopy storages as well as the impact of lakes and wetlands on river discharge. The evaluation of evapotranspiration using the GLEAM data (Martens et al., 2017;Miralles et al., 2011; ICDC, 2019) shows a generally reasonable behaviour but also highlights underestimation of transpiration in parts of the Tropics and the high northern 515 latitudes. This affects runoff generation and results in an overestimation of river discharge especially for the Amazon and Congo rivers compared to GRDC data (GRDC, 2020) and also a too early discharge peak for these two rivers. The seasonality of Arctic rivers is captured very well but the peak discharge is generally too low. Although constraint to specific regions, both biases are considered to be important deficiencies -either in the model or in the meteorological forcing data -and will be a focus of the next development steps. Still, the majority of catchments show a very good correlation with observations and overall discharge 520 can be satisfactorily represented by HydroPy.
Besides the necessary improvements in these parametrizations, there are further plans to actively develop HydroPy in the future. The two most important focus areas are the transport of nutrients within the lateral flow scheme of HydroPy to provide ocean models not only with river discharge, but also transient information about nutrient concentration,

525
implementation of human impacts, especially irrigation and crop management, but also the representation of dams and reservoirs.
Compared to MPI-HM, HydroPy shows very similar results on the global scale. However, there are regional differences due to changes in the snow melt scheme and the updated wetland distribution. While these changes cause a degraded performance of river discharge in a few river basins, the majority of rivers benefit from the improvements implemented in HydroPy.

530
Nonetheless, regions with decreased simulation quality will be in the focus of further model development. In conclusion, the transition from MPI-HM to HydroPy is completed successfully and HydroPy will replace MPI-HM in future applications.
As final remarks we present some insights gained during this project. Naturally, HydroPy is not the first hydrology model to be rewritten (e.g. SWAT+ Bieger et al. (2016) and others), however the difficulties and challenges of such a project are not often discussed (Müller et al., 2018). A major obstacle is the perception of this work: rewriting a model prepares the way for 535 new studies but is often considered a technical rather than a scientific task. For this reason, it is difficult to secure funding, computational resources, and finally publish this work in high-ranking journals. As scientists are often employed on limited contracts, they need to carefully weight such time investments because model development, maintenance and documentation usually pays off only on long time scales. For this reason, small working groups without technical support are at a disadvantage when trying to rewrite their models. Still, in terms of the technical realization, we like to share some advice that was useful to 540 us: -Programming language and style should be chosen according to the user base and application, e.g. Fortran or C for highly-optimized models running on high-performance machines vs interpreted languages like Python or R with easily understandable code, e.g. for educational purposes or working groups without a technical support staff.
-Infrastructural routines should be separated from the scientific code as much as possible. If I/O and variable definitions 545 happen "under the hood", the main routine can be much more easily understood and extended by other scientists. For this very reason, we recommend to develop the model infrastructure as the very first step and only later add the scientific processes.
additional outputs, e.g. log outputs for selected grid cells, help to spot any conceptual or programming errors as early as 550 possible.
-Port one routine at a time and summarize related functions in one file for a better overview. A modular model can be much more easily extended for different applications.
-Write documentation right from the start and not only when the first model version is finished.
-Use a versioning tool (git, svn) to manage the development progress, collaborations and releases.

555
Of course, all this advice comes from personal preferences and other programmers might set different priorities. Nonetheless, we hope it might ease the start on such projects because ultimately, a well structured and maintained open-source model is a valuable asset in hydrological research.