The probabilistic hydrological MARCS (the MARkov Chain System) model: Its structure and core version 0.2

The question of the environmental risks of social and economic infrastructure has recently become apparent due to an increase in the number of extreme weather events. Extreme runoff events include floods and droughts. In water engineering, extreme runoff is described in terms of probability and uses methods of frequency analysis to evaluate an exceedance probability curve (EPC) for runoff. It is assumed that historical observations of runoff are representative for the future; however, trends in the observed time series doubt this assumption. The paper describes a probabilistic hydrological MARCS model that can be applied to predict future runoff extremes. The MARCS model simulates statistical estimators of a multi-year runoff in order to perform future projections in a probabilistic form. Projected statistics of the meteorological variables available in climate scenarios force the model. This study introduces the new model’s core version and provides a user guide together with an example of the model set-up in a single case study. In this case study, the model simulates the projected EPCs of annual runoff under three climate scenarios. The scope of applicability and limitations of the model’s core version 0.2 are discussed.

of a chosen exceedance probability (Arheimer and Lindström, 2015;Veijalainen et al., 2012;Seibert, 1999). In the frequencyanalysis approach, historical yearly time series of runoff are used to evaluate statistical estimators, that is, mean value, the coefficient of variation (CV) and the coefficient of skewness (CS) (van Gelder, 2006). These estimators are applied to calculate the runoff values with the exceedance probability (Guidelines SP 33-101-2003, 2004Guidelines, 1984;Bulletin 17-B, 1982) needed to support the designing of roads, dams, bridges or water-withdrawal stations. The basic assumption of this approach is that the future risks during an infrastructure's operational period are equal to the risks estimated from the past observations. The runoff extremes are simply extrapolated for the next 20-30 years on the assumption that the past observations are representative of the future: the "stationarity" assumption (Madsen et al., 2013).
The number of weather extremes -including hurricanes, wind, rain and snow storms, floods and droughts -has increased (Vihma, 2014;Zhou, 2005, Manton et al., 2001). Historical time series of many climate variables show evident trends, which are statistically significant, and the series of streamflow runoff are among others (Wagner et al., 2011;Dai et al., 2009;Milly et al., 2005). Rosmann et al. (2016) applied the Mann-Kendall test to analyse a time series of daily, monthly and yearly river discharges for the last four decades. The highest number of trends was detected for the yearly time series of annual runoff. The statistically significant trends are founded on historical time series, thus the water engineers and managers are motivated to revise the basic stationarity assumption that lies behind infrastructures' risk assessment since the past observations are representative of the future (Madsen et al., 2013;Kovalenko, 2009;Milly at al., 2008).
In this paper, we described a method that combines the conceptual modelling and frequency analysis in order to estimate the runoff extremes in a changing climate. The method adapts the theory of stochastic systems to the water engineering practice and it was further named as advanced of frequency analysis (AFA). It was introduced by Kovalenko (1993) and relied on the theory of stochastic systems (Pugachev et al., 1974). The basic idea behind the method is to simulate the statistical estimators of multi-year runoff (annual, minimal and maximal runoff) from the statistical estimators of precipitation and air temperature on a climate scale (Budyko and Izrael, 1991). The simulated statistical estimators of runoff are used to construct EPCs with

Model input
Two blocks of the MARCS HYDRO model are needed to analyse and screen the observations. The time series of river runoff and precipitation are required for the period as longer as possible. However, the length of yearly time series on water discharges does not usually exceed 80-90 years. Hydrological yearbooks or runoff data sets provide observations at sites of national hydrological networks, and the river runoff is expressed as a volumetric flow rate (water discharge, m 3 s -1 ). In the data preparation block of the model, the volumetric flow rate (m 3 s -1 ) is converted to a specific water discharge (ARR, mm year -1 ): where Q is a yearly average water discharge (m 3 s -1 ), T is the number of seconds in a year and A is the catchment area (m 2 ). In the data screening block of the model, the yearly time series of ARR are used in the analysis of homogeneity and trends (Dalmeh and Hall, 1990) and to define a period for the model parametrization (called "a reference period" by ). Then, the reference three non-central moments mk ( m k =1/n ∑ i=1 n DR i k for k = 1, 2, 3) are estimated from time series of ARR using the method of moments (van Gelder et al., 2006).
The observations on precipitation are collected from meteorological sites, and they may be interpolated into grids in order to better estimate a precipitation rate over a river basin area. In the data preparation block of the model, the mean annual precipitation rate (mm year -1 ) is calculated from the observed yearly time series for the reference period. The mean annual precipitation rate for the future period can be calculated from an output of any global/regional climate model or even a set of models. In a study on the catchment scale, the time series of water discharges can be extracted from the Global Runoff Data Centre (GRDC) while the precipitation rate can be estimated from gridded data sets (Willmott and Robeson, 1995). These two data sets were used to perform an example of the model application on the Iijoki River basin.

Model cross-validation
The MARCS HYDRO model allows the simulation of the non-central moments of runoff that can be used for the construction of probability distribution (or an EPC), in other words, it provides a probabilistic form of prediction. The end product of the model is the probability density function (PDF) (or the EPC), and there are no simulated time series of runoff to compare with the observations. Kovalenko (1993) suggested comparing the simulated PDF with an empirical PDF by using known statistical tests such as the Kolmogorov-Smirnov test (Smirnov, 1948). In the PHB of the MARCS HYDRO model, a specific cross-validation procedure allows conclusions to be drawn about the model's validation and the quality of hindcasts. For the model's cross-validation, the observed time series of river runoff is divided into two sub-periods, namely the training period and the control period. The splitting year corresponds to the year when a statistically significant difference in mean values is estimated over two periods. In this study, we did not pay much attention to the cross-validation procedure since the model core version 0.2 is described in details in Shevnina et al. (2019). In our study, core version 0.2 of the probabilistic MARCS HYDRO model was suggested instead of version 0.1 . Version 0.2 allows the evaluation of the skewness parameter of the Pearson Type III distribution. In the new core, the non-central statistical moments of the ARR were calculated as follows: where m1, m2 and m3 are the moment estimates of the non-central statistical moments of the ARR; a, b0, b1 and b2 are the parameters of the distributions of the Pearson equation (Andreev et al., 2005).
To set up the MARCS HYDRO model, observations of water discharges are needed. For the reference period (notated by low index r) the moments' estimates for the non-central moments (m1r, m2r, m3r) were first calculated from observed times series of runoff (mm year -1 ), then the non-central moments were used to evaluate the parameters of the Pearson equation a, b0 and b1: Then, the parameters of the linear filter model (see Annex 1 for details) c ,GÑ ,GcÑ denoted with a low index r, were calculated: where N r is the mean of annual precipitation rate (mm year -1 ) estimated from observed time series as an average over any chosen reference period.
To force the MARCS HYDRO model, the outputs from global/regional-scale climate models are needed. Coupled Model Inter- c r =c pr , GÑ r = GÑ pr ,GcÑ r = GcÑ pr (a "basic parametrization scheme" according to Kovalenko, 1993); new parameters of the Pearson equation are calculated from N pr : Finally, the non-central moments of runoff are calculated for the projected period (denoted by a low index pr): It should be noted that in core version 0.2 the linear filter model includes the multiplicative stochastic component (see Annex  Kovalenko (2004), and one of them is already implemented in core version 0.1 .

Model output
In our study, the EPC of runoff was modelled within Pearson Type III distribution. This distribution is commonly used by water engineers to estimate water extremes (Kountrouvelis and Canavos, 1999;Rogdestvenskiy and Chebotarev, 1974;Matalas and Wallis, 1973). The water engineering guidelines provide the ordinates of EPCs from look-up tables (Guidelines, 1984) depending on the CV and CS. These coefficients are calculated from non-central moments' estimates (Rogdestvenskiy and Chebotarev, 1974): The MARCS HYDRO model output includes the estimates of the mean value, CV and CS, calculated for the reference period from observations as well as these estimates simulated from mean precipitation for the projected period. The ordinates of the EPC available from look-up tables then allows the calculation of the runoff values together with their exceedance probability.
6 155 160 165 170 In our study, we chose the basin of the Iijoki River at the Raasakka gauge (Lat 25.4º / Lon 65.3º) in order to give an example of the application of the MARCS HYDRO model on the catchment scale. The Iijoki River is located in north-west Finland, and the Raasakka gauge outlines a watershed area of over 14191 km 2 . The catchment has a small population and there are no hydropower plants of multi-year regulation to affect the natural regime of the annual cycle. Thus, one can expect that historical yearly time series of the annual runoff rate do not contain trends connected to artificial regulation. This case study shows an example of the set-up and output of the probabilistic MARCS HYDRO model.

The MARCS HYDRO model set-up: The reference period
The yearly time series of volumetric water discharge of the Iijoki River were extracted from a dataset of the GRDC (GRDC 56068 Koblenz, Germany). The observations at the Raasakka gauge (ID = 6854600) cover the period 1911-2014, and they do not contain gaps. This period was considered as the reference period. The annual specific water discharge (ARR, mm year -1 ) was calculated from the average volumetric water discharge for each year in the reference period. Then, the noncentral moments were calculated from the yearly time series of the ARR with the method of moments (see Table 1). The reference climatology (the means of precipitation and air temperature) were evaluated from the dataset of NOAA (NOAA/OAR/ESRL PSD, Boulder, Colorado, USA) at a grid node nearest to the watershed centroid (this technique will be discussed in a separate paper, as will the methods of a forcing pre-analysis).

The MARCS HYDRO model forcing: The projected period
Climate scenarios provide a range of projections for temperature and moisture regimes in the future. This range is produced by different assumptions about climate scenarios as well as specific climate models. However, the climate projections include precipitation and air temperature, and they give a forcing to hydrological models in order to simulate projections of runoff. In the case study of the Iijoki River, the data from CMIP5 (Taylor et al., 2012) Table   2). The mean values of the precipitation rate varied by 2-5 % of the model's average over the RCP scenarios; however, these values alter substantially between the climate models. Among the outputs considered, the MPI-ESM-LR model projects the highest changes in the mean values of the precipitation rate compared to the reference period (see Tables 1 and 2 among the climate models (the mean values of the precipitation rate ranges from 619 to 737 mm year -1 ) for the case of the Iijoki River at Raasaka.

The MARCS HYDRO model output: The projected period
The projected non-central moments' estimates were simulated for the scenarios/models listed in Table 2. These estimates were used to calculate the mean value, CV and CS (see Eq. (16-17)) that were included in the output of the MARCS HYDRO model. Table 3 shows the modelling results for the HadGEM2-ES and MPI-ESM-LR global models, where the water discharges with 10 and 90 % exceedance probabilities are given. The ordinates of the Pearson Type III distribution were extracted from the look-up tables used in hydrological engineering (Druzhinin and Sikan, 2001), and they allow expressing runoff as water discharge (m 3 s -1 ). For the Iijoki River at Raasakka, the mean values of ARR and CV vary under the RCP scenarios by over 7 % and 5 % correspondingly. The maximum alteration in the projected mean values of ARR was obtained under RCP85 (619 to 737 mm year -1 ). Under the projections of the MPI-ESM-LR model, the mean ARR increases by over 17 %.
In the case of the Iijoki River at Raasakka gauge, the 10 % water discharge exceedance probability will increase in the future under the scenarios/models considered (see Table 3). It may leads to risks of energy spills at hydropower stations located within the catchment of the Iijoki River in the period 2020-2050. At the same time, risks connected with water shortages may be fewer since they are connected to a 90 % water discharge exceedance probability, which is predicted to increase. Figure 2 shows another way in which the model performs the EPC of the annual runoff rate for the Kyrönjoki River at Skatila gauge (GRDC ID: 6854900). The set of EPCs were simulated under three RCP scenarios using a similar set-up to the MARCS HYDRO model (Shevnina et al., 2019). In the further development of the visualisation block, it would be important to involve water managers and decision makers in order to better outline practical applications for the probabilistic hydrological model.

Discussions
Nowadays, the future vision of the climate is changing continuously. Climate projections are updated almost every 5-6 years and many climate models generate meteorological projections for variables such as precipitation and air temperature. Hydrological models are needed to perform an "express analysis" about future changes in water resources and water extremes (floods and droughts) on a climate scale. The climate scale means that the express analysis is provided for a period of 20-30 years. Lumped or semi-distributed physically based hydrological models are traditionally used on a short-term or seasonal scale to simulate a runoff time series from a time series of meteorological variables (Seibert, 1999). In many catchment-scale hydrological studies, these models are driven by the outputs of climate models or their ensembles in order to evaluate water resources and extremes in the near future (Arheimer and Lindström, 2015;Veijalainen et al., 2012;Yip et al., 2012). The simulation of the runoff time series from a time series of meteorological variables (see Fig. 2 in Veijalainen et al. [2012]) leads to high computational costs for such estimations that need to be served in terms of probability in economical applications (Murphy, 1976). The probabilistic MARCS HYDRO model is computationally cheaper when compared to lumped or semi-distributed physically based hydrological models. It can easily be coupled with global and regional climate models, and it can provide the express analysis of water resources under a modern version of the future climate.
In this paper we described the structure for the probabilistic hydrological MARCS HYDRO model, together with the AFA method that lies behind the new model's core version 0.2. The AFA method has a more than 25-year-long history; however, most of the studies are published in Russian (Kovalenko, 1993;2004;Kovalenko et al., 2010). The AFA method is based on the statistical theory of automatic systems (Pugachev et al., 1974), which is an outsider among the "classical hydrological" disciplines. The AFA method is one simplification of the Fokker-Plank-Kolmogorov equation approach that has been developed in the Russian State Hydrometeorological University. It has been tested in many case studies on river basins located in Russia, Colombia, Bolivia, Mali etc. There are also a number of publications in English (Rosmann and Domínguez, 2017;Kovalenko, 2014;Domínguez and Rivera, 2010;Viktorova and Gromova, 2008). In this manuscript we formulated the theory logically in an attempt to provide the equations for the new core 0.2 of the MARCS HYDRO model; however, it also needs to describe the AFA method that lies behind it.
The probabilistic hydrological MARCS HYDRO model includes the core versions 0.1 and 0.2. In both cores, only three noncentral moments are evaluated to construct the EPC within the theoretical distribution the Pearson III Type, which is among the traditional distributions of the frequency and risk analysis in hydrology (Kite, 1977;Rogdestvenskiy and Chebotarev, 1974;Sokolovskiy, 1968;Elderton, 1969;Benson, 1968). The model simulates three estimates of non-central moments of runoff instead of a runoff time series, and this circumstance makes the computations by the MARCS HYDRO model "low cost" compared to conceptual hydrological models (Arheimer and Lindström, 2015;Veijalainen et al., 2012). The MARCS HYDRO model allows putting the projections of runoff in terms of probability, that is, they appear as runoff values together with their exceedance probability.
The MARCS HYDRO model includes six modules, and each module allows improvements by including new methods. In this paper, the new model -core version 0.2, extended to simulate the third statistical estimator (skewness) -is presented. The applicability of core version 0.2 is limited by the assumptions behind the AFA approach. Among others, there is the "quasi-stationary" assumption for the expected climate change. In this case, the climate is described by the statistical estimators (i.e. mean value, variability etc.) of precipitation, air temperature, evapotranspiration, river runoff etc. for the period of 20-30 years. It is assumed to consider two time period periods with statistically different climates, namely the reference period and the projected period. Another limitation is connected to the linear filter stochastic model (for details, see Annex 1) used in core version 0.2. It should be noted that there is a multiplicative component in the model core, and it may lead to unstable solutions of the Fokker-Plank-Kolmogorov equation. Kovalenko (2004) suggests two solutions that result in stable solutions of the Fokker-Plank-Kolmogorov equation. One of the solutions was given by Kovalenko et al. (2010) and is coded in model version 0.1 . However, a checking procedure needs to be applied before using this core version. In the checking procedure we plan to use the "beta criterion" method suggested by Kovalenko (2004) to further develop the MARCS HYDRO model. Recently, only the basic parametrization scheme (Kovalenko, 1993) has been included. This basic scheme gives over 70-80 % successful hindcasts ("forecasts in the past") in the model cross-validation , and the implementation of a regionally oriented parametrization scheme (Shevnina, 2011) is the next step. It needs to include a mean value of the air temperature of the parameter, connected to "noised" watershed physiography in Eq. (A.4), the inverse of the runoff coefficient in the work of Kovalenko (1993). It is also important to study the role of the spatial resolution of meteorological forcing in affecting the modelling uncertainties for the simulated mean, CV and CS of runoff.
To fine the probabilistic MARCS HYDRO model among other hydrological models, its practical applications needs to be better outlined. The model serves a probabilistic form of long-term hydrological projections, and they require adaptation to the needs of water engineers and water managers as a tool for risk analysis under the expected climate change. The projected EPCs of multi- year river runoff can be applied in designing bridges, pipes, dams etc. in order to minimise the future risks connected to extreme floods Kovalenko et al., 2014;Kovalenko, 2009) and to water shortage due to droughts (Viktorova and Gromova, 2014). It is important to define informative forms for the outputs of the MARCS HYDRO model that can be adapted to the needs of a practice, and the development of the block of economic application is among the others studies that are to be continued in close cooperation with water managers and decision makers.

Conclusions
The Currently, the MARCS HYDRO model code is hosted at https://github.com/ElenaShe000/MARCS , with details of its applications for catchment-scale case studies. The model source code for core version 0.2 is distributed under the Creative Commons Attribution 4.0 License and can be downloaded from the link https://zenodo.org/record/1220096#.WyTXxxxRVhw and used freely in scientific research with reference to this publication. We hope that this type of license provides the best way to create a community of motivated people to further develop the model. Then, the source code will be distributed under the terms of a user agreement.

Data availability
The following data sets can be used to set up and force the MARCS model: the GRDC (GRDC, 56068 Koblenz, Germany), the NOAA/OAR/ESRL PSD (Boulder, Colorado, USA) and CMIP5 (Taylor et al., 2012).

Sample availability
The sample data set for the case study of the Iijoki river at Raasaka is given at this site: https://zenodo.org/record/1220096#.WyTXxxxRVhw.
Annex 1. The theoretical basis for core version 0.2

A1.1 The assumptions behind advance of frequency analysis
Advance of frequency analysis (AFA) is based on the theory of stochastic systems, specifically, the Fokker-Plank-Kolmogorov equation, which is simplified into a system for three non-central statistical moments (Pugachev et al., 1974).
The time series of annual runoff is considered as a realisation of a random-process Markov chain type that is assumed to be "stationary". It means that the statistical estimators (mean, variance and skewness) do not change over the period considered.
The statistical estimators are used to model an exceedance probability curve (EPC) of the annual runoff with Pearson Type III distribution. The AFA approach is developed with an assumption of "quasi-stationary" (Kovalenko et al., 2010, Kovalenko, 1993. The quasi-stationary assumption suggests that the statistical estimators of multi-year runoff are different for two periods (the reference period and the projected period). For the reference period, the statistical estimators are evaluated from historical yearly time series of runoff. For the projected period, the statistical estimators of runoff are simulated based on the outputs of global-or regional-scale climate models under any climate scenario.

A1.2 The linear filter stochastic model
In this context, models replace a complicate hydrological system using maths abstractions and aim to reveal the spatial and temporal runoff features which are important depending on the goals of study. Among other models, "black box" 11 300 305 310 315 320 hydrological models consider a river basin as a dynamic system with lumped parameters. These models are "based on analysis of concurrent inputs and temporal output series" (WMO-№168, 2009) and transform series of meteorological variables (precipitation, air temperature) into series of runoff. Both input and output series are functions of time (WMO-№168, 2009): a n (t) d n Q dt n + a n−1 (t) where Q is the runoff in volumetric flow rate, P is the precipitation in volumetric flow rate (rain, snow melt) and the coefficients a i and b i are the empirical parameters of a translating system. These coefficients are the lumped parameters of the black box model. The solution to Eq. (A.1) for zero initial conditions gives: where the function h (t,τ ) represents the response of a river basin at time t to a single portion of precipitation at time P . In the AFA approach, a river basin is considered as a linear system, transforming the annual precipitation into the annual runoff: On the other hand, a river basin can be considered as a linear system with stochastic components in the input function and the model parameter (Kovalenko, 1993): where a 0 (t)=c +c (t) is the stochastic parameter of the system (a "noised" watershed physiography, the inverse of runoff is the stochastic input for the system (a "noised" precipitation), and a 1 =1 . The stochastic components of c (t ) and Ñ (t ) are the Gaussian "white noise" with zero means, and their intensities are Gc , GÑ . The intensities are mutually correlated as KсÑ ( τ) = E (с (t )Ñ (t + τ) )= GсÑ δ (τ ) . It should be noted, that the multiplicative It limits the application of the AFA method (Kovalenko, 1993). Kovalenko (2004) suggests two solutions, and we will introduce them in a further paper.

A1.3 The Fokker-Plank-Kolmogorov equation and simplifications
The Fokker-Plank-Kolmogorov equation can be applied to simulate the probability density function (PDF) for the stochastic Q(t) in Eq. (4) (Kovalenko, 1993;Pugachev, 1974): where p (Q,t ) is the PDF of Q at time t; and the drift coefficient ( A ( Q ) ) and diffusion coefficients ( B (Q ) ) are calculated as follows (Kovalenko, 1993;Pugachev, 1974): The integrated within limits from −∞ to + ∞ by Q (however, it is supposed that Q > 0 ): Then, ψ (Q) was replaced with ψ (Q) =Q k , and Eq. (A.8) was written as: For a stationary random process dm k ( t ) / dt=0 , and the drift and diffusion coefficients are constant. Thus, Eq. (A.9) was simplified as follows: For k=1: Further, the summands in Eq. (10-11) were divided by ( 2c + Gc ) , and new notations were introduced as suggested in the work of Kovalenko (1993) and Pugachev et al. (1974): Gc .

A1.3 Notations
There are too many notations used to describe the model's core version 0.2, thus the secondary parameters of equations were grouped by the model behind it. 3) is a simplification of Eq. (A.1) that limits the first order ordinal differential equation. It includes three parameters, a 0 , a 1 and b 0 , and two of them are assumed to be noised. These noised parameters include a constant component (indicated with a bar) and a Gaussian white noise component (indicated with a tilde) with their own intensities.
the lumped parameters of "block box" model, i = 0 and 1 c +c (t) the inverse of runoff coefficient: c is constant component, c ( t ) is the Gaussian "white noise" N +Ñ (t ) precipitation: N is constant component, Ñ (t) is the Gaussian "white noise" Gc , GÑ the intensities of the Gaussian "white noise" GсÑ δ (τ ) the correlation function for the mutually delta-correlated processes c ( t) and Ñ (t )

Annex 2. A short user guide the MARCS model
To set up the model for a single river catchment, the non-central moments should be calculated from historical time series of the annual river runoff rate as well as from a mean value of annual precipitation rate. These values should be placed manually (lines 45-48 in model_core.py located at https://zenodo.org/record/1220096#.WyTXxxxRVhw) as should the ID number of catchment (line 51 of model_core.py). To force the model, the projected mean value of the annual precipitation rate should be evaluated from an output of a climate model, and then the model_core.py can be run in the Unix command line: ./model_core.py XXX (where XXX is the mean of the annual precipitation rate for the projected period). The output of model_core.py is stored in the output file model_GPSCH.txt and included in line with the following format: the ID of catchment, the first non-central moment estimate of annual runoff rate (mm year -1 ) for a reference period, the mean value of annual precipitation rate (mm year -1 ) for a reference period, the coefficient of variation for a reference period, the coefficient of skewness for a reference period, the model parameters c , GÑ , GcÑ , the the first non-central moment estimate of annual runoff rate (mm year -1 ) for a projected period, the mean value of annual precipitation rate (mm year -1 ) for a projected period, the coefficient of variation for a projected period, the coefficient of skewness for a projected period. Notes: m 1r ,m 2 r , m 3r are the moments of runoff as well as the mean of precipitation (N r ) were evaluated from observations. The mean air temperature (T r ) * was not used in the model set up in case of the Iijoki River, however this value allows advancement of the model parametrization .