RRAWFLOW : Rainfall-Response Aquifer and Watershed Flow Model ( v 1 . 11 )

Abstract. The Rainfall-Response Aquifer and Watershed Flow Model (RRAWFLOW) is a lumped-parameter model that simulates streamflow, spring flow, groundwater level, or solute transport for a measurement point in response to a system input of precipitation, recharge, or solute injection. I introduce the first version of RRAWFLOW available for download and public use and describe additional options. The open-source code is written in the R language and is available at http://sd.water.usgs.gov/projects/RRAWFLOW/RRAWFLOW.html along with an example model of streamflow. RRAWFLOW includes a time-series process to estimate recharge from precipitation and simulates the response to recharge by convolution, i.e., the unit-hydrograph approach. Gamma functions are used for estimation of parametric impulse-response functions (IRFs); a combination of two gamma functions results in a double-peaked IRF. A spline fit to a set of control points is introduced as a new method for estimation of nonparametric IRFs. Several options are included to simulate time-variant systems. For many applications, lumped models simulate the system response with equal accuracy to that of distributed models, but moreover, the ease of model construction and calibration of lumped models makes them a good choice for many applications (e.g., estimating missing periods in a hydrologic record). RRAWFLOW provides professional hydrologists and students with an accessible and versatile tool for lumped-parameter modeling.


Lumped vs. distributed models
Hydrologic models that commonly are referred to as "lumped-parameter" or "lumped" models generally have a small number of parameters, each representing a property of the entire hydrologic system; conceptually, many physical processes are lumped into a few parameters.In contrast to lumped models, distributed models discretize the system into small compartments or cells, each of which has several parameters defined.All hydrologic models, however, are lumped to some degree.Models that frequently are considered physically based simulate numerous small-scale physics by lumping these processes into simplified mathematical forms (Beven, 1989).The use of the term "physically based" to describe any hydrologic model, therefore, should be discouraged (Beven and Young, 2013).Both distributed and lumped models, however, have components that can represent different hydrologic processes that can be interpreted in physically meaningful ways (Beven and Young, 2013).For example, the impulse-response function (IRF) estimated in many lumped models represents the physical response to an impulse into the system and provides mechanistic insights into that system, including the peak response time and magnitude and the hydrologic memory of the system (von Asmuth and Knotters, 2004;Beven and Young, 2013;Young, 2013).The IRF could be measured directly at the outflow point (e.g., a spring) if a short, intense recharge event follows a long, dry period.Most commonly, however, the outflow, or system response, results from a series of superposed responses to repeating recharge events, and the lumped model is used to estimate the IRF iteratively and to simulate the system response.
Published by Copernicus Publications on behalf of the European Geosciences Union.
In a comparison of lumped models to distributed models, Reed et al. (2004) concluded that lumped models had better overall performance than distributed models but also cited several other studies indicating that distributed or semi-distributed models may or may not provide improvement over lumped models.In another comparison, Smith et al. (2013) concluded that distributed models provided improvements over lumped models in 12-24 % of the cases tested, depending on the criteria of evaluation.The mixed results of these comparisons indicate that lumped models are a good choice when the objectives do not require a distributed model.
A major advantage of a lumped model is its ease of construction and calibration because of the small number of parameters to estimate and because there is no need to assemble large data sets representing the physical properties of the system.Lumped models are useful for karst aquifers, where the geometry of the conduit network frequently is unknown.Lumped models provide an efficient means to simulate the response to possible future changes in the system input (e.g., precipitation).A lumped groundwater model might be more effective than a distributed model in regard to conditional validation and predictive modeling because of its simplicity, as will be discussed.The primary advantage of distributed models is to simulate the response to possible changes within the system, such as urban development or increased groundwater pumping, for example.The choice to use a lumped or distributed model, therefore, depends on a study's objectives and available resources; a lumped model likely is the better choice when it meets the study's objectives.

RRAWFLOW overview
The Rainfall-Response Aquifer and Watershed Flow Model (RRAWFLOW) is a lumped model that is partially based on unit-hydrograph theory applied to streamflow (Nash, 1959).RRAWFLOW simulates a time-series record for a measurement point of streamflow, spring flow, groundwater level, or solute transport in response to a system input of precipitation, recharge, or solute injection.A preliminary version of RRAWFLOW was developed by Long and Mahler (2013) and used to classify karst aquifers and characterize time-variant systems.This preliminary version also was used by Symstad et al. (2015) to simulate future scenarios of streamflow and groundwater level in a cave in Wind Cave National Park, USA, and by the US Geological Survey to simulate future scenarios of spring flow and groundwater levels (https://nccwsc.usgs.gov/display-project/4f8c652fe4b0546c0c397b4a/52d5615ae4b0f19e63da8647).Although this preliminary version was applied primarily, but not exclusively, to karst, RRAWFLOW is suitable for aquifers and watersheds of any type, and non-karst systems generally are easier to model than karst systems.Convolution, as used in RRAWFLOW, has been applied extensively to non-karst surface-water and groundwater systems (e.g., Nash, 1959;Blank et al., 1971;Delleur and Rao, 1971;Dooge, 1973;Neuman and de Marsily, 1976;Maloszewski and Zuber, 1982;Besbes and de Marsily, 1984;Beven, 1989;Jakeman and Hornberger, 1993;von Asmuth et al., 2002;Reed et al., 2004;von Asmuth and Knotters, 2004;Olsthoorn, 2008;Jurgens et al., 2012;Smith et al., 2013).
The purpose of this paper is to present a new version of RRAWFLOW with added functionality, to make the code publicly available, and to guide users in its operation.New functions in this version include (1) the gamma function for parametric IRFs, (2) a spline curve or straight-line segments fit through a set of control points for nonparametric IRFs, (3) a new option for time-variant systems that uses a continuously changing IRF scale, (4) two methods to determine wet and dry periods, and (5) any user-defined IRF.To my knowledge, the spline-curve method previously has not been used for the IRF.The RRAWFLOW open-source code written in the R language (http://www.r-project.org/index.html)can be downloaded from http://sd.water.usgs.gov/projects/RRAWFLOW/RRAWFLOW.html along with a user's manual, example model, and a quick-start guide for the R novice.
Time-invariant and time-variant systems were described by Jenkins and Watts (1968).For example, Larocque et al. (1998) described high-flow periods exhibiting distinctly different response characteristics from low-flow periods.RRAWFLOW includes several options to simulate timevariant systems that generally are not available for distributed watershed models (e.g., PRMS; http://wwwbrr.cr.usgs.gov/projects/SW_MoWS/PRMS.html).If a distributed model is required for a specific study, RRAWFLOW might be a complimentary exploratory tool to analyze the system's sensitivity to time-variant response characteristics.RRAWFLOW is useful for estimation of missing periods in a hydrologic record and as an educational tool for hands-on instruction of some of the basic principles in hydrology.Several example applications that demonstrate model options and calibration and validation procedures are included herein.Input, output, and calibration files are available from the RRAWFLOW website for one of these examples.

The model
The model's time-step interval is determined by the input data record, which must have equal time steps, and model output is generated for the same time step.Hydrological and meteorological data commonly are available for a daily time step, which is suitable for most simulations over a time frame of months to decades.Time steps shorter than 1 day can be used when high-resolution responses are of interest.Any time-step interval can be used because the equations are not time-unit specific.However, the time step should be equal to or less than the quickest identifiable response of interest; longer time steps will result in a loss of infor- System input 1 = system input is precipitation that results in recharge to the system (Eqs.A1-A4). 2 = system input is recharge estimated outside of RRAWFLOW (skip Eqs.A1-A4).3 = system input is solute concentration (skip Eqs.A1-A4).
System output 1 = system output is groundwater level. 2 = system output is spring flow or streamflow.3 = system output is solute concentration (System input = 3).
Time-variance (TV) option 1 = time-invariant (static) IRF. 2 = wet-period IRF and dry-period IRF are defined separately, and each are time invariant within these respective periods.3 = variable IRF vertical scale, where β in Eq. ( 1) is variable.
Wet-switch option 0 = the wet and dry periods are provided by the user. 1 = wet and dry periods are calculated by RRAWFLOW.Any calendar year when the mean precipitation is above the mean for the entire precipitation record is a wet year, and other years are dry years.2 = wet and dry periods are calculated by RRAWFLOW according to the annual cumulative departure from mean precipitation.
mation about response dynamics (Jakeman and Hornberger, 1993).RRAWFLOW also is independent of specific units for flow, water level, or solute concentration, and the user should maintain unit consistency.Air temperature is always in • C.

Precipitation recharge
Effective precipitation for a watershed is the amount of precipitation that results in streamflow exiting the watershed.This consists of infiltration to groundwater below the root zone that reemerges as streamflow, spring flow, shallow groundwater interflow, and overland runoff.Processes that apply to effective precipitation for watershed modeling also apply to infiltration recharge to groundwater that causes a response in spring flow or groundwater level, except that overland runoff generally does not contribute to groundwater recharge.In RRAWFLOW, the term "recharge" is used for both watershed modeling and groundwater modeling.Methods used in RRAWFLOW to simulate precipitation recharge are described in Long and Mahler (2013), and the equations also are presented in Appendix A herein for convenience and reference from the RRAWFLOW user's manual.

Other recharge options
Recharge estimated outside of RRAWFLOW can be used as model input.For example, this applies to precipitation recharge estimated by a soil-water-balance model (e.g., Westenbroek et al., 2010) or sinking-stream recharge in karst aquifers that can be estimated by methods such as those described by Hortness and Driscoll (1998).This is system-input option 2 (Table 1).

Convolution
Convolution is a time-series operation (Jenkins and Watts, 1968;Smith, 2003) that is commonly used in non-distributed hydrologic models to simulate streamflow, spring flow, or groundwater level in response to recharge (e.g., Nash, 1959;Dooge, 1973;Dreiss, 1989;Olsthoorn, 2008).The use of convolution in modeling also has been described as a linearreservoir model and a transfer-function model (e.g., Nash, 1959;Young, 2013;von Asmuth et al., 2002).The discrete form of the convolution integral for uniform time steps used in RRAWFLOW is where h i-j is the IRF; u j is the input, or forcing function; j and i are time-step indices corresponding to system input and output, respectively; N is the number of time steps in the output record; β j is an optional time-varying IRF scaling coefficient; ϕ i represents the errors resulting from measurement inaccuracy, sampling interval, or simplifying model assump-tions; and d 0 is a hydraulic-head datum used in simulation of groundwater levels.d 0 is the level to which hydraulic head would converge if the local recharge was eliminated.Local recharge is assumed to be the only forcing that results in hydraulic-head fluctuation or that causes hydraulic head to rise above d 0 .The errors ϕ i are not explicitly simulated but are shown in Eq. (1) for clarity.
The quantity i-j represents the delay time from impulse to response, and the IRF represents a distribution of these delay times.In RRAWFLOW, the input function u j can be recharge or input of a solute.The system response y i can be streamflow exiting a watershed, spring flow from a groundwater system, groundwater level, or solute concentration at an outlet.Physically, the IRF is the system response y i per unit impulse of u j and also can be described as the response produced by a system when the input is a delta function (Smith, 2003).Conceptually, convolution is the superposition of a series of IRFs that are initiated at the time of each impulse of u j and are scaled proportionally by the magnitude of the corresponding impulse (Fig. 1).

Solute transport
RRAWFLOW can simulate transport of a solute, similarly to the approach of Maloszewski and Zuber (1982).In this case, the user-provided system input u j is the solute concentration.A variable recharge rate is not considered, i.e., the concentration is not weighted by a variable recharge rate.The response in solute concentration at the outlet of a system is simulated by the convolution integral (Eq. 1) with the IRFs described in the following section.Convolution temporally disperses a system input of a solute, according the IRF characteristics, at the system outlet.This is system-input option 3 (Table 1).

Impulse-response function
The IRF characterizes the relation between system input and output by convolution (Eq. 1) and has been described by other terms, including instantaneous unit hydrograph, transfer function, and kernel (e.g., Nash, 1959;Dreiss, 1989;Berendrecht et al., 2003;Smith, 2003;Jukić and Denić-Jukić, 2006).However, the term "transfer function" should only be applied to the Fourier transform of the IRF (Smith, 2003).The IRF of a hydrologic system can be approximated by a parametric function, where its shape is defined by one or more parameters, or a nonparametric function that is not constrained by common curve types.

Parametric IRFs
Parametric functions that have been used to approximate the IRF for hydrologic systems include exponential, lognormal, and gamma functions (Nash, 1959;Besbes and de Marsily, 1984;Jakeman and Hornberger, 1993;von Asmuth et al., 2002;Berendrecht et al., 2003;von Asmuth and Knotters, 2004;Long, 2009;Long and Mahler, 2013).The gamma function is equivalent to the Pearson type III function: the three parameters of the Pearson type III function can be combined into the two parameters of the gamma function (Haan, 2002).Estimation of parametric IRFs generally consists of model-calibration techniques to optimize the parameters with the aim of minimizing the difference between the observed and simulated system response, i.e., fitting the model.The parametric functions previously described (other than Pearson type III) have one or two of these fitting parameters.As the number of fitting parameters increases, the risk of over-fitting the model also increases, i.e., fitting the errors ϕ in Eq. (1).
For a parametric approximation of the IRF, RRAWFLOW uses the gamma function: where λ and η are unitless shape parameters, and the mean and variance are η/λ and η/λ 2 , respectively.Equation ( 3) is approximated in RRAWFLOW by the discrete form where t is time centered on each discrete time step, t 0 and N are time centered on the initial and final time steps, respectively, and t is the time-step duration.The gamma function can produce a variety of shapes, including exponential (η = 1), reverse-J (η < 1), and positively skewed shapes with a peak at t = (η − 1)/λ (Fig. 2; Haan, 2002).The gamma function can produce nearly identical shapes to those of the lognormal function when η > 1 and, therefore, can produce nearly all possible shapes of the exponential and lognormal functions combined when η ≥ 1, plus the additional reverse-J shape when η < 1.The RRAWFLOW option to use parametric IRFs is specified as IRF type 1 (Table 1).
The gamma function (Eq.2), which has an area under its curve of unity, requires the additional scaling coefficient ε for use as the IRF in many hydrologic applications: where ε (unitless) compensates for hydrologic systems that do not have a one-to-one relation between system input and output (Olsthoorn, 2008).For example, if (1) the system input u j is in cubic meters per day of recharge, (2) the system response y i is spring flow with the same units, and (3) 100 % of this recharge emerges as spring flow with nothing else contributing to spring flow, then ε would be set to unity.
For most other hydrologic applications, ε would not equal unity.Similarly, if 100 % of a solute entering the system does not exit the system at the observation point, then the area under the IRF should be less than unity (ε < 1).Maloszewski and Zuber (1982) simulated solute transport with IRFs that were approximated by the exponential or dispersion-model functions.The gamma function has a similar shape to that of the dispersion-model function and could be used as an approximation of the dispersion-model function, or the exact dispersion-model function can be provided to RRAWFLOW as a user-defined IRF.RRAWFLOW allows the use of as many as two superposed gamma functions, herein referred to as double-gamma IRFs, to produce additional IRF shapes such as a doublepeaked curve; several examples of these are shown in Long and Mahler (2013), except with different combinations of lognormal and exponential functions.Approaches similar to this have been used to represent the components of quick flow and slow flow in watershed modeling (Jakeman and Hornberger, 1993) and for conduit and diffuse flow in karst systems (Pinault et al., 2001;Long, 2009;Long and Mahler, 2013).In these examples, each parametric function repre-sents one of two flow components.The use of a doublegamma IRF also might be useful when a single function cannot produce the necessary IRF approximation (e.g., an extralong tail).The scaling coefficient ε can be set to different values for the two gamma functions, e.g., to allow for a larger component of slow flow than of quick flow.

Nonparametric IRFs
The process of determining an unknown IRF from observed system input and output data is known as deconvolution (e.g., Neuman and de Marsily, 1976).To define a nonparametric IRF, an ordinate value is defined for each time step, and any shape desired is possible.Deconvolution methods include Fourier harmonic time-series analysis (Blank et al., 1971;Delleur and Rao, 1971), linear programming (Neuman and de Marsily, 1976), and time-moment analysis (Dreiss, 1989).Estimations of nonparametric IRFs by model calibration include those described by Pinault et al. (2001) and Jukić and Denić-Jukić (2006).A potential problem with nonparametric IRFs is that hundreds or even thousands of IRF ordinates may be needed to define the IRF, depending on the IRF length and time step.Optimization of each individual ordinate would result in a mathematically underconstrained and over-fit model.An extreme example of over-fitting is to determine the IRF by means of deconvolution in the frequency domain (Smith, 2003) that results in a numerically perfect model fit but also an IRF that commonly is highly oscillatory and cannot be explained physically (Blank et al., 1971;Delleur and Rao, 1971) because the errors ϕ (Eq. 1) are included in the fitting process.Filtering the IRF in the frequency domain (i.e., transfer function; Smith, 2003) or smoothing the IRF in the time domain (Long and Derickson, 1999) are options for IRF estimation by Fourier analysis, which may require trial-anderror calibration.Further, an over-fitted model results in a poor model fit when tested on a conditional validation period that was sequestered from the fitting process.Pinault et al. (2001), Jukić and Denić-Jukić (2006), and Ladouche et al. (2014) described different methods to constrain the nonparametric IRF and reduce the number of fitting parameters.
The method proposed herein uses a small number of ordinates to define a smoothly shaped nonparametric IRF: ordinates of the IRF are defined at spaced intervals (IRF control points), and a spline curve is fit through these points (Fig. 3) (IRF type 2, Table 1).Another option is to apply straight-line segments connecting the control points (IRF type 3, Table 1).Similar to parametric IRFs, these two nonparametric options are convenient for the estimation of the IRF through model calibration and conditional validation because of the ability to control the number of fitting ordinates.If a model is suspected of having been over-fit, the number of control points should be reduced; this consists of increasing the controlpoint intervals, resulting in a smoother shape, or by reducing the tail length by setting posterior control points to zero.Trial and error generally is required to determine the opti- mum number of control points for a given application.The minimum number of control points is two: at least one to define the non-zero part of the curve and one to define where the function becomes zero.
Another option allows a predefined IRF to be supplied to RRAWFLOW if the IRF is determined by some other method (IRF type 4, Table 1).The scaling coefficient ε is not used for nonparametric IRFs because the area is defined by the ordinate values.

Linearity and time variance
The terms "linear", "nonlinear", "time variant", and "time invariant" may cause some confusion.Estimated recharge (Appendix A) is a nonlinear process, where recharge as a fraction of precipitation varies with antecedent soil-moisture conditions.Convolution (Eq.1), which simulates the system response to recharge, is a linear system (Jenkins and Watts, 1968;Dooge, 1973).This linear system can be either time variant or time invariant, depending on whether or not the IRF changes with time (Jenkins and Watts, 1968).Most commonly, a time-invariant (i.e., static) IRF is assumed in hydrologic convolution models (e.g., von Asmuth et al., 2002;Denić-Jukić and Jukić, 2003).In many hydrologic systems, however, the IRF changes with changing climatic conditions, resulting in a change in response characteristics (Larocque et al., 1998;Long and Mahler, 2013).Additional details and examples of time-variant IRFs for hydrologic applications include Pinault et al. (2001), Jukić and Denić-Jukić (2006), and Long and Mahler (2013).

Time-variance (TV) options
RRAWFLOW has three options for time variance in convolution (Table 1).In TV option 1, the IRF is time invariant, or static.TV option 2 applies a time-variant IRF, similarly to the method proposed by Long and Mahler (2013), which uses a minimal number of fitting parameters but also repre-sents the dominant transient characteristics of the system.In this method, the system-input record is separated into climatically wet or dry periods.One IRF represents all of the wet periods, and the other represents all of the dry periods.The IRF scaling variable β (Eq. 1) is set to unity for TV options 1 and 2.
All of the parametric and nonparametric IRF-type options previously described can be used in TV option 2 (Table 1).An advantage of this method is that both the size and shape of the IRF can change, while the fitting parameters are kept to a minimum, because IRFs are not defined continuously but rather for two different periods only.A potential disadvantage of this method is that the IRF changes abruptly between wet and dry periods; however, this was not a detrimental factor for several models in which this method was applied (Long and Mahler, 2013).Also, the superposition of many responses applied in convolution results in smooth transitions in the simulated response between wet and dry periods.Jukić and Denić-Jukić (2006) proposed a similar time-variant approach, where three different IRFs were applied to one of three different hydrologic periods determined by an index of antecedent recharge.Pinault et al. (2001) varied the IRF's vertical scale continuously with hydraulic head.However, because hydraulic head also is used for model calibration, this approach cannot undergo conditional validation or be used to simulate periods without observed system-response data, e.g., future periods that might be simulated with climate projections.TV option 3 in RRAWFLOW (Table 1) is similar to the approach of Pinault et al. (2001), except that the IRF scaling variable β (Eq. 1) varies according to the input for convolution u j (e.g., recharge) by where x j is the moving average (MA) of u j (Eq. 1) that is scaled to range from 0 to 1, and m determines the range of β.An MA of u j is used so that the IRF transitions smoothly.Generally, β is assumed to vary directly with x (m > 0).Advantages of this method are that it requires fewer fitting parameters than TV option 2 and the IRF does not change abruptly; the disadvantage is that only the vertical scale of the IRF changes, whereas the shape is static.All of the parametric and nonparametric IRF-type options previously described can be used in this option.TV option 3 has longer run times than TV options 1 or 2 because of the additional computation required, mainly within the convolution loop.
For time-invariant systems, the cross-correlation function (CCF) has the same shape as the IRF but only if the input to the convolution process is completely random (Jenkins and Watts, 1968).If the convolution input has a strong autocorrelation, typical of recharge in hydrologic systems, then there is large error in using the CCF to estimate the IRF (Jenkins and Watts, 1968;Bailly-Comte et al., 2011) and therefore should be avoided.

Determining wet and dry periods
RRAWFLOW includes two options to determine wet and dry periods on the basis of the precipitation input record when using TV option 2. The first option assigns each calendar year to a wet period if the annual mean precipitation is greater than the overall mean P mean for the entire input record, and other years are set to dry years (wet-switch option 1, Table 1).The second option sets wet and dry periods according to the slopes of a cumulative precipitation function in which upward or downward slopes indicate wet or dry periods, respectively (wet-switch option 2, Table 1).This option (1) calculates a record, at the model time step, of the cumulative departure from P mean , (2) calculates the annual mean cumulative departure (CD mean ) from this record, (3) sets each time step within a year to wet if an increase in CD mean from the previous year occurred, and (4) sets all other periods to dry.Finally, this record of wet and dry time steps is shifted backward in time by 183 time steps, by default, which is 6 months if daily time steps are used.The reason for this shift is that the wet periods for option 2 tend to lag behind those of option 1 if no shift is applied.However, because the two methods are calculated differently, wet periods from option 2 can begin either before or after those of option 1, depending on the year; the same is true for the beginning of dry periods.The shift can be changed in the RRAWFLOW code if desired by editing the parameter "shift."A third option allows the user to provide a record of wet and dry periods in the model input (wet-switch option 0, Table 1).

Model outputs
Model outputs consist of time series for simulated system response y i , the dry-period and wet-period IRFs (if using TV option 2), the soil-moisture index s i (Eq.A1, Appendix A), and the input to convolution u j .Other outputs consist of a coefficient of efficiency E to measure the similarity between simulated and observed system response (residuals) and the hydrologic memory of the system.This system memory is the time that the response to an impulse effectively persists, which is defined by the length of the IRF.Because the gamma function is asymptotic and has infinite length, system memory is arbitrarily defined in RRAWFLOW as time t m on the IRF time scale at which 95 % of the curve area is in the range 0-t m .

Evaluating model fit and over-fitting
The calibration period is the period of the data record used to calibrate the model.By default in RRAWFLOW, the conditional validation period is the part of the data record following the calibration period that is used to test the model calibration against system-response observation data not used in calibration (i.e., model prediction of streamflow or spring flow).Assessing the conditional validation period is an in-dication of the expected model performance to predict a future period on the basis of climate simulations, for example.Moreover, this assessment indicates if the model is being over-fit.This validation is considered conditional because the model cannot yet be tested against additional observational data that will be available in the future (Beven and Young, 2013).RRAWFLOW calculates a modified form of the Nash-Sutcliffe coefficient of efficiency (Nash and Sutcliffe, 1970;Legates and McCabe, 1999) to quantify model fit, as proposed by Long and Mahler (2013).This modification calculates the coefficient of efficiency E for a partial period, either calibration or conditional validation, in a manner that allows the two periods to be compared directly: where y obs and y sim are time series of the observed and simulated system responses, respectively; y mean is the mean value of y obs ; the subscripts p and T refer to the partial and total periods, respectively; and l is the time length of the respective period.Conceptually, E is the ratio of the magnitude of model residuals (numerator) to the overall variability in the observation record (denominator) subtracted from unity and theoretically can vary from −∞ (poorest fit) to unity (perfect fit).
In addition to quantifying model fit, E provides a useful way to evaluate possible over-fitting of the model.Although model fit for the calibration period might improve as parameters are added, if the validation period indicates that this added complexity is not helpful, the model has been over-fit (von Asmuth et al., 2002).To test this condition, E is calculated for the calibration and conditional validation periods separately (E cal and E val ) by the modified Nash-Sutcliffe coefficient of efficiency (Eq.7), which makes E cal and E val directly comparable (Long and Mahler, 2013).This method is particularly important for comparison of two periods with different fluctuation amplitudes.
Legates and McCabe (1999) describe limitations of correlation-based measures to quantify model fit, such as the coefficient of determination R 2 , and the benefits of the Nash-Sutcliffe coefficient of efficiency and the index of agreement.The sum of the squared and weighted residuals (Doherty, 2005) is another useful metric used for this purpose.Hartmann et al. (2013) provides an example of using multiple metrics to evaluate model performance.
A value of E cal that is much larger than E val might indicate over-fitting, in which case a simpler model (i.e., fewer fitting parameters) should be tested.For example, if a doublegamma IRF is used, then a second model calibration with a single-gamma IRF could be tested to determine if greater similarity in the E val and E cal values is achieved.For nonparametric IRFs, a reduction in the number of IRF control points could be tested.A time-variant IRF requires more parameters than a time-invariant IRF, and this also can be The number of gamma functions or fitting parameters might correspond to different conceptual models of the system, and model complexity issues can be investigated by testing multiple conceptual models (e.g., Hartmann et al., 2013).Numerous other researchers have investigated issues related to model complexity and its effect on model-prediction uncertainty (e.g., Young et al., 1996;Jakeman and Hornberger, 1993;Arkesteijn and Pande, 2013).Prediction uncertainty crucially depends on model complexity (Arkesteijn and Pande, 2013).Although Vapnik-Chervonenkis generalization theory suggests that models with higher complexity tend to have higher prediction uncertainty, model complexity is not necessarily proportional to prediction uncertainty (Fienen et al., 2010).Doherty et al. (2010) described a method for predictive uncertainty and sensitivity that tests the range of each parameter's potential values on the basis of expert knowledge and propagates this uncertainty to model predictions.Estimating this potential range of IRF parameter values, however, might be more difficult than, for example, estimating hydraulic conductivity or streambed roughness in a distributed model.Although a rigorous assessment of prediction uncertainty is beyond the scope of this article, effective tools are available for this purpose (Doherty, 2005;Fienen et al., 2010).

Example model applications
The model was applied to three hydrologic systems in the United States with responses of streamflow, spring flow, and groundwater level.Several examples with different RRAWFLOW options and different levels of parameterization are described, including examples of model over-fitting.The hydrologic systems were selected to provide a wide range of examples that required different levels of model complexity.The first hydrologic system is streamflow from a watershed, for which a simple model was appropriate.Karst settings were selected for the second and third hydrologic systems to provide examples in which more complex models are needed.For precipitation and air-temperature inputs, gridded data (e.g., Daymet: http://daymet.ornl.gov/)can be used, or a single weather station can be assumed to represent the recharge area.All examples used a daily time step.
Model spin-up is the initial simulation period in which antecedent effects of the system are not fully incorporated into the simulation, which can result in large errors.When the simulation is past the number of time steps equal to the system memory, then the system antecedent effects are fully incorporated into the model.Therefore, the model input record must start n time steps prior to the calibration period, where n is the system memory, as a number of time steps.Because the system memory is not known until the IRF is estimated, it is useful to start the simulation at the earliest date for which input data are available.Estimated system-input values can be used if observation data are not available for the spin-up period, and a constant value equal to the long-term mean can be used if a better estimate is not available; in this case, the antecedent effects will be smoothed.Another option is to select a period from the input data record and use this as input for the spin-up period.
The parameter optimization software PEST (Doherty, 2005) was used for parameter estimation in these examples.RRAWFLOW is a stand-alone model independent of PEST and, therefore, can be used with any optimization method, including trial and error.For optimization of nonparametric IRFs, the last control point was used to set the system memory by assigning a fixed (non-optimized) value of zero to that control point (Fig. 3).Posterior to this point, a series of control points fixed at zero was specified, resulting in a spline fit with a constant value of zero.

Streamflow in Boxelder Creek
Boxelder Creek is located in the Black Hills of South Dakota, USA, with a watershed area of 250 km 2 upstream from US Geological Survey streamgage 06422500, with daily streamflow available from http://waterdata.usgs.gov/nwis.The watershed primarily is pine forest and contains metamorphic rocks of Precambrian age (Carter et al., 2001).Gridded daily precipitation and air-temperature data from the Daymet data set are available at 1 km grid spacing for 1980-2013.These data were obtained from the Geo Data Portal (http: //cida.usgs.gov/gdp/)and spatially averaged for the watershed to produce a daily time series of precipitation and air temperature for 1980-2013, which was used as model input.The calibration period was 1980-1996, and conditional validation was applied to 1997-2013 (Fig. 4) (Fig. 4).A 5-year model spin-up period was applied by inserting data for 1980-1984 into the period 1975-1979.This estimated spin-up period affected the calibration period minimally, because the system memory was only about 3 months long.
The example models described used system-input option 1 (precipitation recharge, Table 1).Five example models are presented for Boxelder Creek, all of which used singlegamma IRFs (Table 2).All other trials with double-gamma IRFs resulted in minimization of one of the IRFs, indicating that single-gamma IRFs were appropriate for this system.All Boxelder Creek gamma functions optimized to η < 1, the reverse-J shape (Fig. 5).Example BC1 used TV option 1 (time-invariant IRF, Table 1), resulting in E cal and E val values of 0.62 and 0.46, respectively (Table 2).Examples BC2 and BC3 used TV option 2 (time-variant IRF) with wet-dry options 1 and 2, respectively.For BC3, E val was higher (0.56) than for BC1 and BC2 (Table 2).Example BC4 and BC5  used TV option 3, in which a time-variant IRF that changes continually was used with a single-gamma IRF and MA windows of 1 and 10 years (Eq.6), respectively.Of the five examples, BC3 had the highest E val value; BC5 had the second highest E val value but with fewer parameters (Table 2).
Comparison of model fit for examples BC4 and BC5 indicates that the time-variant aspects of this system respond to general climatic changes over decadal periods more so than annual.Table 3 shows optimized IRF parameters for selected example models.

Spring flow from Barton Springs
Barton Springs is a group of springs that flow from the Edwards aquifer, a carbonate aquifer in south-central Texas that is contained mostly within the Edwards Group (Lower Cretaceous geologic age).Model input data consisting of daily precipitation and air-temperature and system-response observation data used for model evaluation are described in Long and Mahler (2013) along with details describing the hydrogeology, physiography, and climate.These example models used system-input option 1 (precipitation recharge, Table 1).Seven example models are presented for Barton Springs (Table 2).Model fit varied more widely than for the Boxelder Creek models, possibly as a result of karst features in the Edwards aquifer that result in complex groundwater flow.For this reason, IRFs with added complexity were tested for the Barton Springs examples, and examples of over-fitting are demonstrated.Generally, E cal is proportional to the number of optimized parameters for each example; however, high E cal values often resulted in low values of E val , which might indicate over-fitting (Table 2).
Examples F1-F9 used wet-switch option 2. Examples F1, F5, and F7 used gamma functions with an increasing number of parameters for the three examples, which were characterized as low E val (F1), good E val (F5), and an over-fit model (F7) that is indicated by a high E cal and low E val (Table 2).Example F5, with a moderate number of parameters, is considered the best choice of the three.Examples F8 and F9 used 7 and 16 optimized control points, respectively, in total (Figs.6 and 7); F8 was considered a good choice, and F9 was over-fit with too many control points (Table 2).Of the time-variant examples F5-F9, the two examples with the smallest number of optimized parameters (F5 and F8) had the largest E val values (Table 2).
Examples F2 and F3 used TV option 3, with a singlegamma IRF and MA windows of 1 and 10 years (Eq.6), respectively, but resulted in low E val values.Similarly to Boxelder Creek, increasing the MA window from 1 to 10 years improved E cal and E val values, which indicates that the timevariant aspects of this system respond to general climatic changes over decadal periods more so than annual.
PEST was used to calculate 95 % confidence intervals for the optimized parameters, as described in Doherty (2005), which are shown graphically for example F8 (Fig. 6).Example F9, with a total of 16 control points, had 32 % wider parameter confidence intervals than example F8, with 7 control points, did.Confidence intervals generally widen with an increasing number of parameters because of a decrease in individual parameter sensitivity.

Groundwater level in well LA88C
Well LA88C, located in western South Dakota, is open to the karstic Madison aquifer that is composed of limestone and dolostone of Mississippian geologic age.Model input data consisting of daily precipitation and air-temperature and system-response observation data used for model evaluation are described in Long and Mahler (2013) along with details describing the hydrogeology, physiography, and climate.The example models described used system-input option 1 (precipitation recharge, Table 1).

Wet period
Dry period Examples W1-W3 used TV option 2 with wet-switch option 2 for time variance (Table 1).All three examples for well LA-88C optimized to double-peaked IRFs, which were necessary for this system, as indicated by additional tests of single-peaked IRFs, probably as result of karst features .In example W2, the last two control points for the wet period and the last control point for the dry period were optimized to zero, resulting in only 15 non-zero control points.Of the three examples for well LA88C, W3 had the fewest optimized IRF parameters and resulted in the largest E val value (Table 2), indicating that this is a good choice.IRFs for example W3 approach zero abruptly, resulting in negative values in the spline curve (Fig. 10); in these cases, RRAWFLOW sets all negative IRF ordinates to zero.

Discussion and conclusions
Although RRAWFLOW can be applied to any type of watershed or aquifer, karst aquifers might require more complex models.A non-karst system was compared with two karst systems, which indicated that the best model choices for the karst systems generally had a larger number of parameters than the best choices for the non-karst system (Table 2).Also, differences between wet-and dry-period IRFs were more pronounced for the karst systems than for , possibly as a result of heterogeneity.Example F5 is a karst example with a reverse-J (η < 1) dry-period IRF and a delayed-peak (η > 1) wet-period IRF (Table 3), indicating a distinct difference between wet and dry periods.In karst aquifers, fluctuating groundwater levels might saturate or desaturate different conduit networks that result in different hydrologic responses between wet and dry periods.
Watersheds simulated by Long and Mahler (2013) and Long (2009), which included karst and non-karst systems, had IRF shapes that were similar in some cases to the reverse-J shape for Boxelder Creek (Fig. 5), except that doubleexponential IRFs were used to achieve this shape.A single reverse-J gamma function (Fig. 5) requires only three parameters, whereas the double-exponential IRF requires four parameters.IRFs for karst and non-karst watersheds commonly have quick-flow and slow-flow components (Jakeman and Hornberger, 1993;Long, 2009;Long and Mahler, 2013).The reverse-J IRF (Fig. 5) also exhibits quick-flow and slowflow components in the form of a high peak and long tail, respectively, but fewer parameters for the gamma function is an advantage over the exponential function.
Examples F5 (gamma function) and F8 (control points) are the two preferred models for Barton Springs and are nearly identical in terms of E val and the number of optimized parameters (Table 2).Choosing between these two models, therefore, might be a matter of modeler preference.Use of the gamma function has the advantage of being a common function.The control-points method has the advantage of not being constrained to a parametric function, and confidence intervals for the IRF can be easily shown in a graph (Fig. 6).Showing confidence intervals for a gamma function also could be done but with additional steps involved, in which the gamma function would be calculated for all combinations within the 95 % parameter confidence intervals (i.e., Monte Carlo analysis).Then this family of curves would be plotted, and the maximum upper and lower curve extents would show the confidence intervals for the IRF.A disadvantage of the control-points method is the need to select the temporal locations of control points and also to set the system memory a priori by setting a zero-value control point at the end of the IRF.These settings generally require trial and error.
RRAWFLOW is useful for estimation of missing periods of a hydrologic record and is suitable for hydrographseparation methods to estimate stream base flow, as described by Jakeman and Hornberger (1993) and Long (2009).For the simulated hydrograph, RRAWFLOW can be used to compute the base-flow component by executing the model without the quick-flow IRF.If using a single reverse-J gamma IRF instead of a double-exponential IRF that can be easily separated, the reverse-J function would need to be separated into its quick-flow (peak) and slow-flow (tail) components.To estimate the base-flow component of the observed hydrograph, a graphical separation program can be used, such as PART (Rutledge, 1998); however, because the different options and settings in PART (or similar programs) result in different base-flow estimates, the RRAWFLOW estimated base flow is helpful as a guide to using PART (Long, 2009).For example, the PART settings can be adjusted so that the observed hydrograph separation has similar characteristics to those of the simulated hydrograph separation.The RRAWFLOW-simulated hydrograph separation also could be used as a benchmark model for comparison to more elaborate methods, such as geochemical hydrograph separation (e.g., and Hartmann, 2014).
Comparison of the modified Nash-Sutcliffe coefficient of efficiency for the calibration and conditional validation periods (E cal and E val ) is useful for assessing over-fitting.Conceptual-model options that maximize E val can be evaluated by multiple tests.Too many fitting parameters, as well as too few, can result in low values of E val .The ratio E val /E cal might be a useful metric for comparison of different models and possibly in setting the lengths of the calibration and validation periods.As in any model, this all should be considered in reference to a physical understanding of the system; e.g., two distinct permeability domains might be best simulated by two gamma functions.
The record length of the observed response should be considered in light of the system memory: there is less confidence in the predictive strength of a model if the observed response is shorter than the system memory than when it is longer, because, in the former case, the effects of the IRF tail are not fully tested against observation.Ideally, the validation period alone should be longer than the system memory, and when it is several times longer, then the full range of the IRF is tested several times over.Konikow and Bredehoeft (1992) discuss the numerous uses of the term "validation," which has resulted in confusion, and also highlighted philosophical considerations associated with this term.They argue that conditional validation, as described herein (split-sample test in Konikow and Bredehoeft, 1992), is not useful for distributed groundwater models because of their limited predictive accuracy that results from non-unique solutions in calibration of complex models.Further, they argue that the split-sample test period must be independent of any antecedent effects from the calibration period, which they say rarely can be achieved for a large-scale aquifer system.These arguments highlight an advantage of lumped models, because (1) a small number of parameters minimizes the problem of non-unique solutions, (2) selecting one model from multiple models via a validation process also reduces the problem of non-unique solutions, and (3) a lumped model provides an estimate of the system memory, which indicates the time span for antecedent effects following a calibration period.With regard to this third point, more than one half of the groundwater sites simulated by Long and Mahler (2013) had conditional validation periods that extended beyond the antecedent effects of the calibration period; therefore, E val could be calculated for this restricted period only if desired.
Additional functionality can be added to RRAWFLOW by the user and could possibly be included in future versions.For example, additional methods to estimate parametric or nonparametric IRFs (e.g., the dispersion-model IRF) or the degree-day method for estimating snowmelt (Rango and Martinec, 1995) could be added.If there were a need to include precipitation recharge and sinking-stream recharge simultaneously in one system, this could easily be added.An adjustment to the calculation of the soil-moisture index s could be included to account for watershed changes such as tree coverage.The shape and scale of the solute-transport IRF could be weighted by a variable recharge rate.Revisions, additions, and corrections to the RRAWFLOW code can be sent to the author of this article for potential incorporation into subsequent official versions.The code is not yet available in the comprehensive R archive network (CRAN) but could be included in the future.Optimization packages also are available in CRAN and could be built seamlessly into RRAWFLOW.

Appendix A: Precipitation recharge
To simulate recharge from direct precipitation, a soilmoisture index s (unitless) is estimated for each time step in RRAWFLOW.Quantitatively, s is the fraction of precipitation that infiltrates and becomes recharge.To account for the antecedent effects of rainfall on soil moisture, the past rainfall record is weighted by a backward-in-time exponential decay function (Jakeman and Hornberger, 1993): where c (L −1 ) is a scaling coefficient to constrain the value of s, κ (unitless) adjusts the effect of antecedent rainfall and is related to evapotranspiration, r is total rainfall (L), and i is the time step.In RRAWFLOW, this method is option 1 for system input (Table 1).For watershed modeling, the value of c can be set to satisfy the assumption that the total recharge volume within a watershed is equal to the total outflow volume for the calibration period.This assumption neglects the net change in total watershed storage during this period, which is assumed to be small in comparison to the total inflow or outflow for the same period.Also, this assumption does not apply if recharge to the watershed exits the watershed through deep groundwater and bypasses the stream outlet.Recent rainfall has the largest effect on s in Eq. (A1), whereas earlier rainfall has the least effect.The effect of changing air temperatures on evapotranspiration is accounted for by (Jakeman and Hornberger, 1993): where α (unitless) is a scaling coefficient, T ( • C) is mean air temperature at the land surface, and f is a temperature modulation factor ( • C −1 ).As air temperature T decreases, s in Eq. (A1) increases with sufficient past rainfall.RRAWFLOW can be executed without air-temperature data when unavailable (air-temperature option 2 in Table 1).Recharge for each time step u i (L) is calculated as the fraction s of precipitation by Typically, s is largest during wet periods and rarely reaches a value of 1.0 (Fig. A1).

A1 Considerations for parameters c and κ
An additional function of parameters c and κ is to adjust for differences in the runoff effects between watershed and groundwater modeling.Also, for groundwater applications, c in Eq. (A1) cannot be determined empirically if the recharge area that affects a spring or well is not precisely defined.Therefore, for groundwater applications, c can be set to a value that results in a predefined maximum s value or estimated mean recharge rate, or c can be optimized through model calibration.In practice, the error in the estimation of c is compensated by an adjustment in the IRF area during model calibration; e.g., an overestimation of c by 10 % would result in a 10 % underestimation in IRF area.
Depending on the values of c and κ, the value of s can incorrectly have values < 0 or > 1; when this occurs, RRAWFLOW sets s to 0 or 1, respectively.This is most likely to occur early in the calibration process when parameter values might be far from optimum, and forcing the constraint 0 ≤ s ≤ 1 assists in the efficiency of the calibration process.To ensure that the range of s is appropriate for the model area, this parameter should always be plotted after model calibration; i.e., s should be a physically plausible function that fluctuates in response to local precipitation and air temperature (Fig. A1).For example, in humid climates with high annual precipitation, s might frequently have a value > 0.9, which is less likely in dry climates.

A2 Snow precipitation
For cold climates where winter snowfall is common, a method proposed by Long and Maher (2013) is applied.To determine the form of precipitation for each time step, an airtemperature threshold value T s is set, below which precipitation is assumed occur as snow (typically T s = 0 • C).To determine time steps when melting occurs, a melting threshold value T m is set.If daily snow-depth data are available, T m can be determined empirically as the mean air temperature for time steps when snow depth decreases to zero from a previous time step with a snow depth greater than zero.Long and Mahler (2013) determined that T m = 9 • C for a study area in central North America.Sublimation is accounted for by a sublimation fraction S f .Snow precipitation is summed for each series of snow-precipitation time steps occurring prior to each snowmelt time step by where p m is the accumulated snow precipitation that is assumed to melt when T i > T m , S f is the sublimation fraction (unitless), p i is the snow precipitation in height of water, and N is the number of snow-precipitation time steps occurring between melt time steps.Prior to calculating Eq. (A1), p m is added to the rainfall record for the time step following a snowmelt time step because snowmelt is assumed to have the same effect as rainfall on the value of s. available at http://sd.water.usgs.gov/projects/RRAWFLOW/RRAWFLOW.html.The model included is example BC3 for Boxelder Creek (Table 2).It is not necessary to know the R language to execute the model, but R must be installed on the user's computer.The example is set up to run on the Microsoft ® Windows operating system but could be slightly modified to run on a Linux operating system.RRAWFLOW is in the public domain, and no license is needed.
Although the example model is set up for parameter optimization using the PEST software program (Doherty, 2005), RRAWFLOW can be used with any optimization routine, including trial and error.All RRAWFLOW input and output files are included along with PEST input, output, and executable files.The file 00_Quick_Start_Guide.pdf in the download package contains instructions for executing RRAWFLOW in the R environment and basic instructions for PEST execution for this example.The RRAWFLOW manual has detailed input and output instructions.The download package can be used as a template for a new modeling project by editing the input files accordingly.The R language program and PEST can be downloaded at no cost from http://www.r-project.org/index.html and http://www.pesthomepage.org/,respectively.

Figure 1 .
Figure1.Superposition of sequential impulse-response functions (IRFs).Each IRF is in response to an impulse of the input function u j and is scaled by the magnitude of that impulse.

Figure 3 .
Figure 3. Impulse-response function defined by a spline curve fit through control points.The last control point with a value of zero is not adjusted during model calibration.

Figure 6 .
Figure 6.Example F8: nonparametric impulse-response functions (IRFs) for Barton Springs using a total of seven control points showing (a) optimized IRFs and (b) upper and lower 95 % confidence limits for the IRFs.

Figure 7 .
Figure 7. Example F9: nonparametric impulse-response functions for Barton Springs using a total of 16 control points.

Figure 10 .
Figure 10.Example W3: nonparametric impulse-response functions for well LA88C using a total of 10 control points.

Figure A1 .
Figure A1.The soil-moisture parameter s (Eq.A1) for example BC2.The function s typically is largest during wet periods (shaded gray).
Thoroughly considering model complexity in light of E val and E cal provides context for conditional validation.For example, Long and Mahler (2013) described decision criteria to evaluate model complexity and the number of fitting parameters on the basis of E cal and E val , with several examples of calibrated models.

Table 2 .
Summary of example models.Rows in bold type indicate the best choices (-indicates not applicable).

Table 3 .
Impulse-response function parameters for selected example models.