Shyft v4.8: A Framework for Uncertainty Assessment and Distributed Hydrologic Modelling for Operational Hydrology

This paper presents Shyft, a novel hydrologic modelling software for streamflow forecasting targeted for use in hydropower production environments and research. The software enables the rapid development and implementation in operational settings, the capability to perform distributed hydrologic modelling with multiple model and forcing configurations. Multiple models may be built up through the creation of hydrologic algorithms from a library of well known routines or through the creation of new routines, each defined for processes such as: evapotranspiration, snow accumulation and melt, and soil wa5 ter response. Key to the design of Shyft is an Application Programming Interface (api) that provides access to all components of the framework (including the individual hydrologic routines) via Python, while maintaining high computational performance as the algorithms are implemented in modern C++. The api allows for rapid exploration of different model configurations and selection of an optimal forecast model. Several different methods may be aggregated and composed, allowing direct intercomparison of models and algorithms. In order to provide an enterprise level software, strong focus is given to computational 10 efficiency, code quality, documentation and test coverage. Shyft is released Open Source under the GNU Lesser General Public License v3.0 and available at https://gitlab.com/shyft-os, facilitating effective cooperation between core developers, industry, and research institutions.


Other frameworks
To date, a large number of hydrological models exist, each differing in the input data requirements, level of details in process representation, flexibility in the computational subunit structure, and availability of code and licensing. In the following we provide a brief summary of several models that have garnered attention and a user community, but ultimately were found not optimal for the purposes of operational hydrologic forecasting at Statkraft. 95 Originally aiming for incorporation in General Circulation Models, the Variable Infiltration Capacity (VIC) model (Liang et al., 1994;Hamman et al., 2018) has been used to address topics ranging from water resources management to landatmosphere interactions and climate change. In the course of its development history of over 20 years, VIC has served as both a hydrologic model and land surface scheme. The VIC model is characterized by a grid-based representation of the model domain, statistical representation of sub-grid vegetation heterogeneity, and multiple soil layers with variable infiltration, and 100 non-linear base flow. Inclusion of topography allows for orographic precipitation and temperature lapse rates. Adaptions of VIC allow the representation of water management effects and reservoir operation (Haddeland et al., 2006b(Haddeland et al., , a, 2007. Routing effects are typically accounted for within a separate model during post-processing. Directed towards the use in cold and seasonally snow covered small to medium sized basins, the Cold Regions Hydrological Model (CRHM) is a flexible object-oriented software system. CRHM provides a framework that allows the integration of 105 physically-based parametrizations of hydrological processes. Current implementations consider cold region specific processes such as blowing snow, snow interception in forest canopies, sublimation, snowmelt, infiltration into frozen soils, and hillslope water movement over permafrost (Pomeroy et al., 2007). CRHM supports both spatially-distributed and aggregated model approaches. Due to the object oriented structure, CRHM is used as both a research and predictive tool that allows rapid incorporation of new process algorithms. New and already existing implementations can be linked together to form a complete somewhat more restricted and limited from other tools addressing the multiple model working hypothesis. As discussed in Section 1.1, several such software solutions exist, however, for different reasons these were found not suitable for deployment.
In some cases the software is simply not readily available or suitably licensed. In others, documentation and test coverage was not sufficient. Most prior implementations of the multiple working hypothesis have a focus on the exploration of model uncertainty or provide more complexity than required, therefore adding data requirements. While Shyft provides some mechanisms 130 for such investigation, we have further extended the paradigm to enable efficient evaluation of multiple forcing datasets, in addition to model configurations, as this is found to drive a significant component of the variability.
Notable complications arise in continuously operating environments. Current IT practices in the industry impose severe constraints upon any changes in the production systems, in order to ensure required availability and integrity. This challenges introduction of new modelling approaches, as service level and security are forcedly prioritised above innovation. To keep the 135 pace of research, the operational requirements are embedded into automated testing of Shyft. Comprehensive unit test coverage provides proof for all levels of the implementation, whilst system and integration tests give objective means to validate the expected service behavior as a whole, including validation of known security considerations. Continuous integration aligned with agile (iterative) development cycle minimize human effort for the appropriate quality level. Thus, adoption of the modern practices balances tough IT demands with motivation for the rapid progress. Furthermore, C++ was chosen as a programming 140 language for the core functionality. In spite of a steeper learning curve, templated code provides long term advantages for reflecting the target architecture in a sustainable way and the detailed documentation gives a comprehensive explanation of the possible entry-points for the new routines.
One of the key objectives was to create a well defined api, allowing for an interactive configuration and development from the command line. In order to provide the flexibility needed to address the variety of problems met in operational hydrologic

Hot service
Perhaps the most ambitious principle is to develop a tool that may be implemented as a hot service. The concept is that rather than model results being saved to a database for later analysis and visualization, a practitioner may request simulation results 185 for a certain region at a given time by running the model on the fly without writing results to file. Furthermore, perhaps one would like to explore slight adjustments to some model parameters, requiring recomputation, in real time. This vision will only be actualized through the development of extremely fast and computationally efficient algorithms.
The adherence to a set of design principles creates a software framework that is consistently developed and easily integrated into environments requiring tested, well commented/documented, and secure code.

Architecture/Structure
Shyft is distributed in three separate code repositories and a 'docker' repository as described in Section 6.
Shyft utilizes two different codebases (see overview given in Figure 1). Basic data structures, hydrologic algorithms, and models are defined in Shyft's core, which is written in C++ in order to provide high computational efficiency. In addition, an api exposes the data types defined in the core to Python. Model instantiation and configuration can therefore be utilized from 195 pure Python code. In addition, Shyft provides functionalities that facilitate configuration and realization of hydrologic forecasts in operational environments. These functionalities are provided in Shyft's orchestration and are part of the Python codebase.
As one of Shyft's design principles is that data should live at the source rather than Shyft requiring a certain input data format, data repositories written in Python provide access to data sources. In order to provide robust software, automatic unit tests cover large parts of both codebases. In the following Section, details to each of the architectural constructs are given.

Core
The C++-core contains several separate code folders: core -for handling framework related functionality, like serialization and multithreading, timeseries -aimed for operating with generic timeseries and hydrology -all the hydrologic algorithms, including structures and methods to manipulate with spatial information. 1 The design and implementation of models aims for multi-core operations, to ensure utilization of all computations resources available. At the same time, design considerations class-hierarchies and inheritance. The goal toward faster algorithms is achieved via optimizing the composition and enabling multi-threading and the ability to scale out to multiple nodes.

Shyft api
The Shyft api exposes all relevant Shyft core implementations that are required to configure and utilize models to Python. The 215 api is therefore the central part of the Shyft architecture that a Shyft user is encouraged to focus on. An overview of fundamental Shyft api types and how they can be used to initialize and apply a model is shown in Figure 2.
A user aiming to simulate hydrological models can do this by writing pure Python code without ever being exposed to the C++ codebase. Using Python, a user can configure and run a model, and access data at various levels such as model input variables, model parameters, and model state and output variables. It is of central importance to mention that as long as a model 220 instance is initiated, all of this data is kept in the Random Access Memory of the computer, which allows a user to communicate with a Shyft model and its underlying data structures using an interactive Python command shell such as the Interactive Python (IPython) shell ( Figure 3). In this manner, a user could for instance interactively configure a Shyft model, feed forcing data to it, run the model, and extract and plot result variables. Afterwards, as the model object is still instantiated in the interactive shell, a user could change the model configuration, e.g. by updating certain model parameters, re-run the model, and extract the Characterized by a set of ordered non-overlapping periods.

Time Series (TimeSeries)
Provides a value for all the time intervals defined by the time axis and the point interpretation.

Point Interpretation (point_interpretation_ policy)
Defines how a value is to be interpreted with respect to a time interval. The function that it is desired to maximize or minimize during calibration.

Area (float)
Area of the cell.

Initialize cell environment (Model. initialize_cell_env(…))
Initializes the cell environment of each cell.

Interpolate (Model.interpolate(…))
Interpolates forcing variables in time and space from the region environment to the cell environment of each cell.

Simulate (Model.run_cells(…))
Executes model cell by cell based on model forcing in cell environments.

Calibrate (Optimizer.optimize(…))
Runs simulations while varying model parameters. Aims to minimize the objective function.  level of flexibility at the user level, that facilitates the realization of a large number of different operational setups. Furthermore, using Python offers a Shyft user the access to a programming language with intuitive and easy to learn syntax, wide support through a large and growing user community, over 300 standard library modules that contain modules and classes for a wide variety of programming tasks, and cross-platform availability.

230
All Shyft classes and methods available through the api follow the documentation standards introduced in the Guide to NumPy/SciPy Documentation 2 . Here we will try to give an overview of the types typically used in advanced simulations via api. (The comprehensive set of examples is available at https://gitlab.com/shyft-os/shyft-doc/tree/master/notebooks/api).
shyft.time_series provides mathematical and statistical operations and functionality for time series. A time-series can be an expression, or a concrete point time-series. All time-series do have a time-axis (TimeAxis -a set of ordered non-overlapping Shyft accesses data required to run simulations through repositories (Fowler, 2002). The use of respositories is driven by 270 the aforementioned design principle to have a "direct connection to the data store". Each type of repository has a specific responsibility, a well defined interface, and may have a multitude of implementations of these interfaces. The data accessed by repositories usually origin from a relational database or file-formats that are well known. In practice, data is never accessed in any other way than through these interfaces, and the intention is that data is never converted into a particular format for Shyft.
In order to keep code in the Shyft orchestration at a minimum, repositories are obliged to return Shyft api types. Shyft provides 275 interfaces for the following repositories:

Region-model repository
The responsibility is to provide a configured region-model, hiding away any implementation specific details regarding how the model configuration and data is stored (e.g., in a netcdf database, a GIS-system, etc.).
The responsibility is to provide all meteorology and hydrology relevant types of geo-located time-series needed to run or calibrate the region-model (e.g., data from station observations, weather forecasts, climate models, etc.).

Interpolation parameter repository
The responsibility is to provide parameters for the interpolation method used in the simulation.

State repository 285
The responsibility is to provide model states for the region-model and store model states for later use.
Shyft provides implementations of the region-model repository interface and the geo-located time series repository interface for several datasets available in netcdf formats. These are mostly used for documentation and testing and can likewise be utilized by a Shyft user. Users aiming for an operational implementation of Shyft are encouraged to write their own repositories following the provided interfaces/examples rather than converting data to the expectations of the provided netcdf repositories. Firstly, to offer an easy entry point for modellers seeking to use Shyft. By using the orchestration, users require only a minimum of Python scripting experience in order to configure and run simulations. However, the Shyft orchestration gives only limited functionality and users might find it limiting their ambitions. For this reason, Shyft users are strongly encouraged to learn how to effectively use Shyft api functionality in order to be able to enjoy the full spectrum of opportunities that the 300 Shyft framework offers for hydrologic modelling.
Secondly, and importantly, it is through the orchestration that full functionality can be utilized in operational environments.
However, as different operational environments have different objectives, it is likely that an operator of an operational service wants to extend the current functionalities of the orchestration or design a completely new one from scratch suitable to the needs the operator defines. The orchestration provided in Shyft then rather serves as an introductory example.

4 Conceptual Model
The design principles of Shyft led to the development of a framework that attempts to strictly separate the model domain (region) from the model forcing data (region environment) and the model algorithms in order to provide a high degree of flexibility in the choice of each of these three elements. In this Section it is described how a model domain is constructed in object that is central to Shyft, the so called region-model. For corresponding Shyft api types see Figure 2.

Region: the model domain
In Shyft, a model domain is defined by a collection of geo-located sub-units called cells. Each cell has certain properties such as land type fractions, area, geographic location, and a unique identifier specifying to which catchment the cell belongs (the catchment id). Cells with the same catchment id are assigned to the same catchment and each catchment is defined by a set of 315 catchment ids (see Figure 4). The Shyft model domain is composed of a user defined number of cells and catchments, and is called a region. A Shyft region thus specifies the geographical properties required in a hydrologic simulation.
For computations, the cells are vectorized rather than represented on a grid, as is typical for spatially distributed models.
This aspect of Shyft provides significant flexibility and efficiency in computation.

Region environment 320
Model forcing data is organized in a construct called region environment. The region environment provides containers for each variable type required as input to a model. Meteorological forcing variables currently supported are temperature, precipitation, radiation, relative humidity and wind speed. Each variable container can be fed a collection of geo-located time series, referred to as sources, each providing the timeseries data for the variables coupled with methods that provide information about the geographical location for which the data is valid. The collections of sources in the region environment can originate from e.g.

325
station observations, gridded observations, gridded numerical weather forecasts, or climate simulations (see Figure 4). The time series of these sources are usually presented in the original time resolution as available in the database from which they originate. That is, the region environment typically provides meteorological raw data, with no assumption on spatial properties of the model cells or the model time step used for simulation.

330
The model approach used to simulate hydrological processes is defined by the user and independent from the choice of the region and region environment configurations. In Shyft, a model defines a sequence of algorithms, each of which describing a method to represent certain processes of the hydrological cycle. Such processes might be evapotranspiration, snow accumulation and melt processes, or soil response. The respective algorithms are compiled into model stacks, where different model stacks differ in at least one method. Currently, Shyft provides four different model stacks described in more detail in Section 5.2.  • Running the model forward in time. Once the interpolation step is performed, the region-model is provided with all data 345 required to predict the temporal evolution of hydrologic variables. This step is done through cell-by-cell execution of the model stack. This step is computationally highly efficient due to enabled multithreading that allows parallel execution on a multiprocessing system, utilizing all Central Processing Units (CPU) unless otherwise specified.
• Providing access to all data related to region and model. All data that is required as input to the model and generated during a model run is stored in memory and can be accessed through the region-model. This applies to model forcing data 350 at source and cell level, model parameters at region and catchment level, static cell data, and time series of model state result variables. The latter two are not necessarily stored by default in order to achieve high computational efficiency, but collection of those can be enabled prior to a model run.

Targets
Shyft provides functionality to estimate model parameters by providing implementation of several optimization algorithms and goal functions. Shyft utilizes optimization algorithms from dlib: www.dlib.net/optimization.html#find_min_bobyqa -Bound Optimization BY Quadratic Approximation (BOBYQA), which is a derivative-free optimization algorithm, explained in (Powell, 2009) and global function search algorithm http://dlib.net/optimization.html#global_function_search, which performs 360 global optimization of a function, subject to bounds constrains.
In order to optimize model parameters, model results are evaluated against one or several target specifications. Most commonly, simulated discharge is evaluated with observed discharge, however, Shyft supports further variables such as mean catchment SWE or snow-covered area (SCA) to estimate model parameters. An arbitrary number of target time series can be evaluated during a calibration run, each representing a different part of the region and/or time interval and step. The overall 365 evaluation metric is calculated from a weighted average of the metric of each target specification. To evaluate performance user can specify Nash-Sutcliffe, Kling-Gupta, Absolute Difference or Root Mean Square Error (RMSE) functions. The user can specify which model parameters to optimize, giving a search range for each of the parameters. In order to provide maximum speed, the optimized models are used during calibration, so that the CPU and memory footprints are minimal.

370
Modelling the hydrology of a region with Shyft is typically done by first interpolating the model forcing data from the source locations (e.g. atmospheric model grid points or weather stations) to the Shyft cell location and then running a model stack cell-by-cell. This Section gives an overview over the methods implemented for interpolation and hydrologic modelling.

Interpolation
In order to interpolate model forcing data from the source locations to the cell locations, Shyft provides two different interpo-375 lation algorithms: interpolation via inverse distance weighting and Bayesian Kriging. However, it is important to mention that Shyft users are not forced to use the internally provided interpolation methods. Instead, the provided interpolation step can be skipped and input data can be fed directly to cells, leaving it up to the Shyft user how to interpolate/downscale model input data from source locations to the cell domain.
Inverse Distance Weighting (IDW) (Shepard, 1968) is the primary method used to distribute model forcing timeseries to the cells. The implementation of IDW allows a high degree of flexibility in the approach of a choice of models for different variables.

Bayesian temperature kriging
As described in section 5.1.1 we provide functionality to use a height-gradient based approach to reduce the systematic error 385 when estimating the local air temperature based on regional observations. The gradient value may either be calculated from the data or set manually by the user.
In many cases, this simplistic approach is suitable for the purposes of optimizing the calibration. However, if one is interested in greater physical constraints on the simulation, we recognize the gradient is often more complicated and varies both seasonally and with local weather. There may be situations in which insufficient observations are available to properly calculate the 390 temperature gradient, or potentially the local forcing at the observation stations are actually representative of entirely different processes than the one for which the temperature is being estimated. An alternative approach has therefore been implemented in Shyft, that enables applying a method that would buffer the most severe local effects in such cases.
The application of Bayes' Theorem is suitable for such weighting of measurement data against prior information. Shyft provides a method that estimates a regional height gradient and sea level temperature for the entire region, which together with 395 elevation data subsequently model a surface temperature.

Model stacks
In Shyft, a hydrologic model is a sequence of hydrologic methods and called model stack. Each method of the model stack describes a certain hydrologic process and the model stack typically provides a complete rainfall-runoff model. In the current state, the model stacks provided in Shyft differ mostly in the representation of snow accumulation and melt processes due to 400 the predominant importance of snow in the hydropower production environments of Nordic countries, where the model has been operationalized first. These model stacks provide sufficient performance in the catchments for which the model has been evaluated, however, it is expected that for some environments with different climatic conditions more advanced hydrologic routines will be required and therefore new model stacks are in active development. Furthermore, applying Shyft in renewable energy production environments other than hydropower (e.g. wind power) is realizable but will not be discussed herein.

• GS (Gamma-Snow)
Energy balance based snow routine that uses a gamma function to represent sub-cell snow distribution (Kolberg et al., 2006).

415
• K (Kirchner) Hydrologic response routine based on Kirchner (2009). In the PTGSK model stack, the model first uses Priestley-Taylor to calculate the potential evapotranspiration based on temperature, radiation, and relative humidity data (see table 1 for an overview of model input data). The calculated potential evaporation is then used to estimate the actual evapotranspiration using a simple scaling approach. The Gamma-Snow routine 420 is used to calculate snow accumulation and melt adjusted runoff using time series data for precipitation and wind speed in addition to the input data used in the Priestley-Taylor method. Glacier melt is accounted for using a simple temperature index approach (Hock, 2003). Based on the snow and ice adjusted available liquid water, Kirchner's approach is used to calculate the catchment response. The PTGSK model stack is the only model in Shyft which provides an energy-balance approach to the calculation of snow accumulation and melt processes. Temperature index model based snow routine with focus on snow distribution according to Skaugen and Randen (2013) and Skaugen and Weltzien (2016).
As with the PTGSK model stack, all calculations are identical with the exception that the snow accumulation and melt 430 processes are calculated using the Skaugen Snow routine. The implementation strictly separates potential melt calculations from snow distribution calculations, making it an easy task to replace the simple temperature index model currently in use with an advanced (energy balance based) algorithm.

PTHSK
• HS (HBV Snow) 435 Temperature index model for snow accumulation and melt processes based on the snow routine of the HBV (Hydrologiska Byråns Vattenbalansavdeling) model (Lindström et al., 1997).
As with the PTGSK model stack, all calculations are identical with the exception that the snow accumulation and melt processes are calculated using the snow routine from the HBV model.

440
The HBV model stack very closely resembles the original description of Bergström (1976). An exception is that we calculate the potential evapotranspiration using the Priestley-Taylor routine rather than temperature adjusted monthly mean potential evapotranspiration. In the HBV model stack, the original routines are all combined into a complete model. As with the other routines, we also include the calculation of glacial melt and allow for routing using the methods described in Section 5.7.

445
Routing in Shyft is established through two phases: a) cell-to-river routing, and b) river network routing. In cell-to-river routing water is routed to the closest river object providing lateral inflow to the river. While individual cells have the possibility to have a specific routing velocity and distance, unit hydrograph (UHG) shape parameters are catchment specific. River network routing provides for routing from one river object to the next downstream river object along with lateral inflow from the cells as defined in the first phase. The sum of the upstream river discharge and lateral inflow is then passed to the next downstream river object.

450
A UHG parameter structure provides for UHG shape parameters and a discretized time-length according to the model time-step resolution. Currently, a gamma function is used for defining the shape of the UHG. The approach of Skaugen and Onof (2014) to sum together all cell-responses at a river routing point and define a UHG based on a distance distribution profile to that routing point is commonly used. Together with convolution, the UHG will determine the response from the cells to the routing point.

6 Availability and documentation
The source code of Shyft is published under version 3 of the GNU Lesser General Public License. All code is available via git repositories located at: https://gitlab.com/shyft-os. Therein, three separate repositories are used by the developers for the management of the code. The main code repository is simply called 'shyft'. In addition to the source code of Shyft, data needed to run the full test suite is distributed in the 'shyft-data' repository, while a collection of Jupyter Notebooks providing example 460 Python code for a number of use cases of the Shyft api is provided within the 'shyft-doc' repository -in addition to the full code for the Read the Docs (https://shyft.readthedocs.io) website. At this site we provide end-user documentation on: • installation on both Linux and Windows operating systems • how to use the Shyft api to construct a hydrological model, feed it with data, and run hydrologic simulations • use of Shyft repositories to access model input data and parameters 465 • use of the Shyft orchestration to configure and run simulations We also maintain 'dockers' repository https://gitlab.com/shyft-os/dockers, where docker build recipes for the complete Shyft eco-system reside, including 'build dockers', 'python test dockers', 'development dockers' and 'application dockers'.
An aspect of Shyft that is unique to most research codebases is the extensive test suite that covers both the Python and the C++ codebase. The test suite is comprehensive, and in addition to unit-tests covering C++ parts and Python parts, it also covers 470 integration tests assuring valid bindings to external dependencies such as netcdf and geo-services. This is a particularly helpful resource for those who are somewhat more advanced in their knowledge of coding.

Recent Applications
Shyft was originally developed in order to explore epistemic uncertainty associated with model formulation and input selection (Beven et al., 2011). At Statkraft, and at most Norwegian hydropower companies, inflow forecasting to the reservoirs is con-475 ducted using the well-known HBV (Bergström, 1976) model. The inflow to the reservoirs is a critical variable in production planning. As such, there was an interest to evaluate and assess whether improvements in the forecasts could be gained through the use of different formulations. In particular, we sought the ability to ingest distributed meteorological inputs and to also assess the variability resulting from NWP models of differing resolution and operating at different time-scales. Figure 5 shows a simple example in which Shyft is used to provide inflow forecasts with a horizon of 15 days for a subcatchment in the Nea-Nidelva basin (marked red in Figure 4). The total area of the basin is about 3050 km 2 and the watercourse runs for some 160 km from the Sylan mountains on the boarder between Sweden and Norway to the river mouth in Trondheimsfjorden.

Production Planning 480
The hydrology of the area is dominated by snow melt from seasonal snow cover.
In this example, the Shyft region is configured with a spatial resolution of 1x1 km 2 and the model setup aims to reproduce 485 the hydrological forecast with forecasting start on 22.04.2018, 00:00 UTC. In order to estimate model state variables, the simulation initiates before the melt season begins. Using the model state based on the historical simulation and latest discharge observations, the model state is updated so that the discharge at forecast start equals the observed discharge. Forecasts are then initiated based on the updated model state and using a number of weather forecasts from different meteorological forecast providers. A deterministic hydrologic forecast is run using the AROME weather prediction system from the Norwegian 490 Meteorological Institute with a horizon of 66 hours and a spatial resolution of 2.5 km. Likewise, a second deterministic forecast is conducted based on the high resolution 10-day forecast product from the European Centre for Medium-Range Weather Forecasts (ECMWF) (spatial resolution 0.1 • x 0.1 • latitude/longitude). In addition to the deterministic forecasts, simulations shown for reference. Note: discharge are provided from Statkraft AS and are divided by a factor X in order to mask the observational data as Statkraft's data policy considers discharge sensitive data. based on ECMWF's 15-day ensemble forecast system are conducted (51 ensemble members, spatial resolution 0.2 • x 0.2 • latitude/longitude).

495
The forecast is run during the initial phase of the snow melt season in April 2018. The historical simulation overestimated streamflow during the week prior to the forecast start (left of black bar in Figure 5). However, after updating the model state using observed discharge, the simulations provide a reasonable streamflow forecast (right of black bar in Figure 5) as well as a series of possible outcomes based on the ensemble of meteorological products. For production planning purposes, the ability to assess the uncertainty of the forecast rapidly, and to efficiently ingest ensemble forecasts is highly valued.  Wiscombe and Warren (1980) and Warren and Wiscombe (1980) hypothesized that trace amounts of absorptive impurities occurring in natural snow can lead to significant implications for snow albedo. To date, many studies have given evidence to this hypothesis (Jacobson, 2004;Forsström et al., 2013;IPCC, 2013;Wang et al., 2013;Hansen and Nazarenko, 2004, e.g.,).

510
Particles that have the ability to absorb electromagnetic waves in the short wavelength range caught the attention of the research community due their influence on water and energy budgets of both the atmosphere and the earth surface (Twomey et al., 1984;Albrecht, 1989;Hansen et al., 1997;Ramanathan et al., 2001, e.g.,). If these aerosols are deposited alongside snowfall, they lower the spectral albedo of the snow in the shortwave spectrum (see for example Figure  properties of snow grains in the thermal infrared, this leads to heating the snow. This in turn has implications for the evolution of the snow micro-structure (Flanner et al., 2007) and snow melt.
At the catchment scale such an absorptive process should have an observable impact on melt rates and discharge. While several studies have provided field-based measurements of the impact of LAISI on albedo of the snow (Painter et al., 2012;Doherty et al., 2016, e.g.,), no studies have attempted to address the impact of this process on discharge. Skiles and Painter 520 (2016) showed that snow pack melt rates were impacted in the Colorado Rockies resulting from dust deposition by evaluating a sequence of snow pits and Painter et al. (2017) provided observational evidence that the hydrology of catchments is likely impacted by LAISI deposition, but no studies incorporated a physically based simulation in a hydrologic model to ascribe the component of melt attributable to LAISI.
Using Shyft, Matt et al. (2018) addressed this process by using the catchment as an 'intergrating sampler' to capture the 525 signal of deposited LAISI. In this work, it was shown that even in a remote Norwegian catchment, the timing of melt is impacted by the slight BC concentrations deposited in the snow with an average percentage increase in daily discharge ranging from 2.5 to 21.4 % for the early melt season and a decrease in discharge of -0.8 to -6.7 % during the late melt season, depending on the deposition scenario.
To accomplish this, a new snow pack algorithm was developed to solve the energy balance for incoming shortwave radiation 530 flux K in , incoming and outgoing longwave radiation fluxes L in and L out , sensible and latent heat fluxes H s and H l , and the heat contribution from rain R. As such, δF δt is the net energy flux for the snowpack: In order to account for the impact of LAISI the algorithm implemented a radiative transfer solution for the dynamical calculation of snow albedo, α. The algorithm builds on the SNICAR model (Flanner et al., 2007) and allows for model input 535 variables of wet and dry deposition rates of light absorbing aerosols. Thusly, the model is able to simulate the impact of dust, black carbon, volcanic ash, or other aerosol deposition on snow albedo, snow melt, and runoff. This is the first implementation of a dynamical snow albedo calculation in a catchment scale conceptual hydrologic model and raises exciting opportunities for significant improvements in forecasting for regions that may have signficant dust burdens in the snowpack (e.g. the Southern Alps, or the western slope of the Colorado Rockies).

The value of snow cover products in reducing uncertainty
In operational hydrologic environments, quantification of uncertainty is becoming increasingly paramount. Traditionally, hydrologic forecasts may have provided water resource managers or production planners a single estimate of the inflow to a reservoir. This individual value often initializes a chain of models that are used to optimize use of the water resource. In some cases it may be used as input to subsequently calculate a water value for a hydropower producer giving insight into how to 545 operate the generation resources. In other cases, the value may be provided to a flood manager, who is responsible for assessing the potential downstream flood impacts.
There is a growing awareness of the need to quantify the amount of uncertainty associated with the forecasted number. In general, in hydrologic modeling, uncertainty is driven by the following factors: data quality (both for input forcing data, as well as validation (guage) data), uncertainty associated with the model formulation, and uncertainty around the parameters selected 550 (Renard et al., 2010). The Shyft platform aims to provide tools to facilitate rapid exploration of these factors.
In Teweldebrhan et al. (2018b) not only was parameter uncertainty explored using the well-know generalized likelihood uncertainty estimation (GLUE) methodology, but a novel modification to the GLUE methodology was implemented for operational applications. The investigation of the suitability of snow cover data to condition model parameters, required a novel approach be defined to constraining model parameters. Rather than the traditional approach to GLUE LOA, Teweldebrhan windows -one can reduce the uncertainty in SWE reanalysis. In this study the LoA approach to data assimilation was introduced, and improved the performance versus a more traditional particle-batch smoother scheme. In both DA schemes, however, the correlation coefficient between site-averaged predicted and observed SWE increased by 8% and 16%, respectively for the particle batch and LoA schemes.

Complexity of hydrologic algorithms
Shyft is focused on providing both hydrologic service providers and researchers a robust codebase suitable for implementation in operational environments. The design principles of Shyft are formulated in order to serve this aim. Using simple approaches in hydrological routines is a design decision related to the desire for computational efficiency. Rapid calculations are necessary in order to provide the possibility to simulate a large number of regions at high temporal and spatial resolution several times 575 per day or on demand in order to react to updated input data from weather forecasts and observations. Hydrologic routines are therefore kept as simple as possible, but also as complex as necessary, and focus has not been on the implementation of the most advanced hydrologic routines, but on known and tested algorithms that are proven in application. Furthermore, emphasis is on portions of the hydrologic model for which data exists. For this reason, the available routines are limited in hydrologic process representation, but active community contribution is envisioned, and new functionality will be implemented when significant 580 improvement in the scope of the targeted applications is assured. Developments aiming for increase in algorithmic complexity in Shyft undergo critical testing aiming to evaluate if the efforts goes in hand with a significant increase in forecasting performance or similar advantages. Of key importance is that the architecture of the software facilitates both the testing of new algorithms and model configurations within an operational setting.

595
Another example is given by Shyft's independence from requirements towards file formats and data bases. The repository concept allows a strict separation of data sources and model, which facilitates the replacement of the forecasting model in the operational setup while leaving other parts of the forecasting system, such as databases and data storage setups, unchanged.
Moreover, the above mentioned functionalities allow, in addition to using the multiple working hypothesis through multi model support, the testing of multiple model configuration, where different combinations of input data, downscaling methods, 600 and model algorithms can be tested.

Conclusions
This paper describes a hydrologic model toolbox aiming for streamflow forecasts in operational environments that provides experts in the business domain and scientists at research institutes and universities with an enterprise level software. Shyft is based on advanced templated C++ concepts. The code is highly efficient and able to take advantage of modern day compiler 605 functionality, and released Open Source in order to provide a platform for joint development. An Application Programming Interface allows for easy access to all of the components of the framework, including the individual hydrologic routines, from Python. This enables rapid exploration of different model configurations and selection of an optimal forecast model.