CREST-VEC: A framework towards more accurate and realistic flood simulation across scales

. Large-scale (i.e., continental and global) hydrologic simulation is an appealing yet challenging topic for the hydrologic community. First and foremost, model efficiency and scalability (flexibility in resolution and discretization) have 10 to be prioritized. Then, sufficient model accuracy and precision are required to provide useful information for water resource applications. Towards this goal, we craft two objectives for improving US current operational hydrological model: (1) vectorized routing and (2) improved hydrological processes. This study presents a hydrologic modeling framework CREST-VEC that combines a gridded water balance model and a newly developed vector-based routing scheme. First, in contrast to a conventional fully gridded model, this framework can significantly reduce the computational cost of river routing by at 15 least ten times, based on experiments at regional (0.07 sec/step vs. 0.002 sec/step) and continental scales (0.35 sec/step vs. 7.2 sec/step). This provides adequate time efficiency for generating operational ensemble streamflow forecasts and even probabilistic estimates across scales. Second, the performance using the new vector-based routing is improved, with the median-aggregated NSE (Nash-Sutcliffe Efficiency) score increasing from -0.06 to 0.18 over the CONUS. Third, with the lake module incorporated, the NSE score is further improved by 56.2%, and the systematic bias is reduced by 17%. Lastly, 20 over 20% of the false alarms on two-year floods in the US can be mitigated with the lake module enabled, at the expense of only missing 2.3% more events. This study demonstrated the advantages of the proposed hydrological modeling framework that could provide a solid basis for continental and global scale water modeling at fine resolution. Furthermore, the use of ensemble forecasts can be incorporated into this framework; and thus, optimized streamflow prediction with quantified uncertainty information can be achieved in an operational fashion for stakeholders and decision-makers


Introduction
Flooding all over the world has affected millions of people, especially those who reside in floodplains (Tellman et al., 2021).
In the US, flooding, as the primary cause of billion-dollar weather disasters, costs $3.9 billion monetary losses and 15 deaths per year over the past four decades according to the NOAA National Centers for Environmental Information U.S. Billion-Dollar Weather and Climate Disasters (2021). In light of frequent flooding in the US, several public agencies have been The mizuRoute has been used together with the hydrologic framework SUMMA (Structure for Unifying Multiple Model 65 Alternatives) (Knoben et al., 2021) and is planned to be implemented in the CESM (Community Earth System Model). These vector-based routing models overcome several challenges for large-scale hydrologic simulations faced by raster-based routing models. First, higher model resolution in raster-based models comes at the expense of higher computational cost, which prohibits global hydrological simulations at tens or hundreds of meters. However, the vector-based routing model is much more scalable and computationally efficient, irrespective of increasing resolution. Second, river networks can be more 70 realistically represented in a vector form. In conventional hydrologic models, the river network in a raster form has to be delineated based on a DEM as a preprocessing step. River networks generated in such a way do not always align well with natural river centerlines. For studies investigating hydrologic connectivity in particular, river grid cells in a raster form can easily become discontinuous without considering river topology. Alternatively, river networks in popular hydrography data are digitalized based on satellite optical imagery and manual inspection . Another weakness of raster-based 75 routing stems from the traditional D8 flow strategy which means water in the central grid can only be permitted to flow through one of its neighboring grid cells (Tarboton, 1997). On the contrary, vector-based routing offers a more flexible approach from vector representation of river networks.
To date, modern vector-based routing models such as RAPID and mizuRoute have neglected the subsurface routing, which is either assumed to be minimum (Mizukami et al., 2020) or treated the way same as surface routing (Lin et al., 2019;Yang et 80 al., 2021). However, subsurface routing is an important hydrologic process and dominates over regions that have intermittent flow behaviors (Freeze, 1972). For flood simulation, ignoring subsurface routing could underestimate the peak flow and miscalculate the flood timing, both of which directly affect decision-making processes. An equally important research thrust is the representation of lakes and reservoirs in vector formats, since they markedly alter flow response not only at a local scale, but also downstream rivers. One of the functions of lakes and reservoirs in the US is for flood control, so simulation without 85 incorporating such a process is likely to result in falsely issued flood warnings.
In light of the advantages of vector-based routing, this study introduces a coupled modeling framework CREST-VEC (Coupled Routing and Excess STorage with VECtor routing), which strives to facilitate real-time flood forecasting across scales. This framework seamlessly integrates the current operational flash flood forecast model structure -CREST model and the vector-based routing framework -mizuRoute. We utilize a case study to demonstrate the advantages of this coupled 90 framework and to investigate some updates we made to improve the existing routing scheme. Four questions are posed in this regional case study: (1) What are the performance gains for CREST-VEC compared to the CREST model? (2) Does the included subsurface routing improve model performance? (3) Can a simple natural lake simulation improve model performance in a downstream urban area? and (4) How does CREST-VEC model adopt to flood warnings? In the second part, we apply this framework to the continental US for a comprehensive evaluation. We ask one additional question: How many 95 floods are falsely alarmed without considering reservoir operations? It is anticipated that findings from this work could motivate the future development of large-scale hydrologic models and raise awareness on whether and how much flood forecasts by model simulations should be trusted without proper representation of lakes.

Hydrography data 100
In this study, we use the vectorized river network and Hydrologic Response Unit (HRU) dataset derived from the high-accuracy Multi-Error-Removed-Improved-Terrain (MERIT) Hydro hydrography dataset (Yamazaki et al., 2017(Yamazaki et al., , 2019. The flowlines were created from the 90-meter DEM data (MERIT DEM), covering the full global land surface (60°S-90°N). A minimum channelization threshold of 25 km 2 (upstream area) was applied to restrict river channel grid cells in the MERIT Hydro dataset.
The HRUs were processed along with flowlines by the TauDEM software and trimmed with the HydroBASINS level-II 105 boundaries. Detailed processing of the hydrography data is listed in Lin et al. (2019). This set of hydrography data has been validated against 30-year Landsat imagery  and empowered the global reconstruction of historical streamflow (Lin et al., 2019;Yang et al., 2021). Over the CONUS, we have obtained 341,921 river reaches and the same amount of unit catchments for the routing component.
Lakes and reservoirs in the U.S. play a significant role in regulating streamflow (Tavakoly et al., 2021). Major river basins 110 (e.g., Mississippi and Columbia River Basins) are highly regulated, as shown in Fig.1a and 1b, results obtained from Lehner et al. (2011). The HydroLAKES dataset provides a global catalog of lake polygons and pour points that can be easily integrated into hydrologic models (Messager et al., 2016). Over one million natural lakes and constructed reservoirs were identified globally, with a minimum surface area larger than 10 ha. Over the U.S., there are 96,874 lakes recorded in the HydroLAKES data, of which 94,865 are natural lakes without human intervention, and 1,992 (17) lakes are reservoirs (regulated lakes), as 115 shown in Fig.1c. Of regulated lakes or reservoirs, 20.0% are primarily used for irrigation, 19.9% for hydroelectricity, 17.6% for water supply, 17.2% for flood control, 14.1% for recreation, 1.9% for navigation, 0.7 for fisheries, and 8.6% for others ( Fig.1d). The total lake volume, estimated from the lake bathymetry, is a required field in our modeling framework to approximate outflow.

Forcing data
Forcing data is required as model inputs to drive the hydrologic model. Hourly precipitation rates are obtained from the Multi-Radar Multi-Sensor (MRMS) data, operated at the NOAA NSSL (Zhang et al., 2016). The MRMS is a state-of-the-art radar-125 gauge merged product, providing instantaneous rates at a 1-km spatial resolution over the CONUS and parts of southern Canada and northern Mexico. We used the one-hour accumulated and gauge-corrected precipitation product in this study for streamflow simulation. The performance and hydrologic utility of MRMS data has been corroborated in previous studies (Li et al., , 2021. The daily temperature from the PRISM (Parameter-elevation Relationships on Independent Slopes Model) is used to simulate snow accumulation and melt (PRISM Climate Group, 2014). The PRISM team routinely collects 130 meteorological data from meteorological stations over the U.S. and interpolates them into 4-km gridded data based on the elevation dependence (Daly et al., 2008). The potential evapotranspiration (PET) data is obtained from the USGS FEWS data port (https://earlywarning.usgs.gov/fews) at daily and 1° spatial resolution (Allen et al., 1998). Forcing data at different spatial resolutions is re-gridded to a 1-km model resolution. All of these data are collected from the simulation period in complete calendar days from 2015 to 2019. 135

CREST model
As jointly developed by the University of Oklahoma (OU) and NASA, the CREST model has been released for a decade (Wang et al., 2011). It is a distributed hydrologic model whose primary purposes are (1) flood simulation and forecasting, (2) evaluating the hydrologic utility of satellite precipitation datasets, and (3) water resources management (Xue et al., 2013;Tang et al., 2016;Gourley et al., 2017;Gao et al., 2021;Li et al., 2021;Chen et al., 2022). Owing to its relatively simple structure 140 and computationally efficient simulation, the CREST model has been promoted by the NOAA NSSL for real-time flash flood forecasting over the continental U.S. and its territories (Gourley et al., 2017;Flamig et al., 2020). As shown in Fig. 2, the effective rain (deficit of rainfall rates and evaporation rates) reaches the land surface and is partitioned into fast runoff from urban impervious area ratio and infiltration into the soils. A Variable Infiltration Curve (VIC) model is incorporated to determine the infiltration rate (Liang et al., 1994). Surface runoff is generated when infiltration rates become higher than the 145 maximum infiltration capacity. In the meantime, slow-flowing interflow is produced while soil water content is depleted. In the CREST model, flow routing is handled in two ways. Terrain routing and in-channel river routing are done by the kinematic wave model which simplifies the Saint-Venant equation by ignoring the acceleration and forcing terms (Vergara et al., 2017).
The interflow is routed by a conceptual linear reservoir with parameterized velocity (Shen et al., 2017). We refer to the CREST model hereafter as a standalone package that couples the water balance model with gridded terrain and channel routing. The 150 original code is written in C++.
To account for snowmelt, we coupled the original CREST model with the Snow-17 model, which is part of the National Weather Service River Forecast System in the U.S (Franz et al., 2008). The Snow-17 model is a conceptual snowmelt scheme that simulates snow accumulation and ablation based on temperature and precipitation as inputs (Anderson, 2006). Although the physics behind it is not as comprehensive as the energy balance model, Snow-17 is advantageous for having less required 155 input data and performing "at least as good as" energy-based models (Ohmura, 2001).

mizuRoute
The mizuRoute river routing model, developed at the NCAR (National Center for Atmospheric Research), is a vector-based routing framework that incorporates both terrain and channel routing for large-domain river routing applications (Mizukami et al., 2016(Mizukami et al., , 2021. For the terrain routing, the IRF or UH is used with parameters associated with gamma distribution to adjust 160 the shape and scale. For the channel routing, user-defined options are IRF, kinematic wave with Lagrangian solution, and kinematic wave with Euler solution. A recent version of mizuRoute (Version 2.0.1) includes two lake routing schemes (Gharari et al., 2022;Vanderkelen et al., 2022)one based on Döll et al. (2003) with a simple level-pool equation for natural lakes and the other more complicated one based on Hanasaki et al. (2006) which includes reservoir operation rules. These two schemes have been applied to the other global hydrologic models (e.g., WaterGAP, VIC, and CWatM) to account for regulated 165 streamflow. The original code is written in Fortran.
The current version of mizuRoute does not explicitly account for subsurface runoff routing over terrain, which is critical in the Great Plains and regions where streams are intermittent across a year (Salas et al., 2017). In this study, we enable an option to turn on or off subsurface routing as defined in the model configuration file. Similar to surface runoff routing, the subsurface flow is routed using the IRF scheme but with a much slower velocity and reduced magnitude. We use a two-parameter Gamma 170 distribution function to materialize the IRF method as shown in eq. 1.
Where is the time variable, is a shape parameter, and is a time-scale parameter. Both and determine the flood peaking time and flashiness. After calculating instantaneous rates based on the gamma function, we use convolution to compute flow (2)

CREST-VEC
The framework, CREST-VEC, and the difference compared to its precedent CREST model are shown in Fig. 2. The main 180 difference comes from the routing process, where the original CREST model routes surface flow and interflow via a kinematic wave routing model and a conceptual linear reservoir model in a gridded manner. However, the CREST-VEC model requires area-averaged time series of surface and subsurface flow at each river reach to be separately routed downstream. The gridded outputs from the CREST model (i.e., surface runoff and subsurface runoff) are extracted and averaged over each unit catchments or HRU using the newly developed Python package EASYMORE (EArth SYstem Modeling REmapper), publicly 185 available from https://github.com/ShervanGharari/EASYMORE.
The framework is loosely coupled with two models written in different programming languages. A bash file calls three executables after model compilation subsequently (CREST-EASYMORE-mizuRoute). The input files for this model chain include forcing data (gridded precipitation, potential evaporation, and temperature), topography data (gridded digital elevation model, flow direction, flow accumulation, river network topology, and hydrologic response unit), and configuration files. The 190 topography data can be accessed from the HydroSHEDS website which consists of grid-based and vector-based topography data.
We use the IRF scheme in this study for both terrain routing and channel routing in this study and activate the lake model with the Döll et al. (2003) lake model. The parameters for lake parameters such as the outflow coefficient a and exponent b of eq.3, are based on suggested values in Döll et al. (2003) and Gharari et al. (2022). For lakes that have monitored storage provided by the US Geological Survey (USGS), we directly insert storage time series into the model. As reservoir operation is not considered in this study, we exclude observed streamflow that is regulated by reservoirs and regulated lakes, as shown in Fig.   1c. So, only results from natural lakes, which account for 98% of US lakes or reservoirs, are considered valid for statistical comparison. To initialize model states, especially for initial lake volumes, we warm up the CREST-VEC model from 1948 to 2014 using the GLDAS forcing (Global Land Data Assimilation System) at a daily time step. 200 where a and b are the outflow coefficient (1/day) and exponent, respectively; is the actual lake storage (m 3 ); , is the maximum lake storage (m 3 ). [

Case study: Houston region
As mentioned in the objectives of this study, we first conduct a case study analysis to assess the relative contributions of subsurface flow routing and lake routing to streamflow simulation based on the CREST-VEC framework. The original CREST 210 model is used as a benchmark. We chose the Houston region (Fig. 3a) because there are two large natural lakes -Lake Barker and Lake Addicks that impact hydrologic simulations (Fig. 1a). For the CREST model with gridded routing, we calibrate the  Figure 3b shows the computational cost (elapsed time at seconds per step) for a series of model configurations for the routing process. All the tests were run on a single core Intel i7-6700K CPU (4.00 GHz). The grid-based CREST model costs 0.01, 0.08, and 0.12 seconds per step at 1-km, 250-m, and 90-m resolutions, respectively. However, the CREST-VEC model can reduce this to approximately 0.002 seconds per step, regardless of grid resolutions from forcing data. There is little difference among the three scenarios (i.e., CREST-VEC, CREST-VEC+subq: CREST-VEC plus subsurface routing, and CREST-225

Model speedup 220
VEC+subq+lake: CREST-VEC with subsurface routing and lake routing). Relatively speaking, CREST-VEC can speed up the current operational CREST model at 1-km by 10x, let alone at finer resolutions.

Performance improvement
Regarding model skills, the CREST model and CREST-VEC achieve similar median NSE (Fig. 3c) based on observations from 22 stream gauges, even though the CREST model takes advantage of automatic calibration.  VEC+subq overestimate flows downstream of two natural lakes, resulting in poor scores. But after incorporating lake routing schemes, the CREST-VEC+subq+lake model achieves not only better median scores but also less spread (quantified by the interquartile range). Notably, both CREST-VEC+subq and CREST-VEC+subq+lake have positive NSE values and smaller uncertainty ranges, primarily owing to included subsurface routing. The time series in Fig. 4 highlights the model performance at three stream gauges affected by upstream lakes. The CREST-VEC overestimates streamflow by a considerable amount (i.e., 235 three times higher than observation in Hurricane Harvey), resulting in low NSE scores: 0.11, 0.16, and 0.18, respectively. With lake routing considered in the CREST-VEC+subq+lake, the simulated streamflow aligns well with observations, achieving NSE scores of 0.61, 0.65, and 0.64, respectively. Although the CREST model captures streamflow magnitude after calibration with the NSE scores -0.37, 0.52, and 0.54, the peak timing is at least one-day delayed for Hurricane Harvey. In summary, the advantages for the general CREST-VEC framework against the gridded CREST model are threefold: (1) improve 240 computational efficiency by at least ten times, (2) improve overall model skill, (3) reduce uncertainty ranges.

CONUS simulation 250
Moving towards continental-scale hydrologic simulation, the CREST-VEC model excels at reducing computational costs, leaving room for quantifying uncertainties from forcing, model structure, and parameters in real-time. The following question is whether and how much the new lake routing improves a continental simulation. To answer this question, we simulate CREST-VEC with and without lake routing over the CONUS from 2017-06-01 to 2020-01-01 at an hourly time step. Notably, subsurface routing is activated for both models with and without lake routing, so we expect the difference in results to be 255 primarily due to lake simulation. Streamflow data from 5,350 stream gauges in the same period are collected and used for model verification. For this case, the CREST-VEC model parameters are based on the pre-configured CONUS-wide parameters, the same as the ones used in Flamig et al. (2020). Table 1 lists model performance with respect to total computational costs and evaluation scores of streamflow simulation. 260 CREST-VEC certainly improves streamflow simulation not only via a higher resolution (from 1-km to 90-m) but with faster computational speed (149.2 hours to 29.9 hours in total; 7.2 sec/step to 0.37 sec/step for routing step only). Considering all preprocessing steps altogether, CREST-VEC model is still at least four times faster than the original framework. To be noted, a considerable amount of time is spent on mapping gridded runoff data to a vector form (>50% of the time). Future attention should be drawn on how to optimize the efficiency while preserving certain degrees of accuracy for this process. 265

Performance improvement
The median NSE score has increased from -0.06 (gridded) to 0.12 (no lake) and 0.18 (lake). The fraction of gauges with positive NSE scores has been improved from 41.8% (gridded CREST) to 50.6% (CREST-VEC without lake) and to 56.2% (CREST-VEC with lake). However, the CREST-VEC results are more biased than gridded CREST, partly due to the systematic overestimation of streamflow by the IRF routing scheme in the CREST-VEC. The difference would be primarily attributed to 270 the different routing processes, as CREST permits leakage in the interflow reservoir, thereby leading to lower positive bias.
Results with lake simulation have reduced BIAS from 27% to 17%, as part of the water is being held in the lake. The CC (Correlation Coefficient), however, does not vary much between scenarios with and without lake simulation, as shown in Fig.   5. One of the reasons is that the CREST-VEC model does not simulate regulated lakes or reservoirs which have strong control of streamflow time shifts. Notably, the IQRs (interquartile ranges) of NSE and BIAS for lake simulation are lower than without 275 lakes, meaning that this method particularly boosts scores at gauge locations that had poor performance previously. Figure 6 depicts the spatial map of model skill (with lake) and its difference between scenarios with and without lake simulation.
CREST-VEC with lake module in regions like the West Coast and Upper Mississippi River Basin have relatively good performance (NSE>0.4), yet over the Great Plains and East Coast, the model bias is high (BIAS>1), yielding low NSE scores. Similar issues are found in the literature with other models (Clark et al., 2008;Newman et al., 2015;Mizukami et al., 2017;280 Salas et al., 2017;Lin et al., 2019;Konben et al., 2020;Knoben et al., 2020;Yang et al., 2021;Tijerina et al., 2021). Taking the Great Plains as an example (highlighted box in Fig. 6c), the model physics of CREST-VEC does not correctly represent the real hydrologic processes by two means. First, the surface runoff (before routing) simulated by CREST-VEC is biased. We compare the annual surface runoff by CREST-VEC to the public community dataset GRFR (Global Reach-level Flood Reanalysis) in Fig. 7. The runoff in GRFR is simulated by the VIC model and undergoes stringent bias correction against 285 observations via the discrete quantile mapping technique (Yang et al., 2021;Lin et al., 2019). There is a 116.3% higher surface runoff by the CREST-VEC than theirs, partly explaining the high BIAS and low NSE scores in such region. We suspect the singular bulk soil layer represented in the CREST model yields such systematic differences. Second, the missing representation of playas, small and rain-fed lakes that are prominent in the Great Plains, leads to falsely produced runoff (Hay et al., 2016;Solvik et al., 2021). However, even when accounting for multiple hydrologic model structures, performance in this region is 290 still ranked as one of the poorest (Clark et al., 2008;Knoben et al., 2020). For example, Knoben et al. (2020) analyzed 36 hydrologic models over the U.S., in which the maximum KGE scores out of those models are lower than 0.5 over the Great Plains.

How likely are floods falsely detected?
In this section, we shift gears to explore how likely US floods are falsely alarmed if no lake simulations are included. We selected 283 gauges that are downstream of natural lakes (Fig. 8), with most of them located in the middle and eastern US.
The hourly time series of streamflow of those gauges are compared against advised flood thresholds (2-year flooding) provided by the US Geological Survey. They fit a log-Pearson III type distribution to the annual maxima streamflow from long-term 310 records and extract values with given flood frequency. Following a similar approach as in Yang et al. (2021), consecutive yet independent events have to be two days apart from one another. From there, we calculated the Probability of Detection (POD), False Alarm Ratio (FAR), and Critical Success Index (CSI) based on the contingency table.
As expected, median FAR is reduced from 0.63 (without lake simulation) to 0.50 (with lake), resulting in a slightly higher CSI of 0.36 than that of 0.31 for no lake simulation (Fig. 8a). Additionally, previous research reported that simulation results with 315 the lake module mitigate the seasonal variability of the river discharge (Tokuda et al., 2021). The decrease in FAR values implies five instances: (1) decrease in false alarms while hits remain the same; (2) increase in hits while false alarms remain the same; (3) decrease in false alarms while increase in hits; (4) decrease in both false alarms and hits; and (5) increase in both false alarms and hits. We find that however POD values decrease from 0.87 without lakes to 0.85 with lakes, from which we can infer that both hits and false alarms are decreasing, but false alarms decrease at a higher rate. That is a fact of reducing 320 simulated flood peak, which results in fewer hits in flood forecasts but meanwhile less falsely alarmed floods. As most studies focus on flood detection, they inevitably arrive at more falsely detected floods. Too many false alarms could make people disregard the warnings, despite a real threat, causing the "cry wolf" effect. Fig. 8b display the distributions of flood detectability with lake simulation and its improvements compared to results without lake simulation. High POD and FAR values co-exist in the Great Plains, where the model simulates considerably 325 higher streamflow values than observations. Moderate FAR values are found near the Florida panhandle and parts of Georgia.

Maps in
Lower FAR values are found in the Midwest and West Coast. Compared to results without the lake, FAR values are reduced reasonably over the East Coast, Midwest, Gulf Coast, and West Coast, although POD values remain relatively unchanged or even decreased.
Five local cases are shown in Fig. 9, which depicts the river topology and time series of hourly streamflow. One can infer that 330 these lakes are not heavily regulated from recorded streamflow time series, therefore showing the effectiveness of our model.
In Fig. 9a, the simulated streamflow without lakes is heavily overestimated, peaking at 1200 cms in the year 2017, whereas the actual flow rate is around 400 cms. The scenario with lake simulation, however, produces a magnitude much closer to the observation. Due to decreased systematic bias, the lake scenario boosts the NSE score from -0.2 to 0.5. There is also an 8% less chance of issuing false alarms than the model without lake simulation. Figure 9b shows a case where FAR is reduced from 335 0.70 to 0.17, a reduction rate of 75.7%. The flood detectability, i.e., CSI, is greatly improved from 0.29 to 0.57. Figure 9c exemplifies a case with all improved metrics (i.e., NSE, POD, FAR, and CSI). All these three cases in Figs. 9a-c are located along the St. Johns River, in which we expect a systematic improvement along this river after incorporating the lake simulation.

Vector vs. Raster routing
In this study, we compare the advantages of vector-based routing with respect to conventional raster-based routing in two aspects: (1) model efficiency and (2) model accuracy. Overall, the vector-based routing shows great promise, as it speeds up the routing process by at least ten times, compared to grid-based routing, for both the regional simulation (0.07 sec/step vs. 355 0.002 sec/step) and the CONUS simulation (0.35 sec/step vs. 7.2 sec/step). In terms of results against observations, the CONUS-wide performance is improved regarding NSE values. However, the variable river reach lengths (from hundreds of meters to tens of kilometers) in large-scale simulation pose challenges for estimating routing parameters such as the time and shape parameters in a unit hydrograph. Second, most land surface models are still grid-based, making a type mismatch (gridbased land surface model vs. vector-based routing model) (Lehner & Grill, 2013). To integrate the two, we need a processing 360 step by mapping surface and subsurface runoff onto representative HRUs. Different aggregation strategies are present and subject to the primary purpose of interest. At present, there is an ongoing effort to seamlessly integrate these two processes together (Gharari et al., 2020). However, it is yet to be efficient and draws further attention to improving this mapping scheme.
Third, the many-to-one river network is established but not for one-to-many, meaning river bifurcation is challenging to represent and tackle (Yamazaki et al., 2014). 365 Raster-based routing comes at the resolution of the input DEM data, albeit at a slower computational speed. Having matured over the years, most raster-based routing models are seamlessly integrated with water balance models so that model can be set up at minimum effort by a modeler. Mostly derived from hydrodynamic models, the concept of "raster" can be extended to "grid" because of the emerging unstructured grids such as triangles, curvilinear, hexagons, etc. These flexible grid types mimic the real flow directions and reduce computational costs. However, they are yet to be accepted/applied by the hydrologic model 370 community.
As one objective of this study, we want to examine the potential improvement from the with-lake configuration on streamflow simulation over a wide range of hydrometrical and geographical settings in the CONUS, rather than provide some optimal model setup and parameterization at the CONUS scale, which we believe is way beyond our scope and several steps forward from the current CREST-VEC or any existing CONUS models. As far as what qualifies 'an adequate base simulation', there 375 may be some room for debate but should be some bottom-line principles: first, one should be clearly aware of the sources of uncertainties, including forcing, model structure, parameterization, streamflow observation as the reference, etc. Optimization, though effective in improving the model performance, compensates for uncertainties from the other sources simply via adjusting model parameters. This has been acceptable for operational purposes but is not appropriate for this study where a modification of the model structure is introduced. Instead, we use an a-priori parameter set that was developed based on remote 380 sensing datasets and also evaluated at the CONUS scale (Vergara et al., 2016). The physical base of these a-priori parameters set a solid foundation for examining the new with-lake configuration, thus should not be compromised via parameter tunning.
The CREST-VEC model, by no means, represent all physical hydrological processes. Instead, it is a conceptual flood forecast model that aims to deliver timely flood information to stakeholders, decision-makers, and broader users. We admit that some processes such as vadose zone modeling, snow melting, hillslope routing, in-channel river routing, and reservoir operations 385 are simplified, and some processes such as vegetation and groundwater modeling are missing from the current version. For the lake module, we expect to include more sophisticated multi-layer decision processes instead of a level-pool process. Lake evaporation is another important factor to be considered for improved water balance. Since it is a compromise between model complexity and efficiency, we hope to continuously push the envelope on this front to optimize the real-time flood forecast system. 390

Room for improving large-scale hydrologic simulation
Large-scale hydrologic simulation is still a long-standing challenge for the hydrologic community, especially with debates on developing a "one-model-fits-all" structure or a "malleable" structure (Burek et al., 2020;Clark et al., 2015a;Fenicia et al., 2011;Savenije, 2008). The CREST model, in our study, systematically overestimates surface runoff over the Great Plains and Southeast, a result of some misrepresented or missing processes, yet excels in flash flood simulation. Diverse hydrologic model 395 structures, on the other hand, hope to overcome individual limitations and offer joint benefits (Horton et al., 2021). We, therefore, promote the "malleable" model structure from the efficiency point of view -a flexible structure disables redundant hydrologic processes. Then, the central question becomes: How do we adapt the model to variable catchment processes? In such a context, intercomparisons and discussions of different hydrologic models in varying catchment processes become particularly valuable (Clark et al., 2015a;Knoben et al., 2020;Tijerina et al., 2021). Notably, simply relying on the NSE or 400 KGE score to assess the model performance can be misguiding (Clark et al., 2021).
Hydrologic calibration is powerful in boosting model accuracy, yet large-scale models oftentimes suffer from the complexity that impedes credible model calibration. River routing schemes and their parameters can affect streamflow simulations, especially at fine time scales such as sub-daily (Mizukami et al., 2021). Our current study used an IRF scheme in which the impulse response function is derived from a diffusive wave equation (see Lohman et al., 1996;Mizukami et al., 2016) and 405 includes two parametersdiffusivity and celerity. These parameters need to be exposed to calibration in addition to the hydrologic model parameters. Furthermore, to fully understand the routing model's impact on streamflow simulations, it is necessary to consider other routing schemes including a diffusive wave model as well as a kinematic wave model, which may be suited for flood forecasting.
Lastly, the computational costs for large-scale simulation can be optimized from accelerated hardware (multi-core CPUs and 410 GPUs) once codes are parallelized and scalable. Advances in Reduced Order Modeling (ROM), a surrogate model which develops a parsimonious solution to replace the computationally intensive part, hold promise to reduce costs (Clark et al., 2015b). For instance, to integrate reservoir simulation into the CREST-VEC system, we can build an offline ML model which is promising in mimicking human decisions (Yang et al., 2021) and plug it into the system.

Towards improved flood forecasting with lake routing 415
Flood forecasts are difficult because of their rarity, and their hits and misses are typically low while false alarms are high (Bartholmes et al., 2009;Cloke & Pappenberger, 2009). Results in this study demonstrate a dilemma in which the model with a lake module reduces false alarms but at the cost of more missed flood events compared to the one without a lake module.
Although the combined metric CSI has a certain degree of improvement, this leaves a questionshould we reduce a large number of false alarms at the expense of missing a small number of real events? Before discussing this point, we acknowledge 420 that the current lake routing process is simple and imperfect, and improvement in this process possibly leads to an optimal situation where both false alarms and misses can be improved. However, in most situations, tradeoffs exist in hydrologic predictions. A good strategy in our case would be running both simulations with and without the lake module concurrently and making the "without lake" results the worst-case scenario. Since the CREST-VEC model has the advantage of efficiency, running two scenarios is totally feasible. A decision-maker can be trained to assess the situationresults from two scenarios 425 disagreefrom the perspective of flood severity and consequences.

Conclusion
This study compares a conventional raster-based routing scheme with the emerging vector-based routing approach in hydrologic models for regional case and continental simulations. From the continental run, we demonstrate the improvement in streamflow simulation after incorporating the lake storage and release module. Last but not least, flood-related false alarms 430 can be greatly reduced by including the lake module. The following points summarize the primary findings of the study: 1. Vector-based routing can accelerate continental-scale river routing by up to ten times, compared to a grid-based routing, for both a regional case (0.07 sec/step) and a continental case (0.002 sec/step). This leaves adequate room for generating ensemble predictions with variable forcing, parameters, and/or model structures. Furthermore, it improves streamflow simulation from -0.06 to 0.18, according to the aggregated median NSE values. 435 2. A newly developed lake model increases the NSE score by 56.2% and reduces systematic BIAS by 17% for the continental simulation.
3. Flood false alarm ratios can be mitigated by 20.6% after enabling the lake module at the expense of missing 2.3% more floods on a continental scale.
We recommend the use of ensemble simulations stemming from different model structures to overcome and adapt to varying 440 catchment processes. Optimized streamflow prediction with quantified uncertainty information can be achieved in an operational manner for stakeholders and decision-makers. Future studies can fully investigate the limitation and uncertainty of different forcing, parameters, and/or model structures to catchment signatures such as climatology, dominant hydrologic processes, lithology, etc. Vector-based routing, in such a context, can enable fair comparison by excluding the effect of different routing schemes while focusing on discrepancies in water balance models alone. For future work, we hope to have 445 the best possible model-simulated streamflow product in the US, fused with multi-model structures and observations. Another direction is to improve current lake and reservoir outflow simulation with a hybrid modelprocess-based and ML-based.

Acknowledgement
The first author is sponsored by the University of Oklahoma Hydrology and Water Security (HWS) program (https://www.ouhydrologyonline.com/) and Graduate College Hoving Fellowship. 450