Effects of dimensionality on the performance of hydrodynamic models for stratified lakes and reservoirs

Numerical models are an important tool for simulating temperature, hydrodynamics, and water quality in lakes and reservoirs. Existing models differ in dimensionality by considering spatial variations of simulated parameters (e.g., flow velocity and water temperature) in one (1D), two (2D) or three (3D) spatial dimensions. The different approaches are based on different levels of simplification in the description of hydrodynamic processes and result in different demands on computational power. The aim of this study is to compare three models with different dimensionalities and to analyze differences between model results in relation to model simplifications. We analyze simulations of thermal stratification, flow velocity and substance transport by density currents in a medium-sized drinking-water reservoir in the subtropical zone, using three widely used open-source models: GLM (1D), CE-QUAL-W2 (2D) and Delft3D (3D). The models were operated with identical initial and boundary conditions over a 1-year period. Their performance was assessed by comparing model results with measurements of temperature, flow velocity and turbulence. Our results show that all models were capable of simulating the seasonal changes in water temperature and stratification. Flow velocities, only available for the 2D and 3D approaches, were more challenging to reproduce, but 3D simulations showed closer agreement with observations. With increasing dimensionality, the quality of the simulations also increased in terms of error, correlation and variance. None of the models provided good agreement with observations in terms of mixed layer depth, which also affects the spreading of inflowing water as density currents and the results of water quality models that build on outputs of the hydrodynamic models.


Introduction
A wide variety of different numerical models have been used for simulating temperature and hydrodynamics in lakes and reservoirs, as well as the biogeochemical and ecological processes that depend on them (e.g., Dissanayake et al., 2019;Guseva et al., 2020;Wang et al., 2020;Xu et al., 2021). While the mechanistic description of underlying physical processes is similar in all models, they differ in their dimensionality, i.e., the number of spatial dimensions that are considered in the model.
One-dimensional (1D) models usually resolve the vertical direction only (water depth), while considering homogeneity of all relevant quantities along horizontal directions. They are attractive due to their easy connection to ecological and biogeochemical modules. In addition, the comparably low number of required input parameters and fast computational time allow for evaluations of scenarios and sensitivity analyses, which facilitate their application for assessing long-term dynamics and resilience of lakes and reservoirs in response to climatic, hydrological and land use changes (Bruce et al., 2018;Sabrekov et al., 2017;Hipsey et al., 2019). Therefore, 1D models such as DYRESM (Imberger and Patterson, 1981;Hetherington et al., 2015), SimStrat (Goudsmit et al., 2002;Stepanenko et al., 2014) and GLM (Hipsey et al., 2019;Fenocchi et al., 2017;Bruce et al., 2018;Soares et al., 2019) have been extensively used in scientific and applied studies. On the other hand, detailed studies of hydrodynamic effects and spatially varying flow and transport mechanisms, such as density currents at river inflow locations, require models with a higher dimensionality. Two-dimensional (2D) models can provide additional insights into the hydrodynamics of lakes and reservoirs while keeping computational costs low when compared to 3D models. The 2D models that neglect variations in the vertical dimension, 2DH, are suitable for shallow lakes, where gradients along depth are minor, but they are mainly used for flood maps, river flows, hydraulics structures and sediment transport. Alternatively, the models that resolve the vertical dimension and one horizontal (longitudinal) dimension, 2DV, are suitable for elongated deepwater bodies where vertical thermal stratification plays a major role, e.g., CE-QUAL-W2 (Gelda et al., 2015;Kobler et al., 2018;Mi et al., 2020). Finally, three-dimensional (3D) models provide highly detailed spatial data but require larger computational effort in terms of time and storage. Regardless of their complexity, 3D models are widely applied, e.g., POM (Beletsky and Schwab, 2001;Song et al., 2004), EL-COM (Carpentier et al., 2017;Marti et al., 2011;Zhang et al., 2020) and Delft3D-FLOW (Soulignac et al., 2017;Bermúdez et al., 2018;Baracchini et al., 2020;Guénand et al., 2020). The choice of model dimensionally often represents a trade-off between required accuracy, availability of boundary conditions and computational costs, and it depends on the objectives of the simulation, the water body characteristics and the availability of computer resources. Different models can complement each other, and model intercomparisons can significantly contribute to process understanding of the studied system, as well as to the assessment of model limitations. Within the framework of the Lake Model Intercomparison Project (LakeMIP; Stepanenko et al., 2010), the performance of different 1D models was compared for a number of reference sites, targeting also at the improvement of model parameterizations (Stepanenko et al., 2013Thiery et al., 2014;Guseva et al., 2020). Perroud et al. (2009) compared four different 1D models, which were previously applied to small water bodies, to the large Lake Geneva, and Mesman et al. (2020) investigated the performance of three 1D models under extreme weather events like storms and heat waves. Most of the comparison among 3D models focused on systems where circulation patterns and internal waves had a major influence. For example, Huang et al. (2010) compared three 3D models for Lake Ontario, where variations in surface temperature are caused by circulation patterns and upwelling or downwelling of the thermocline. Dissanayake et al. (2019) applied ELCOM and Delft3D to simulate internal wave motions and surface currents in Upper Lake Constance. Zamani and Koch (2020) compared AEM3D and MIKE3 models in a reservoir with complex morphology.
Comparison of models with different dimensionalities is more difficult, since the interpretation of each result also depends on model considerations and simplifications. Polli and Bleninger (2019) compared water temperature simulations of a thermally stratified reservoir using  and Delft3D (3D model), and they found that 1D simulations may provide similar information as 3D models in terms of thermal structure. Therefore, the study recommended onedimensional models as a first approach for assessing reservoir stratification patterns and the application of a 3D model if the horizontal substance transport is of interest. Following the same idea, Man et al. (2021) recommended the application of a 1D model for parameter estimation that is subsequently used in 3D simulations, because of the shorter computational time of the 1D model. In their simulations the 1D model showed better agreement with measurements only for specific periods (when stratification or mixing were stable). The Geologic Survey of Israel and Tahal (Gavrieli et al., 2011) developed numerical models with three different dimensionalities for the simulation of the hydrodynamics and temperature stratification of the Dead Sea: a 1D model using the software 1D-DS-POM, a 2D laterally averaged model using CE-QUAL-W2 and a 3D model using the software POM2 K. The models were used in a complementary manner, taking advantage of their respective strengths: the 1D model was used to simulate decades in order to study future scenarios, the 2DV model was used to investigate the changes in the thermal structure of the reservoir due to changes in the multiple inflows, and the 3D model allowed for the study of currents and the 3D thermohaline structure. Nevertheless, the performance of the three different model approaches was not compared within this study and neither were comparisons with respect to velocities where shown. It is important to note that the selection of a higher dimensionality does not imply better simulation results (Wells, 2020). DeGasperi (2013) compared the performance of CE-QUAL-W2 and CH3D-Z (3D model) in simulating the water temperature of Lake Sammamish in the USA. Both models presented similar results with slightly better performance statistics for the 2DV model. Al-Zubaidi and Wells (2018) evaluated the capacity of CE-QUAL-W2 and a three-dimensional adaptation of the same software known as W3 in modeling the temperature stratification at Laurance Lake, Oregon, USA. For this study, the simulations from both models were in comparable agreement with measurements, but running the 3D model was 60 times more expensive in terms of computational time.
The aim of this study is to compare three models with different dimensionalities and to analyze the results based on the simplification of the physical processes caused by model dimensionality. We analyze simulations of thermal stratification, horizontal flow velocity and substance transport by density currents in a medium-sized drinking-water reservoir in the subtropical zone using three widely used open-source models: GLM (1D), CE-QUAL-W2 (2D) and Delft3D-FLOW (3D). The models were run with identical initial and boundary conditions over a 1-year period. Their performance was assessed by comparing model results with measurements of temperature, flow velocity and turbulence over a 1-year period. We aim at providing a reference study for supporting the selection of models and the assessment of model accuracy, as well as at improving the mechanistic understanding of model performance at reduced dimensionality.
2 Description of the models 2.1 General Lake Model (GLM) General Lake Model (version 3.1.8) (Hipsey et al., 2019) is a one-dimensional vertical model, freely available, and is designed to simulate the water balance and the vertical stratification of lacustrine systems. The model computes the vertical profiles of temperature, salinity and density by considering hydrological and meteorological forcing. GLM adopts a flexible Lagrangian layer structure (Imberger et al., 1978;Imberger and Patterson, 1981), which allows the layer thicknesses to change dynamically by contraction and expansion, according to density changes driven by surface heating, mixing, inflows and outflows. The number of layers is adapted throughout the simulation to maintain homogeneous properties within them, while the water volume in each layer is determined based on the site-specific hypsographic curve.
The thickness of the surface mixed layer is described in terms of a balance of turbulent kinetic energy, comparing the available energy with that required for vertical mixing. The available kinetic energy calculation considers surface wind stress, convective mixing, shear production between layers and Kelvin-Helmholtz billowing. Mixing in the deeper hypolimnion is modeled using a constant turbulent diffusivity or a derivation by Weinstock (1981), in which the diffusivity is calculated as a function of the strength of stratification (described by the Brunt-Väisälä frequency) and the dissipation rate of turbulent kinetic energy.
The general heat budget equation for the uppermost layer considers the balance of shortwave and longwave radiation fluxes and sensible and latent heat fluxes. The effect of heating or cooling by the sediment can additionally be included. The rate of temperature change in each layer is a function of the temperature gradient and of the relative area of the layer that is in contact with the bottom.
Inflows can be characterized by temperature, salinity and other scalar concentration data. Initial mixing is estimated by the inflow entrainment coefficient, which is calculated using a bottom drag coefficient and water column stability characterized by the Richardson number. The inflow Richardson number, in turn, is computed according to the channel geometry and assuming a typically small velocity and Froude number for the drag coefficient (Imberger and Patterson, 1981). Thereafter, the inflow is placed in a layer of neutral buoyancy along the water column. Thus, a new layer is created, with a thickness dependent on the inflow volume.

CE-QUAL-W2
CE-QUAL-W2 (version 3.7) (Cole and Wells, 2006) is a laterally averaged (2DV) model, resulting from the integration in one horizontal direction of the differential equations of conservation of mass, momentum and energy. It is an opensource Eulerian model using a structured orthogonal grid that uses a bathymetric map as geometry input and shortwave radiation, cloud cover, air temperature, dew point temperature, wind speed, wind direction and precipitation as meteorological forcing. Hydrodynamic output data include water temperature and longitudinal flow velocity. Laterally averaged models are based on the shallow water equations (Reynolds-averaged Navier-Stokes equations using the hydrostatic pressure assumption in the vertical, thus neglecting vertical accelerations) and are used for modeling hydrodynamics, water quality, and density stratification in lakes and reservoirs for which the transversal gradients of those properties are small compared to gradients in the longitudinal and vertical directions. The assumption of lateral homogeneity can be well suited for describing long and narrow water bodies. The equations are applied in a finite difference grid.
The default turbulence closure model (W2) uses the layer thickness as the mixing length and a formulation for the turbulent viscosity derived by Cole and Buchak (1995). It is also possible to use Nikuradse, parabolic, RNG (renormalization group) and TKE (turbulent kinetic energy, k-ε model) closure schemes. The k-ε closure is frequently used and was chosen for the study.

Delft3D-FLOW
Delft3D-FLOW (version 4.04.01) (Deltares, 2013), from now on referred to as Delft3D, is a 3D open-source software for simulating the flow and transport of constituents in water bodies. For the simulation of the hydrodynamics, the numerical algorithms behind Delft3D solve the shallow water equations (3D Reynolds-averaged Navier-Stokes equations with the hydrostatic approximation for the vertical direction). The simulation of the transport of matter and heat is achieved through the solution of the advection-diffusion equation. The mentioned equations are solved on a structured finite-difference grid using case-appropriate initial and boundary conditions. Delft3D considers user-defined (constant) background viscosities and diffusivities. They represent all forms of mixing that are not parameterized through the turbulence closure scheme. In order to calculate the heat exchange between the water surface and the air, five different heat flux models are implemented in Delft3D. Those models consider the short-and longwave radiation balances, evaporation and sensible heat fluxes.
The spatial discretization in the horizontal plane can be performed using curvilinear or rectangular grids, with the former having variable cell size. In the vertical direction a Zor σ -layer configuration can be employed. In the Z model, the number of layers is not constant over the basin and varies with local bathymetry.

Field data
Passaúna Reservoir is a drinking-water reservoir located in southern Brazil (25.50 • S, 49.38 • W), which has been in operation since 1990. The reservoir is around 11 km long, has 9 km 2 of surface area and has a maximum depth of 16.5 m close to its dam . The main tributary is the Passaúna River with a mean discharge of 2.4 m 3 s −1 , delivering approximately 75 % of the total inflow to the reservoir (Carneiro et al., 2016). Passaúna River enters the reservoir through a small forebay formed at the upstream region of the reservoir due to a bridge (Ferraria Bridge). This forebay has an average depth of 1 m and approximately 0.28 km 2 of area ( Fig. 1). The outflows from the reservoir are the abstraction for the water treatment station, the bottom outlet at the dam (ensuring a minimum discharge of ∼ 0.4 m 3 s −1 in the downstream river) and the free overflow spillway. Reservoir bathymetry and its hypsographic curve were obtained from a high-resolution echo-sounder survey . The field measurements described below have been analyzed in detail in Ishikawa et al. (2021a).

Meteorological data
Relative humidity, downwelling shortwave radiation, wind (speed and direction at 10 m height) and dew point temperature were measured at a meteorological station located 4 km east of the reservoir. This station is operated by the Technology Institute of Paraná (TECPAR) and measured every 1 min, here averaged to 1 h. The company operating the reservoir (Sanitation Company of Paraná, SANEPAR) measured precipitation nearby the dam, and starting from May 2018, they also measured air temperature at the same location (temporal resolution of 10 min, later averaged to 1 h). Starting from this date, the air temperature data were taken from this station. Cloud cover data were downloaded from the ERA5 database from Copernicus (Hersbach et al., 2018) with hourly resolution.
Air temperature varied seasonally with a lowest monthlymean value of 14.0 ± 4.3 • C (mean ± standard deviation of hourly time series) in August 2018 and the highest temperature in January 2019 (23.7 ± 3.6 • C). Large diel temperature variations followed the daily cycle in shortwave radiation. Wind speed was generally low with a total mean value of 2.0 ± 1.0 m s −1 . A slight seasonal variation was observed where the lowest monthly averaged wind speed occurred during winter (in June 2018, 1.6±1.0 m s −1 ) and the largest during spring (November 2018, 2.5 ± 1.0 m s −1 ). No seasonal pattern was observed for the remaining parameters (Fig. 2).

Inflow, outflow and water level
Daily-averaged discharge and temperature of the inflows were modeled using the Large Area Runoff Simulation Model (LARSIM-WT; Haag and Luce, 2008). The model was calibrated with data from four gauging stations (the two most downstream stations are shown on the map in Fig. 1) for the period 2010 to 2013 with a Nash-Sutcliffe efficiency of 0.77 (for further information see Ishikawa et al., 2021d). In 2018, the model underestimated the peaks of discharges but had good agreement during baseflow conditions. Simulated water temperature for the year of 2018 had a Nash-Sutcliffe efficiency of 0.96.
Starting from March 2018, a temperature logger was installed in the Passaúna River and measured inflow temperature was used instead of the simulated values (sampling resolution of 10 min, here averaged to 1 h). The measurement was made with an accuracy of ±0.1 • C and resolution of 0.01 • C using a temperature-oxygen sensor (miniDOT, Precision Measurement Engineering, Inc.).
Water abstraction rate at the intake facility was provided by SANEPAR, measured with an inductive flow meter, and provided at hourly resolution. The operator also provided reservoir water level measured by an ultrasonic probe in a 30 min temporal resolution. Outflow discharge at the ground outlet and the spillway were calculated based on standard hydraulic structure design equations, according to the structure's geometry. The discharge coefficients were adjusted using a few downstream discharge measurements (MuDak-WRM project; Fuchs et al., 2019) but also considering the overall water balance, where simulated inflows minus calculated outflows should correspond to the measured water level changes.

Temperature
Close to the intake facility, where the water depth is about 12 m, a thermistor chain was deployed from 1 March 2018 to 6 February 2019. The chain was fixed at the bottom and had 11 temperature loggers with 1 m vertical spacing, starting from 1 m above the bed. The loggers (Minilog-II-T, Vemco) measured at a sampling interval of 1 min with a pre-cision of ±0.1 • C and 0.01 • C resolution. An additional logger of the same type was placed under the Ferraria Bridge, with the same configuration from 2 March 2018 to 12 August 2018. In addition, temperature profiles were collected with a CTD (conductivity-temperature-depth profiler, Son-Tek CastAway) at five locations along the reservoir ( Fig. 1) in February, April, May, June, August, November and December 2018, as well as February 2019.

Flow velocities
An upward-looking acoustic Doppler current profiler (ADCP Signature 1000, Nortek AS) was deployed close to the thermistor chain (< 50 m distance) at the bottom of the reservoir to measure vertical profiles of flow velocity. The device was deployed and recovered for data download and battery replacement several times from 23 February 2018 to 5 February 2019. Its configuration was modified between individual deployments (Table S1 in the Supplement) to improve the data quality and also to adjust power consumption (measurement duration) to the monitoring program. Mean values of the three-dimensional flow velocities were recorded along a vertical profile starting at 0.7 m above the bed up to 1.5 m below the water surface with vertical and temporal resolution of 0.5 m and 5 or 10 min, respectively. High-resolution pro-files of vertical velocities were used for turbulence analysis. These profiles covered a depth range of 7.4 m, starting 0.6 m above the bed with a spatial (vertical) resolution of 4 cm and a sampling frequency of 1 or 4 Hz. High-resolution data are not available for the first ADCP deployment.

Model setup
The simulation period started on 1 August 2017 and ended on 28 February 2019. The first 6 months were considered a spinup period for the models; it was decided to start the simulations in August when the reservoir was vertically mixed. Therefore, all models started with uniform temperature of 17 • C and water level at 887.01 m a.s.l. In addition, conservative tracers were implemented to observe the transport of substances from Passaúna River; hence, the river had a constant concentration of 1 kg m −3 starting from 1 August 2017.

Model calibration
Parameters that could be used for calibration are related to the exchange of heat and momentum at the water surface and include coefficients for wind drag, light extinction, and sensible and latent heat transfer. Scaling factors were not considered in the calibration process. The light extinction coefficient was intended to have a fixed value for all models based on Secchi disk depth measurements (which the average along the longitudinal and over time was 2 m, resulting in a light extinction coefficient of 0.85 m −1 ). However, it was noticed that the results of the 2D model could be improved based on this coefficient; therefore, it was considered an additional calibration parameter.
Each model underwent a manual calibration procedure, in which the listed coefficients (see Table 1) were modified in order to reduce the mean absolute error (MAE) of the water temperature. Automatic calibration procedures are available for GLM but were not applied, and its calibration processes were similar to those of the 2D and 3D models. The specific choice of model parameters is presented in Table 1.
Each model used different time steps; nevertheless, the difference between 2D and 3D models was minor. For Delft3D, the time step was defined based on the estimation of Courant number in order to reach numerical stability. In GLM, the default time step of 1 h was used, which was applied in several other studies using the same model (e.g., Farrell et al., 2020;Gal et al., 2020;Ladwig et al., 2021). Due to the relatively large difference between the 1D and the other models, we ran the GLM model with time steps ranging from 1 to 86400 s and compared the simulations with measurements ( Fig. S1 in the Supplement). The simulation results differed due to the variable number of vertical layers. The default time step had one of the smallest centered root-mean-square errors and was used for the model comparison. The usage of different time steps should not affect the comparison, once all models were stable and calibrated.

Boundary conditions
The same boundary conditions were used for all three models, with temporal resolutions according to the availability of data and model requirements. The boundary conditions are the following: air temperature, relative humidity, downwelling shortwave radiation, wind speed, wind direction, precipitation, cloud cover, outflow discharge (for water intake and continuous discharge of ground outlet), and inflow discharge and temperature. The water level at the spillway was used as a boundary condition as it represents an open boundary in the 3D model.

Bathymetry and grids
Bathymetric information was interpolated on the grids of Delft3D and CE-QUAL-W2, whereas GLM only used the hypsographic curve. The CE-QUAL-W2 grid was built using a QGIS 3.2 plugin developed by Bornstein (2019). It contains a maximum of 20 layers (vertical direction), two branches (longitudinal direction) and 82 segments divided among the two branches (Fig. 3b).
The Delft3D grid was built using the grid generator of Delft3D (RGFGRID). The resolution is higher in the forebay-bridge area in order to better represent the formation of density currents in this region ( Fig. 3e and f). The refinement of the grid in the forebay region was made using the RGFGRID module, and where needed the bathymetry and grid was edited using the Quickin module for a better representation of the reservoir.

Inflow
GLM and CE-QUAL-W2 used inflow discharge and temperature with daily resolution, which was required for GLM, while the 3D model used temperatures with 10 min temporal resolution for the Passaúna River after the installation of a temperature logger.
In GLM the inflows were divided into three: the two main tributaries (Passaúna River and Ferraria River) and the 60 other minor tributaries as a single discharge, their discharges were summed up and the temperatures averaged. In CE-QUAL-W2 and Delft3D, each tributary was implemented as a single discharge at the closest respective segment or cell. The intrusion depth for CE-QUAL-W2 was defined as the layer having the same density, and for Delft3D it was uniformly distributed over depth.

Intake
The intake facility had withdrawal flow rates implemented in daily resolution for the 1D and 2D models and in hourly resolution for the 3D model. The abstraction of water occurred  close to the surface; therefore, for GLM and CE-QUAL-W2, the abstraction level was set up as 885 m a.s.l., and the surface cell was defined for Delft3D. The abstraction location was defined for the 2D and 3D models, and for the 1D model only the level was required.

Ground outlet
The ground outlet flow rate was almost constant (0.44 ± 0.07 m 3 s −1 ) over the simulation period, as there were only a few gate operations, and small water level variations. This flow rate was abstracted from all models in a daily temporal resolution. Similar to the intake withdrawal, GLM required as additional information the level of the outlet (872 m a.s.l.). For CE-QUAL-W2 and Delft3D, the second deepest cell was selected.

Spillway
The spillway was set up as an open boundary in the three models. For GLM and CE-QUAL-W2, its discharge was computed through the following equations: where Q spillway (m 3 s −1 ) is the volumetric flow over the spillway, and h (m) is the difference between water level and spillway crest. For GLM, α was calculated as which depends on spillway width, W spill = 60 m, and associated drag coefficient C Dspill = 0.62, and g is the acceleration due to gravity (m s −2 ), while in CE-QUAL-W2 it was defined as an empirical coefficient = 110.3. For Delft3D, this open boundary was defined as water level dependent, where the measured water levels were applied in temporal resolution of 30 min. The outflow discharge thus depends on the water level and the bathymetry at the open boundary grid cells.

Shorelines, bed and water surface
Shorelines and bed are considered to be closed boundaries with no-flux condition. For Delft3D and CE-QUAL-W2, a uniform roughness coefficient was specified at the bed (Table 1). Surface heat fluxes are described in Table 1, and wind direction was not used in the 1D model. Precipitation was uniformly distributed over the water surface.

Meteorological data
GLM and CE-QUAL-W2 used meteorological data with a temporal resolution of 1 h. For GLM, this resolution is equivalent to the minimum time step. For Delft3D, meteorological data with 10 min temporal resolution were used.

Indices for comparison
To compare model simulations to observations, the following parameters were calculated for the cells being closest to the sampling location. For CE-QUAL-W2, segment 55 was selected (indicated in Fig. 3a-c); for Delft3D, cell [193,28] was selected ( Fig. 3d-f), and for GLM the first 12 m of depth was selected. Simulated values were linearly interpolated to match with the sampling depths, and indices were calculated only for the period of interest: 1 March 2018 to 28 February 2019.
The assessed variables were water level, spillway discharge, evaporation rate, water temperature, flow velocities, energy dissipation rates and substance transport. Variables that required additional processing for the analysis are described below; otherwise, the variables were assessed through time series provided by the models.

Statistics
The model simulations were compared with observations in terms of mean absolute error (MAE), centered root-meansquare error (cRMSE), standard deviation (σ ), correlation coefficient (R) and coefficient of determination (r). These parameters were calculated as follows: where n is the number of observations, a i is one observation and a is the mean of all samples (measured or simulated).
where s denotes the simulated value and m the measured value.
where SS res is the sum of the squared residuals, and SS tot is the sum of squared data. Taylor diagrams (Taylor, 2001) are used to assess the simulation results of the vertical temperature distribution. The diagram provides a concise overview of results through comparison with observations in terms of standard deviation (σ ), correlation coefficient (R) and centered root-mean-square error (cRMSE) in one plot.
In addition, descriptive statistics (mean, standard deviation, percentiles and percentage differences) were calculated to compare simulated and observed values.

Mixing and stratification
Following Ishikawa et al. (2021a), the water column was classified as mixed or stratified based on a threshold of the Schmidt stability (S T ). Days with daily-averaged Schmidt stability equal or lower than 10 % of the annual maximum Schmidt stability (calculated with the measured data as 16.3 J m −2 ) were considered mixed, while days with higher S T were classified as stratified.
Schmidt stability was calculated using the software Lake Analyzer (Read et al., 2011) as where g (m s −2 ) is the gravitational acceleration, A S is the reservoir surface area, A z is the area at depth z, ρ z is the water density at depth z, z D is the maximum depth and z v is depth of the reservoir center of volume calculated as The thickness of the upper mixed layer (UML) was also computed by Lake Analyzer using a threshold for the vertical density gradient, which depends on the density gradient of the entire water column (see Read et al., 2011, for details).

Temperature
Due to the dynamic mixed layers of GLM, its results were linearly interpolated for each 0.5 m over depth. To calculate errors, the measured temperatures correspondent to the timestamp of simulation results were selected, and simulation results were linearly interpolated to match with the observations over depth. This procedure was followed for the three models.

Density currents and substance transport
Water from inflowing rivers can take different flow paths after entering the reservoir, which are affected by density stratification in the reservoir and inflow conditions. Densitydriven inflows can be classified as underflows, interflows, or overflows, where the first is spreading along the reservoir bottom, the second at an intermediate water depth and the last at the reservoir surface.
Substance transport in models was analyzed by simulating the transport of a conservative tracer that was added continuously to the inflowing water. Tracer concentrations at the intake region were assessed to observe substance transport from the Passaúna River to the monitoring site through vertical profiles of the daily maximum value of tracer concentration. The GLM model was set up with a maximum water depth of 17 m, while the water depth at the point of analysis was ∼ 12 m. For this reason, we present model outputs up to a maximum depth of 12.5 m. If the maximum tracer concentration was below this depth, the inflow regime was categorized as underflow. For CE-QUAL-W2 and Delft3D, the closest cell to the station was selected, which represents the actual water depth at the monitoring site.

Water level and water storage
At the end of the simulation period, all three models presented a lower water level than the measured. The last mea-surement was 886.81 m a.s.l., and the closest value simulated was in Delft3D with 886.49 m a.s.l., which was also the one with the lowest error (MAE = 7.4 cm). GLM simulated a final water level of 886.44 m a.s.l., and CE-QUAL-W2 estimated 886.36 m a.s.l., with respective MAE values of 10.5 and 10.8 cm (Fig. S2a in the Supplement). Additional statistical metrics are presented in the Supplement (Table S2). The largest discrepancies in water level occurred when it raised over the spillway crest. GLM and Delft3D had water above the crest for a longer period than observed, and their levels kept being larger than the measurements until a sharp increase in October 2018, which none of the models reproduced. Total spillway discharge had its largest volume in CE-QUAL-W2: 2.93 × 10 7 m 3 , GLM had a spillway volume as the 2D model of 2.87×10 7 m 3 , and Delft3D simulated 3.7 % less spillway discharge than CE-QUAL-W2 (2.83 × 10 7 m 3 ) (Fig. S2b).
Evaporation values in all models were in the same order of magnitude but significantly different (one-way ANOVA test with p value = 5 × 10 −17 , with the null hypothesis that both pieces of data have the same mean). The 1D, 2D and 3D model estimated daily-mean evaporation rates were 2.9±1.3, 2.7±1.0 and 3.4±1.4 mm d −1 , respectively. Comparing the volumes due to evaporation with the reservoir volume (7.0 × 10 7 m 3 ), over the year GLM lost the equivalent of 12.0 % of the reservoir volume, CE-QUAL-W2 with the lowest evaporation rate lost 11.0 % and Delft3D lost 14.3 %.
6.2 Temperature 6.2.1 Vertical profile at the intake region From the measurements made with the thermistor chain, it was observed that the reservoir was thermally stratified at the beginning of the monitoring period (end of summer). The first autumn overturn took place in mid-April, but after a few days the reservoir became stratified again. These dynamics of mixing and stratification repeated several times throughout autumn and winter, characterizing a warm polymictic mixing regime (Lewis, 1983;Ishikawa et al., 2021a). Thermal stratification developed in spring and persisted throughout the summer.
The observed seasonal pattern of stratification and mixing was reproduced by all three models (Fig. 4). At the water surface, simulated temperatures were highly correlated with observations with comparable correlation coefficient (> 0.99) for all three models (Fig. 5b). The net surface heat fluxes simulated by the models were not statistically different (p value = 0.14, one-way ANOVA test with null hypothesis that the two groups have the same mean). Observed water surface temperature was 21.6 ± 3.5 • C (mean value and standard deviation) for the whole period. In the simulations, surface temperature was 22.2 ± 4.0 • C in GLM, 21.1 ± 3.7 • C in CE-QUAL-W2, and 22.4 ± 3.7 • C in Delft3D. With increasing depth, the error increased, and the correlation between measured and simulated temperatures decreased (Fig. 5). In the deepest layer, temperature was on average 19.1 ± 2.0 • C, while GLM, CE-QUAL-W2 and Delft3D simulated 18.6 ± 2.2, 18.2 ± 2.3 and 19.3 ± 2.3 • C, respectively. In the 2D model, errors increased rather continuously with increasing depth, showing maximum cRMSE of 0.6 • C at around 10.5 m depth. Meanwhile, GLM and Delft3D showed the largest errors around the middle of the lower half of the water column, GLM at a depth of 6.7 m with cRMSE of 0.94 • C, and Delft3D at around 8.6 m depth with an error of 0.6 • C.
The average water temperature in the GLM simulations was about 0.5 • C warmer at the surface and colder at the bottom, which lead to stronger thermal stratification than in the observations. CE-QUAL-W2 simulated lower temperatures at the surface and bottom (−0.5 and −0.9 • C, respectively), and Delft3D estimated warmer temperatures at surface and bottom (+0.8 and +0.2 • C, respectively).
According to the classification based on Schmidt stability, Passaúna Reservoir was mixed on 95 d out of 343 d of the monitoring period, with the longest continuous period of mixing from 8 May to 20 June 2018 (Fig. 6a). In GLM, stratification was generally more stable and the reservoir was classified as mixed only on 68 d. Periods with homogeneous temperature were shorter and discontinuous, and the last mixing event in early September was not resolved by the model. The simulated Schmidt stability was strongly correlated with those estimated from observations (Pearson's correlation coefficient with null hypothesis of no relationship, R = 0.98, p value = 0); however, it was overestimated by a factor of 1.64 on average (Fig. 6b). CE-QUAL-W2 provided the closest match of the number of mixed days with observations (95 d). Due to the lower simulated bottom temperature, the Schmidt stability was overestimated by a factor of 1.19 on average (Fig. 6c), but simulations and observations were highly correlated (R = 0.95, p value = 0). During the mixed season, the intermittent stratification was attenuated in the 2D model, and the mixed periods were slightly longer. Delft3D had the best correlation and a lower overestimation of Schmidt stability than GLM (R = 0.98, p value = 0, factor of 1.12), however the number of mixed days in the simulations was underestimated by about 25 % (72 mixed days) (Fig. 6c).
The upper mixed layer (UML) depths estimated from measurements were compared to UMLs estimated for each model, and all of them presented poor coefficient of determination (r 2 < 0.6) for linear regressions (Pearson's correlation coefficients with null hypothesis of no relationship are the following: for 1D R = 0.42, for 2D R = 0.33, and for 3D R = 0.68, and all had p value = 0 -see Fig. S3 in the Supplement). GLM and Delft3D presented rather thinner UMLs, whilst CE-QUAL-W2 had a larger variance, ranging between deeper and shallower UMLs.
The Taylor diagram presented in Fig. 7 was calculated for temperature simulations throughout the entire period, and all depths demonstrate that the three models had good correlations (> 0.95) and similar standard deviations of residuals (all models had a standard deviation lower than 0.5 • C for residuals of the difference between measured and simulated temperature). The cRMSE had the most significant differences between models, with Delft3D as the closest to observations (0.50 • C), followed by CE-QUAL-W2 (0.56 • C) and GLM (0.84 • C).

Longitudinal temperature variations
To compare temperature simulations along longitudinal cross-sections of the reservoir, CTD profile measurements were interpolated and compared with simulations of the 2D and the 3D model (Fig. 8). The models were capable of reproducing the different temperature distributions during the sampling dates. In February 2018 and 2019, the reservoir was stratified in the upstream region with a growing UML along its longitudinal axis. During the remaining surveys in August, November and December 2018, the reservoir showed different patterns of vertical stratification with only minor longitudinal variations. Similar to the mean temperature analyzed above, CE-QUAL-W2 had colder temperatures, while Delft3D had higher temperatures, and both had a comparable strength in stratification and were in agreement with the measurements.
As the 1D model considers horizontally averaged temperature, the temperature profiles measured along the longitudinal cross-section of the reservoir were averaged for the comparison with GLM simulations (Fig. 9). The simulations were generally in good agreement with the averaged temperature profile. The best agreement occurred in August 2018, when the reservoir was mixed. In November 2018 and February 2019, the low water temperature in the upstream part of the reservoir affected the surface temperature of the averaged profiles, leading to a greater difference compared to the simulations.
In addition, simulated temperatures from the 2D and 3D models were horizontally averaged and compared to the continuous temperature profile observed at the intake. The cRMSE values of both were very similar to the original comparison with the closest segment or cell to the monitoring station (Fig. S4 in the Supplement). The similarity with the original comparison can be explained by the fact that a large part of the reservoir was indeed homogeneous over the horizontal (approximately until 5000 m of distance from the dam) and had a comparable water depth as at the monitoring station.

Flow velocities
The total averaged horizontal flow velocity (the magnitude of the horizontal velocity components averaged in time and   Ishikawa et al. (2021a), flow velocities larger than 3.5 cm s −1 (∼ 90th percentile) were defined as currents. They were forced by wind and had the upper part of the water column flowing towards an opposite direction as the lower part (see Fig. 10a-d). The currents, and consequently the total averaged flow velocity, were significantly (p value = 1 × 10 −181 , with null hypotheses of both having a similar distribution) more frequent and more intense during stratified periods when compared to mixed periods. The same analysis, only considering the magnitude of the longitudinal component, was made for the 2D and 3D simulation results (Table 2).
For the total period and all depths, CE-QUAL-W2 had a cRMSE = 2.1 cm s −1 and a negative correlation coefficient with observations (−0.04, p value = 4×10 −40 , with null hypothesis of no relationship), while Delft3D had cRMSE = 1.7 cm s −1 and correlation coefficient of 0.50 (p value = 0). The two models had errors in the same order of magnitude, but the simulations of the 2D model had a lower standard deviation (0.8 cm s −1 ), while the 3D simulations had a standard deviation closer to the observed value (1.4 cm s −1 , being the observed 1.9 cm s −1 ). Both models showed the largest er-  rors at the surface, where the 3D model is closer to the observations than the 2D model (Fig. 11). In contrast to the temperature simulations, the simulated flow velocities had the smallest errors near the bottom. Despite the comparable magnitude of cRMSE of both models, the correlation between simulated and observed velocities differed remarkably ( Fig. 11b). While it was generally low (< 0.1) and fluctuated around zero along the water column for CE-QUAL-W2, it varied between 0.4 and 0.6 for Delft3D. The longitudinal velocities of CE-QUAL-W2 had a 90th percentile of 1.8 cm s −1 and a MAE of 1.7 cm s −1 , its total mean value was 0.8 cm s −1 , and it is possible for the wind in- Figure 7. Taylor diagram of the total simulated period of temperature profiles at the intake region. Green lines indicate centered rootmean-square error (cRMSE) isolines. Angular coordinate, in blue, represents the correlation coefficient (R). Standard deviation (black dashed line) is represented in radial coordinate with the reference measured data in center. Measured is the baseline where correlation is 1 and cRMSE is zero. The red dots represent model performance.
fluence on formation of currents to be observed (Fig. 10a, b, e and f). The occurrence of currents differed between mixed and stratified conditions and were statistically different (p value = 0, Kruskal-Wallis test with null hypothesis that the two groups are from the same distribution). Their relative occurrence was larger during mixed conditions, which is the opposite from that observed. In general the simulated flow velocities were lower than observed velocities. For Delft3D, MAE was 1.3 cm s −1 , and its longitudinal velocities were in general lower than observed with a total average of 1.1 cm s −1 and 90th percentile of 2.4 cm s −1 . As in the observations, the occurrence of currents was significantly different (p value = 2.98 × 10 −285 ) between mixed and stratified conditions. The simulated currents presented clear opposing directions of flow between upper and lower depths ( Fig. 10g  and h). In addition, their relative occurrence was within 1 % difference from the observed data.

Turbulence
Only Delft3D provided simulated energy dissipation rates (ε). The estimation of ε based on measurements is described in Ishikawa et al. (2021a), and due to the limited measurement range of the high-resolution mode of the ADCP, estimations were only made up to 8 m height above the bed.
Observed energy dissipation rates were basically the same during mixed and stratified conditions (respective log averages along depth and time: 7.5 and 5.5 × 10 −10 W kg −1 ). In Delft3D, ε was approximately 1 order of magnitude larger at mid-depth under mixed conditions. Log-averaged dissipation rates for the depth range with observations were 8.0×10 −10 W kg −1 under mixed and 1.0×10 −10 W kg −1 under stratified conditions ( Fig. 12a and b).
While simulations from Delft3D had log-averaged profiles with higher ε towards the bed, the estimations only had the same trend during the presence of currents (magnitude of longitudinal velocities > 90th percentile). The increase for the estimations started around 3 m above the bed; for simulations, the flow velocities under the current threshold also had ε increasing towards the bottom, and in the presence of currents ε started to be larger at around 6 m above the bed ( Fig. 12c and d).

Density currents and substance transport
The models had similar overall distributions until August 2018 (Fig. 13). The tracer transport changed from interflows to underflows (or to deeper interflows) after the first autumn overturn, with differences at the depth of daily-averaged maximum concentrations.
Underflows and interflows at greater depths were predominant in autumn and winter and overflows and interflows closer to the surface were more frequent in spring and summer. GLM predicted interflows with the maximum concentrations at shallower depths than the other models from March to mid-April. After this time, GLM and Delft3D simulations showed underflows more frequently, while CE-QUAL-W2 results had the interflows moved to deeper regions and presented less underflows when compared to the 1D and 3D models. Starting from August 2018, the maximum concentration of the tracer showed different patterns in each model. In the 1D simulations, underflows persisted until the middle of October, and after that interflows formed at ∼ 5 m depth. In the 2D simulations, the inflow formed interflows at a slightly deeper depth (∼ 7 m), while maximum tracer concentrations were widely scattered over the upper water column in the 3D simulations, with their lower bound following the 2D simulations (Fig. 13).
We calculated the relative frequency of occurrence of each flow path by assigning overflows when the maximum concentration was at the water surface (uppermost depth cell), underflows when it was at the bottom (lowest depth cell or deeper than 12 m for GLM) and interflows otherwise. For GLM, there were no overflows, while for most of the time underflows were observed (57.6 %) and interflows for the remaining time. For CE-QUAL-W2 and Delft3D, interflows were the most frequent, with 75.1 % for the 2D model and 67.1 % for the 3D model; overflows and underflows were almost equally distributed for Delft3D, with, respectively, 14.9 % and 18.0 %, and CE-QUAL-W2 simulated more frequent underflows (21.4 %) than overflows 3.5 %. Figure 8. Contour plots of temperature along a longitudinal cross-section of the reservoir from the forebay to the dam. Each column represents one sampling campaign; the first row shows the measurements (interpolated from vertical CTD profiles at locations marked by the dashed vertical lines). The second row shows the simulation results of CE-QUAL-W2 for which we show every segment of the grid and all the respective depth cells. The third row of panels shows simulation results from Delft3D; a way along the thalweg of the reservoir was drawn by selecting several grid cells, and the entire depth of each cell was used.

Water storage
Regarding the water balance, all models had errors of comparable magnitude: 1D, 2D and 3D models had errors in stored water volume of −4.4 %, −4.9 % and −3.7 %. The error in terms of water level was similar for all models (∼ 10 cm) and is in the range of errors reported in the literature (e.g., Dai et al., 2013;Jeznach et al., 2014;Chen et al., 2016;Bueche et al., 2020). GLM had a constant water level from January to May 2018, corresponding to the maximum level defined by the hypsographic curve due to the spillway crest elevation. CE-QUAL-W2 had lower water levels and a larger discharge  shows correlation coefficient (R) along depth for each model. over the spillway. Both models calculated the discharge by empirical equations. Delft3D was forced by the measured water level as an open boundary at the spillway location. For water level elevations higher than the bathymetry at the outflow cells, water will leave the domain. For water level elevations lower than the bathymetry at the outfall cells, no water should flow at all. However, as the bathymetry at the outflowing cells could not reproduce the spillway geometry, periods of water flowing into the domain were observed, which is obviously an artifact. Despite similar results in terms of storage volume, discharges through the spillway and evaporation differed among models, with Delft3D having the largest evaporation loss and CE-QUAL-W2 the lowest. The differences can be attributed to the differences in how the models describe and implement the boundary conditions. We cannot affirm which model is most precise, because the modeled processes (e.g., evaporation or flow over spillway) were not directly measured. In addition, measured data are also associated with uncertainties. There is an underestimation of peaks of inflow discharge from LARSIM-WT and poor data accuracy on outflows of the bottom outlet at the dam and the spillway discharge, which were important parameters that contributed for the discrepancies of the water balance of the reservoir. Thus, care has to be taken when defining boundary conditions in all models, and the first step should always be to check water balance and flows at the boundaries.

Temperature
All three models simulated the dynamics of temperature stratification in reasonable agreement with observations. Resulting errors are in the same order of magnitude as values reported in other model applications (e.g., Bruce et al., Figure 12. Solid lines show depth profiles of log-averaged energy dissipation rates (ε). Background shades mark the range between the 5th and 95th percentiles of the temporal variations. Panels (a) and (b) show estimated and simulated (Delft3D) dissipation rates separated between mixed and stratified conditions. Panels (c) and (d) show estimated and simulated dissipation rates divided in periods of longitudinal flow velocity magnitudes exceeding (>) or being smaller (<) than their 90th percentile. 2018; Weber et al., 2017;Kobler et al., 2018;Mi et al., 2020;Chanudet et al., 2012;Dissanayake et al., 2019).
Stratification is mostly driven by heat exchange associated with absorption of solar radiation, net longwave radiation, evaporation, precipitation and sensible heat transfer at the water-air interface. Despite the different parameterizations of the surface heat flux, which also included downwelling longwave radiation that was not measured, and the use of different coefficients (see Table 1); daily averages of net surface heat flux of the three models had no significant difference among each other. Other processes that influence the temperature stratification are inflows, surface runoff, groundwater inflow and heat exchange with the sediment (Wetzel, 2001). Out of those, only river inflow temperatures were known and implemented in all models. The others were considered to be negligible. Regarding the heat exchange with sediment, Stepanenko et al. (2013) showed that it did not have significant influence on simulations of bottom water temperature of a shallow lake for a comparable temperature range as observed in Passaúna Reservoir. Even though the errors were of comparable magnitude, the model simulations of thermal stability differed among the models, with underestimated temperature in the 2D simulations and overestimated temperature in the 3D simulations. We assume that the differences were related to model dimensionality and the parameterization of vertical mixing and inflows.
The 1D model imposes the largest simplification by neglecting horizontal variations in flow and water temperature, even though an initial mixing of inflows due to entrainment is parameterized in the model. It is important to point out that the continuous temperature measurements at a single sampling location are not ideal for comparison with 1D simulations, and spatially averaged temperature profiles are more representative for the horizontally homogenous conditions assumed by the 1D model. The lower temperatures in the deeper layers can be explained by the cold inflow temperatures combined with the inability of reproducing the enhanced heat exchange of the inflowing water with the atmosphere in the shallow forebay. In addition, the surface temperature was overestimated by GLM, which explains the larger number of stratified days and increased thermal stability. Another factor that can potentially contribute to errors is the selection of the first 12 m in depth for comparison with measurements, thus excluding the bottom boundary layer of the 1D vertical grid. On the other hand, the 2D and 3D models had a better representation of the strength of vertical density stratification, despite having overall divergent results -respectively colder and warmer temperatures than measured. These differences can be at least partially explained by the calibration process, which becomes more difficult for increased dimensionality. Due to the short computational time required for 1D models, it is possible to per- form repeated runs with varying calibration parameters (e.g., light extinction coefficient) and to improve agreement with data during model calibration. Moreover, tools for automated model calibration are available (Bueche et al., 2020).
CE-QUAL-W2 estimated 92 d with mixed conditions, which is in closest agreement with the observations (95 d), but their temporal dynamics differed from observations especially during winter. The intermittency between mixing and stratification was more frequent; mixed periods were longer from June to August and the last mixing event shorter. In the Delft3D simulations, the shorter mixed periods are missing, which reduced the total duration of mixed conditions to 78 % of the observed.

Longitudinal processes
Both 2D and 3D simulations of temperature distributions along the longitudinal cross-section of the reservoir were in good agreement with observed temperature distributions. The 1D simulations reproduced the horizontally averaged temperature distributions, with better results when the reservoir was relatively homogeneous. The close agreement between the 2D and 3D simulations with measurements indicates that transversal gradients are of minor importance for the stratification in Passaúna Reservoir. This is further sup-ported by the good agreement between both models for the simulated tracer transport from the river inflow to the water intake station. Nevertheless, the tracer analysis showed how the differences in temperature simulated by each model also affected the inflow pathways.
These results highlight the advantages of CE-QUAL-W2 and Delft3D, as they are capable of representing the observed longitudinal gradient, especially considering the inflow region, where colder river water flows downwards as an underflow. This underflow is responsible for transporting not only cold water but also dissolved nutrients and suspended sediment over long distances into the deeper parts of the reservoir. Those dynamics are represented in GLM only in a very simplified manner, explaining the weaker performance of GLM with respect to water temperatures at higher depths (Fig. 5).
Tracer dynamics observed with GLM complies with the hypothesis that the lower temperature at the bottom was caused by inflow pathways of mostly underflows because of the absence of the forebay. Fenocchi et al. (2017) demonstrated that, in order to reproduce the thermal response to inflows in a subalpine lake with GLM, it was necessary to use an impractical coefficient of light extinction. The general colder temperatures simulated by CE-QUAL-W2 placed the inflow at deeper depths and confined them in layers. This behavior was observed because the longitudinal flows were located below the UML, thus showing the higher concentrations of tracer especially after September (Fig. 13). For Delft3D, the opposite was observed: the density currents were within the UML, which diluted the tracer concentration, and the depth of its maximum was strongly variable over the last 6 months. The travel times of the tracer was evaluated by identifying the first time that the tracer concentration was larger than 10 −3 kg m −3 at the intake after the release of the tracer at Passaúna River. The transport in CE-QUAL-W2 was faster with travel times of 2.2 and 3.5 d in Delft3D, which can be associated with the higher tracer concentrations of the 2D model. This information is important for management of reservoirs during spilling accidents (e.g., Jeznach et al., 2014); for GLM it is not possible to estimate time travel, since inflows are directly placed at defined layers. Studies assessing the inflow pathways through modeling demonstrated a good agreement between simulations and observations, e.g., Marti et al. (2011) and Zamani et al. (2020) with 3D models and Jeznach et al. (2014) with CE-QUAL-W2.
Despite similar seasonality of stratification and mixing predicted by the three models, the tracer analysis demonstrated how the spatial simplification affects the transport of substances. This was clear especially for the 1D model that differed considerably from the 2D and 3D models. CE-QUAL-W2 and Delft3D presented similar seasonal patterns for density currents, which influenced the stratification in the reservoir mainly by underflows that added a layer of colder water at the bottom and were strongly present in the simulations during winter. Ishikawa et al. (2021a) analyzed the distribution of density currents, categorized as underflows, interflows and overflows, based on the comparison of the measured temperature between the forebay region and the main reservoir (only available for the first half of the total period) with the temperature profile at the intake. Overflows were assigned when the forebay temperature was larger than the surface of the measured profile, and underflows were assigned when lower than the bottom; otherwise, they were interflows. A similar classification of density currents was made using the location of the maximum tracer concentration of the models (Fig. S4). Processes such as entrainment, mixing, diffusion and dilution of inflow are neglected in this approach; therefore, overflows were frequent in March and April 2018, while of minor importance in the simulations. However, the underflow season that was expected in the prior analysis was at some extension reproduced by CE-QUAL-W2 and Delft3D. It was mainly present during the winter, with shifts of 1 month among analyses made based on observations and simulations, matching with the period where the process is relevant for stratification.
For the second half of the simulated annual cycle, the flow paths of 2D and 3D models differed the most with respect to the occurrence of overflows, which is explained by the location of the density currents in relation to the UML depth (Figs. 13 and S4). All models had poor results for UML depth estimated from measurements. For this reason, the 2D model presented a smoothed error along the depth (Fig. 5), while the 1D and 3D models simulated consistently shallower UML depths, causing the peak in the error and correlation profiles. The predictions of thickness of the mixed layer and the slope of the thermoclines are generally challenging for models, as reported for other model applications (Perroud et al., 2009;Huang et al., 2010).

Flow velocities and vertical mixing
Simulation of flow velocities showed less agreement with measurements than temperature, although errors were in the same range of other work with Delft3D (Chanudet et al., 2012;Dissanayake et al., 2019). The magnitudes of simulated longitudinal flow velocities were generally lower than observations, but Delft3D was capable of reproducing the overall characteristic of larger magnitudes of longitudinal flow velocities during stratification and larger relative occurrence of currents (> 90th percentile), while flow velocities simulated by CE-QUAL-W2 showed no agreement in magnitude or dynamics with observations. For a fair comparison with the 2D model, laterally averaged flow velocity observations would be required, which are not accessible from longer-term observations. However, the transversal flow velocity component of observations and simulations of Delft3D were disregarded in the comparison (0.96 ± 0.98 and 0.54 ± 0.64 cm s −1 , respectively). The ratios of mean transversal and longitudinal velocity components are 0.7 for measurements and 0.5 for Delft3D, which indicates that potential transversal flow processes are not resolved by CE-QUAL-W2.
The poor results regarding magnitude and direction of flow velocities in both models can be associated with the fact that flow velocities in Passaúna Reservoir were generally small, internal seiches were not observed and circulation patterns are absent. In studies where those properties are relevant, better agreement between observations and 3D simulations were reported (Huang et al., 2010;Chanudet et al., 2012;Dissanayake et al., 2019), and a simulation in Lake Erie with CE-QUAL-W2 reproduced the oscillation frequency of basin-scale seiches but not their amplitudes (Boegman et al., 2001). Further, the direction of flow velocities is highly affected by the inaccuracy of wind direction measurements, and weak wind intensities are expected to have lower correlation with flow velocities, since they potentially change direction more frequently (Dissanayake et al., 2019), which is the case in Passaúna. Moreover, the correlation between observed and simulated flow velocities increased towards the bottom, probably because the bottom is a better represented boundary for the process, whereas it is the opposite is the case for temperature.
The turbulent closure models are relevant for the vertical mixing, thus the different approaches implemented in the models can be considered to contribute to the variation among simulated UML thicknesses. GLM is the one with the thinner UML and poorest correlation; its simplification in dimensionality and the structure of the model in mixed layers can be the cause for lower mixing. CE-QUAL-W2 neglects transversal flows, which may contribute to the shear profile and cause errors in turbulence production. Lastly, Delft3D resolved all three dimensions, but dissipation rates simulated by the model differ from those estimated from measurements of flow velocities. The log-averaged profiles of dissipation rates computed by the 3D model follow an expected profile with a turbulent surface boundary layer, a low energetic interior and increasing dissipation rates towards the bottom (Wüest and Lorke, 2003). Only few studies in the literature reported comparisons of measured and simulated dissipation rates. In general they presented good agreement, but all of them were performed in high energetic environments such as ocean regions with the presence of breaking waves, near to the surface, and large lakes (Stips et al., 2005;Jones and Monismith, 2008;Paskyabi and Fer, 2014;Moghimi et al., 2016). In spite of all uncertainties regarding the estimated dissipation rates, in our study the results from the 3D simulations revealed that the model has limitations in reproducing all processes contributing to energy dissipation in a mediumsized reservoir.

Conclusions
Three commonly used hydrodynamic models with different dimensionalities were applied to a subtropical reservoir using identical boundary conditions. The simulation results were compared to measurements covering a complete annual cycle. All models were capable of providing valuable information about the water balance and reproduced the overall pattern of seasonal thermal stratification and mixing. Flow velocities, only available from the 2D and 3D models, were more challenging to reproduce, particularly because of low flow velocities and the lack of large-scale circulation pattern in the reservoir. In terms of the mean absolute error for water level, temperature and flow velocity, the three models had a maximum variation among each other of 30 %, but the time required to run the simulations increased by nearly 5 orders of magnitude, from 5 s with GLM, 3.7 min with CE-QUAL-W2 and 3.5 d with Delft3D (Table 1). Passaúna is a mediumsized reservoir, so for larger systems computational time can turn into a more constraining factor. Nevertheless, each model has its advantages and limitations, and their application should be chosen in accordance with the parameters to investigate. 1D.
-Water balance and water level are fundamental for the management of reservoirs.
-Seasonal operations that depend on stratification are important, e.g., selecting intake and outflow depths that will have better results depending on the mixing condition (Weber et al., 2017).
-The good trade-off between computational costs and provided accuracy in simulating seasonal thermal stratification and vertical mixing is attractive, and 1D models have been increasingly employed in larger-scale studies including a large number of water bodies (e.g., Read et al., 2014;Woolway and Merchant, 2019). 2D.
-Assisting in actions and time response regarding substance transport are important, e.g., in case of contamination of the principal inflow (Jeznach et al., 2014). However, specific determination of layers containing the density currents is uncertain, and it will depend on initial mixing of inflow and depths of UML and thermocline, which none of the models could reproduce with precision, although the 3D model had better correlation.
-Seasonal pattern of the density currents is important. 3D.
-Field of flow velocities is important. Although Passaúna Reservoir had low kinetic energy, the 3D model presented positive correlation with measurements. In addition, wind speeds were low and were measured a few kilometers apart from the monitored site, where it could have different directions and could reduce the agreement between observation and simulation. Flow velocities can be important for processes that depend on circulation patterns, e.g., the transport of nutrients that are related to algae blooms (León et al., 2005;Chung et al., 2014).
Challenges faced by all models were the water balance and the UML thickness. The first was rather because of the poor monitoring; thus, it is of paramount importance to have good measurements of the volumetric discharge of inflows and outflows, including engineering structures such as spillways. Otherwise it is difficult to identify sources of errors related to the models themselves. The thickness of the mixed layer can have large effects on subsequent simulations of water quality. The categorization of density currents as overflows, interflows or underflows depends on that and will have a direct impact on the fate of nutrients and organic matter inside the reservoir (Rueda et al., 2007). Similarly, the dynamics and vertical distribution of dissipation rates of turbulent kinetic energy could not be reproduced. This quantity can be relevant not only for hydrodynamic applications but also for the prediction of air-water gas exchange (Katul and Liu, 2017), sediment-water fluxes (Lorke and Peeters, 2006;Grant and Marusic, 2011) or the development of algal blooms (Aparicio Medrano et al., 2013). By taking these general and modelspecific limitations into consideration, models are valuable tools not only for managing water resources but also for scientific applications (e.g., Sabrekov et al., 2017;Mi et al., 2020;de Carvalho Bueno et al., 2021).
Data availability. The current versions of the models used in this study are available at their respective websites.
Author contributions. MI contributed to the conceptualization of the article, application of Delft3D, formal analysis, and investigation. WG was responsible for finishing the setup of Delft3D, adjustment of grids, bathymetry, and analysis of water balance. OG made the first setup of Delft3D and the application of CE-QUAL-W2. GS and JAR were responsible for the application of GLM. TB was the supervisor of GS and OG master theses and also advised in all models and concepts. MM was the co-supervisor of GS and advisor for the article concept. AL was the supervisor of MI PhD thesis and advisor for the article concept. Writing of the original draft was performed by MI, WG, JAR, GS, and OG. AL, TB, and MM also contributed to writing, reviewing, and editing the article.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.