Comment on gmd-2021-64 Anonymous Referee # 1 Referee comment on " Modeling reservoir surface temperatures for regional and global climate models : a multi-model study on the inflow and level variation effects

L 20-21: “Our results highlight that surface water temperatures in reservoirs differ significantly from those found in lakes” is slightly misleading as you did not investigate any lakes. I suggest you reword the sentence to reflect your actual findings, and from that make inferences about lakes if appropriate. Introduction The aims don’t seem to include a comparison of the effect of in and outflows. L 89 – 91: This sentence is a little confusing. Try: Maximum daily mean air temperature ranges from 13 oC in the central highlands to 25 oC in the southeast region. The minimum daily mean air temperature ranges from 5 oC in the northern and central regions to 18 oC in the south. L118 – 120: Did you use single linear regressions between air and water temperature? There is typically a seasonal hysteresis in the air-water temperatures that varies in strength depending on some factors (eg Q), so that you would probably get a better estimate of inflow water temperature using a model like air2stream (https://github.com/marcotoffolon/air2stream, see references here). I am not suggesting you do this and repeat all the simulations and analyses, but could you please comment on this effect? L117: Could you please describe what the baseline scenario is? It would be helpful to explain the rationale of the scenarios early on and in a little more detail. L131 – 134: why did you not use the solar radiation and cloud cover from ERA-Interim? Section 3.2: could you please describe how you parameterized the basin morphology for W2 simulations, especially the data sources and horizontal and vertical grid resolution, and outlet heights? L 143-145: Please describe the calibration procedure. Which parameters were calibrated in which ranges? Was calibration automatic? Which extinction values did you use for the uncalibrated models? L 174: Change to “Two different temporal resolutions ...” L 212: the sentence doesn’t convey what the Ultimate algorithm is used for. I suggest to reword it.

atmosphere interactions in numerical weather prediction, but it also has important implications for limnology. I think the topic is novel and important, as the effect of heat flux through water advection is generally neglected both in NWP and limnological investigations on, for instance, climate change. Reservoirs are seldom specifically studied, yet they have considerably different characteristics compared to lakes, especially their riverine character and the importance of hydrology. The study showed that inflows and outflows indeed have an effect on surface water temperature, particularly in reservoirs with a retention time below 100 days, which should accordingly affect a very large number of reservoirs, which generally have shorter retention times than lakes. A particular strength of the study is its ensemble approach of using one and two-dimensional processbased models of varying complexity as well as a machine learning model, applied to a considerable number (24) of Portugese reservoirs spanning a range of sizes and retention times. This enabled a detailed a robust analysis of the influence of retention time. The model intercomparison is useful in numerical weather prediction because the necessarily high computation efficiency required means that model complexity and input data resolution need to be examined and optimized. I think this paper will make a valuable contribution to the field.
I have a couple of issues with the study. Firstly I would avoid the comparison of reservoirs with lakes. Lakes have inflows and outflows just as reservoirs do, and some lakes also have very short retention times. Furthermore, only reservoirs and no lakes were investigated in the study. An alternative designation for the baseline scenarios may be just W2 instead of W2-Lake, and W2-hydrology instead of W2-reservoir, but this is just an idea.
Secondly, the main difference between reservoirs and lakes in my opinion is not the presence/absence of inflows and outflows (though reservoirs are typically dominated by hydrology as riverine systems), but the fact that most reservoirs have deep water outlets, whereas lakes discharge from the surface. This can have profound effects on the heat budget in stratified waters because of potentially large differences in surface (e.g. 25 degrees C) and bottom water temperature (eg 10 degrees C or potentially lower). Deep water withdrawal tends to shrink the hypolimnion, expand the surface mixed layer, decrease vertical temperature gradients, shorten stratification duration, and increase the heat content of reservoirs in comparison to lakes with surface water outlets. This aspect was not addressed in the paper but would significantly influence surface water temperature and I recommend that the authors provide information on whether/how this was accounted for, how the outflows were represented in the model, and what its potential effect on the results would be.
Thirdly, I found the description of the scenarios somewhat confusing and I have some questions. Could you please describe the rationale behind the scenarios in more detail and perhaps early on (e.g. the baseline scenario is actually 4 scenarios -hourly, and daily forcing, with and without hydrology). Please also clearly indicate whether the 1D model scenarios considered inflows and outflows or not (I believe it was mentioned in the introduction but it should be clear in the methods too). Consider adding a table describing the different scenarios and comparisons. You may also consider removing Table 2 as its content can easily be summarized in the text. Finally, the model errors were assessed against the observed data (correct in my opinion) but in other analyses model errors were assessed against the baseline scenario W2-Reservoir, as far as I can ascertain. This can be problematic because each model has its own characteristics and does not per se represent the "truth", especially if some parameters are highly calibrated as is the case of the baseline scenarios here. For instance, if the extinction coefficient was calibrated in the baseline scenario with the 2-d model but not in the 1-d models, or if different parameter values were used, are the results comparable and how meaningful are the model errors? I think it may make more sense to calculate the model error always against the observed values, and then to assess the relative changes in error in the different scenarios. E.g. "not accounting for inflows increased the model error by X". If the model error is used mainly as an analytical variable, please describe the rationale here clearly.

Introduction
The aims don't seem to include a comparison of the effect of in and outflows.
L 89 -91: This sentence is a little confusing. Try: Maximum daily mean air temperature ranges from 13 ºC in the central highlands to 25 ºC in the southeast region. The minimum daily mean air temperature ranges from 5 ºC in the northern and central regions to 18 ºC in the south.
L118 -120: Did you use single linear regressions between air and water temperature? There is typically a seasonal hysteresis in the air-water temperatures that varies in strength depending on some factors (eg Q), so that you would probably get a better estimate of inflow water temperature using a model like air2stream (https://github.com/marcotoffolon/air2stream, see references here). I am not suggesting you do this and repeat all the simulations and analyses, but could you please comment on this effect? L117: Could you please describe what the baseline scenario is? It would be helpful to explain the rationale of the scenarios early on and in a little more detail. L131 -134: why did you not use the solar radiation and cloud cover from ERA-Interim? Section 3.2: could you please describe how you parameterized the basin morphology for W2 simulations, especially the data sources and horizontal and vertical grid resolution, and outlet heights? L 143-145: Please describe the calibration procedure. Which parameters were calibrated in which ranges? Was calibration automatic? Which extinction values did you use for the uncalibrated models? L 174: Change to "Two different temporal resolutions …" L 212: the sentence doesn't convey what the Ultimate algorithm is used for. I suggest to reword it.

Results:
L 293-296: The parameter ranges appear to be quite big and the calibration resulted in especially low values of the wind factor and extinction. The average extinction value of 0.38 1/m is typical of reasonably deep oligotrophic lakes but several reservoirs are quite shallow and I would expect some considerably higher values. I am wondering if the low wind factors were compensating for the low extinction values? Were these calibrated values used for the other models, or which parameterisations were used? L302: change 'major' to 'high' or 'highest'.
L319: can this result also be attributed to the fact that W2 and ANN were fitted to the data and the 1D models were not?