the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GEMS v1.0: Generalizable empirical model of snow accumulation and melt based on daily snow mass changes in response to climate and topographic drivers
Atabek Umirbekov
Richard Essery
Daniel Müller
Abstract. Snow modeling is often hampered by the availability of input and calibration data, which can affect the choice of model complexity and its transferability. To address the trade-off between model parsimony and transferability, we present the Generalizable Empirical Model of Snow Accumulation and Melt (GEMS), a machine learning-based model that requires only daily precipitation, temperature or its daily diurnal cycle, and basic topographic features to simulate snow water equivalent. The model embeds a Support Vector Regression pretrained on a large dataset of daily observations from a diverse set of the Snowpack Telemetry Network (SNOTEL) stations in the United States. GEMS does not require any user calibration, except for the option to adjust the temperature threshold for rain-snow partitioning, though the model achieves robust simulation results with the default value. We validated the model with long term daily observations from numerous independent SNOTEL stations not included in the training and with data from reference stations of the Earth System Model-Snow Model Intercomparison Project. We demonstrate how the model advances large scale SWE modelling in regions with complex terrain that lack in-situ snow mass observations for calibration, such as the Pamir and Andes, by assessing the model`s ability to reproduce daily snow cover dynamics. Future model development should consider the effects of vegetation, improve simulation accuracy for shallow snow in warm locations at lower elevations and address wind-induced snow redistribution. Overall, GEMS provides a new approach for snow modeling that can be useful for hydro-climatic research and operational monitoring in regions where in-situ snow observations are scarce.
- Preprint
(2442 KB) - Metadata XML
- BibTeX
- EndNote
Atabek Umirbekov et al.
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2023-103', Matthieu Lafaysse, 10 Aug 2023
General comments
Umirbekov et al. present a new machine learning approach to simulate snow mass with parcimonious data input and an extremely low numerical cost. The evaluation framework is really interesting as it includes independent data removed from the calibration dataset, but also the state-of-the-art ESM-SnowMIP dataset including challenging climate and environment conditions beyond those of the calibration dataset, and finally a spatialized application with more uncertain forcing data and evaluation data derived from remote sensing. Of course, the potential of machine learning has to be considered in snow modelling and I think this paper can be a significant contribution on that topic. The results clearly challenge physical models, even if obviously the output variables are not sufficient for all applications.
Nevertheless, I think the description of methods and results is sometimes a bit too fast in the current version of the manuscript and that some details are missing for an accurate understanding and interpretation of results. In general, figures are not really introduced in the main text. I would also have expected more in-depth discussion of the advantages and disadvantages of this approach compared to physical approaches and other machine learning approaches in the light of presented results and previous literature, and also discussions about the possibility to disentangle errors due to the forcing and to the algorithm itself. Maybe, the chosen structure of the paper that mixes results description and results discussion is partly responsible for this sometimes incomplete discussion. Finally the choice to try to recalibrate the Ts parameter is sometimes confusing especially when it’s done on evalution datasets, as it leads to unrealistic values and overcalibration.
I also have some specific comments or questions below that can probably be addressed rather easily by the authors during the revision process.
Detailed comments
Section 2.1 The choice of SVR relatively to other machine learning algorithms is not discussed. I would suggest to add a quick summary of advantages and disadvantages compared to the most classical algorithms available in literature (random forests, convolutional neural network, simpler regressions, etc.)
Can you define more explicitely i, j, N, xi, xj, X ?
I understand from Fig.1 and Eq. 2 that when temperature is below the -1°C threshold and precipitation is zero, then dSWE is always equal to 0. Is that correct ? How often does this assumption fail in the training or evaluation dataset ? Does this imply an intrinsic limitation of GEMS for transferability on steep slopes where the surface energy balance can be positive even at negative temperatures ? (I think it does.)
Section 2.2
The authors say they « fine-tuned the hyperparameters so that the model produces similar levels of accuracy when applied to observations from the same stations for 2019 and 2020. » I understand the general idea but the detailed procedure is not accurately described. Can you describe the detailed protocol for this « fine-tuning » ?
As solid precipitation measurements are prone to large measurement errors and is one of the main predictor of the model, I would have expected more details about precipitation gauges used in the SNOTEL network, procedures applied to account for undercatch, and if possible estimated uncertainties.
Section 3 I think « Model evaluation » would be a more appropriate title than « model validation » as a model can never be considered as fully validated.
The authors say « we excluded stations that exhibit precipitation undercatch, which we formulate as when SWE accumulated by March is greater than the accumulated precipitation during October to March. ». I would expect all stations to be affected by precipition undercatch and total SWE to be always higher than raw precipitation measurements. Do you apply a specific threshold to only eliminate major undercatch ? Or do you use precipitation timeseries that are already corrected for precipitation undercatch following WMO recommendations ? My misunderstanding is probably linked to the lack of details in Section 2.2 as previously mentioned.
Then, was this selection procedure also apply to the training dataset ? If not, why ?
Section 3.1
L193 I would suggest to start by a sentence presenting the Figure before providing its interpretation.
In Figure 4 « actual » should be replaced by « observed ». Is there a reason to present the simulations in the X axis and not in the Y axis (that would be more common for a scatter plot) ?
In Figure 5, it is not immediate to understand what is represented because the caption is not self-sufficient and the description in the text is also too vague. The definition of TAVG should be remind in the caption. Then what does represent a single point ? A station and a date ? Then, this solid fraction of precipitation does not really appear in model description, neither in Figure 1 neither in the Equations, so it is difficult to understand how this diagnostic is obtained from the provided model description. The reason for providing this Figure is also unclear as finally these outputs are not really used as a fixed temperature threshold finally replaces the values obtained by the algorithm. This needs to be clarified.
As for the other Figures, introducing quickly Figure 6 would be helpful before providing the results analysis. In the description of the results of Figure 6, detailed references to the subplots would help to follow results description.
Isn’t the maxSWE score more representative of the quality of input precipitation than of the skill of the SVR model ?
L252-254 If removing stations with incorrect measurements is understandable, removing stations with snow drift should be avoided as snow drift is not a measurement error, it’s a natural process challenging to reproduce with physical models and also maybe with machine learning models, but the general ability or unability of any model to reproduce snow conditions should account for places where snow drift happen.
L255-256 You mean that an overcalibration is obtained due to error compensation between snow drift and rain-snow transition ? Could the sentence be more clear ?
Section 3.3
Again, an introduction of Figure 8 in the text would be useful.
L266-267 It is not obvious which value of NSE should be considered as « acceptable ». Indeed, NSE is easily high when dealing with variables with a high seasonal cycle. What would be the NSE value of the daily interannual mean of observed SWE ? Is the 0.7 value at Sapporo better than such a reference score ?
L269-270 This could be moved to the Method section
L274-280 As it was already noticed with the SNOTEL dataset that local calibration of the Ts threshold leads to severe error compensations, and as the purpose of the application of the GEMS system on the ESM-SnowMIP dataset is to assess its spatial transferability beyond its training dataset, I am not really convinced of the interest to test again to recalibrate locally this threshold on each ESM-SnowMIP site. The conclusions that again this leads to overcalibration and errors compensations were rather expected, so I would suggest to remove thisanalysis.
Section 3.4
Again an introduction of Figure 9 is missing.
My feeling is that the level of discussion in this section is not as advanced as for the evaluation on ESM-SnowMIP sites. How does this skill in terms of snow cover extent compare with physical models ?
Section 4.1
L312 Reference error.
L315 Could the relatively low contribution of the heat-insolation index be possibly explained by an unsufficient variability of this predictor in the training dataset ? In mountainous areas, shadows and slope inclinations are a major factor to explain melting. But I assume that all observations correspond to flat areas, and maybe the variability of shadows in the SNOTEL network is neither representative of the variability of topographic conditions in mountains. This is important to discuss as it could limit the possibility to apply this algorithm on areas with complex topography.
Section 4.2
I am wondering how much this conclusion is affected by the choice of NSE to quantify errors. Indeed, as this score is highly influenced by the existence of a seasonal cycle, it is rather normal to get better scores with deeper snowpacks that exhibit a very strong seasonality than on sites with more intermittent snow cover. Considering other scores (for instance a Root Mean Square Error), I would not be surprised that stations with the poorest performance would be reversed. Can you comment on that topic ?
L375 The authors say that « GEMS also addresses the equifinality issue that is pertinent to hydrological and snow modelling. » but the only parameter they have introduced (Ts threshold) clearly raises a very strong equifinality resulting in possible overcalibration to compensate various possible errors including snow drift, precipitation undercatch, etc.
L388 « GEMS can, for instance, provide information for the parameterization of physics-based models, e.g. precipitation phase partitioning and its elevational dependence ». I don’t see how the results presented here suggest this conclusion and considering the strong risk of overcalibration of this Ts value (leading to clearly unrealistic values below -5°C), I am not convinced at this point that GEMS could help me to discriminate between snow and rain.
There is a section 5.1 but not any section 5.2. Maybe a subtitle for the first part of Section 5 is missing.
L393-400 The authors discuss the limitations of their approach relatively to forest areas but they seem to have intentionnally remove the 3 forest sites of the ESM-SnowMIP dataset from their evaluations. This should at least be discussed if there is a valid reason for that. But even if the model skill is lower on the 3 Canadian forest sites, I would have included these sites in the evaluations to provide concrete results to support this discussion.
L408-410 Unfortunately, blowing snow can be an important process even at large scale especially in polar regions. So large scale applications of the system may still be affected by this limitation.
The discussion do not compare the skill of this approach with the skill of physical models while similar metrics are provided at the same sites in Ménard et al., 2021, and other evaluations are also available in the literature for snow cover extent. I think this would be important to consider as well.
The discussion or final summary also lack comments about the strengths and weaknesses of their results compared to the literature cited in the introduction applying machine learning to predict snow mass.
Furthermore, the outputs of the model are currently limited to SWE while several snow-sensitive applications require more variables (e.g. surface temperature for NWP and climate modelling, snow internal properties for remote-sensing retrieval algorithms or avalanche forecasting). This limitation should also be mentioned with possibly discussions about the feasability to extend this approch to more variables.
Citation: https://doi.org/10.5194/gmd-2023-103-RC1 -
RC2: 'Comment on gmd-2023-103', Anonymous Referee #2, 21 Sep 2023
The paper addresses an important and compelling topic: the issue of choosing an adequate snow modelling scheme in the context of scarce data availability. This topic is particularly relevant for many areas of the world where instrumentation and monitoring is rather poor, yet the population depends on meltwater resources. The authors presented a machine learning-based model that requires simple and/or commonly available input data and no calibration. The model showed good performances in reproducing SWE both in the subset of stations not used for calibration and in two other remote, orographically complex and scarcely monitored stations. The model structure, training, validation and limitations are well explained and clear. The validation is extensive and considers point-wise and large-scale cases.
My suggestion is a major review. The motivations are the following. Generally, throughout the paper, I often found the literature review either insufficient or even absent. The description of the data used is scattered throughout the text, which doesn’t help clarity. Figures often lack axes ticks, labels and/or units.
The comments are the following:
--- MANUSCRIPT ---
0. General comments:
0.1 I suggest adding a comprehensive “Data” section where the authors can (a) list all the data they used, separating them in subsections for model training and validation, point-wise and large-scale; (b) roughly describe the geography/orography/data availability for the datasets they chose.
0.2 I suggest restructuring the final part of the paper with a freestanding “Model limitations” section and a “Conclusions” section encompassing and enhancing what is now in section “Summary”.
0.3 I suggest a re-reading and improvement of the English language, there are syntax/grammar errors in the text and the structure of some sentences is confusing (see comments for each section). Please check that the used tense is consistent along a section or paragraph.
0.4 Notations: throughout the text, figures and tables, please make the Celsius degree symbol consistent (°C); correct the Elevation unit from m to m a.s.l.; when a quantity is non-dimensional (i.e. NSE), please use the non-dimensional unit ([-]).
1. Introduction
I suggest rewriting the Introduction by significantly expanding the state of the art and literary research, taking into account the following comments:
- L30: Suggested citation: Beniston M. (2008), Extreme climatic events and their impacts: Examples from the swiss alps. In: Díaz HFRJ (ed) Murnane, climate extremes and society. Cambridge University Press. New York. USA. pp: 147-164.
- L31-39: This paragraph generally lacks references and examples on both kind of models; I suggest providing a small literature review.
- L37: “... research often opt for relatively simpler conceptual TI models…” references and examples are needed.
- L40-41: I find this sentence too general and poorly supported by literature (the authors only provide one example). For example, in this recent study https://doi.org/10.5194/hess-26-3447-2022 the authors showed how a PB snow-hydrological model substantially outperformed a conceptual TI model. Both models were applied on the same spatial domain (catchment Dischma), and the TI model completely missed the snowmelt-induced discharge timing (see Figure 7 d-e).
- L51-60: I find this paragraph dedicated to the state of the art preceding the authors' work too short and general. I suggest expanding this section by better detailing the findings of previous works (upon which the authors rely for their work) and the critical issues of the previous works (which the authors seek to address in this paper).
2. Model description
- The default threshold temperature value for rain/snow separation is set to -1 °C. Here, it would be necessary to justify this choice, or at least provide references, because this tuning parameter can vary a lot in snow/hydrological modelling (see for example https://doi.org/10.3390/cli9010008 for a TI model and https://doi.org/10.5194/hess-26-1063-2022 for a PB model).
- L82-85: “... and is available as a set of functions [...] respectively” If the subject is “a set of functions”, then verbs should be “calculate” and “generate”. Otherwise, the sentence as it is is unclear and I suggest rephrasing, dividing or better explaining.
- L110: “As it was noted above, the SVR model has two tunable parameters: cost and gamma…” Actually, gamma is never mentioned. The authors mention “sigma” on L99. Please clarify.
3. Model validation
- L160: Please cite https://doi.org/10.1016/0022-1694(70)90255-6
- L180: As mentioned in Comment 0.1, Mendoza and Western Pamir are not mentioned earlier in the text as data used for validation and are only introduced here.
- L199-200: Do the authors refer to Figure 4? If so, Figure 4 needs to be mentioned. See the comments about Figures.
- L202: “... the rain-to-snow transition modelled using the metadata of the 520 validation SNOTEL stations.” Do the authors mean that there are observations/data on the transition between rain and snow for all the 520 stations? And how was that used in modelling? Please clarify.
- L206: “... does not exceed 100%” do the authors mean does not reach 100%?
- L210: I suggest justifying this sentence with a plot or a better explanation. Again, if this information is contained within some metadata, this needs to be explicitly stated.
- L241: How did the authors calibrate Ts? Please clarify.
- L255-256: Can the authors verify this assumption? Shortly after, in the text, the authors write the same for the SnowMIP station SNB, so I assume it is possible?
- L292: The authors should explain the meaning of “class balance accuracy”.
4. Model sensitivity and uncertainty assessment
- L305: Is there a reference for this method? If so, I suggest adding it.
- L311: “... depending on the phase considered …” Do the authors mean “precipitation phase”? Please clarify. Also, the reference is missing.
- L316: What do the authors mean by “relative comparison”? Please clarify.
- L349: Please refer to Table 1 when addressing the different model settings.
- L355: What do the authors mean by “when outliers are controlled for”? Please clarify.
5. Summary
- L375: The concept of equifinality is only addressed at the end of the paper but it is never mentioned earlier. The most important papers on equifinality are not cited (see https://doi.org/10.1016/0022-1694(89)90101-7, https://doi.org/10.1016/0309-1708(93)90028-E, https://doi.org/10.1016/j.jhydrol.2005.07.007). If overcoming equifinality is one of the aims of the paper, this needs to be addressed in the Introduction and also in the discussion of the results. And additionally, how does the model improve equifinality? This needs to be explained and justified. The results shown in Figure 12, for example, seem contradictory to this sentence, because there the authors show that one can obtain similarly good model performances with different sets of parameters.
- L383-385: This sentence is not clear. What do the authors mean by “instrumental”?
- L385: Similarly for the equifinality, the problem of finding empirical relations and parametrizations is never addressed before in the text. If this is one of the aims of the paper, it needs to be addressed in the Introduction accompanied by proper references (as parametrizations of different kinds are already widely used in snow/hydrological modelling).
- Please consider mentioning the undercatch selection issue within the Model limitation section.
--- FIGURES ---
General comments:
- When a figure is composed by different subplots, as it is often the case in this paper, something that enhances clarity very much is naming each subplot differently, for example with letters like (a), (b)... And then, throughout the text, referring to each subplot like Figure 5a, Figure 5b etc.
- I suggest improving the figure referencing generally and throughout the whole text: often the authors describe the results referring to specific subplots of a same Figure by only mentioning the general Figure once at the beginning of the paragraph. Referring to each specific subplot before introducing each finding highlighted by the subplot increases clarity significantly.
Specific comments:
- Figure 2: Axes ticks and labels (latitude, longitude) are missing, legend is missing.
- Figure 3: Axes labels are missing.
- Figure 6: Left plots: missing adimensional symbol for NSE ([-]), missing unit for snow meltout date error (days?), missing y-axis label. Right plots: Missing axes ticks and labels (latitude, longitude).
- Figure 7: Same as above.
- Figure 8: y-axis label and units are missing.
- Figure 11: “Latitude” is spelled wrong, missing units, missing y-axis ticks and labels.
Citation: https://doi.org/10.5194/gmd-2023-103-RC2
Atabek Umirbekov et al.
Atabek Umirbekov et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
214 | 55 | 10 | 279 | 4 | 2 |
- HTML: 214
- PDF: 55
- XML: 10
- Total: 279
- BibTeX: 4
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1