the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
SnowQM 1.0: a fast R package for bias-correcting spatial fields of snow water equivalent using quantile mapping
Adrien Michel
Johannes Aschauer
Tobias Jonas
Stefanie Gubler
Sven Kotlarski
Christoph Marty
Download
- Final revised paper (published on 19 Dec 2024)
- Preprint (discussion started on 17 May 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on gmd-2022-298', Michael Matiu, 27 May 2023
Michel et al. supply a package to perform quantile mapping for spatiotemporal grids. It’s usage is specifically designed for snow water equivalent maps, but can in principle be applied to any variable (as claimed by the authors). Compared to existing approaches, the author’s implementation offers a flexible way to include temporal and spatial neighborhoods in the QM training step. Finally, it is designed to make the most of lower-level languages and parallelization to achieve high-throughput for large spatiotemporal data.
Altogether, the combination of temporal and spatial neighborhoods with a good flexibility and fast computational characteristics present a novel and worthy contribution to existing QM implementation. Even though, a few clarifications are needed before publication (see below).
Major points
-
Introduction could be improved: Currently it focuses a lot on what you did (which is the job of the other sections of the paper). And since one major strength (I assume?) of your package is the spatial component, you could also provide some background on spatial bias correction approaches.
-
Language: Consider changing the language throughout the manuscript. Clusters refers to distinct (non-overlapping) groups, while what you do in your approach is more a temporal (and spatial) moving window or neighborhood approach.
-
Re-usability and applicability:
-
You could consider simplifying the installation command to a one-liner devtools::install_git("https://code.wsl.ch/snow-hydrology/snowqm")
-
Since QM is often used not only for bias-correction but also downscaling (e.g., CH2018), what are your takes on this? Could your package also be used in this regard?
-
If you are interested that your package is re-used and applied in the community, you could consider updating the code documentation and vignette, and providing examples with some example data. Also posting the package to CRAN is great way to boost usability.
-
How easy would it be to apply snowQM to other variables like precipitation and temperature? You claim it is theoretically possible, but how much effort would this mean in practice? Furthermore, do you have the option to correct, e.g., wet-day frequency?
-
Since the spatial neighborhood approach does not seem to provide substantial benefits, could it be instead used to speed up computation? For example, by correcting similar pixels simultaneously? That is, in a true clustering sense, where clusters of similar pixels are trained and corrected together.
-
-
As repeated throughout the manuscript, a main strength of your implementation is its speed. However, what I don’t understand is why the C++ variant is 12-27 times faster in the correction phase, where 80% of the time is spent on harddisk operations (as per your profiling)? While in the main CPU process of training the benefits of C++ are more in the order of 2.
-
Related: You could consider being more neutral when discussing computational benefits of pure R (or python) versus C++, because the impression arises that C++ is the only way to achieve high throughput. There is more to it: easy parallelization of standard R (e.g., with the foreach package), RAM and harddisk bottlenecks. And finally, there is the trade-off between between higher- and lower-level languages in how easy it is to understand and modify code. Nevertheless, it is true that re-writing slow components in C++ can significantly improve computations, which is quite standard practice for R programmers which deal with computational bottlenecks, btw.
Minor points:
-
L3: “Accurate…” seems a bit overstated, or not clearly articulated. What do you mean by accurate? I guess this statement depends strongly on spatial and temporal aspects. Please consider rephrasing.
-
L9: reduces bias with respect to what?
-
L11: did you mean heterogeneous instead of homogeneous?
-
L39: not quite accurate. Many functions in base R are coded in C++, so if these functions are used, then R not slower than C++.
-
L89: Maybe mention that besides using the ECDF as approximation to F, one could also use parametric variants (which Gudmundsson et al. 2012 assessed), even though ECDFs are usually preferred (cf. reviews by Maraun and others).
-
L146: I think this is a limitation if the correction is performed day-by-day. If performed at longer time slices, one could also use the same probabilistic approach used for precipitation. Which conserves frequency (snow vs snow-free period) and precip amounts (total SWE). For instance, with reshuffling the time series after the QM step?
-
L154: same as above. One could consider also the aim of QM to transfer the seasonal properties (length, total accumulation), then “creation” or “removal” of snow seems less an issue.
-
L194: Not clear what your intent is with this spatial metric?
-
L213: what do you mean by homogenized?
-
Fig5: It would make more sense to split the figure into two sets of metrics: pos only, like all MAE and the FP and FN counts; and then the ME metrics which can be both pos and neg. In this way you could make better use of the y-scale.
-
L249: Unclear how you determined the ranks. Also ranking in general discards all information on magnitude of differences. Wouldn’t it make more sense to look also at absolute or relative gains in performance?
-
L280: What is with the other regions?
-
L307: you could still do relative errors for all values above a treshold (meanSWE or elev), if you think it’s necessary
-
Fig 10 is difficult to read. Please consider reducing the number of years (or increasing the width of each subplot timeseries, or both, whatever works good), so as to better the see the different lines.
-
L330: I could imagine the issues at low elevation might also be related to your choice of a restricted time period, which might not capture enough variability to derive ECDFs properly.
-
L352: how was the 57 derived? Comparing 1 core R to 8 core C++? Is that fair?
-
L379f: Not sure about this statement. The strength of QM (compared to simple adjustments like mean and variance) is that it corrects the whole distribution, which includes extremes. The paper you cited concerns more trends in extremes using century long climate simulations.
-
L382: What about splitting the evaluation by elevation?
Citation: https://doi.org/10.5194/gmd-2022-298-RC1 -
-
RC2: 'Comment on gmd-2022-298', Marianne Cowherd, 22 Jul 2023
The authors present an R package for computing quantile mapping based bias correction of snow water equivalent using Switzerland as a test case. The package includes C++ implementation of routines and OpenMP parallelization for speed within the R package. The package also includes built-in evaluation functions. QM is an important bias correction method with many possible applications and this package is a positive contribution to the accessibility of QM with both spatial and temporal windows used for the clustering.
Major comments:
The phrase “snow cover” is sometimes used when you may mean snow water equivalent. For example L37: “tailored towards snow cover” when it appears to be tailored to snow water equivalent.
There is a long discussion of speed. However it would be good to more clearly separate the speedup achieved from parallelization vs. from the C++ implementation, especially as the user base will likely have varying access to large numbers of cores when running the code.
A longer discussion of the role of spatial clustering would better explain some of the results. It seems that the proposed best-ranked parameterizations have small clusters and therefore weak use of the spatial aspect. Can the clusters be overlapping with clusters from other pixels? It seems that they can, since each pixel is computed independently, which is not really a common way to use the word “cluster”. If you do use the spatial term for clustering and do not repeat a computation for each pixel within the cluster and they are nonoverlapping, this could be a future way to improve speed (at the risk of reduced accuracy).
Some of the discussion focuses on the areas with low or zero snow, however many of those areas are not particularly of interest for snow maps because there is no snow there. While areas in and around the snow line would be of interest, there is a risk that never-snow regions could inflate accuracy statistics because they will always perfectly match at 0. Showing elevation-stratified performance is helpful for this, but it would be good to be clear in some of the error metrics about showing only snow areas. Would it be possible to include historical snow values as an additional clustering criterion in a future version?
A discussion of potential expansion to other quantities other than snow and to other regions would be a welcome addition.
Minor comments:
L8: parallel computing is possible in any language; the choice of C++ for speed is parallel to the choice to implement in parallel
L11: if homogeneous then this is a limitation for snow water equivalent, especially in mountainous terrain (including Switzerland). Possibly heterogeneous?
L26-48: the detailed description of the attributes and benefits of SnowQM are better suited for a discussion or conclusion. Especially L35-14 would be better at the end of the paper. Consider replacing with more detailed literature review, for example a longer introduction to previous QM snow efforts (e.g., Jorg-Hess et al, 2014 as cited in this section already) would be important to discuss. Consider Matiu and Hanzer 2022.
L50: acronym SWE can be introduced at first use in the main body of the text
L50-55: suggest re-phrasing this paragraph to decrease the number of parentheticals. As written, it is somewhat difficult to follow
L65: does the package require daily data? It should require the inputs and outputs to be at the same temporal resolution
L66: the reference of “they” is ambiguous
L75: lowercase n in netCDF
L94: more explanation of this
L109-11: split into two sentences to avoid long parenthetical: “Temporal clustering is straightforward: given a time window parameter wt, the CDF for DOYi is calculated using all SWE values for the pixel of interest in the time interval i±wt of each year. However, spatial clustering is more difficult.”
L199: cite the R core team at first mention of the R programming language, in the introduction
L209-210: “laptops, workstations, and HPC implementations” is vague – the goal here is to demonstrate testing on multiple operating systems (previous phrase) and architectures, and specify (briefly) architecture information. Name the HPC system and state the number of cores/nodes used for the laptop, workstation, and HPC. “Installation and usage were easy” is subjective, remove.
L213-216: sentence is long, split into 2-3 sentences for clarity. “for entire Switzerland” – “for the entire area of Switzerland”
L226-235: how are the snow height measurements used to improve the SWE time series? Is it a reanalysis or direct injection to a model?
L235: do you have a strict definition for how to identify snow towers in the model?
L225: “really small” is vague
Figure 5: Add letter labels to make referencing more clear. For the spatial cluster parameter, is there a physical meaning to a value of 0? The listed values start at 1 in the table
L263: if “significant” does not mean statistically significant, find another word to describe the impact of the difference
L280: quantify this difference. This seems to contradict the previous finding that the best parameterization had a cluster size of 1 pixel. If that is not the case, clarify the relationship between the sigma P parameter and the spatial factor here
Figure 6 caption: typo “worth” -> “worse”
L303-310: reference specific panels of Figure 9 in the text to make the references more clear, in addition to panels a and b
Figure 9f: the pixel count of aspect drops to 0 at the far right – is this a clipping issue with the circular nature of aspect? I.e., 360 = 0? If so, recommend to remove the last value to avoid having aphysical representation of the aspect distribution in this plot
Figure 9: consider distinguishing the different lines more to make the figure more color vision deficiency and grayscale printing friendly, for example with dashed or dotted lines
Figure 9: should be “number of pixels”
Figure 9: the terminology “training” when applied to datasets is confusing since that terminology is also applied to the training period within the framework.
Figure 10: there are many years represented here and with several regions and three lines per region, it is difficult to gain meaningful insight from this way of showing the data. Instead, consider a single DOY xaxis with transparent lines or a shaded region for each year and a solid line showing the different averages, or a scatter plot with training data on the x axis and model and corrected data in different shapes on the x axis with a 1-1 line, or some other setup
L350-360: it is not entirely clear how the speedup factors in the text are related to the times shown in the table. For example, the 57-time speedup appears to compare 1-core R to 8-core C++. The 11x speedup in the correction phase from 1-core R to 1-core C++ is still impressive. It would be better to include data from all timings tested, for example showing a scaling plot with cores on the x axis and time on the y axis for the R and C++ implementation , and a second scaling plot with number of cores on the x axis and time on the y axis for both implementations.
L365: what is the time taken for disk access?
L370: the results do seem to be sensitive to the parameters, and the authors make suggestions for optimal parameters in the paper
373: “remove” is an overstatement. Reduce.
370-375: SnowQM as stated is used for bias correction, not error reduction. This should be consistent throughout the text. In particular, the authors state earlier in the paper and in the subsequent paragraph that the process does not remove the random error. Make sure this is consistent.
403: reference to parallel computing is vague, specify the openMP approach
415: false positives?
L435 and abstract: abstract should not present repetitive results as the body of the text
Citation: https://doi.org/10.5194/gmd-2022-298-RC2 -
AC1: 'Comment on gmd-2022-298', Adrien Michel, 05 Dec 2023
Dear Reviewers and Editor,
Dear Marianne Cowherd, dear Michael Matiu, dear Steven Phipps,
We appreciate your insightful and constructive comments on our manuscript. They have allowed us to present a much improved version of our manuscript and package. We would also like to apologise for the time it has taken us to respond. Please find below a detailed response to each of your comments and concerns, along with the changes we have made to the manuscript.
The reviewers' original comments are in black and our responses are in blue. Note that we have taken the opportunity of the manuscript revision to update some relevant literature and provide a bit more context about SnowQM. These additions are listed in our response provided as supplement.
Please note that the page and line numbers in this document refer to the updated version of the manuscript without track changes.
Yours sincerely,
Adrien Michel, on behalf of the writing team.