the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
SnowQM 1.0: A fast R Package for bias-correcting spatial fields of snow water equivalent using quantile mapping
Johannes Aschauer
Tobias Jonas
Stefanie Gubler
Sven Kotlarski
Christoph Marty
Abstract. Snow cover plays a crucial role in regional climate systems worldwide. It is a key variable in the context of climate change because of its direct feedback to the climate system, while at the same time being very sensitive to climate change. Accurate long-term spatial data on snow cover are scarce, due to the lack of satellite data or forcing data to run land surface models back in time. This study presents an R package, SnowQM, designed to correct for the bias in long-term spatial snow water equivalent data, using more accurate data for calibrating the correction. The correction is based on the widely applied quantile mapping approach. A new method of spatial and temporal clustering of the data points is used to calculate the quantile distributions. The main functions of the package are written in C++ to achieve high performance and to allow parallel computing. In a case study over Switzerland, where a 60-year snow water equivalent climatology is produced at a resolution of 1 day and 1 km, SnowQM reduces the bias in snow water equivalent from −9 mm to −2 mm in winter and from −41 mm to −2 mm in spring. It is also significantly faster than pure R implementations. The limitations of the quantile mapping approach for snow, such as snow creation, are discussed. The proposed spatial clustering improves the correction in homogeneous terrain, which opens the way for further use with other variables.
Adrien Michel et al.
Status: open (until 12 Jul 2023)
-
RC1: 'Comment on gmd-2022-298', Michael Matiu, 27 May 2023
reply
Michel et al. supply a package to perform quantile mapping for spatiotemporal grids. It’s usage is specifically designed for snow water equivalent maps, but can in principle be applied to any variable (as claimed by the authors). Compared to existing approaches, the author’s implementation offers a flexible way to include temporal and spatial neighborhoods in the QM training step. Finally, it is designed to make the most of lower-level languages and parallelization to achieve high-throughput for large spatiotemporal data.
Altogether, the combination of temporal and spatial neighborhoods with a good flexibility and fast computational characteristics present a novel and worthy contribution to existing QM implementation. Even though, a few clarifications are needed before publication (see below).
Major points
-
Introduction could be improved: Currently it focuses a lot on what you did (which is the job of the other sections of the paper). And since one major strength (I assume?) of your package is the spatial component, you could also provide some background on spatial bias correction approaches.
-
Language: Consider changing the language throughout the manuscript. Clusters refers to distinct (non-overlapping) groups, while what you do in your approach is more a temporal (and spatial) moving window or neighborhood approach.
-
Re-usability and applicability:
-
You could consider simplifying the installation command to a one-liner devtools::install_git("https://code.wsl.ch/snow-hydrology/snowqm")
-
Since QM is often used not only for bias-correction but also downscaling (e.g., CH2018), what are your takes on this? Could your package also be used in this regard?
-
If you are interested that your package is re-used and applied in the community, you could consider updating the code documentation and vignette, and providing examples with some example data. Also posting the package to CRAN is great way to boost usability.
-
How easy would it be to apply snowQM to other variables like precipitation and temperature? You claim it is theoretically possible, but how much effort would this mean in practice? Furthermore, do you have the option to correct, e.g., wet-day frequency?
-
Since the spatial neighborhood approach does not seem to provide substantial benefits, could it be instead used to speed up computation? For example, by correcting similar pixels simultaneously? That is, in a true clustering sense, where clusters of similar pixels are trained and corrected together.
-
-
As repeated throughout the manuscript, a main strength of your implementation is its speed. However, what I don’t understand is why the C++ variant is 12-27 times faster in the correction phase, where 80% of the time is spent on harddisk operations (as per your profiling)? While in the main CPU process of training the benefits of C++ are more in the order of 2.
-
Related: You could consider being more neutral when discussing computational benefits of pure R (or python) versus C++, because the impression arises that C++ is the only way to achieve high throughput. There is more to it: easy parallelization of standard R (e.g., with the foreach package), RAM and harddisk bottlenecks. And finally, there is the trade-off between between higher- and lower-level languages in how easy it is to understand and modify code. Nevertheless, it is true that re-writing slow components in C++ can significantly improve computations, which is quite standard practice for R programmers which deal with computational bottlenecks, btw.
Minor points:
-
L3: “Accurate…” seems a bit overstated, or not clearly articulated. What do you mean by accurate? I guess this statement depends strongly on spatial and temporal aspects. Please consider rephrasing.
-
L9: reduces bias with respect to what?
-
L11: did you mean heterogeneous instead of homogeneous?
-
L39: not quite accurate. Many functions in base R are coded in C++, so if these functions are used, then R not slower than C++.
-
L89: Maybe mention that besides using the ECDF as approximation to F, one could also use parametric variants (which Gudmundsson et al. 2012 assessed), even though ECDFs are usually preferred (cf. reviews by Maraun and others).
-
L146: I think this is a limitation if the correction is performed day-by-day. If performed at longer time slices, one could also use the same probabilistic approach used for precipitation. Which conserves frequency (snow vs snow-free period) and precip amounts (total SWE). For instance, with reshuffling the time series after the QM step?
-
L154: same as above. One could consider also the aim of QM to transfer the seasonal properties (length, total accumulation), then “creation” or “removal” of snow seems less an issue.
-
L194: Not clear what your intent is with this spatial metric?
-
L213: what do you mean by homogenized?
-
Fig5: It would make more sense to split the figure into two sets of metrics: pos only, like all MAE and the FP and FN counts; and then the ME metrics which can be both pos and neg. In this way you could make better use of the y-scale.
-
L249: Unclear how you determined the ranks. Also ranking in general discards all information on magnitude of differences. Wouldn’t it make more sense to look also at absolute or relative gains in performance?
-
L280: What is with the other regions?
-
L307: you could still do relative errors for all values above a treshold (meanSWE or elev), if you think it’s necessary
-
Fig 10 is difficult to read. Please consider reducing the number of years (or increasing the width of each subplot timeseries, or both, whatever works good), so as to better the see the different lines.
-
L330: I could imagine the issues at low elevation might also be related to your choice of a restricted time period, which might not capture enough variability to derive ECDFs properly.
-
L352: how was the 57 derived? Comparing 1 core R to 8 core C++? Is that fair?
-
L379f: Not sure about this statement. The strength of QM (compared to simple adjustments like mean and variance) is that it corrects the whole distribution, which includes extremes. The paper you cited concerns more trends in extremes using century long climate simulations.
-
L382: What about splitting the evaluation by elevation?
Citation: https://doi.org/10.5194/gmd-2022-298-RC1 -
Adrien Michel et al.
Data sets
SnowQM data Adrien Michel, Johannes Aschauer, Tobias Jonas, Stefanie Gubler, Sven Kotlarski, and Christoph Marty https://zenodo.org/record/7886773
Model code and software
SnowQM source code Adrien Michel https://zenodo.org/record/7886675
Adrien Michel et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
195 | 51 | 6 | 252 | 4 | 1 |
- HTML: 195
- PDF: 51
- XML: 6
- Total: 252
- BibTeX: 4
- EndNote: 1
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1