<span style="" class="text typewriter">RavenR</span> v2.1.4: an open-source R package to support flexible hydrologic modelling

Chlumsky, Robert; Craig, James R.; Lin, Simon G. M.; Grass, Sarah; Scantlebury, Leland; Brown, Genevieve; Arabzadeh, Rezgar

doi:https://doi.org/10.5194/gmd-15-7017-2022

Articles | Volume 15, issue 18

https://doi.org/10.5194/gmd-15-7017-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-15-7017-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 15, issue 18

Development and technical paper

|

16 Sep 2022

Development and technical paper |

| 16 Sep 2022

RavenR v2.1.4: an open-source R package to support flexible hydrologic modelling

Robert Chlumsky, James R. Craig, Simon G. M. Lin, Sarah Grass, Leland Scantlebury, Genevieve Brown, and Rezgar Arabzadeh

Download

Final revised paper (published on 16 Sep 2022)
Preprint (discussion started on 19 Nov 2021)

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2021-336', Paul C. Astagneau, 29 Dec 2021
This manuscript presents a new R package which aims at helping modellers in their use of the Raven hydrologic framework. Most of the package features consist in functions for data wrangling to feed Raven and functions for simulation analyses. Rationales behind the implementation of RavenR are presented. Examples of the RavenR functionalities are introduced using a formerly built perceptual model of the Liard river basin.

Several authors have advocated for the use of flexible structures for systematic testing of multiple working hypotheses in hydrological modelling. The use of such structures inherently results in higher complexity for modellers hence a challenge for reproducibility of methods and results. I think that any attempt at improving the use of these flexible structures is therefore relevant to the community of hydrological modellers. Furthermore, an extensive documentation is introduced to use the RavenR package, lots of interesting functionalities ranging from data preparation to simulation analysis are implemented and feedbacks between users and developers are encouraged to maintain and improve the package.

However, to be able to thoroughly evaluate the added value of using RavenR, I would have needed some experience with the Raven hydrologic framework. As it is not objectively possible in the time required to write a review, the following comments can only be seen as a way to improve the readability of the paper for non-Raven users and broaden the possible reach to the hydrological community.

General comments:

Two similar flexible hydrological frameworks need to be cited in this work (either in the introduction or in Sect. 2): DECIPHeR (Coxon et al., 2019) and SuperflexPy (Dal Molin et al., 2020). A short description of the main differences between Raven/RavenR and these frameworks might further demonstrate the added value of using RavenR.

To improve understanding by new users of Raven (or even new hydrological modellers), I suggest adding a short description of the main choices that were made in the Raven hydrological framework and RavenR in terms of programming languages. The Raven hydrologic framework is coded in a compiled programming language, probably for computational speed and flexibility purposes. To improve its usability, the RavenR package was created. However, some hydrological models are coded in a compiled programming language and interfaced by R using packages (e.g. hydromad; Andrews and Guillaume, 2018). Why is the Raven workflow (in terms of programming languages) more suited for flexible modelling?

Section 3 is probably the most important section of this paper if we want to use the RavenR package and the Raven hydrologic framework. The steps of the hydrological workflow are presented in Table 1 and the related R code and model files are provided to understand the functionalities of RavenR. However, I found some parts of this section a bit difficult to understand, especially since in the provided R script, the model run command line appears before input file processing.

The authors state line 195 that step 4 and 5 will not be presented but it is not clear why. They are important steps of the hydrological workflow especially when performing uncertainty analyses. An explanation of why this is not relevant given the objectives of the paper is needed.

Although it is probably relevant to introduce the notion of locked or protected HRUs in Sect. 3.2.4, hydrological modellers with less experience with Raven might need a simpler use case of model discretization first. If the authors want to keep this section as it is, I suggest adding a simpler example in the future vignettes of the package.

Sect 3.3 may be too long and its purpose not very clear since the evaluation of what the authors call “model realism” does not lead to questioning the hypotheses behind the Liard basin model. I think this section should be limited to a presentation of the possible analyses of model simulation enabled by RavenR. Possible cuts: l 376 to l 381; l 383 to “Overall” l 386; from “A similar check” l 396 to l 402; from “The model” l 407 to “bias in estimation” l 408; from “The hydrograph” l 430 to “peak” l 433; from “The plot” l 446 to l 448; from “The results” l 452 to l 453; from “The plot shows” l 460 to “measurements” l 464.

Overall, I think that the R script provided to understand Sect. 3 could become a vignette but for a very simple use case that would include parameter estimation procedures and questioning of modelling hypotheses. Building a simple model from data preparation to output analysis using a catchment from the Camel dataset (Addor et al., 2017) would allow very different modellers to use the Raven hydrologic framework.

Minor comments:

I think that lines 60 to 70 could be moved just after line 44 for better links between the paragraphs of the introduction.

Please add the references of Python, R and C++.

Line 128, “3) running raven” should be moved before “2) reading output files”.

Line 349/350: please remove “providing…for the right reasons? (Kirchner; Euser et al., 2013)”, as it is not the place to provide insights into a scientific question that was not presented in the introduction.

Please define “model realism” and “reality checks” in Sect. 3.3.1, as they are vague concepts, especially when no other data than streamflow are available for model validation.

Line 365: I do not think that the term “observed baseflow” can be used to refer to the results of baseflow separation techniques that rely only on streamflow time series.

Lines 414 to 418 should not appear in Sect 3.3.

Line 449: “Figure C” should be “Figure B”.

Line 550: “Figure D” should be “Figure C”.

Technical comments:

I noticed a few typos. As I am a non-native English speaker, the following comments might not be relevant.

L1: “advances…have enhanced” instead of “has enhanced”.

References such as “(e.g. GR4J (Perrin et al., 2003))” should appear as “(e.g. GR4J; Perrin et al., 2003)”. The latex command for this is: \citep[e.g. GR4J;][]{citationkey}.

Line 312: “The development…requires” instead of “require”.

Comments specific to the R package documentation:

From my understanding, the pipe operator is not mandatory to run the Raven package and is only used here for better readability. However, some R users are not familiar with the dplyr syntax. Although this is mentioned in the title of Figure 2 of the article, I would recommend adding this information in the package documentation (if not done already, I might have missed it).

For some functions (e.g. rvn_annual_peak), the units of the related arguments are mentioned in the detail section. It is always easier for users to find the required unit beside the related argument. I would suggest doing so in future versions of the package.

I noticed that for some functions, time series must be provided at a daily time step. I thought that the Raven hydrologic framework could run at multiple time steps. Again, I might have missed the explanation at some point. If not, I would suggest adding a warning somewhere to use the time step required by RavenR/Raven.

References

Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017.

Andrews, F. T. and Guillaume, J. H.: hydromad: Hydrological Model Assessment and Development, available at: http://hydromad.catchment.org/ R package version 0.9-26, 2018

Coxon, G., Freer, J., Lane, R., Dunne, T., Knoben, W. J. M., Howden, N. J. K., Quinn, N., Wagener, T., and Woods, R.: DECIPHeR v1: Dynamic fluxEs and ConnectIvity for Predictions of HydRology, Geosci. Model Dev., 12, 2285–2306, https://doi.org/10.5194/gmd-12-2285-2019, 2019.

Dal Molin, M., Fenicia, F., and Kavetski, D.: SuperflexPy: the flexible language of hydrological modelling, https://superflexpy.readthedocs.io/en/latest/index.html version 1.3.1, 2021
Citation: https://doi.org/10.5194/gmd-2021-336-RC1
- AC1: 'Reply on RC1', Robert Chlumsky, 14 Mar 2022
  
  Thank you for the comments. Please see attached responses.
  
  Citation: https://doi.org/10.5194/gmd-2021-336-AC1
RC2:
'Comment on gmd-2021-336', Anonymous Referee #2, 02 Mar 2022
This manuscript provides a description of a set of R functions to process in- and output files and the contained data, for the purpose of running the hydrological modelling software Raven, which itself is available as a C++ executable.

The manuscript is generally well-presented and well written, and I have very few specific comments. However, I make the following more general observations:

- I am actually not sure whether the manuscript fits any of the designated manuscript type. I suppose that it is classified as a development paper because of the ample references to reproducibility in the manuscript. However, the original model is available as an open source code as well and therefore perfectly reproducible (at least in the sense that it is described on the about GMD web page). So at most, it is an enhancement of the usability of a specific model rather than its reproducibility.

- The other reviewer has made some very useful comments on the presentation, with which I fully agree. Overall, I think that the manuscript is too wordy and can be reduced substantially. Specifically, the authors seem to be at pains to convince the reader about the importance of open source software, accessibility, and good practices in model development. I don't think that the GMD readership needs such advocacy. It distracts from the core message and makes the manuscript unnecessarily long and somewhat tedious to read. (For example, the section L.137 - 146 is quite trivial and may be deleted entirely, but also many other sections can be streamlined).

- The technical implementation of the package is quite straightforward, and does not make optimal use of advanced functionality of R. Specifically:

The fact that the model needs to be run separately is not very elegant. It would be ideal if the Raven model itself is distributed with the package as a dynamic library, and can be loaded as such by the R process. This would avoid the need for separate installation of the model, as well as the slightly clunky way that the executable is called by the rvn_run() function. It would also help with the next point.

The fact that the scripts writes the input files to disk, which are then subsequently read by the executable (and vice versa for the output files) is inelegant at least, and probably also inefficient as well. If the model itself were implemented as a dynamic library then the in- and output data could be passed in memory to the model, which would greatly enhance performance in use cases such as monte carlo simulation.

The package makes relative limited use of the object oriented nature of R. It does use relevant classes such as xts and lubridate, but does not define any classes itself. This results in a very long list of functions, essentially one function for every step in the analysis. It would be much more elegant (and efficient) to define a set of classes (e.g., one for each in- and output file, by extending classes such as xts) and then use method dispatch to read and write them, as well as any other standard processing such as aggregation. This would reduce the need for a long list of different functions to a few read() and write() commands, and allow for method dispatch on existing xts functions.

Lastly, while the examples in the manuscript are generally easily reproducible, some of the examples in the online documentation are not, for example because they include idiosyncratic path statements. i strongly recommend the authors to read through R guidelines such as the ones below, and cross-check that all the code adheres to these good practices:

https://www.tidyverse.org/blog/2017/12/workflow-vs-script/

https://www.carlboettiger.info/2013/06/13/what-I-look-for-in-software-papers.html

To conclude, I believe that this is certainly a useful piece of software, however for me the manuscript reads too much like a manual instead of a scientific paper, even of the type that GMD aims at. I think that there is scope for streamlining, and ideally going a bit beyond simply presenting a wrapper, towards exploring how even something as simple as a wrapper can incorporate state-of-the-art software design concepts. This does not mean that the software needs to be entirely implemented according to the recommendations above. But some attempt, or at least a discussion as to why this may be scentifically non-trivial, would lift the scientific value of the manuscript in my opinion.
Citation: https://doi.org/10.5194/gmd-2021-336-RC2
- AC2: 'Reply on RC2', Robert Chlumsky, 14 Mar 2022
  
  Thank you for the comments. Please see attached responses.
  
  Citation: https://doi.org/10.5194/gmd-2021-336-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Robert Chlumsky on behalf of the Authors (07 Apr 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (19 Apr 2022) by Wolfgang Kurtz

RR by Paul C. Astagneau (26 Apr 2022)

RR by Anonymous Referee #2 (10 Jun 2022)

RR by Anonymous Referee #3 (06 Jul 2022)

Suggestions for revision or reasons for rejection

## General remarks

The manuscript provides a detailed overview about the RavenR model setup and evaluation package. The authors document and demonstrate the application of this tool within the RavenR framework for the example of a specific river basin.

First of all, congrats to the authors for putting together such an extensive tool to facilitate the setup, use and evaluation of hydrological models generated within the Raven framework. I think this paper is in a good state for publication. While the tool itself is rather specific to one modeling framework, the publication of such a tool is a good blueprint for other modeling groups to develop and improve similar scripts on their own and, thus, earns its place in this journal. I only have some minor comments that the authors may use to improve the manuscript, but the publication should not be conditional to that:

- it might be sensible to more clearly define the intended user group of the tool right from the start. The abstract and introduction mention the potential application of RavenR for hydrological models outside the Raven framework. While this is technically true at least for such evaluation packages that only require a single time series, these evaluations are usually also found in different evaluation packages or are probably already included in the workflows of other models. Here, RavenR does not appear to be worth the effort to write output conversions scripts for other models. However, RivenR really shines as a comprehensive support software for the Raven framework. As a hydrological modeler who neither uses R nor Raven or any of the hydrological models mentioned in the manuscript, the paper is still an interesting and inspiring lecture. However, I don't feel I can profit from this tool at all with an reasonable effort. In order to manage expectations, I would mention any general application possibility only in the conclusion and otherwise target the Raven user group more directly.
- section 3.2.2 appears a bit too optimistic to me. True, as long as a tool like weathercane is available RivenR can utilize its standardized interface and data format. However, as soon as user target river basins in different regions, such tools will either be not available at all or use different data formats which will require a considerable effort from user side to adapt it for working with RavenR. Such limitations should be mentioned clearly.
- Even after the revision, the manuscript feels quite long in parts of the introduction. While I very much sympathize with the authors call for transparent and open(-source) science, this statement feels a bit out of place in a journal like this, as I would assume almost all readers already share this view. One the other hand, it cannot hurt to emphasize it once again.
- just having the technical opportunity to setup 8 x 10^12 model configuration doesn't actually seem to be a step forward as the vast majority of combinations are most likely not sensible ones. Thus, it seem to be very important to promote a tool like RavenR to guide users through the model setup.

## Technical comments

- Fig 2 & 3: why are the referenced sections in bold font?
- why is example code included and labeled as a figure? Wouldn't it be more straight forward to implement it as code blocks?
- Fig 5B: which actual variables are sim and obs? I assume all three of the others are forcing variables? Just to know that both curve are (probably) the same quantity without information about what they are, doesn't help with model evaluation.

Hide

ED: Reconsider after major revisions (28 Jul 2022) by Wolfgang Kurtz

AR by Robert Chlumsky on behalf of the Authors (29 Jul 2022) Author's response Author's tracked changes Manuscript

ED: Publish as is (10 Aug 2022) by Wolfgang Kurtz

AR by Robert Chlumsky on behalf of the Authors (11 Aug 2022)

Short summary

We introduce the open-source RavenR package, which has been built to support the use of the hydrologic modelling framework Raven. The R package contains many functions that may be useful in each step of the model-building process, including preparing model input files, running the model, and analyzing the outputs. We present six reproducible use cases of the RavenR package for the Liard River basin in Canada to demonstrate how it may be deployed.