Climate Services Toolbox (CSTools) v4.0: from climate forecasts to climate forecast information

Pérez-Zanón, Núria; Caron, Louis-Philippe; Terzago, Silvia; Van Schaeybroeck, Bert; Lledó, Llorenç; Manubens, Nicolau; Roulin, Emmanuel; Alvarez-Castro, M. Carmen; Batté, Lauriane; Bretonnière, Pierre-Antoine; Corti, Susana; Delgado-Torres, Carlos; Domínguez, Marta; Fabiano, Federico; Giuntoli, Ignazio; von Hardenberg, Jost; Sánchez-García, Eroteida; Torralba, Verónica; Verfaillie, Deborah

doi:https://doi.org/10.5194/gmd-15-6115-2022

Articles | Volume 15, issue 15

https://doi.org/10.5194/gmd-15-6115-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-15-6115-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 15, issue 15

Model description paper

|

04 Aug 2022

Model description paper |

| 04 Aug 2022

Climate Services Toolbox (CSTools) v4.0: from climate forecasts to climate forecast information

Núria Pérez-Zanón, Louis-Philippe Caron, Silvia Terzago, Bert Van Schaeybroeck, Llorenç Lledó, Nicolau Manubens, Emmanuel Roulin, M. Carmen Alvarez-Castro, Lauriane Batté, Pierre-Antoine Bretonnière, Susana Corti, Carlos Delgado-Torres, Marta Domínguez, Federico Fabiano, Ignazio Giuntoli, Jost von Hardenberg, Eroteida Sánchez-García, Verónica Torralba, and Deborah Verfaillie

Download

Final revised paper (published on 04 Aug 2022)
Supplement to the final revised paper
Preprint (discussion started on 06 Dec 2021)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2021-368', Anonymous Referee #1, 04 Jan 2022
Overview. This paper presents a toolbox facilitating the integration of climate data in sectoral applications. The manuscript is well structured with a clear general introduction, an overview of the toolbox, and some user cases. The R package presented is a valuable contribution. All post-processing steps are integrated in the framework and the processes are compatible with other R libraries. The nested structure of functions, allowing users to adapt specific blocks, and the exhaustive integration of potential operations allowing to process climate data to produce relevant climate forecast information, make this toolbox a valuable contribution in the field of climate services. The resources are open access (Zenodo, Git Hub, Gitlab) and relatively easy to use thanks to the associated R documentation. In addition, the vignettes and use cases made available by the authors are well commented/illustrated and can be launched and (mostly) run. The case studies presented in Section 3 are relevant and well-illustrated by the introduction of two models requiring such post-processing (SNOWPACK and SCHEME). The visualization (Figures 3, 4, 6, 7, 9, A1, B1) of climate data and potential output is of high quality which is also a real contribution for R users.

CSTools functions can be complex, and the handling of the package is not trivial with limited knowledge. In general, R users expect to (i) install and load a package, (ii) run an example notebook, (iii) retrieve an output, in a reasonably limited time. This manuscript should better give the users a clear idea of the available methods and guide them through main difficulties. For example, the authors acknowledge that climate data retrieval/loading and formatting (of the NetCDF files prior the import in R or python environments) is often a blocking point for some users: “This can be a labour-intensive step when trying to combine multiple datasets such as observations and forecasts from multiple systems” (l.68). However, the current manuscript simply refers to external notebooks without further comments (l.196-203). In general, there are file paths, in both this paper and vignettes (GitHub), pointing where data is stored (apparently locally) with no reference or link to retrieve the original input files. Following the link Earth Sciences / CDS Seasonal Downloader · GitLab (bsc.es), leads the reader to python functions such as “download_seasonal_cds_monthly.py”, which raises the question of the consistency of the tool in terms of programming language (R, CDO, python). From the point of view of a climate risk modeler (in finance or insurance for example), this manuscript is not self-contained enough to allow a quick grasp of the R functions while having a precise idea of the underlying methods.

Some minor adjustments could help broaden the target audience of the paper. For example, make sure each specific acronym or term is introduced. This toolbox aims at facilitating the integration of climate data in “sectoral applications”, yet the manuscript is hardly accessible to potential interested parties, who would not necessarily have the knowledge of all the techniques recently developed in the field of climate services. Although it is acknowledged that the paper aims at experts (“applied climate scientists or climate services developers” l.59), in practice, specialists who already handle climate data frequently have certainly developed their own routines and procedures to perform most of the stated operations, so extending the target audience to non-experts could truly add value to this manuscript. Indeed, especially in the context marked by an increasing concern about climate change, this toolbox would gain from being more understandable by energy system planners (l.530) but also mathematicians, risk modelers, insurers, economists, agricultural engineer, etc. Currently, although the manuscript includes ample references, authors often use specific acronyms or terminology (in particular in 2.2) without proper and non-technical introduction/definitions that would allow the target audience to be broadened (e.g. “Best Estimate Index” l.301, “ignorance score” l.250, “SEAS5” l.430, etc.).

The overview (section 2) should add value to R package and documentation with further abstraction and description of the underlying processes. For example, the underlying procedures are not described mathematically in the current version of the manuscript. We could expect this paper to better describe and focus on the “processes” (mathematical specification, parameters, hypothesis), especially as the functions and attributes are already described in the R documentation and vignettes.

Regarding the presentation, the authors too often use bullet points (listing). This manuscript sometimes looks more like a complementary user guide made of “lists” than a model description paper. This is particularly true for section 2. This problem of structure affects the substance because even if each of the functions is described in an understandable way, a linear reading of the manuscript makes it difficult for the reader to retain the main mechanisms and methodological choices the package embeds. The structure of the use cases (section 3) should be streamlined to facilitate the reading, e.g. (i) application (why?), (ii) data required/ input and at what resolution / frequency, (iii) process required from source to model, (iv) code guide, (v) output and final data visualization and (vi) interpretation. Some sentences/paragraphs which refer to documentation or with links, could be removed or placed in footnotes. In addition, there are some (not always working) links in the text while we would rather have the information in the document, whereas there are code boxes with path to NetCDF files without the link of website to retrieve the data.

Contributions. If one of the contributions is the “gathering” of existing functions in a harmonized toolbox, it is hard to say if some of the processes are original or not in the current version.

Paper hardly self-contained. In general, the paper requires to know or check references and nothing can be done from scratch based on the description given in this manuscript only.

Conclusion and recommendations: The paper does not present a model but a toolbox to introduce climate data in several applications. This toolbox fills a clearly identified gap and could help researchers addressing relevant scientific questions within the scope of EGU. This paper proposes no substantial advance I could identify, but from an operational standpoint, the proposed package is within the scope of GMD and the amount and quality of supplementary material is significant. However, in the current manuscript I would recommend the authors to provide more information about (1) the sectoral applications and highlight climate relevant information beyond the three user cases, (2) the modeling structure of underlying functions to help users understand the methods and assumptions, i.e. (2.a) the input, (2.b) the mathematical formulae, (3) the definition of abbreviations, acronyms and technical terms. On the presentation side, (3) avoid excessive use of lists, (4) avoid extensive use of links and (5) streamline the case studies.

Specific comments and typing errors:

l-39. “stakeholders”: I would appreciate a series of examples for sectoral applications introduced in the beginning of the paper (agriculture, tourism, consumer discretionary stock planning, climate risk for insurer/ infrastructure, energy (wind but also solar/ thermic etc.).

l.41. “tailored climate information”: The transmission channels from climate data to climate relevant information could be slightly more detailed in this section.

l.50 To address these needs

l.57. CSTools targets primarily

l.100: R based

l.105-110: first sentence in the end or footnote (from a detailed description …)

l.191. “automatically interpolates all the data onto a common grid”: What is the advantage of CST_load (turning ncdf into s2dv_cube), vs. traditional ncdf4 package loading netcdf object directly? In general, key advantages of the package vs others could be better exposed in the paper rather than in the data description vignettes (“Some benefits of using this function are”). In addition, instead of CDO, would it be possible to use internal R functions such as rasterize (package raster)?

l-196: These sentences: "Although datasets can be retrieved from OPeNDAP URLs with NetCDF files, in general, the datasets have to be downloaded onto a local repository and formatted to comply with the CST_Load requirements. Observational reference datasets are stored in a folder in separate monthly NetCDF files (other formats are also possible; see https://earth.bsc.es/gitlab/es/s2dverification/-/blob/master/vignettes/data_retrieval.md for more information), while seasonal 200 forecasts are stored by start date in distinct folders (see https://cran.rproject.org/web/packages/CSTools/vignettes/Data_Considerations.html). A python code to download and format the seasonal forecast datasets from the CDS is provided in the repository CDS Seasonal Downloader (https://earth.bsc.es/gitlab/es/cds-seasonal-downloader).”: should be clarified in the paper.

l.237. “k-mean”: how k is determined? optimal? parametrized?

l.259: Why five methods? can all downscaling methods be used regardless of the climate variable considered? For instance, if a method is developed for surface (10m) wind (e.g. TORRALBA, 2017), can it be applied to humidity, sea-level pressure? If not, the authors could list the best suited input for each method.

l.266: Precise applications for each pattern for analog downscaling. What are the main differences, what should we use in which situation? Or is it recommended to use all three and minimize error?

l.277 and 285: maybe recall the minimal mathematical expression of the effect of orography on downscaling (Terzago et al. 2018)?

l.291.“CST_AnalogsPredictors function downscales precipitation or maximum/minimum temperature low resolution forecast output data, in a domain centred over Iberian Peninsula”. The function “Analogs Predictors” works in Spain only?

l.292: in a domain centered over Iberian Peninsula

l. 309: better explain (1) calibration methods (evmos, mse_min, crps_min, rpc-based), (2) on what variables / conditions should the choice of the method be based?

Visalisation: Maybe insert some “visualization” (i.e. output of the functions described for each of them so we know what it does, even if it’s in the next section)?

l.399: “Oops, ha ocurrido un error 404 La página a la que intentas acceder al parecer no existe o ha sido eliminada de nuestro sitio web”

l.445. Code box. please add a link where to find the data file referred to in the link to improve reproducibility

Figure 3: isn’t the density shape giving a somewhat misguiding idea of the underlying distribution (“smoother” than it is)? Apart from that this figure is very nice.

SNOWPACK inputs: first introduced line 550, while the inputs of the models are introduced line 714 (consider restructuring)

l.652-663: very important issue in climate data manipulation: the size of the data. I think a full subsection could be dedicated to this topic in the section 2, and then simply referred to in the case study section where we want to focus on the application side (and not the technical issue).
Citation: https://doi.org/10.5194/gmd-2021-368-RC1
- AC1: 'Reply on RC1', Núria Pérez-Zanón, 01 Apr 2022
  
  Dear Referee,
  Thanks for your detailed review of our manuscript, we really appreciate the precious time it takes. We are glad to read the overview you shared which emphasizes the relevance of our contribution and highlights the complexity of our purpose.
  In the attached document, we provide a detailed answer to each of your comments.
  Kind regards,
  Núria Pérez-Zanón
  On behalf of all manuscript authors
  
  Citation: https://doi.org/10.5194/gmd-2021-368-AC1
RC2:
'Comment on gmd-2021-368', Anonymous Referee #2, 10 Jan 2022

The paper of Pérez-Zanón et al. presents a flexible toolbox, supporting stakeholders in multiple sectors to correctly manage climate forecasts from seasonal to multi-annual scales. The developed toolbox provides several functions that allow relatively straightforward extraction of useful and concise information from large datasets (I particularly liked Fig. 4) that, indeed, can be handled with difficulty even by experts. The range of functions is large enough to allow quite elaborated reanalyses. I have tested successfully the examples available on the CRAN repository, which were relatively easy to run.

In summary, I believe this tool, even though not provide any new modelling option (but this was not the aim of the authors), is a valuable contribution and has good potential to impact several sectoral applications. Nevertheless, I believe some more details must be provided to make it really accessible to the wider public (i.e., even stakeholders not particularly expert in forecasting issues) and, in some cases, even experts. In particular, I refer to the data retrieval and formatting section, which could be very “labour-intensive” as the same authors state. All the examples provided use either link to static paths (in the paper) or already pre-processed input data (vignettes). I suggest the authors go more into detail on that and provide at least one example starting from raw data.

Still concerning input data, another common feature of the examples offered is that they seem to rely only on global/large scale gridded datasets. In my experience, I’ve learned that such datasets often don’t fit adequately ground observations for specific regions. If the monitoring network (e.g. rain gauges) is dense enough, it can be used in turn to prepare one’s own high-resolution (let’s say) dataset. It’s not clear to me if/how such datasets can be included, for example for correction or validation purposes.

Another comment concerns the structure of the three use cases provided. I suggest describing them more homogeneously and streamlining them. The third use case is a bit sacrificed, in my opinion.

Finally, I suggest organizing better (in a more straightforward way) the connection between functions developed and corresponding literature references, to support the user in going into details with the theoretical aspects behind them. Maybe, some synoptic tables (even as an appendix), in addition to existing text, could help.

Below I provide some specific comments (and highlight some typos). I recommend careful re-reading of the manuscript. I hope my review helps improve the overall quality of the manuscript and makes more accessible the interesting toolbox developed.

L 66: as illustrated in Fig. 1

L100: R-based

L104: please check this sentence

L130: maybe “each function”?

L191: to automatically interpolate

L193: lotlan_data for temperature? Please check

L197: downloaded into (or simply “in”)

L244: “The amount of categories can be changed and are taken as…” please check this sentence. To which subject is the verb “are” referred? To the categories?

L291: not clear: is this function available only for the Iberian Peninsula? Will it be available for other areas in the future?

L301: not clear: here, too, is this function available only for NAO?

L375: A comparison … IS also possible

L386: three example case studies

L399: the link does not work. However, I would prefer some more technical link than that to a newspaper

L401: I guess IP stands for Iberian Peninsula. But his term is used only some words before, so please check the sentence and rephrase

L453: by?

L503: “only one member”: it’s better to tell how many members make up the ensemble

L509: please explain what “ensemble dressing” means.

L545: I would write “agriculture and industry, while meltwater shortage …”

L597: “the result is” (better) or “the results are”

Figure 6a: I guess this map shows one of the 25 possible precipitation fields for 11 December 1993 given by the SEAS5 ensemble

L719 (and elsewhere): please check throughout the text if there are shifts using tenses (from the present to the past and vice versa)

LL719-720: not clear if these operations were made through CSTools (please refer also to main comments)

L723: “the SNOWPACK model is run for each of the 21 seasonal forecasts over the hindcast period 1996-2016”. Only here the objective of the use case is clearly stated. I suggest declaring it at the beginning of the section.

L741: again, for what period? State clearly the objectives of the exercise at the beginning of the section.

L794: at the end of this section, I realize that the fact that the SCHEME hydrological model is used is not so relevant, after all. The case study could be generalized to any (semi-distributed or even distributed) hydrological model requiring precipitation and temperature forecasts.

L804: “(see e.g. Fig. 4)” I would remove this test in brackets.

L813: also, agricultural issues are involved (drought, irrigation needs, water resources management, etc.)

L815: what about the other features? I think this sentence underestimates other aspects of the tool. Please explain in more detail.

Please note: Appendixes A and B are not referred to in the main text. They should be and contextualized.

Citation: https://doi.org/10.5194/gmd-2021-368-RC2
- AC2: 'Reply on RC2', Núria Pérez-Zanón, 01 Apr 2022
  
  Dear Referee,
  Thanks for your comments on our manuscript. We really appreciate them and also that you took the time to run the examples available on CRAN. We are glad to read that you consider it a valuable contribution.
  As deserved, in the attached document, we provide a detailed answer to each of your comments.
  Kind regards,
  Núria Pérez-Zanón
  On behalf of all manuscript authors
  
  Citation: https://doi.org/10.5194/gmd-2021-368-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Núria Pérez-Zanón on behalf of the Authors (08 Apr 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (16 Apr 2022) by Jinkyu Hong

RR by Anonymous Referee #2 (30 Apr 2022)

RR by Matteo De Felice (12 May 2022)

Suggestions for revision or reasons for rejection

This paper describes an impressive software package in terms of features. Given the complexity and the number of features provided by the package, the paper results a bit long and hard to follow. In general, I would suggest the authors to shorten the paper possibly:
- Reducing section 2.1, especially the first half
- Keeping only one use case and moving the other two to the supplementary

Here, some additional comments:
- The authors mention EUROSIP in the first paragraph but it has been taken over by the C3S Multi-model system (see https://www.ecmwf.int/en/about/media-centre/news/2019/c3s-multi-model-seasonal-forecasting-system-takes-over-eurosip). I would suggest mention this one.
- After the list of the categories at the beginning of Section 1.2, I would specify which ones are covered by CSTools
- The authors write "CSTools, on the other hand, targets scientists interested in providing a climate product to some final users" but this sentence is unclear, what are exactly the differences between CSTools and other tools that make it more suitable for "final users"?
- The authors state that CSTools is compatible with other packages, this point should be better explained.
- In Line 192 some 'common guidelines' are mentioned: does it mean that they published specifications/requirements to follow to implement - for example - another input data format or another post-processing method?
- What is exactly the repository of CSTools? The repository at https://earth.bsc.es/gitlab/external/cstools says that is the repository of the MEDSCOPE project.
- I would add a paragraph in section 2.1 giving some more details on the possibility to do lazy and distribute calculations using startR, that is a very important topic for climate scientists
- The fourth column in the Table 1 is a bit confusing, what is its meaning? For 's2dverification' I assume that its functions are directly called into CSTools code, but what about the rows with empty values ('-')? What's the difference between '-' and 'adaptation to CSTools'?
- Are the specifications of s2dv_cube described? Would it be possible for someone to create a function generating s2dv_cube objects that can be used straightly into CSTools?
- I would suggest removing the 'single but powerful' at the beginning of Line 280, I understand the enthusiasm but it seems a bit an exaggeration.
- Does the function CST_MultiEOFs deal with ensembles (i.e. working directly with the members without using the ensemble mean)? How?

In general, I think the paper is relevant and well-done but, however, before publication I think it should be simplified to improve the readability, especially for non-climate scientists.

Hide

ED: Publish subject to minor revisions (review by editor) (15 May 2022) by Jinkyu Hong

AR by Núria Pérez-Zanón on behalf of the Authors (24 May 2022) Author's response Author's tracked changes Manuscript

ED: Publish subject to minor revisions (review by editor) (29 May 2022) by Jinkyu Hong

AR by Núria Pérez-Zanón on behalf of the Authors (01 Jun 2022) Author's response Author's tracked changes Manuscript

ED: Publish as is (19 Jun 2022) by Jinkyu Hong

AR by Núria Pérez-Zanón on behalf of the Authors (29 Jun 2022) Manuscript

Short summary

CSTools (short for Climate Service Tools) is an R package that contains process-based methods for climate forecast calibration, bias correction, statistical and stochastic downscaling, optimal forecast combination, and multivariate verification, as well as basic and advanced tools to obtain tailored products. In addition to describing the structure and methods in the package, we also present three use cases to illustrate the seasonal climate forecast post-processing for specific purposes.