|Review of “Current status on the need for improved accessibility to climate change models code” by Juan A. Añel and colleagues. |
This is my second review of the manuscript by Añel et al. As already mentioned in my original review I find the topic of the manuscript to be highly relevant for the scientific community in general and for the climate model community in particular. Personally I still think some of the rather strong statements about the need to make code publicly available for everyone under any circumstances could be framed differently and be better discussed as suggested in my original review but this is up to the authors. Apart from that I found several statements which still need to be clarified as detailed below.
Title: I maintain that CMIP5 models are not “specifically intended” to investigate climate change but also many other aspects of the climate system such as feedbacks, uncertainties, etc even in the absence of climate change (e.g., using the piControl runs). But if the authors are convinced that climate change models is more accurate in the title this is fine.
Page 1, Line 4 & p2, l20: “Climate Model Intercomparison Project (CMIP5)” CMIP5 actually stands for “Coupled Model Intercomparison Project” https://esgf-node.llnl.gov/projects/cmip5/
p1, l13-15: For readers (and reviewers) who are not experts in the field of code reproducibility it would be helpful to have a short introduction what CSR means here (and in general) and what makes it so complex. I assume it is not enough to just put code in some repository? And also where does it end? E.g., climate model output is (to my understanding) not bit-by-bit reproducible, what does that mean in this context?
P2, l23-26: I’ve raised some of these points in my first round of comments already and I’m raising them again here because I think it is crucial to be precise with such rather controversial statements. Without having done any study myself my intuition is to agree with the authors that in several (many) instances climate model code is probably not following an “ideal level of programming practice”.
BUT: This must not be generalized as the authors do it here. I’m equally convinced that there are climate models out there that can serve at best practice! Therefore I do not think the word “generally”, should be used here.
Several things the authors should be precise about:
- not all climate models are in CMIP (think of a simple 1D energy balance model for educational purposes – this might be perfectly coded, documented and licensed) I assume the authors do not refer to such models here even though they use the generic term “climate model”?
- “ideal level of programming practice” is fairly abstract. The authors introduced the CSR earlier, why not continue to use it (here an in other instances)? For example, if a model is not published how would the authors know if it follows coding and documentation standards? Or is code publication part of the programming practice so that any unpublished model automatically does not follow such practice?
- “García-Rodríguez et al. (2020) show how programmers have tended to perform very poorly in this regard in particular, and the incidence of comments throughout the code of CMIP5 models is very low.” I’m sorry but I’m not able to find any paper under this citation merely a software tool. Am I missing something or is this what the authors want me to look at? In any case I argue it is impossible to state that “the incidence of comments throughout the code of CMIP5 models is very low” as the code is not available for all CMIP5 models (based on the results of this very study!).
P3, l9: Maybe the authors can already here link table 1 as it also lists all models involved?
P4, l3-6: I believe the contact information field is available in all NetCDF files for all models as it was required by CMIP5 not only in five models. I would therefore argue that this might even be the preferred place to look at.
This is a minor point but if the reason for publishing code is scientific reproducibility, it seems not unreasonable to me to require knowledge of the NetCDF standard from someone who is trying to run a climate model (which arguably requires way more expertise than opening a binary file).
Do I understand correctly that the five models mentioned were indeed NOT contacted in the end?
P4, l14: “This analysis is relevant.” Delete this sentence?
P3, l25-, Table 1, and Appendix A
I’ve mentioned this in my last review already and the references to the different mails are still unclear to me. Here is an attempted summary:
- Mail 1 & 2: “anonymous” requests; given in A1
- Mail 3: request explaining the research; this seems to be missing as A2 which is labelled as “Third mail” seems to be something else
- Mail 4: this is the survey send if access was denied (but presumably not if there was no answer?); I assume this is the “Third email” in A2? Even though the text in A2 is oddly specific and seems to be taken from a longer exchange??
In addition table 1 still lists only Mail 1 & 2, it is unclear to which mails that refers.