the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The computational and energy cost of simulation and storage for climate science: lessons from CMIP6
Stella V. Paronuzzi Ticco
Gladys Utrera
Joachim Biercamp
Pierre-Antoine Bretonniere
Reinhard Budich
Miguel Castrillo
Arnaud Caubel
Francisco Doblas-Reyes
Italo Epicoco
Uwe Fladrich
Sylvie Joussaume
Alok Kumar Gupta
Bryan Lawrence
Philippe Le Sager
Grenville Lister
Marie-Pierre Moine
Jean-Christophe Rioual
Sophie Valcke
Niki Zadeh
Venkatramani Balaji
Download
- Final revised paper (published on 19 Apr 2024)
- Preprint (discussion started on 23 Oct 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on gmd-2023-188', Anonymous Referee #1, 17 Nov 2023
Nov. 14, 2023
Review for Geoscientific Model Development
“The computational and energy cost of simulation and storage for climate science: lessons from CMIP6”
By Acosta et al.
Recommend: accept subject to revision
This is an interesting and perhaps unique accounting of a subset of the models that participated in CMIP6 with regards to model characteristics related to computing and carbon footprint details. My major comment is that the authors need to be clear that the models described in the paper are, in fact, a subset of the total number of models that participated in CMIP6. This subset is apparently the group of models that participated in the IS-ENES3 project. This needs to be made clear in the abstract and elsewhere.
Detailed comments:
List of authors: One of the coauthor’s names is in the wrong order and misspelled: “Joussame Sylvie” should be “Sylvie Joussaume”
Line 4: Here is an example of where the authors need to clarify the scope of the paper. I’d suggest the wording be changed to,
“This paper shows the main results obtained from the collection of performance metrics from 30 models that participated in the IS-ENES3 and represent a subset of the total of 124 CMIP6 models. The document provides…”
HOWEVER, it’s unclear exactly how many models are actually involved. I got the number 30 from line 42, but there are 32 models listed in Table 2, and 33 listed in Table 3.
Line 41-42: Once again, the authors need to be clear how their models relate to the larger CMIP6 set of models. I recommend the following wording: “In this paper, we present in Sec. 2 the collection of CPMIP metrics from the 30 [or 32, or 33] models that participated in the IS-NES3 project (Joussaume, 2010), out of the total of 124 CMIP6 models, and were used to simulate almost 500,000 years…”
Table 2 caption: Pursuant to the comments above, the authors need to be clear how the models they list in this table relate to the total number of CMIP6 models. This would avoid a scientist reading this paper and looking at this table and not seeing their model and wondering why they aren’t in the table. I’d recommend the following wording: “List of institutions and the models that provided metrics from their CMIP6 models to IS-NES3, which represents a subset of the total of 124 models in CMIP6. Also listed are HPC platform, and resolution used for the ATM and OCN components.” Now, proceed to comment below for a suggestion on the rest of the caption.
Table 2: In Table 1 the authors define “resol” as number of gridpoints, a singularly unhelpful metric when comparing models. Fortunately, here in Table 2 they relate that metric to the more conventional lat-lon resolution. However, I think in the table caption they should note something like, “Note “resol” in Table 1 is number of gridpoints. Here we define “ATM resol” as the horizonal resolution of the atmospheric model component in degrees of latitude and longitude, and “OCN resol” as the horizontal resolution of the ocean model component in degrees of latitude and longitude”.
Line 97: Following the comment above, please add clarification here, something like, “…the categorization of low, medium and high resolutions in terms of latitude-longitude grid spacing. Thus, for the grouping…”
Line 99: And again here, I recommend clarifying the wording as follows: “…as low resolution with roughly 1 degree latitude-longitude grid spacing and up to …”
Line 115: Please define “complexity”. Does this mean number of components (e.g. atmosphere, ocean, sea ice, land ice, biogeochemistry, ocean ecosystem, cloud-aerosol interaction, etc.), number of parameterizations per component, or what? This needs to be defined up front in order to make sense of the subsequent discussion.
Citation: https://doi.org/10.5194/gmd-2023-188-RC1 - AC1: 'Reply on RC1', Sergi Palomas, 06 Feb 2024
-
RC2: 'Comment on gmd-2023-188', Anonymous Referee #2, 05 Jan 2024
## General comments
Acosta et al. present the results from the Computation Performance for Model Intercomparison Project (CPMIP). I believe this is the first time CPMIP has been run (but the relationship between CPMIP and other exercises isn't fully clear to me, see comments below). They demonstrate that there is a large range in computing requirements across modelling centres. They also include some nice analysis of what drives these differences in computing performance and the challenges associated with collecting their data. I think the paper provides some very interesting results that could be very helpful for the community. I also congratulate the authors on pulling together a paper like this (the research software engineer in our group has explicitly told us that they would never lead the writing of a paper so this effort to work in manuscript-land rather than the daily work of computing-land must be congratulated). However, the paper's key point is currently quite unclear to me and I think some areas are under-explored while others are given more time/space than is needed. I elaborate on my major and minor concerns below and recommend that the paper undergoes major revisions. As I said though, I congratule the authors on the effort they have put in to capture and share this information and hope that they have the energy to revise the paper and re-submit as it would be great to see it published in a revised form.
Before continuining, it should also be noted that I am a not an HPC expert. I do a lot of analysis with ESM output but do not run them myself. As a result, it is also possible that most of my concerns arise simply because I am not the intended audience and, as such, I don't have enough background to meaningfully engage with the manuscript.
## Major concerns
### Point of the paper
At present, the paper presents a collection of results. However, its key conclusion was not immediately clear to me.
For example, it isn't clear to me whether the computing peformance of climate modelling centres is world leading, or whether there is much better computing performance seen in other fields and climate science is struggling to capture it for whatever reason (e.g. lack of funding for software/hardware/people, lack of expertise, lack of time for performance tuning). Put another way, are there lots of easy wins out there or are we already at the limit of computing power and improving from here will be incredibly difficult? Or does the data not allow us to have any insights into this question?
This is particularly true in the quantitative sense: there are lots of numbers in the paper, but I have no idea whether those numbers are a demonstration of excellence or a demonstration that a lot of performance is being left beind for whatever reason (although I must say that I have come away from the paper with a much better sense of what qualitatively drives computing performance and the trade-offs). I think this would be greatly helped if the authors were to include representative benchmarks next to each of their metrics (where possible, acknowledging that many of them are quite specialised to climate science hence comparisons from outside the field may not be possible). That would help know e.g. whether a memory bloat of 100 is reasonable in comparison to other computing programs or whether this much bloat is extraordinarily high.
I recommend the authors think about the 1, 2 or 3 key points they want to get across in the paper, add those key conclusions to the abstract and then ensure that those key points come out clearly throughout the text. (Even if the answer is, the metric collection was so difficult and so variable across groups that we really can't make any strong conclusions about performance, that is still an answer that would be good to understand. Such an answer would then make clear that the key conclusion is that we need to get much better at metric collection before we can really identify where the next performance gains can come from.)
### Which project is being discussed
It was quite unclear to me which project exactly is being presented here. The authors introduce the idea of CPMIP. However, IS-ENE3 is also introduced and it wasn't clear to me what the relationship between the two, if any, is. Similarly, the authors mention that this presents results for European groups (page 16, line 253). Are there are also results for non-European groups or are these not included because such groups aren't part of IS-ENE3?
Related other questions on this topic:
- Was IS-ENE3 responsible for ES-doc too? How does IS-ENE3 relate to the wider ESGF/CMIP effort?
- Is IS-ENE3 responsible for CMIP6 model result publishing? Or is that the remit of wider ESGF/CMIP efforts (or is IS-ENE3 the team that actually does the publishing within the wider ESGF/CMIP banner)?
- page 3, line 58: "As the reader can see, not all institutions provided the full collection of CPMIPs." Does this mean, "As the reader can see, not all institutions provided the full collection of CPMIP metrics." An institution can't provide CPMIP, it can only participate in CPMIP no?### Uncertainty in measurements
Computational benchmarking is notoriously difficult (results vary based on a whole bunch of factors which can be extremely difficult to control for). At the moment, there is no indication of the uncertainty on these measurements at all. I know that doing this with high precision is probably impossible. However, even a rough sense of the order of magnitude of the uncertainty would be extremely helpful. For example, do you think that the modelling centre's reported results come with an uncertainty of e.g. +/- 1%, 10%, 100%, more? Even this rough indication would help the reader understand what they're looking at and how certain we are in measurement of different quantities (e.g. I suspect we are much more certain about resolution than we are about coupling cost) and how obvious the difference between modelling centres really is.
## Minor concerns
### Presentation
The writing is generally pretty good, but there are definitely some rough patches which I think could be easily improved with another close read from the authors. I think the figures could be generally made clearer and more appealing as their key point doesn't really jump out at present (even using a package like seaborn in Python would provide an instant boost at almost zero time cost). I would also say that I found the use of extensive abbreviations extremely distracting and that the abbreviations made comprehending the manuscript significantly harder. I know that the authors are probably used to abbreviating things in code, but there aren't character restrictions in the manuscript format so I think just spelling things out is often the better option as it makes life for the reader much simpler (they can just read the text, they don't have to remember what 15 different abbreviations mean). If the abbreviations are to be kept, I think they need to be repeated (at minimum, referred to) in table captions so the reader has them available to them while reading the tables (or at least knows where to look to decode the abbrevations).
### Carbon footprint discussion
Considering the climate cost of running these climate models is a good thing to do. However, I do think the current discussion is a bit one-sided for two reasons. Firstly, the carbon comparisons presented aren't that helpful in my opinion (driving a car non-stop for a year is hard to imagine). I think much more helpful comparisons would be to, e.g., the carbon associated with all the humans involved in CMIP. For example, how much carbon was required for the various meetings associated with CMIP over the years (or, perhaps a better example, the carbon associated with travel to the latest UNFCCC COP) and how does the computing carbon compare to this (my rough back of the envelope suggested the carbon from air travel for CMIP is probably similar order of magnitude, if not an order of magnitude more, than the carbon required for the computing, for COP, I have no idea but I would guess it is significanly more). (For this comparison, I think it's also worth noting that zero-emissions electricity is a well-understood technology, whereas zero-emissions travel is still very nascent.) (A different, perhaps more amusing and directly comparable, comparison point would be to compare the computational cost of CMIP's computing with the computational cost of running e.g. YouTube, Google, NetFlix or Twitch.) Secondly, only presenting the costs without considering the benefits at all is one-sided. The benefits of CMIP are huge and shouldn't be ignored in any such conversation (particularly when the carbon cost of the computing is, in the scheme of things, relatively small both in absolute terms but also perhaps as a fraction of the wider CMIP effort).
### Journal scope
This paper is much more about computing performance than model development. I know that finding journals for exactly this topic is tricky, but my feeling reading this was that this paper may be better suited to a dedicated computing journal rather than a journal on Physics (obviously this is ultimately up to the editors though).
## Other questions/comments
- Introduction: I think this could be re-formulated to focus more clearly on computing performance. There is some time spent on explaining the value of climate modelling more generally but I don't think that's really a point you have to prove for this paper so those sentences could be condensed/cut. I think this could make the start of the paper a bit more punchy and help to address the issue's related to making the paper's point clear (it's a paper about computing performance, not climate modelling importance). This may happen naturally as part of addressing the major concerns of course.
- page 2, line 26: Could you provide more explanation about why software evolves slower than hardware? This seems quite important/interesting as part of the bigger picture, yet it currently isn't explored at all (e.g. if there are computing gains that aren't being realised simply because too few research software engineers are employed, rather than because of any true technical barriers, then that is a powerful, insightful conclusion from this exercise and you authors are probably the best placed in the world to make such comments as you have done so much work in this space with multiple teams over the last years i.e. even though this evidence is only qualitative, it is still extremely powerful and the best collection we have right now)
- page 2, line 30: The comment, "bigger ensemble sizes to minimize the model’s inherent uncertainty", seems too vague to me. Bigger ensemble sizes only lower some kinds of uncertainty, not all model uncertainty. I would suggest either making this line more specific or deleting it (I don't think you need it, the point that better computing performance will allow us to do more science and that is a good thing stands on its own pretty well)
- Section 3.7: please put this memory bloat in context i.e. explicitly address whether this memory bloat is surprisingly high or in line with what computer programs normally use. A factor of 100 between memory used and ideal memory seems high at first glance, but maybe this is just how modern computers are given how many processes need to run in order to produce output (or I misunderstand the metric).
- It wasn't clear to me how exactly Cpl C is measured. If that could be clarified (or the fact that it can't currently be measured easily) that would be great. I think some of that happens in Section 4 so maybe this just needs to be foreshadowed when Cpl C is first introduced to help the reader understand why it is so vaguely defined compared to the other metrics.
- mixing of floating point and decimals throughout (both in text and in the tables) is distracting. Please pick one or the other and use it consistently throughout.
## Technical corrections
page 1, line 5: Would be helpful to define CPMIP more clearly. When I first read this I thought it was just a typo of CMIP
page 2, line 14: Define HPC at first use (this will also give you a chance to clarify exactly what the term HPC means, is it just high-performance computing in general or does it have a more specific, technical meaning)
page 2, line 20: 'have participated' --> 'participated'
page 2, line 37: "set of 12 performance metrics," --> "set of 12 performance metrics that define the".
The current phrasing seemed a bit odd to me.
Does this better capture what you mean? (Balaji defined CPMIP, here you now follow up?)page 2, line 39: "they" --> "the performance metrics" (the text is too far from what you're referring too to use 'they' in my opinion, 'they' could also refer to ESMs which is what 'they' previously referred to)
page 2, line 40: "Tab." --> "Table". The abbreviation Tab. is quite unusual and was very distracting to me at least so I would just use the full word (it's not worth saving one character).
Same comment for all uses of Tab. throughout the paper.page 2, line 41: "Sec." --> "Section". As above, I would just spell the word out to avoid distracting your reader with this unusual/unnecessary abbreviation.
Same comment for all use of Sec. throughout.page 3, Table 1: 'cost in core-hours' --> 'cost, measured in core hours'
page 3, line 53: Who provided the support?
page 4, Table 2 caption: "Institution" --> "institution" (no need for capital here). Also what are units for atmosphere and ocean resolution?
page 4, Table 2: make all numbers floating point or all decmials. Mixing 1/4 and 0.5 in the same column reads really weirdly.
page 5, Table 3 caption: "Institution" --> "institution" (no need for capital here)
page 5, Table 3: units for each column? What is 'Useful SY' (not in Table 1)?
page 5, Table 3: Adding a line at the bottom with the mean/median across all groups and the standard deviation would help to more quickly understand where each centre sits relative to its peers. For non-experts, if possible (and maybe it's not), it would also be very helpful to have some representative number so we can tell where performance is already good and where there seems to be clear areas for improvement. (This comment can be made across all tables that present results from multiple groups)
page 5, Table 4: What are these metrics? What are their units? How do these compare to other machines around the world (like are climate centres using best in class machines or are there even more powerful ones around that are being used for other purposes, some help for the reader to understand the broader context would be great)?
page 6, line 74: "Tab. 10" --> "Table 11" (are you using latex references? I'm surprised that a non-existent table can be referenced (I'm also not sure how the Table numbers manage to skip 10 i.e. there is Table 9 and Table 11 but not Table 10)) (same comment for all references to Table 10 in the manuscript)
page 9, line 127: 'should require' --> 'requires'
page 9, line 133: What is a PE? I don't think this has been defined anywhere?
page 9, line 193: 'to use' --> 'the use'
page 11, line 157: 'metirc' --> 'metric'
page 11, line 161: 'that the' --> 'than the'
page 11, line 175: 'account up' --> 'account for up'
page 11, line 187: "Vegetation" --> "vegetation" (this random capitlisation occurs in quite a few places, please check)
Figure 5: I think this would be much better as a scatter plot. Also, do you have an equivalent figure for medium-high resolution models?
Table 11: What is a 'dwarf' in this context?
Table 11: "for automize" --> "to automate"
Citation: https://doi.org/10.5194/gmd-2023-188-RC2 - AC2: 'Reply on RC2', Sergi Palomas, 06 Feb 2024