Cloud-based framework for inter-comparing submesoscale permitting realistic ocean models
- 1Université Grenoble Alpes, CNRS, IRD, Grenoble-INP, Institut des Géosciences de l’Environnement, France
- 2Lamont-Doherty Earth Observatory, Columbia University in the City of New York, USA
- 32i2c.org, USA
- 4Ocean Next, Grenoble, France
- 5Datlas, Grenoble, France
- 6Center for Ocean-Atmospheric Prediction Studies, Florida State University, USA
- 7Univ. Brest, CNRS, Ifremer, IRD, Laboratoire d’Océanographie Physique et Spatiale (LOPS), IUEM, 29280, Plouzané, France
- 8Institut Universitaire de France (IUF), Paris, France
- 9Alfred Wegener Institute (AWI), Helmholtz Center for Polar and Marine Research, Germany
- 10Jet Propulsion Laboratory, National Aeronautics and Space Administration (NASA), USA
- 11Mercator Ocean International, France
- 12Department of Earth and Environmental Sciences, University of Michigan, USA
- 13Oceanography Division, US Naval Research Laboratory, USA
- 14First Institute of Oceanography, and Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources, Qingdao, China
- 15GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel, Germany
- 16Kiel University, Kiel, Germany
- 17Department of Earth, Environmental, and Planetary Sciences, Brown University, USA
- 18Department of Earth, Ocean and Atmospheric Science, Florida State University, USA
- 1Université Grenoble Alpes, CNRS, IRD, Grenoble-INP, Institut des Géosciences de l’Environnement, France
- 2Lamont-Doherty Earth Observatory, Columbia University in the City of New York, USA
- 32i2c.org, USA
- 4Ocean Next, Grenoble, France
- 5Datlas, Grenoble, France
- 6Center for Ocean-Atmospheric Prediction Studies, Florida State University, USA
- 7Univ. Brest, CNRS, Ifremer, IRD, Laboratoire d’Océanographie Physique et Spatiale (LOPS), IUEM, 29280, Plouzané, France
- 8Institut Universitaire de France (IUF), Paris, France
- 9Alfred Wegener Institute (AWI), Helmholtz Center for Polar and Marine Research, Germany
- 10Jet Propulsion Laboratory, National Aeronautics and Space Administration (NASA), USA
- 11Mercator Ocean International, France
- 12Department of Earth and Environmental Sciences, University of Michigan, USA
- 13Oceanography Division, US Naval Research Laboratory, USA
- 14First Institute of Oceanography, and Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources, Qingdao, China
- 15GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel, Germany
- 16Kiel University, Kiel, Germany
- 17Department of Earth, Environmental, and Planetary Sciences, Brown University, USA
- 18Department of Earth, Ocean and Atmospheric Science, Florida State University, USA
Abstract. With the increase in computational power, ocean models with kilometer-scale resolution have emerged over the last decade. These models have been used for quantifying the energetic exchanges between spatial scales, informing the design of eddy parametrizations and preparing observing networks. The increase in resolution, however, has drastically increased the size of model outputs, making it difficult to transfer and analyze the data. Nonetheless, it is of primary importance to assess more systematically the realism of these models. Here, we showcase a cloud-based analysis framework proposed by the Pangeo Project that aims to tackle such distribution and analysis challenges. We analyze the output of eight submesoscale-permitting simulations, all on the cloud, for a crossover region of the upcoming Surface Water and Ocean Topography (SWOT) altimeter mission near the Gulf Stream separation. The models used in this study are run with the NEMO, CROCO, MITgcm, HYCOM, FESOM and FIO-COM code bases. The cloud-based analysis framework: i) minimizes the cost of duplicating and storing ghost copies of data, and ii) allows for seamless sharing of analysis results amongst collaborators. We describe the framework and provide example analyses (e.g., sea-surface height variability, submesoscale vertical buoyancy fluxes, and comparison to predictions from the mixed-layer instability parametrization). Basin-to-global scale, submesoscale-permitting models are still at their early stage of development; their cost and carbon footprints are also rather large. It would, therefore, benefit the community to document the different model configurations for future best practices. We also argue that an emphasis on data analysis strategies would be crucial for improving the models themselves.
Takaya Uchida et al.
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2022-27', Stephen M. Griffies, 11 Feb 2022
This is an enjoyable piece of work that documents a tremendous and exciting advance in our ability to analyze ocean models. I fully support publication and offer only minor comments.
line 63: The phrase "we more often than not do not possess" is very awkard. How about "commonly, we do not possess..."
line 112-113: I did not find "absolute dynamic topography" in Gregory et al (2019) paper. Even if ADT is the name used by AVISO, please do connect directly to the now-standard nomenclature in Gregory et al. Furthermore, note that "dynamic topography" is a deprecated term listed in Section 8 of Gregory et al, with three recommended replacements depending on the context. So again, please move to the new nomenclature to avoid confusion.
line 115: where precisely in Gregory et al (2019) are you pointing to? Again, I do not recall us defining "absolute dynamic topography" in Gregory et al, though perhaps I am missing something. And again, "dynamic topography" is not a recommended term since it has multiple meanings depending on the science community.
Figure 2: Some model grid spacing is given in km and others in degrees. In the caption, or in Table A3, it would be useful to see a common approach. Additionally, please provide the number of grid points in the domain in Table A3; i.e., the "resolution" as it is normally meant, say, for a computer screen.
line 133: "interesting". But I think it is "expected", right? If unexpected, then comment.
Figure 4: I failed to find information about the geographical location of this frequency power spectrum.
END OF REVIEW
------------------------------------------------
- AC1: 'Reply on RC1', Takaya Uchida, 03 May 2022
-
RC2: 'Comment on gmd-2022-27', Mike Bell, 14 Mar 2022
- AC2: 'Reply on RC2', Takaya Uchida, 03 May 2022
-
CC1: 'Comment on gmd-2022-27', Andy Hogg, 04 Apr 2022
This paper advocates for a cloud-based strategy to address the problems in sharing and analysing the large volumes of data that emerge from high-resolution ocean model simulations. I found the paper to be interesting and well-written, and concur with two previous reviewers that this manuscript is a worthwhile contribution to the literature. I have some minor comments, which the authors may like to take into account, listed below. I would be happy to recommend publication if these issues are addressed.
Line 8-9 - Consider deleting the sentence naming the 5 models from the abstract?
Section 2 — I found this description of the process of sharing data, and the ARCO format, to be particularly useful. But one thing I don’t understand is whether the authors are arguing the Zarr files produced here are optimal for all operations. For example, if I wanted to filter with FFTs, average in time or average in space, would the Zarr chunking remain optimal for all of these operations? Or is there a trade-off between operations?
Line 115 — Improved GS separation is a nice feature, but the global ocean is bigger than just the North Atlantic and there are many more processes revealed by resolution than WBC separation. I’m not convinced that separation is more “key” than other processes that are improved with resolution. Maybe just back away from this statement a little?
Line 122 - “this will…” is a little ambiguous.
Line 146 — my recollection is that the tides in LLC4320 had a bug in the tidal forcing which overestimates the tidal magnitude (but I apologise that I can’t put my hands on the appropriate reference). I suggest the authors check on this issue as they revise the manuscript.
Line 180 — “… the two …” - also ambiguous.
Line 191 - there is a case made about daily-averaged submesoscale fields, but it wasn’t clear (to this reader) where these daily-averaged fields were used in this paper?
Line 222 — “This presents …” ambigious …
Figure 6 — On a first read, I was amazed at the similarities between the parameterised submesoscale fluxes and measured buoyancy flux. Actually, it looked too good to be believable. But when I looked at D1 the comparison was underwhelming. I suspect the use of the spatial median in Fig 6 is unfairly favouring the comparison. I would prefer the authors to show D1 as the main figure, or perhaps show both in the main text, for a warts-and-all comparison of the parameterisation.
Section 5 — The authors makes some good points here and I agree with most of them. But I found the approach to be slightly evangelical. Fundamentally, the argument seems to be “we have found the best approach, but if the scientists/funders don’t back us then it will fail”. I agree that the approach espoused here is good, and I would like to advocate for it myself. But a more dispassionate discussion of the pros and cons would probably be an advantage here. For example, a significant disadvantage here is the risk that the Google Cloud Platform is discontinued or unavailable to researchers in some nations, for whatever reason. That is not such an outlandish proposition, but could be catastrophic for an open platform like this. There are other risks of equal access, long term funding, etc. I am just asking here for a more objective analysis of the risks here — which would be a greater service to the reader than the advocative approach.
- AC3: 'Reply on CC1', Takaya Uchida, 03 May 2022
-
RC3: 'Comment on gmd-2022-27', Joel Hirschi, 05 Apr 2022
I think this manuscript is a most interesting and timely illustration of how storage and analysis of large model datasets may evolve in the coming years. The latest generation of ocean (and also of atmosphere) models now routinely produce datasets of O(Tera- Petabytes). The storage and analysis of these datasets is a major challenge – which often results in cutting edge simulations being underexploited. Here the authors use the cloud-based framework proposed by the Pangeo project to produce an intercomparison of a set of 8 submesoscale-permitting ocean models. The manuscript shows the potential the cloud framework and provides an assessment of the mixed-layer instability (MLI) parameterisation of Fox-Kemper (2011) across a set of submesoscale-permitting models. The manuscript is clear and well-written and will be an excellent contribution to GMD. I only have a few minor points listed below that might benefit from some clarification.
Comments:
1) The main motivation of the study is to demonstrate a framework for the intercomparison and analysis of datasets of O(Tera- Petabytes). However, the region of focus around the Gulf Stream separation is actually quite small (~1000 km x 1000 km) and the size of the datasets will be Gigabytes rather than Terabytes or more. The choice to focus on the Gulf Stream separation region is well motivated as this is a region where SWAT tracks will cross. Nevertheless, I wonder if something can be said about how easily the system would scale if comparison and analysis were extended to e.g. the largest domain (North Atlantic) that all 8 models have in common or to global analyses (ie. when the amount of data indeed gets in the order of Petabytes…). Could OSN handle this amount of data? Could it be uploaded onto the cloud within a reasonable amount of time?
2) I found the MLI assessment most interesting as it adds an interesting piece of science and the agreement seen in Figures 6 and D1 is surprisingly good. However, I am not sure that the explanation given in Appendix D as to why the histogram values shown in Figure D1 are falling under the one-to-one line is correct. Isn’t this rather the consequence of taking the spatial median? If the local (i.e. for each grid cell) values are taken for Ce, there is by construction a perfect alignment of the histograms with the one-to-one lines. Any departure from the one to one lines has therefore to result from summarising the spatial variability with one value (i.e. Ce(t, x,y) --> Ce(t)). I also note here that across the models the slope for the histograms is steeper than the one-to-one lines. As before, I feel that the slope will be affected depending on which value you chose for Ce (e.g. median, mean, mode, 1st, 3rd quartile…etc). Depending on which value you pick and on the distribution of the values Ce(t,x,y), I expect that the histogram values can move above, onto, or below the one-to-one line and that the slope can increase or decrease. What do the distributions of values Ce(t,x,y) actually look like? It might be nice to see an example. This distribution may be a useful guide for deciding on the value of the efficiency coefficient Ce.
Details:
Figure 5: I suggest to label the panels with w, wm, ws and b, bm, bs.
Figures 6, D1: Use "Ce" rather than "C".
- AC4: 'Reply on RC3', Takaya Uchida, 03 May 2022
-
RC4: 'Comment on gmd-2022-27', Andy Hogg, 06 Apr 2022
This paper advocates for a cloud-based strategy to address the problems in sharing and analysing the large volumes of data that emerge from high-resolution ocean model simulations. I found the paper to be interesting and well-written, and concur with two previous reviewers that this manuscript is a worthwhile contribution to the literature. I have some minor comments, which the authors may like to take into account, listed below. I would be happy to recommend publication if these issues are addressed.
Line 8-9 - Consider deleting the sentence naming the 5 models from the abstract?
Section 2 — I found this description of the process of sharing data, and the ARCO format, to be particularly useful. But one thing I don’t understand is whether the authors are arguing the Zarr files produced here are optimal for all operations. For example, if I wanted to filter with FFTs, average in time or average in space, would the Zarr chunking remain optimal for all of these operations? Or is there a trade-off between operations?
Line 115 — Improved GS separation is a nice feature, but the global ocean is bigger than just the North Atlantic and there are many more processes revealed by resolution than WBC separation. I’m not convinced that separation is more “key” than other processes that are improved with resolution. Maybe just back away from this statement a little?
Line 122 - “this will…” is a little ambiguous.
Line 146 — my recollection is that the tides in LLC4320 had a bug in the tidal forcing which overestimates the tidal magnitude (but I apologise that I can’t put my hands on the appropriate reference). I suggest the authors check on this issue as they revise the manuscript.
Line 180 — “… the two …” - also ambiguous.
Line 191 - there is a case made about daily-averaged submesoscale fields, but it wasn’t clear (to this reader) where these daily-averaged fields were used in this paper?
Line 222 — “This presents …” ambigious …
Figure 6 — On a first read, I was amazed at the similarities between the parameterised submesoscale fluxes and measured buoyancy flux. Actually, it looked too good to be believable. But when I looked at D1 the comparison was underwhelming. I suspect the use of the spatial median in Fig 6 is unfairly favouring the comparison. I would prefer the authors to show D1 as the main figure, or perhaps show both in the main text, for a warts-and-all comparison of the parameterisation.
Section 5 — The authors makes some good points here and I agree with most of them. But I found the approach to be slightly evangelical. Fundamentally, the argument seems to be “we have found the best approach, but if the scientists/funders don’t back us then it will fail”. I agree that the approach espoused here is good, and I would like to advocate for it myself. But a more dispassionate discussion of the pros and cons would probably be an advantage here. For example, a significant disadvantage here is the risk that the Google Cloud Platform is discontinued or unavailable to researchers in some nations, for whatever reason. That is not such an outlandish proposition, but could be catastrophic for an open platform like this. There are other risks of equal access, long term funding, etc. I am just asking here for a more objective analysis of the risks here — which would be a greater service to the reader than the advocative approach.
- AC5: 'Reply on RC4', Takaya Uchida, 03 May 2022
Takaya Uchida et al.
Takaya Uchida et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
566 | 170 | 22 | 758 | 5 | 4 |
- HTML: 566
- PDF: 170
- XML: 22
- Total: 758
- BibTeX: 5
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1