Comment on gmd-2022-27

A wind-driven

I think this manuscript is a most interesting and timely illustration of how storage and analysis of large model datasets may evolve in the coming years. The latest generation of ocean (and also of atmosphere) models now routinely produce datasets of O(Tera-Petabytes). The storage and analysis of these datasets is a major challenge -which often results in cutting edge simulations being underexploited. Here the authors use the cloudbased framework proposed by the Pangeo project to produce an intercomparison of a set of 8 submesoscale-permitting ocean models. The manuscript shows the potential the cloud framework and provides an assessment of the mixed-layer instability (MLI) parameterisation of Fox-Kemper (2011) across a set of submesoscale-permitting models.
The manuscript is clear and well-written and will be an excellent contribution to GMD. I only have a few minor points listed below that might benefit from some clarification.

Comments:
1) The main motivation of the study is to demonstrate a framework for the intercomparison and analysis of datasets of O(Tera-Petabytes). However, the region of focus around the Gulf Stream separation is actually quite small (~1000 km x 1000 km) and the size of the datasets will be Gigabytes rather than Terabytes or more. The choice to focus on the Gulf Stream separation region is well motivated as this is a region where SWAT tracks will cross. Nevertheless, I wonder if something can be said about how easily the system would scale if comparison and analysis were extended to e.g. the largest domain (North Atlantic) that all 8 models have in common or to global analyses (ie. when the amount of data indeed gets in the order of Petabytes…). Could OSN handle this amount of data? Could it be uploaded onto the cloud within a reasonable amount of time?
2) I found the MLI assessment most interesting as it adds an interesting piece of science and the agreement seen in Figures 6 and D1 is surprisingly good. However, I am not sure that the explanation given in Appendix D as to why the histogram values shown in Figure  D1 are falling under the one-to-one line is correct. Isn't this rather the consequence of taking the spatial median? If the local (i.e. for each grid cell) values are taken for C e , there is by construction a perfect alignment of the histograms with the one-to-one lines. Any departure from the one to one lines has therefore to result from summarising the spatial variability with one value (i.e. C e (t, x,y) --> C e (t)). I also note here that across the models the slope for the histograms is steeper than the one-to-one lines. As before, I feel that the slope will be affected depending on which value you chose for C e (e.g. median, mean, mode, 1 st , 3 rd quartile…etc). Depending on which value you pick and on the distribution of the values C e (t,x,y), I expect that the histogram values can move above, onto, or below the one-to-one line and that the slope can increase or decrease. What do the distributions of values C e (t,x,y) actually look like? It might be nice to see an example. This distribution may be a useful guide for deciding on the value of the efficiency coefficient C e .

Details:
Figure 5: I suggest to label the panels with w, w m , w s and b, b m , b s .