Review of C. Horvat and L. Roach — “WIFF1.0: A hybrid machine-learning-based parameterization of Wave-Induced sea-ice Floe Fracture.”

The authors previously developed a physics-based model (SP-WIFF), which can capture wave-induced sea ice floe fracture and was used as a super-parameterization in the large-scale model CICE. However, including SP-WIFF increased the runtime of CICE by an order of magnitude. Here, the authors develop a neural network model (NN-WIFF), trained on the output of SP-WIFF, that can be used as a computationally efficient alternative to SP-WIFF in large-scale models. It significantly reduced runtime while still capturing the floe fracture patterns produced by SP-WIFF in CICE quite reliably. I found the paper was straightforward and easy to read (apart from section 2). It could be relevant for large-scale sea ice modeling, and I recommend it for publishing after revisions.

My main comment is that I couldn't really understand how SP-WIFF works just from reading the summary in the paper (section 2). I understand that it is explained in detail elsewhere, but it would help to add some more clarifications here so that it would be clearer what the baseline assumptions are -any error introduced by the neural network comes on top of this. Also, I feel that the claim that NN-WIFF is overall superior to physically based SP-WIFF-1, is somewhat unfounded (based on the presented evidence) -for example, under a climate change scenario the neural network could fail as the base state moves outside the parameter range of the training set, while SP-WIFF-1 could still be as accurate as before.
Major comments: 1. section 2: I had trouble understanding the basic assumptions of the model from this section. Here are some of the places where I got confused, but I would appreciate a more detailed discussion in general.
(a) What is the timescale of Eq. 1 -does it evolve on the same timescale as the large-scale model (e.g. CICE), or is it considered to be infinitely fast compared to large-scale model evolution (i.e. for each time-step of the large-scale model, one finds a steady state distribution f (r))? Do you consider Eq. 1 to be a part of SP-WIFF, or do just the steps S1-S3 fall under it?
(b) Does this model explicitly ignore the possibility for ice floes to be advected between grid cells? Or is this possibility somehow included independently in the large-scale model?
(c) At which timescales does the wave spectrum, S(λ), evolve compared to the FSD? Is there a feedback with the FSD?
(d) Why is F (r, s)ds independent of time, t, and of the duration dt? I feel there is some implicit assumption here that I do not understand. Naively, I would think that the longer one waits and allows the ice to break, the higher fraction of the original floes would end up as small floes?
(e) Why 10km in step S1? Is that the typical size of a grid cell?
(f) Does the floe size distribution, f (r), enter the fracture algorithm, S1-S3? And, if so, how? It seems to me that it should since Ω(r, t), as defined at the beginning of this section, should be proportional to f (r).
(h) Does S3 consist of repeating S1-S2 on the same floe (breaking it additionally with each step), or does each iteration start from a new 10km floe?
(i) I do not understand how steps S1-S3 yield the terms Ω and F as defined in the opening paragraphs of section 2. In the beginning, Ω(r, t)drdt is defined as the fraction of the domain for which floes of size between r and r + dr fracture between times t and t + dt. From Eq. 2, it seems that Ω(r, t) is the fraction of floes smaller than r that come from fracturing a very large floe (of 10km in size). How are these two definitions equivalent? Moreover, Ω as defined in the beginning has units of m −1 s −1 , whereas Ω in Eq. 2 is dimensionless. Likewise for F (r, s)ds -the introduction defines it as the fraction Ω(r, t)drdt that breaks into floes of size between s and s + ds, while Eq. 3 seems to suggests that it is the ratio of the number of floes of size between r and r + dr to those smaller than s. Again, I cannot reconcile these two definitions. Perhaps a more careful explanation of what A(r) is and how it is related to Ω and F would help.
(k) Eq. 3: Shouldn't it be F (s, r)dr instead of F (s, r)ds?
2. performance of NN-WIFF compared to SP-WIFF-1: A major drawback of neural networks (or any other "black box" method), is that we cannot rely on them in circumstances significantly different than those seen during training. Physically-based models, such as SP-WIFF-1, are not as susceptible to this. So, under a climate change scenario, SP-WIFF-1 could turn out to be a better choice. I feel this point was not really discussed much apart from a couple of sentences in the conclusions. Perhaps a short discussion (if not more investigation) about this would be useful. As a suggestion for future versions of this model (which I don't expect implemented here), it could perhaps be useful to include the possibility to flag data points that fall outside of the parameter range of the training set. (b) line 213: The difference between NN-WIFF and SP-WIFF-1 errors seems to be quite small, and typically much smaller than the spread of the error (e.g. in the Arctic the difference is 0.1% in the SAE metric). So, saying that NN-WIFF consistently outperforms SP-WIFF-1 seems like an overstatement to me. I would rather say that they are of quite similar accuracy, although NN-WIFF is significantly faster.
(c) line 250: Again, I am not convinced that NN-WIFF is always more accurate than SP-WIFF-1.
(d) paragraph of line 255: Perhaps you can expand on this discussion.
3. line 228: "This demonstrates that differences in WIFF implementation do not have an emergent effect on sea ice model state." -It could also be that ice fracture in itself does not have a major impact on the state of sea ice. Have you compared the sea ice state with and without ice fracture (here, or in some previous work)?
continued on next page. . .