Articles | Volume 16, issue 4
https://doi.org/10.5194/gmd-16-1445-2023
© Author(s) 2023. This work is distributed under the Creative Commons Attribution 4.0 License.
Porting the WAVEWATCH III (v6.07) wave action source terms to GPU
Download
- Final revised paper (published on 03 Mar 2023)
- Preprint (discussion started on 10 Jun 2022)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on gmd-2022-141', Anonymous Referee #1, 10 Jul 2022
- AC1: 'Reply on RC1', Olawale Ikuyajolu, 01 Dec 2022
-
RC2: 'Comment on gmd-2022-141', Anonymous Referee #2, 12 Aug 2022
- AC2: 'Reply on RC2', Olawale Ikuyajolu, 01 Dec 2022
-
RC3: 'Comment on gmd-2022-141', Anonymous Referee #3, 13 Sep 2022
- AC3: 'Reply on RC3', Olawale Ikuyajolu, 01 Dec 2022
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Olawale Ikuyajolu on behalf of the Authors (02 Dec 2022)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (06 Dec 2022) by Julia Hargreaves
RR by Anonymous Referee #2 (10 Jan 2023)
RR by Anonymous Referee #3 (18 Jan 2023)
ED: Publish as is (24 Jan 2023) by Julia Hargreaves
AR by Olawale Ikuyajolu on behalf of the Authors (09 Feb 2023)
Manuscript
This manuscript presents results from a modified WAVEWATCH III code, which off-loads the spectral source terms to GPUs. The authors investigate the scaling and two different platforms, and validates the results against a CPU-only run.
This is a very timely, important and well written manuscript. I can recommend that it is accepted for publication after some minor changes. I also want to mention, that the way the problem is broken down and presented step by step made the manuscript easy to follow. Please find my comments and questions below:
lines 24-25: "despite the growing literature on their in the simulation of weather and climate."
Seems to be missing a word
line 71: I'm not sure what this is a reference to, but usually Komen et al. 1994 is used as a WAM reference?
line 86-88: These sentences are slightly confusing, since first we are talking about modules that calculate source terms (right hand of Eq. 1), and then we talk about discretizing (left hand side).
line 94: Can the relative computational intensivness change if we are using defferent propagation schemes. Can extremely small time steps alter/tip this balance? (Probably not, I guess.)
line 188-189: I'm a bit surprised that Sin takes half of the source term computational, time, since I would have expected the non-linear interactions to be the heaviest. The cumulative breaking term in ST4 is supposed to be quite resource consuming, while possible not having a large effect on the end results. Is that turned on or off here? If it's turned on, then that would perhaps be an obvious candidate to try to speed up the model (not directly related to GPU porting).
line 196-198 Is this a hard requirement, or does the lack of communication just mean that the source terms are trivially paralellisizable? The word "suitable" suggests that any communication here would make GPU porting a non-option, but I'm not sure that is the case (altough it probably becomes a lot more complex).
line 241-266: very long paragraph. Even though the paper is generally very well written, this aspect could be checked.
line 275: It seems like the order the figures are presented might be wrong, since Fig. 8 has already been referenced?
line 295: Would it be possible to increase occupancy by reorganizing the loops? Now we loop over all grid points, and then loop over one spectrum, but perhaps it would be more efficient to define an array that has both spatial and spectral dimensions (and perhaps slice that up into some blocks, if needed)? Can you comment on this?
The paper is missing a discussion section. Although it might not strictly be needed in this kind of more technical paper, it would perhaps be interesting for the reader to know what kind of impact these speed-ups might have in practical terms. Several days of wall time was mentioned, but is this a "game changer" to allow for including wave models in ESMs, or do we still need to optimize? I'm also wondering how well the exact non-linear solution might scale (if the authors can comment), since this might have consequences to very basic reasearch into e.g. wave growth that might be affected by the crude approximations of DIA. Finally, would it every be viable to port any other parts of the wave model, such as the propagation, to GPUs, or is the communication needed beween the grid points a complete deal breaker?