the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
LISFLOOD-FP 8.2: GPU-accelerated multiwavelet discontinuous Galerkin solver with dynamic resolution adaptivity for rapid, multiscale flood simulation
Abstract. The second-order discontinuous Galerkin (DG2) solver of the shallow water equations in LISFLOOD-FP 8.0 is well-suited for predicting small-scale transients that emerge in rapid, multiscale floods caused by impact events like tsunamis. However, this DG2 solver can only be used for simulations on a uniform grid where it may yield inefficient runtimes even when using its graphics processing unit (GPU) parallelised version (GPU-DG2). To maximise runtime reduction, the LISFLOOD-FP 8.2 version integrates GPU parallelised dynamic (in time) grid resolution adaptivity of multiwavelets (MW) with the DG2 solver (GPU-MWDG2). The GPU-MWDG2 solver requires selecting a maximum refinement level, L, based on size and resolution of the Digital Elevation Model (DEM) and an error threshold, ε ≤ 10-3, to preserve similar accuracy as a GPU-DG2 simulation on a uniform grid. The accuracy and efficiency of dynamic GPU-MWDG2 adaptivity is assessed for four tsunami-induced flooding test cases involving increasingly complex tsunamis: from single-wave impact events to wave trains. At ε = 10-3, the GPU-MWDG2 simulations yield predictions similar to the GPU-DG2 simulations but using ε = 10-4 can improve the accuracy in velocity-related predictions. In terms of efficiency, the GPU-MWDG2 simulations show progressively larger speedups over the GPU-DG2 simulations from L ≥ 10, which become significant (≥ 3.3- and 4.5-fold at ε = 10-4 and 10-3 , respectively) for simulating a single-wave impact event. The LISFLOOD-FP 8.2 code is open source, DOI: 10.5281/zenodo.4073010, as well as the simulation data and the input files and scripts to reproduce them, DOI: 10.5281/zenodo.13909072, with additional documentation at https://www.seamlesswave.com/Adaptive (last accessed: 9 October 2024).
- Preprint
(5149 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2024-152', Anonymous Referee #1, 18 Oct 2024
In the manuscript titled “ LISFLOOD-FP 8.2: GPU-accelerated multiwavelet discontinuous Galerkin solver with dynamic resolution adaptivity for rapid, multiscale flood simulation”, the authors develop the new LISFLOOD-FP 8.2 version integrates GPU parallelised dynamic and the DG2 solver (GPU-MWDG2) to simulate tsunami-induced flooding. It shows progressively larger speedups over the GPU-DG2 simulations from 𝐿 ≥ 10. There are still certain changes and clarifications that the authors should address prior to publication. There are still certain changes and clarifications that the authors should address prior to publication. I believe that the manuscript can be accepted for publication by the GMD after minor revision. Below, I have some general comments for the authors.
General comments:
- The paper describes an innovative GPU-MWDG2 solver, so section 2.1 is the core of the calculation method in this paper. To help readers understand the detailed clarification on certain algorithmic. It is recommended that the authors provide some key portions of the code or pseudo-code as appendix. This is merely a suggestion and does not affect the validity of the paper's arguments.
- The paper could benefit from a deeper analysis of the trade-offs between accuracy and efficiency when choosing different error thresholds (ε). The choice of ε = 10-4 vs. 10-3 shows efficiency improvements, more discussion is needed on when to choose one over the other in practical scenarios.
- Figure 7 and Figure 10 are comparisons between GPU-MWDG2 and GPU-DG2, there is no description of the dashed line in the figure caption. The comparison between the two in terms of computational performance is not intuitive enough.
- The paper touches on the scalability of the solver but could provide a more forward-looking discussion of future applications. For example, how would this solver perform in larger simulations involving urban flooding, river flooding that require coupling with other environmental models? A brief discussion of scalability in more complex, coupled systems could enhance the paper's impact.
Specific comments:
- It is very rare for DOIs and data links to appear in an abstract.
- Line #219-#220, ensure consistency in the formatting of references. For example, "Kesserwani et al. (2023)" is used multiple times but isn't always consistent in placement or citation style.
- Figure Legends Overlap: In some figures, such as Figure 4(b), the legends and lines overlap, making it difficult to interpret the data. Please revise the figures to ensure clarity.
Citation: https://doi.org/10.5194/gmd-2024-152-RC1 -
RC2: 'Comment on gmd-2024-152', Anonymous Referee #2, 11 Nov 2024
Review of
LISFLOOD-FP 8.2: GPU-accelerated multiwavelet discontinuous Galerkin solver with dynamic resolution adaptivity for rapid, multiscale flood simulation
This paper validates the accuracy and efficiency of a new version of the LISFLOOD-FP inundation model that provides dynamical multiscale grid adaptivity by incorporating the previously developed GPU-MWDG2 solver. The previous version of LISFLOOD-FP only allowed static adaptivity, based on the initial conditions.
The results suggest that the new, adaptive, LISFLOOD-FP is both accurate and efficient (up to 4.5 times faster). The paper represents a significant advance in efficient flood modelling using the standard 2D shallow water equation model. I also believe the paper will be interesting for more general readers, who want to better understand dynamically adaptive methods.
Most of my questions and suggestions are aimed at making the paper more understandable and higher impact by clarifying basic properties of the method, however I also have some concerns about the tolerance and validation of the method.
Questions:
1. The abstract should be revised to be clearer for non-expert users of the LISFLOOD-FP models.
(a) It would be helpful to specify that the model is based on the two-dimensional shallow water equations (not the multilayer three-dimensional shallow water equations used in ocean and atmosphere modelling).
(b) More details on the error threshold are needed in the abstract and Section 1. Is epsilon a relative error threshold that controls all prognostic variables, or just some variables? What does it measure or control? Or perhaps it is a relative error threshold for the tendencies? Please clarify what epsilon measures/controls and how it set. See also question 3 below: the results don’t seem to show that epsilon controls the error of the prognostic variables.
(c) How is the refinement level L defined? Is it a dyadic (power of 2) refinement of a coarsest grid? What is the coarsest grid? How does the number of computational elements depend on L? Does the non-adaptive simulation use the same L (but with no adaptivity)?
(d) It would be helpful to make more general conclusions about the efficiency of the adaptive method, rather than reporting results for a specific example. What determines the refinement level where the adaptive method is faster? How should the tolerance epsilon be set to achieve an acceptable balance of efficiency and accuracy for a given application?
2. Introduction(a) Par 3: Low order finite volume discretizations are typically used because the goal is to preserve various mimetic properties of the discretization (e.g. conservation of mass to machine precision, ensuring that there is nos spurious generation of vorticity from a uniform vorticity field). This is considered essential for climate modelling, for example. Does LISFLOOD-FP discretely conserve mass and other mimetic properties? Does the adaptive version retain these conservation properties? If not, why are mimetic properties like mass conservation not important?
(b) Par 5: How should the number of retained computational modes scale with the error threshold? This can usually be calculated for adaptive wavelet methods, based on the order or accuracy of the wavelets and dimensionality of the problem (2 in this case).
(c) Par 5: What is special about L=9? Can you make more general claims about efficiency? Is the time step fixed for all scales?
3. Section 3
The RMS errors presented in Tables 3 to 6 do not confirm that the tolerance actually controls the error of either the height or velocity variables. A valid dynamically adaptive wavelet method should be characterized by the fact the tolerance controls the error: error should ideally be proportional to tolerance, or, at least the error should decrease significantly when the tolerance is decreased by a factor of 10. In many cases the results show the error decreases by 10-30% or so, and in at least cases the error actually grows, which is concerning since it suggests that epsilon does not control the error of the simulation (at least for these choices of epsilon).
These results merit much more discussion and explanation if they are intended to validate the error control of the adaptive method. It would be very helpful to consider a wider set of tolerances epsilon for large L (e.g. 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6 or another suitable set of 6 or more epsilon values) to properly understand how and if epsilon controls the error of the prognostic variables.
Citation: https://doi.org/10.5194/gmd-2024-152-RC2 -
RC3: 'Comment on gmd-2024-152', Anonymous Referee #3, 30 Nov 2024
The manuscript presents an adaptive Discontinuous Galerkin solver for the shallow water equations implemented to be executed on GPUs. The main goal is to show its performance on tsunami inundation cases, which is well achieved. Overall I have a positive opinion of the manuscript. Nevertheless, I believe that the manuscript could benefit from some additional discussion concerning the observed results and the attribution of the behaviours to different aspects of the numerical solver and the adaptivity.
1- Introduction, line 96-105. The authors refer to "rapid" and "slow-to-rapid" flows. What this means is not clear. Also in line 578.
2- Line 122: "which its dynamic adaptivity". There seems to be something off in the syntax in this sentence.
3- Line 170: why is it necessary to trigger refinement around the wall? In the figure of the Monai grid the refinement seems to be outside of the flow domain. Please explain why this is necessary.
4- line 187: 15% of memory is allocated for arrays storing the hierarchy of uniform grids. This is unclear. What needs to be stored for the hierarchy of grids? This seems to be a simple recursion. Do you need to store the geometry for some reason? What is stored in these arrays requiring so much memory?
5- line 189: "arrays for non-uniform grid". Presumably these are the DG coefficients of states and parameters, are they not? The wording suggests you store "grid" information, no the actual model states and parameters.
6- line 190: why is it necessary to store neighbours "explicitly". What does "explicitly" mean? Does it imply data duplication in your arrays?
7- Across the different test cases, there is the possibility that the problem size is too small to really exploit the high level of parallelism the A100 GPUs can offer, assuming that parallelism is mostly achieved through parallel processing of elements. If I have understood properly, the highest resolution grids have the following approximate number of cells:
- 380k Monai. Surely very small for the A100.
- 2.4M Seaside Oregon. Seems ok.
- 9M Tauranga Harbour. Seems ok.
- 485k Hilo Harbour. Surely very small for the A100.
The adaptive grids, in turn, as desired, significantly reduce the number of elements to the following approx. maximums:
- 60k Monai
- 230k Oregon
- 3.6M Tauranga
- 195k Hilo
With the exception of Tauranga, it seems these grids will not fully exploit the GPU parallelism.
I don´t think this is a fundamental issue. Nevertheless, the performance analysis seems not to take it into consideration at all. To at least some degree the low arithmetic load of the GPU is likely to explain the large gap between the ratio of cells and the speed-up. The statements in lines 479-481 further seem to support that, for example, the Monai case is simply too small of a problem, since larger cases actually yield a better behaviour.8- L344, Concerning the reduced dt which is explained by wetting and drying. This is pretty obscure I get why there's a more frequent and aggressive grid coarsening with epsilon=1E-3, but why does this impact "wetting and drying"? How does this impact on wetting and drying result in a reduced time step? Why do you discard CFL-dominated time step sizes, say at shocks as the reason? How are wet/dry fronts treated? Are they not refined to the highest resolution? If they are, why does the time-step size restriction associated to the wetting/drying fronts yield smaller time steps?
8b- later in L411 you mention "wet*dry fronts at coarse cells". This seems counter intuitive... why would you keep coarse cells at a wet/dry front? Isn't this precisely the time of feature you would like to capture? Moreover, it seems to behave poorly, since it requires an aggressive time step reduction. How is it advantageous to keep wet/dry fronts at coarser levels?
9- L347-348, regarding the computational effort of MRA. It seems rather important to highlight that Figure 7 shows that R_MRA is significantly larger than R_DG2 and C_MRA is significantly larger than C_DG2. Please comment on this. Earlier in the text you highlight that one would ideally expect a close correlation between the reduction in number of elements (relative to the fine grid) and the speed up. This is obviously not the case, as a reduction to ~16% of elements would imply a speed up in the order of 6x. Therefore the overhead of MRA here is overwhelming, and costs you 3x of speed up. Similar points can be made for the Oregon case, with 10% of the cells, but speed ups below 5, at best.
10- Line 420-422. This text might be a bit unintentionally deceptive. What would happen if you were to measure cumulative times at t=24 when the flow is really complex? From your own arguments one can say that during the "simple" flow stages (as the incoming wave propagates through a smooth and open bathymetry) you get a speed -up. You later lose a lot of that speed up because of the wet/dry limited time step, thus, the adaptivity at eps=1E-3 is self-damaging.
If you measure Ctot from t=24, is the speed up still larger than 1?
The rationale for this question is this: the fraction of time in this problem with complex flows (the ones that eps=1E-3 doesn't like) is relatively small compared to the rest. What if it were the opposite, i.e., 95% of the time you had complex flows which eps=1E-3 doesn't like?11- L 426 "all showing a good agreement with the measured time series". There's some room for debate on this statement. In A1 the peak of h+z is underestimated, it is overestimated in B6, and in D4 it is inaccurate and out of phase. What is true is that the adaptive solution is very similar to the non-adaptive solution. I understand this is the key point for the authors. It is nevertheless questionable is if the non-adaptive solution is an accurate-enough reference. This poses questions on WHY the non-adaptive solution does not capture these clear features. Specifically, could it be related to resolution? Would a higher resolution solve the discrepancies? If yes, why not try?
12- Figure 13. What are the dashed lines in the Ncells plot?
13- Lines 539-541. You comment that the dt history is very similar, unilke previous cases. What is different in this case? Why is the dt not affected by the different adaptive thresholds like in other cases?
14- In Figure 16, despite the small problem size, RDG2 is larger than RMRA, which suggests that the complex dynamics dominate. This is relevant, I believe.
15 - L592: "the DEM area correspons to a choice for L >= 9". This is not exactly what the results show.
16- L953-598. What arguably matters is how long the dynamics are complex, not the fact that there are multiple peaks. You could have a very short simulation around a single peak and get a high density of complex dynamics during a short time. This is effectively what happens in the Tauranga case, as shown in the Sacc plot in figure 13.
17- In order to disentangle the two (apparently) key conclusions related to point 15 and 16: wouldn't it make sense to run the same test with larger L? In particular smaller cases, such as the Monai case? I understand it is all limited by memory, but memory limitations seem far for the Monai case.
18- Table 7: what is "max cells"?
19- I would suggest to get rid of lines 605-608, as they appear in the code and data availability statement.Citation: https://doi.org/10.5194/gmd-2024-152-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
204 | 40 | 13 | 257 | 5 | 3 |
- HTML: 204
- PDF: 40
- XML: 13
- Total: 257
- BibTeX: 5
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1