Lossy checkpoint compression in full waveform inversion: a case study with ZFPv0.5.5 and the overthrust model

Kukreja, Navjot; Hückelheim, Jan; Louboutin, Mathias; Washbourne, John; Kelly, Paul H. J.; Gorman, Gerard J.

doi:https://doi.org/10.5194/gmd-15-3815-2022

Articles | Volume 15, issue 9

https://doi.org/10.5194/gmd-15-3815-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-15-3815-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 15, issue 9

Development and technical paper

|

12 May 2022

Development and technical paper |

| 12 May 2022

Lossy checkpoint compression in full waveform inversion: a case study with ZFPv0.5.5 and the overthrust model

Navjot Kukreja, Jan Hückelheim, Mathias Louboutin, John Washbourne, Paul H. J. Kelly, and Gerard J. Gorman

Download

Final revised paper (published on 12 May 2022)
Preprint (discussion started on 06 Nov 2020)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

SC1: 'executive editor comment on gmd-2020-325', Astrid Kerkweg, 14 Nov 2020
- SC2: 'Reply on SC1', Navjot Kukreja, 05 Jan 2021
RC1: 'Review of Lossy Checkpoint Compression in Full Waveform Inversion', Anonymous Referee #1, 17 Jan 2021
- AC1: 'Reply on RC1', Navjot Kukreja, 02 Feb 2021
RC2: 'Review GMD-2020-325', Anonymous Referee #2, 25 Mar 2021
AC2: 'Comment on gmd-2020-325', Navjot Kukreja, 14 Apr 2021

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Navjot Kukreja on behalf of the Authors (07 Jul 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (05 Oct 2021) by Adrian Sandu

RR by Anonymous Referee #2 (19 Nov 2021)

Suggestions for revision or reasons for rejection

The manuscript “Lossy Checkpoint Compression in Full Waveform Inversion: a case study with ZFP v0.5.5 and the Overthrust Model” by Kukreja et al. discusses a strategy to mitigate the huge memory bottleneck of time-domain adjoint simulations that require access to the entire forward wavefield in reverse order.

The topic is highly interesting and relevant for realistic FWI on modern compute architectures. The specific contribution of the manuscript is to compress checkpoints during the forward simulation before writing them to disk and to restart the computations from de-compressed checkpoints during the adjoint run. Admittedly, this is a rather small step, but given its importance in an FWI workflow, it could still warrant a publication. A synthetic 2D FWI demonstrates that even with significant compression factors the rate of convergence is not negatively affected. This is a great and very relevant result.

I primarily see two aspects in the current manuscript that require some additional work, which I list below. Furthermore, some parts of the manuscript are a bit sloppy and could use more care. Several figures (8, 10, 11, 12, 14, 16) are not explained or even referenced in the text.

+++ Technical aspects

Can you provide more information on some technical aspects of the compression? For instance:
- How do you manage parallel file formats for distributed simulations with compressed chunks of data, where the size of each chunk might vary and unknown prior to the simulation?
- Did you compare different compression algorithms, and can you comment on the computational overhead for (de)compressing the wavefield?
- Which fields do you compress (pressure, velocity, pressure gradient, …)? Do you apply different tolerances or compression strategies for different fields?
- Is it possible to a-priori ensure an absolute / relative tolerance after decompression?

+++ Error analysis

Can you elaborate more on the absolute tolerance atol used in the numerical examples? I think it would be better to somehow relate the tolerance to the maximum amplitude of the wavefield / resp. source. The absolute number is rather meaningless.

When looking at Fig. 7, I am wondering if the frequency content of the (de)compressed snapshot is altered, which is why the error is growing in the first couple of time steps? Furthermore, is the decreasing trend a simple result from the decaying amplitudes in the wavefield or is this normalized in some way? It would help to also show relative errors per time step.

+++ Minor comments

page 2, line 22:
Referencing an equation long before it appears in the manuscript is bad style. Furthermore, the equation is not called TTI. TTI refers to the medium / model parameterization or stress-strain relation, respectively, and should not be used as an acronym without introduction.

page 2, Table 1:
“Forward propagation” is misleading and should be “time steps” instead. Calling it “peak memory” is very misleading for gradient computations because no reasonable implementation would do that.

page 3, line 48/49:
Either remove the reference to eq. (1) or state the equation here.

page 4, line 81:
Why is there an asterisk after Louboutin?

page 5, Figure 2:
Add labels and annotation to make it easier to read. The horizontal axis could count multiples of single simulations.

page 9, line 191:
What density model are you using? I would still consider it an inverse crime if it were a scaled version of the velocity model or even just a homogeneous model.

section 3:
This section seems a bit disconnected from the rest. I would recommend merging it with section 4. In particular, I don’t see a reason to introduce the subsections of section 4 already here with a single paragraph.

page 11, Figures 4 and 5:
How does the absolute tolerance of 1e-4 relate to the pressure amplitude? Which other fields do you compress is this really an absolute tolerance and not a relative one?

page 12, caption Figure 6:
There is a reference missing: “See figures A1 and ??”

page 12, Figure 6:
How do you define the signal to noise ratio in this case?

page 15, Figure 10:
The same results are shown again in Fig. 19, so I would either show the entire x-axis here or remove the figure.

page 15, Figure 11:
What error is shown on the y-axis? And why is it huge?

page 17, line 269:
Please put atol in context to the maximum amplitude of the wavefields that are compressed.

page 17, Figure 14:
Should this be “true model” instead of “true solution”? The figure is not referenced in the text.

page 17, Figure 15:
Some of the recovered structure looks to be significantly smaller than a wavelength for the given frequency content. Could you comment on inverse crime? Are you inverting for density as well (see question on the density model above)?

page 20, line 293:
What does atol > 4 mean?

Hide

RR by Anonymous Referee #3 (27 Jan 2022)

ED: Publish subject to minor revisions (review by editor) (11 Mar 2022) by Adrian Sandu

AR by Navjot Kukreja on behalf of the Authors (26 Mar 2022) Author's response Author's tracked changes Manuscript

ED: Publish as is (11 Apr 2022) by Adrian Sandu

AR by Navjot Kukreja on behalf of the Authors (14 Apr 2022) Author's response Manuscript

Short summary

Full waveform inversion (FWI) is a partial-differential equation (PDE)-constrained optimization problem that is notorious for its high computational load and memory footprint. In this paper we present a method that combines recomputation with lossy compression to accelerate the computation with minimal loss of precision in the results. We show this using experiments running FWI with a variety of compression settings on a popular academic dataset.