spyro: a Firedrakebased wave propagation and full waveform inversion finite element solver
 ^{1}Dept. of Mining and Petroleum Engineering, Escola Politécnica, University of São Paulo
 ^{2}Dept. of Mechanical Engineering, Escola Politécnica, University of São Paulo
 ^{3}Dept. of Mathematics, Baylor University
 ^{1}Dept. of Mining and Petroleum Engineering, Escola Politécnica, University of São Paulo
 ^{2}Dept. of Mechanical Engineering, Escola Politécnica, University of São Paulo
 ^{3}Dept. of Mathematics, Baylor University
Abstract. In this article, we introduce spyro, a software stack to solve acoustic wave propagation in heterogeneous domains and perform full waveform inversion (FWI) employing the finite element framework from Firedrake, a highlevel Python package for the automated solution of partial differential equations using the finite element method. The capability of the software is demonstrated by using a continuous Galerkin approach to perform FWI for seismic velocity model building, considering realistic geophysics examples. A timedomain FWI approach is detailed that uses meshes composed of variably sized triangular elements to discretize the domain. To resolve both the forward and adjointstate equations, and to calculate a meshindependent gradient associated with the FWI process, a fullyexplicit, variable higherorder (up to degree k = 5 in 2D and k = 3 in 3D) mass lumping method is used. We show that, by adapting the triangular elements to the expected peak source frequency and properties of the wavefield (e.g., local Pwavespeed) and by leveraging higherorder basis functions, the number of degreesoffreedom necessary to discretize the domain can be reduced. Results from wave simulations and FWIs in both 2D and 3D highlight our developments and demonstrate the benefits and challenges with using triangular meshes adapted to the material properties.
Keith J. Roberts et al.
Status: closed

RC1: 'Comment on gmd2021363', Anonymous Referee #1, 24 Jan 2022
# spyro: a Firedrakebased wave propagation and full waveform inversion finite element solver
Thank you for submitting your manuscript. The paper reads nicely and is well written. The objectives and proposed results are interesting and are a good fit for this journal.Overall, I would recommend minor revision as some details need to be improved and the presented results are not fully convincing. I think this software could be of benefit to the geophysical community and that an improved manuscript would greatly improve its outreach.
Mainly my concerns are:
 The computational aspect, emphasized in the introduction, needs more clarity and some type of baseline The author does not discuss meshing when a "good smooth model" is not available. What would be the strategy if the mesh needs to be modified at each (couple) iteration. The 3D results are not very accurate. While I understand these take a lot of resources and can not necessarily be rerun, justification on the accuracy is needed to justify the use of that software.
The author needs to justify better why the software they propose improves the current software landscape for FWI.
Please see below for more details about the manuscript.
# Detailed comments
Line 18:This is a bit terse, other methods exist (NMO, WEMVA, kirchoff, Depth/time extrapolation) and are widely used.
Line 20:"However, the FWI problem is challenging to apply in practice since there exists a nonunique 20 configuration of data that can best explain the observations. "
This is commonly described as cycle skipping. Please add references to the literature.
Line 47:One other problematic issue (the main one I would say) that isn't mentioned is the model dependence of the mesh. If the meshing becomes velocity/Unkown dependent then in theory, it should be added to the mathematical definition of the problem and the gradient of the mesh with respect to the velocity must be taken. This problem leads in some cases into elements wrapping onto themselves.This should be discussed by the author, and considering the mesh independent of the inverse problem needs to be justified.
Figure 1: Did you make that figure? This looks fairly familiar so please add a reference in case this is from somewhere else.
Equation 1: This is not the conventional way to define the wave equation in an acoustic medium. Ignoring the PML for simplicity, the acoustic wave equation (d'Alember operator) is defined as`diff(u, (t, 2))  c^2 laplacian(u)`. including the spatially varying wave speed as `div(c ^ 2 grad(u))` is quite unconventional, what is the justification behind this formulation?
Equation 9: Following the point above, the expression for the sensitivity is quite unconventional as well. (usually is either `2/c^3` or `1` depending on whether the wavespeed or squared slowness `1/c^2` is considered.
Gradient subsampling: This is known to lead to artifacts and inexact gradient. What exact subsampling ratio is used and how much information is lost with that subsampling (maybe compare gradients with and without subsampling).
Line 379: " were not ready to be used in our FWI code." So did you implement the adjoint by hand?
Line 380: Rapid Optimization Library. Any reason you did not choose a standard python library easier to use (i.e scipy) since, as you mentioned in the introduction, the bulk of the computing is the wave equation solves. Therefore "fast" optimization implementation doesn't really impact the overall performance.
Line 386: "we exclusively rely on the secondorder optimization method LBFGS (". LBFGS rely heavily on exact gradient information. Subsampling the gradient leads to an inexact gradient that doesn't satisfy lbfgs requirement to guarantee better convergence.
Line 451: "parallelism is trivial and handled by splitting the MPI communicator into groups of processes at initialization" Using MPI for separable task parallelism is known to be overkill and prone to issues. Any reason more adapted task parallelism (Dask, spark, ...) was not used?
The computational runtime, while informative, does not say much about the performance of the proposed method without any reference. How long would it take to run FWI with FD with the same amount of resources? And how much memory?
## 2D Marmousi
The peak RAM usage seems a bit high. AS a comparison, as standard FD on this model (17km x 3 km discretized on a 20m grid) requires `151*851*1251*4/(1024^3) = .6Gb` of memory for a wavefield sampled at 4ms (`r=4` for your case) which s about 5x10X lower than what your method seems to require. Can you elaborate on the memory needs of your method?
The runtimes need a reference as by themselves they do not provide any information.
## 3D overthrustThe 3D overthrust results arent' very convincing as they show a lot of artifacts and noise. I would consider trying to improve those. The author states "which is likely a result of poor source illumination beyond several kilometers of depth.". The illumination is usually corrected for with the hessian (or some diagonal approximation such as `1/norm(u_^2)`. Because you use lBFGS you should have an approximation of the hessian that is quite accurate after that many iterations. Therefore this interpretation seems incorrect. You can see in related work (such as Witte et. al you refer to) that the deep part of the model can be recovered with a god initial model such as yours.
A 5Hz peak wavelet is unrealistic as it leads to <1Hz data not being feasible in real life. Lowfrequency FWI is usually performed for frequencies >3Hz. This could be achieved with a higher frequency wavelet (ie 15 Hz) than band/lowpass your data to 38Hz for inversion.
AC1: 'Reply on RC1', Alexandre Olender, 01 Apr 2022
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd2021363/gmd2021363AC1supplement.pdf

AC1: 'Reply on RC1', Alexandre Olender, 01 Apr 2022

RC2: 'Comment on gmd2021363', Anonymous Referee #2, 27 Jan 2022
In this manuscript, the authors present spyro, a full waveform inversion finite element solver based on Firedrake. The solver features wavespeedadapted triangular/tetrahedral meshes, a fullyexplicit time stepping scheme based on a masslumping technique, and conforming elements of variable orders (up to degree 5 in 2D, and 3 in 3D).
The paper is wellwritten and fits the scope of this journal. The numerical results also look promising and the proposed solver seems to have several advantages over other FWI methods. I only have a few remarks/questions:
p3, line 61: spectral triangular elements are known at least up to degree 9 (see, e.g., Mulder 2013: New triangular masslumped finite elements of degree six for wave propagation, Cui et al. 2017: High order masslumping finite elements on simplexes, and Liu et al. 2017: Higherorder triangular spectral element method with optimized cubature points for seismic wavefield modeling).
p6, line 161: \sigma_i is not defined here. I suggest to give the definition of \sigma_i and \Psi_i here and explain that p_i and w only need to be computed in the boundary layer.
Section 2.2: is there a reference for the derivation of the adjoint equations? Especially the boundary conditions given on page 8, line 207 need some explanation. The adjoint problem has 2 boundary conditions while the original problem had only 1.
p9, line 230: definition of F is missing.
Section 3.1: is the stiffness matrix assembled and stored or are the computations matrixfree?
p12, top line: In 3D, alpha is computed taking the cube root?
p12, definitions A_{n+1}, A_n, A_{n1}: please double check these definitions.
In A_n topright: should be M_{omega, 1} instead of M_{omega}.
In A_n bottomright: should be 0.
In A_{n1} second row: second and third term should be swapped.
In A_{n1} bottomright: should have a minus sign.
p14, equation 36: the definition of G is not really clear. What is its continuous counterpart? Also, the righthandside should be a vector. Please give a more precise definition.
p16, line 396: 32 nodes instead of 50.
p17, Algorithm 1: step 10 is done via L_BFGS and step 11 via ROL? Or are both used for both steps?
p17, line 441: equation (36) instead of (19)?
p18, Figure 5: This figure is rather unclear. Does gradient.py do steps 8+9 of Algorithm 1? This figure also contains several equations, whereas I would expect steps of an algorithm.
Section 5: it seems that the zcoordinate is always given first. Please explicitly mention this somewhere.
p22, line 548550 and Figure 8: for a fixed p, the relation between C and G is actually linear. In Figure 8, straight lines are only expected when using a loglog plot. I would suggest to use a logscaling for the horizontal axis in Figure 8 or remove the linear fits.
p26, final paragraph: please doublecheck the domain sizes.
p32, Table 3: it would be convenient to also have the final J here as an additional column.
Section 6: with 2 full pages, the conclusions seem to be overly long. Please try to make it shorter and more concise. A summary of the results and corresponding conclusions should be the main focus. Ideally, this section has one or just a few clear takeaway messages.
p41, Appendix: The definition of superscripts x_k/x_l are missing. Also, does there need to be a summation over l in the last two equations?
AC2: 'Reply on RC2', Alexandre Olender, 01 Apr 2022
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd2021363/gmd2021363AC2supplement.pdf

AC2: 'Reply on RC2', Alexandre Olender, 01 Apr 2022
Status: closed

RC1: 'Comment on gmd2021363', Anonymous Referee #1, 24 Jan 2022
# spyro: a Firedrakebased wave propagation and full waveform inversion finite element solver
Thank you for submitting your manuscript. The paper reads nicely and is well written. The objectives and proposed results are interesting and are a good fit for this journal.Overall, I would recommend minor revision as some details need to be improved and the presented results are not fully convincing. I think this software could be of benefit to the geophysical community and that an improved manuscript would greatly improve its outreach.
Mainly my concerns are:
 The computational aspect, emphasized in the introduction, needs more clarity and some type of baseline The author does not discuss meshing when a "good smooth model" is not available. What would be the strategy if the mesh needs to be modified at each (couple) iteration. The 3D results are not very accurate. While I understand these take a lot of resources and can not necessarily be rerun, justification on the accuracy is needed to justify the use of that software.
The author needs to justify better why the software they propose improves the current software landscape for FWI.
Please see below for more details about the manuscript.
# Detailed comments
Line 18:This is a bit terse, other methods exist (NMO, WEMVA, kirchoff, Depth/time extrapolation) and are widely used.
Line 20:"However, the FWI problem is challenging to apply in practice since there exists a nonunique 20 configuration of data that can best explain the observations. "
This is commonly described as cycle skipping. Please add references to the literature.
Line 47:One other problematic issue (the main one I would say) that isn't mentioned is the model dependence of the mesh. If the meshing becomes velocity/Unkown dependent then in theory, it should be added to the mathematical definition of the problem and the gradient of the mesh with respect to the velocity must be taken. This problem leads in some cases into elements wrapping onto themselves.This should be discussed by the author, and considering the mesh independent of the inverse problem needs to be justified.
Figure 1: Did you make that figure? This looks fairly familiar so please add a reference in case this is from somewhere else.
Equation 1: This is not the conventional way to define the wave equation in an acoustic medium. Ignoring the PML for simplicity, the acoustic wave equation (d'Alember operator) is defined as`diff(u, (t, 2))  c^2 laplacian(u)`. including the spatially varying wave speed as `div(c ^ 2 grad(u))` is quite unconventional, what is the justification behind this formulation?
Equation 9: Following the point above, the expression for the sensitivity is quite unconventional as well. (usually is either `2/c^3` or `1` depending on whether the wavespeed or squared slowness `1/c^2` is considered.
Gradient subsampling: This is known to lead to artifacts and inexact gradient. What exact subsampling ratio is used and how much information is lost with that subsampling (maybe compare gradients with and without subsampling).
Line 379: " were not ready to be used in our FWI code." So did you implement the adjoint by hand?
Line 380: Rapid Optimization Library. Any reason you did not choose a standard python library easier to use (i.e scipy) since, as you mentioned in the introduction, the bulk of the computing is the wave equation solves. Therefore "fast" optimization implementation doesn't really impact the overall performance.
Line 386: "we exclusively rely on the secondorder optimization method LBFGS (". LBFGS rely heavily on exact gradient information. Subsampling the gradient leads to an inexact gradient that doesn't satisfy lbfgs requirement to guarantee better convergence.
Line 451: "parallelism is trivial and handled by splitting the MPI communicator into groups of processes at initialization" Using MPI for separable task parallelism is known to be overkill and prone to issues. Any reason more adapted task parallelism (Dask, spark, ...) was not used?
The computational runtime, while informative, does not say much about the performance of the proposed method without any reference. How long would it take to run FWI with FD with the same amount of resources? And how much memory?
## 2D Marmousi
The peak RAM usage seems a bit high. AS a comparison, as standard FD on this model (17km x 3 km discretized on a 20m grid) requires `151*851*1251*4/(1024^3) = .6Gb` of memory for a wavefield sampled at 4ms (`r=4` for your case) which s about 5x10X lower than what your method seems to require. Can you elaborate on the memory needs of your method?
The runtimes need a reference as by themselves they do not provide any information.
## 3D overthrustThe 3D overthrust results arent' very convincing as they show a lot of artifacts and noise. I would consider trying to improve those. The author states "which is likely a result of poor source illumination beyond several kilometers of depth.". The illumination is usually corrected for with the hessian (or some diagonal approximation such as `1/norm(u_^2)`. Because you use lBFGS you should have an approximation of the hessian that is quite accurate after that many iterations. Therefore this interpretation seems incorrect. You can see in related work (such as Witte et. al you refer to) that the deep part of the model can be recovered with a god initial model such as yours.
A 5Hz peak wavelet is unrealistic as it leads to <1Hz data not being feasible in real life. Lowfrequency FWI is usually performed for frequencies >3Hz. This could be achieved with a higher frequency wavelet (ie 15 Hz) than band/lowpass your data to 38Hz for inversion.
AC1: 'Reply on RC1', Alexandre Olender, 01 Apr 2022
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd2021363/gmd2021363AC1supplement.pdf

AC1: 'Reply on RC1', Alexandre Olender, 01 Apr 2022

RC2: 'Comment on gmd2021363', Anonymous Referee #2, 27 Jan 2022
In this manuscript, the authors present spyro, a full waveform inversion finite element solver based on Firedrake. The solver features wavespeedadapted triangular/tetrahedral meshes, a fullyexplicit time stepping scheme based on a masslumping technique, and conforming elements of variable orders (up to degree 5 in 2D, and 3 in 3D).
The paper is wellwritten and fits the scope of this journal. The numerical results also look promising and the proposed solver seems to have several advantages over other FWI methods. I only have a few remarks/questions:
p3, line 61: spectral triangular elements are known at least up to degree 9 (see, e.g., Mulder 2013: New triangular masslumped finite elements of degree six for wave propagation, Cui et al. 2017: High order masslumping finite elements on simplexes, and Liu et al. 2017: Higherorder triangular spectral element method with optimized cubature points for seismic wavefield modeling).
p6, line 161: \sigma_i is not defined here. I suggest to give the definition of \sigma_i and \Psi_i here and explain that p_i and w only need to be computed in the boundary layer.
Section 2.2: is there a reference for the derivation of the adjoint equations? Especially the boundary conditions given on page 8, line 207 need some explanation. The adjoint problem has 2 boundary conditions while the original problem had only 1.
p9, line 230: definition of F is missing.
Section 3.1: is the stiffness matrix assembled and stored or are the computations matrixfree?
p12, top line: In 3D, alpha is computed taking the cube root?
p12, definitions A_{n+1}, A_n, A_{n1}: please double check these definitions.
In A_n topright: should be M_{omega, 1} instead of M_{omega}.
In A_n bottomright: should be 0.
In A_{n1} second row: second and third term should be swapped.
In A_{n1} bottomright: should have a minus sign.
p14, equation 36: the definition of G is not really clear. What is its continuous counterpart? Also, the righthandside should be a vector. Please give a more precise definition.
p16, line 396: 32 nodes instead of 50.
p17, Algorithm 1: step 10 is done via L_BFGS and step 11 via ROL? Or are both used for both steps?
p17, line 441: equation (36) instead of (19)?
p18, Figure 5: This figure is rather unclear. Does gradient.py do steps 8+9 of Algorithm 1? This figure also contains several equations, whereas I would expect steps of an algorithm.
Section 5: it seems that the zcoordinate is always given first. Please explicitly mention this somewhere.
p22, line 548550 and Figure 8: for a fixed p, the relation between C and G is actually linear. In Figure 8, straight lines are only expected when using a loglog plot. I would suggest to use a logscaling for the horizontal axis in Figure 8 or remove the linear fits.
p26, final paragraph: please doublecheck the domain sizes.
p32, Table 3: it would be convenient to also have the final J here as an additional column.
Section 6: with 2 full pages, the conclusions seem to be overly long. Please try to make it shorter and more concise. A summary of the results and corresponding conclusions should be the main focus. Ideally, this section has one or just a few clear takeaway messages.
p41, Appendix: The definition of superscripts x_k/x_l are missing. Also, does there need to be a summation over l in the last two equations?
AC2: 'Reply on RC2', Alexandre Olender, 01 Apr 2022
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd2021363/gmd2021363AC2supplement.pdf

AC2: 'Reply on RC2', Alexandre Olender, 01 Apr 2022
Keith J. Roberts et al.
Data sets
Simulation scripts and data for full waveform inversion using spyro Keith J. Roberts https://doi.org/10.5281/zenodo.5172307
Model code and software
spyro V0.1.0 Keith J. Roberts; Alexandre Olender; Lucas Franceschini https://doi.org/10.5281/zenodo.5164113
Keith J. Roberts et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

904  271  22  1,197  13  9 
 HTML: 904
 PDF: 271
 XML: 22
 Total: 1,197
 BibTeX: 13
 EndNote: 9
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1