Massively parallel modeling and inversion of electrical resistivity tomography data using PFLOTRAN

. Electrical resistivity tomography (ERT) is a broadly accepted geophysical method for subsurface investigations. Interpretation of ﬁeld ERT data usually requires the application of computationally intensive forward modeling and inversion algorithms. For large-scale ERT data, the efﬁciency of these algorithms depends on the robustness, accuracy, and scalability on high-performance computing resources. In this regard, we present a robust and highly scalable implementation of forward modeling and inversion algorithms for ERT data. The implementation is publicly available and developed within the framework of PFLOTRAN, an open-source, state-of-the-art massively parallel subsurface ﬂow and transport simulation code. The forward modeling is based on a ﬁnite-volume discretization of the governing differential equations, and the inversion uses a Gauss– Newton optimization scheme. To evaluate the accuracy of the forward modeling, two examples are ﬁrst presented by considering layered (1D) and 3D earth conductivity models. The computed numerical results show good agreement with the analytical solutions for the layered earth model and results from a well-established code for the 3D model. Inversion of ERT data, simulated for a 3D model, is then performed to demonstrate the inversion capability by recovering the conductivity of the model. To demonstrate the parallel performance of PFLOTRAN’s ERT process model and inversion capabilities, large-scale scalability tests are performed by us-ing up to 131072 processes on a leadership class supercom-puter. These tests are performed for the two most computationally intensive steps of the ERT inversion: forward modeling and Jacobian computation. For the forward modeling, we consider models with up to 122 × 10 6 degrees of freedom (DOFs) in the resulting system of linear equations and demonstrate that the code exhibits almost linear scalability on up to 10000 DOFs per process. On the other hand, the code shows superlinear scalability for the Jacobian computation, mainly because all computations are fairly evenly distributed over each process with no parallel communication.

ERT data simulation is performed by solving the electrostatic Poisson equation.Numerical methods are needed to solve the governing equation in arbitrary 3D conductivity structures.The most used numerical methods for 3D ERT modeling are the finite-difference (FD) method (Dey and Morrison, 1979;Spitzer, 1995;Penz et al., 2013), finiteelement (FE) method (Coggon, 1971;Li and Spitzer, 2002;Rücker et al., 2006;Blome et al., 2009;Johnson et al., 2010;Ren et al., 2018), and integral equation method (Lee, 1975;Schulz, 1985;Méndez-Delgado et al., 1999).Among these methods, the FD and FE methods are attractive choices for highly heterogeneous distributions of electrical conductivity in the subsurface.However, application of the FD method is mostly limited to simple model geometries due to its requirement of Cartesian grids (Rücker et al., 2006).By contrast, the FE method has proven to be effective in accounting for complex geometries, specifically by using unstructured meshes (Rücker et al., 2006;Blome et al., 2009;Johnson et al., 2010;Ren et al., 2018).
The finite-volume (FV) method can also be used effectively to account for complex geometries (Jahandari and Farquharson, 2014).The FV method is usually seen relatively close to the FD method as it inherits the simplicity similar to the FD method in its implementation.Unlike the FD method which discretizes the differential form of the governing equation, the FV method, however, directly discretizes its integral form.Nevertheless, the FV method has rarely been considered for ERT simulation problems.In our literature search, we came across only Cockett et al. (2015), which implements the method for the ERT modeling.
On the other hand, ERT data inversion is a nonlinear optimization problem to minimize a cost function that represents a measure of the difference between observed and simulated ERT data.For large-scale 3D ERT data, the inversion is often performed iteratively by linearizing the optimization problem at each iteration, calculating a model update, updating a conductivity model, and subsequently simulating ERT data for the updated model to examine the new cost function.The process continues until the cost function reduces to a predefined tolerance level.The model update is typically calculated using a gradient-based local optimization method, e.g., the steepest-descent, conjugate-gradient, or the Gauss-Newton method (Park and Van, 1991;Ellis and Oldenburg, 1994;Zhdanov and Keller, 1994;Zhang et al., 1995;Günther et al., 2006;Johnson et al., 2010).The Gauss-Newton method has usually been preferred due to its high convergence rate.Indeed, the fast convergence occurs at the cost of intensive computations because the method requires building the Jacobian (or sensitivity) and/or Hessian matrices at each inversion iteration (Nocedal and Wright, 2006).
For large-scale 3D ERT problems with tens of millions of degrees of freedom (DOFs), forward modeling and inversion are computationally demanding jobs and may need hours, if not days, of computing time if the underlying algorithms are not properly implemented to scale well on high-performance computing (HPC) resources or supercomputers.For such datasets, it is computationally infeasible to use desktopbased ERT data processing software without exploiting HPC resources.Recently, a few open-source 3D ERT modeling and inversion codes have been developed to handle moderately sized ERT problems composed of up to a few million DOFs, e.g., ResIPy (Blanchy et al., 2020), SimPEG (Cockett et al., 2015), and pyGIMLI (Rücker et al., 2017).However, these codes still lack full-scale parallel implementations to efficiently distribute computations over a supercomputer.On the other hand, another open-source code, E4D, is highly scalable on HPC resources (Johnson et al., 2010;Johnson and Wellman, 2015).It can be run on hundreds or even thousands of processes (or cores) on a supercomputer (Johnson et al., 2010).But despite this, the maximum number of processes that it can exploit is limited to the number of electrodes in a survey.
In this paper, we present an implementation of massively parallel 3D ERT modeling and inversion algorithms.The forward modeling is based on the FV method, and inversion employs the Gauss-Newton method.The algorithms are implemented within PFLOTRAN (https://www.pflotran.org,last access: 27 January 2023), an open-source, massively parallel code that leverages parallel data structures and solvers within PETSc (Portable, Extensible Toolkit for Scientific Computation) (Balay et al., 2021) to simulate subsurface flow and transport processes on supercomputers (Hammond et al., 2012(Hammond et al., , 2014)).Although we are presenting only ERT modeling and inversion capabilities here, the main motivation for implementing these capabilities in PFLOTRAN is to perform coupled modeling and inversion of ERT, flow, and/or transport problems in the future.
The paper is structured as follows: we first outline a detailed background theory on the ERT forward modeling including the governing Poisson equation and FV method to solve it numerically.We then describe the ERT inversion by defining it as a nonlinear optimization problem and the Gauss-Newton method to find a minimizer of the problem along with details on computing the Jacobian matrix.Thereafter, we discuss the parallel implementations of the modeling and inversion algorithms.We subsequently benchmark numerical results computed using PFLOTRAN against analytical solutions for a layered (1D) earth model and numerical solutions for a 3D earth model.Next, a 3D inversion result is presented to illustrate the inversion capability.Scalability tests are then performed to demonstrate the parallel performance of the PFLOTRAN ERT process model before discussing the results and drawing our final concluding remarks.

Forward ERT modeling
Forward modeling is a way of simulating ERT data for any given 1D, 2D, or 3D electrical conductivity models of the earth.For 1D conductivity models, the electrical potentials generated by an induced point current source can be obtained by using analytical or semi-analytical methods (Das, 1995;Pervago et al., 2006).However, for multi-dimensional heterogeneous conductivity models with complex geometries, one must use numerical methods to compute the electrical potential.This section describes the numerical computations of the electrical potentials in a 3D medium and their superposition to produce simulated ERT data.

Poisson's equation
The ERT forward modeling is governed by the following electrostatic Poisson equation: where φ is the electrical potential at position r for a given conductivity σ (r) due to a current I injected through a point located at r s and δ is the Dirac delta function.For brevity, hereinafter, the dependencies on r will be omitted except where it is necessary to show them.
If side and bottom boundaries ∂ of the 3D computational domain are located at sufficiently far from the current injection location r s , the potential and the normal components of the current density σ ∂φ ∂n asymptotically approach zero.On the top or surface boundary ∂ S , the normal current density σ ∂φ ∂n is zero as no current flows through the earth surface along the outward normal vector.
To simulate φ in arbitrary 3D conductivity structures, Eq. (1) needs to be solved using numerical methods subject to the boundary conditions in Eq. (2).

Finite-volume method
The FV method is implemented to compute φ by solving Eq. (1).The 3D computational domain is first discretized into a set of control volumes known as cells (Fig. 1).These cells can be of arbitrary shape and size, but they must be bounded by planar surfaces for our implementation.The bounding discrete surfaces are known as cell faces.
Let us consider the governing Poisson equation (Eq. 1) and integrate it over the ith cell of the domain .This gives where V i is the volume of the ith cell.By applying the Gauss divergence theorem in Eq. ( 3) and using the translation property of the Dirac delta function, we get where S i is the surface area of the bounding faces of the ith cell, and dS is a differential area on the bounding surface with a unit surface normal n pointing outward.
If the ith cell is bounded by N f faces, Eq. ( 4) can be replaced by an equivalent discrete form where f represents faces and S f is the area of the bounding face f .Using a two-point flux approximation for each face term in Eq. ( 5) by considering two adjacent cells i 1 and i 2 shown in Fig. 1, we get where S nf is the area of the common face f projected onto the plane normal to the vector connecting centers of cells i 1 and i 2 .
The conductivity at the face, σ f , can be obtained by averaging the conductivity of cells i 1 and i 2 .We use the following harmonic distance-weighted scheme to calculate σ f : where σ i 1 and σ i 2 are the conductivities of cells i 1 and i 2 and d i 1 and d i 2 are the distances of the projected face from the center of cells i 1 and i 1 , respectively (Fig. 1).
The value of the flux at the face f , i.e., (∇φ) f , is obtained by using the approximation Substituting Eqs. ( 7) and (8) in Eq. ( 6) and assuming that the computational domain is subdivided into N m FV cells, i.e., = ∪ N m i=1 i , where i ⊂ R 3 is the ith cell domain, we have where A i,k is the local coefficient matrix and s i is the source vector for the ith cell.
The assembly of the local coefficient matrix into a global system, upon carrying out the summation in Eq. ( 9), results into the system of linear equations where A ∈ R N m ×N m is the symmetric system matrix resulting from the FV discretization of Eq. ( 1), φ ∈ R N m is a vector containing the unknown electric potential for all cells, and s ∈ R N m is the source vector resulting from the right-hand side of Eq. (1).To solve this linear system, we utilize various options of iterative solvers available from PETSc.Each electrode in an ERT survey results in a different vector s.Thus, for N e electrodes, we need to solve Eq. ( 10) for N e times with a different right-hand side.The solutions provide electrical potential at each cell center of the discrete computational domain and for each current injection location.Finally, the potential distribution for any combination of source/sink electrodes can be obtained by subtracting the potential associated with the sink electrodes from the source electrode potential (Johnson et al., 2010).

Inversion
ERT inversion is a procedure of estimating electrical conductivity of the earth from a set of observed ERT data and/or some imposed constraints.The ERT inverse problem is a nonlinear optimization problem.For large-scale 3D ERT data, the inversion is often performed iteratively by linearizing the problem at each iteration.This section details our approach of performing inversion of ERT data.

Optimization problem
We pose the inverse problem as a nonlinear minimization problem of finding a conductivity model where m = (m 1 , m 2 , . .., m N m ) T is an unknown model parameter vector in the model set M. Since the conductivity within subsurface can vary over several order of magnitudes, the inversion operations are performed for logarithmic conductivity, i.e., we assume m = ln σ .This also enforces positivity of conductivity values.The quantity ψ is the cost function, defined as where ψ d is the data cost function that represents a measure of the misfit between the observed d obs and synthetic d syn ERT data, ψ m is the regularization cost function representing a measure of the variation in the model parameters, and β is a regularization parameter.
A least-squares method is used to minimize ψ (Nocedal and Wright, 2006).Thus, ψ d and ψ m can be expressed as where vectors d obs and d syn are N d dimensional vectors containing the observed and synthetic ERT data.
is the data weighting matrix, usually a diagonal matrix, whose elements are estimated based on the standard deviation of the noise; W m ∈ R N m ×N m is the regularization matrix, and m ref is a N m dimensional vector containing the logarithmic of reference conductivity model parameters.

Gauss-Newton method
To find a minimizer m inv of the cost function in Eq. ( 12), we use a Gauss-Newton minimization approach.This is an iterative approach where after linearizing ψ(m k ) in the vicinity of a model m k for a small model perturbation δm k at the kth iteration, we obtain a set of linear equations or the normal equation where the Gauss-Newton Hessian given by and the gradient vector by The matrix J k is the Jacobian matrix also known as the sensitivity or the Fréchet derivative matrix.Following Johnson et al. (2010), we implement a parallel conjugate-gradient least-square (CGLS) solver (Hestenes and Stiefel, 1952) to solve the normal Eq. ( 15).The normal equation is first reformulated into an equivalent least-squares problem and then solved for the model update δm k using only the Jacobian matrix J k without explicitly forming the Hessian matrix H k .As N m N d for typical ERT surveys, this approach avoids the massive computational cost associated with the explicit formation of H k .We therefore only need to compute J k at each iteration, which is also a computationally intensive step.The effective computation of J k is explained in the next subsection (Sect.3.3).
The solution of Eq. ( 15) gives a model update vector δm k at the kth iteration such that a new model decreases the cost function ψ, i.e., ψ(m k+1 ) < ψ(m k ).
There is a common practice to find step length α using a line search method (Nocedal and Wright, 2006); however, from our experience, a full step length, i.e., α = 1, works well for the ERT data inversion.
In our implementation, the Gauss-Newton iteration usually starts with an average apparent conductivity model as an initial model and continues until the data misfit drops below a predefined tolerance level (defined by a chi-squared data misfit χ 2 = φ d /N d ≤ 1) or it reaches a predefined maximum number of iterations.Furthermore, a predefined input conductivity model can also be used as an initial model if it can be estimated from the existing geological and/or geophysical information.After the inversion converges, it provides the minimizer m inv which represents the inverted conductivity model σ inv = exp(m inv ).

Jacobian computation
Calculating J at each iteration is one of the most computationally intensive steps in Gauss-Newton inversion.Therefore, an efficient J drives the robustness of the developed inversion scheme and maximizes its scalability.The Jacobian matrix characterizes the change in the synthetic ERT data d syn relative to a change in the model parameters m and is defined as the partial derivatives of d syn with respect to m = ln σ ; i.e., An adjoint state method is used to effectively compute the Jacobian matrix.Let A and B be the source and sink electrodes located at r A and r B and M and N be the two measuring or receiving electrodes located at r M and r N .The potential difference between M and N can be obtained using the potential computed due to the source A at M and N, i.e., φ AM and φ AN , and the sink B at M and N, i.e., φ BM and φ BN .A simplified four-electrodes Wenner configuration is shown in Fig. 2. Note that, contrary to this configuration, electrodes can be placed in any other configurations, e.g., Schlumberger, pole-pole, pole-dipole, dipole-dipole, and gradient arrays (Dahlin and Zhou, 2004).
Using the expression derived in Appendix A (Eq. A9), an element of the Jacobian matrix is obtained by where φ A and φ B are potential distributions due to the source A and sink B and φ M and φ N are potential distributions due to pseudo-source and pseudo-sink, respectively, at the receiving electrodes M and N.
Assuming φ S = φ A − φ B and φ R = φ M − φ N , where φ S is the net potential due to the source and sink electrodes A and B, and φ R is the net potential due to source and sink assumed to be located, respectively, at the receiving electrodes M and N, this results in Equation ( 21) is valid for any numbers of the source/sink electrodes and the receiving electrodes as well as any of their combinations.One only needs to compute the net potentials due to the source/sink electrodes φ S and receiving electrodes φ R using the superposition principle.

Parallel implementation within PFLOTRAN
The parallel implementation strategy for the ERT forward modeling and inversion capabilities follows the existing PFLOTRAN's strategy for flow and transport process models.Forward ERT modeling is implemented by extending PFLOTRAN's hierarchy of Fortran classes and data structures to create a new ERT process model that constructs and solves the linear systems of equations described in Sect. 2. The implementation of inversion, as described in Sect.3, is https://doi.org/10.5194/gmd-16-961-2023 Geosci.Model Dev., 16, 961-976, 2023 completely new to PFLOTRAN and requires all the necessary steps (or calculations) for ERT inversion, such as the cost-function computation, regularization, Jacobian computation, solution to the dense Gauss-Newton normal equation, and model update.
To execute simulations in parallel, domain decomposition is employed to distribute the discrete computational domain across (computer) processes using a division along the three principal axes for structured grids (calculated by PETSc or explicitly defined by the user), or in the case of unstructured grids, the ParMETIS parallel graph partitioner (Karypis and Schloegel, 2013) is employed to divide the domain.All interprocess communication is carried out with MPI (Message Passing Interface), and PETSc's parallel data structures facilitate much of the MPI communication.
Efficient partitioning and the overlapping of communication with computation (e.g., calculating floating point operations while data are being transferred between processes) helps improve parallel performance.The latter helps mask communication costs.The goal is to maximize the time spent in computation and minimize interprocess communication and enforce load balance, the even distribution of computation across processes employed.Efficient partitioning can be accomplished by dividing the domain evenly and minimizing the number and size of (MPI) messages being passed between processes.However, an even distribution does not necessarily guarantee load balance as certain regions of the domain with time-consuming operations (e.g., localized physics, intense input/output or I/O) may require more computation.Thus, efficient partitioning is not trivial.
PFLOTRAN's parallel file I/O is through HDF5 (Hierarchical Data Format version 5), a platform-independent binary file format that leverages MPI-IO operations to read/write a single file using multiple (potentially thousands of) processes (The HDF Group, 2022).PFLOTRAN writes ASCIIformatted observation files (files storing ERT data at specific locations in space) locally on the processes owning the observation points.Thus, the writing of observation files requires no parallel communication.For more comprehensive details on the parallel implementation strategy employed within PFLOTRAN, interested readers are referred to Hammond et al. (2012Hammond et al. ( , 2014)).

Modeling benchmarking results
To examine the accuracy of the PFLOTRAN ERT process model, simulation results are compared to solutions obtained using well-established analytical and numerical methods.Two cases are considered for the comparison: three-layer earth and 3-D earth resistivity models.All benchmarking simulations were performed on the Deception supercomputer housed at the Pacific Northwest National Laboratory.It is composed of 96 compute nodes where each node has 64-core The model boundaries are extended to ±3000 m in the x and y directions and 3000 m in the z direction to accommodate zero Dirichlet boundary conditions at the side and bottom boundaries.These boundary extensions are not shown in Fig. 3. Vertical electrical sounding (VES) data are simulated at the center of the model considering a Wenner configuration.This configuration consists of four electrodes A, B, M, and N, which are deployed in-line with an equal spacing a between the two neighboring electrodes (Fig. 2).Electrodes A and B again act as the current source and sink electrodes, while electrodes M and N act as the potential receiving electrodes.Starting with a smaller value of a for a shallow-depth investigation, the value is progressively increased to examine deeper depths.In our case, we start with a minimum value a min = 8 m and progressively increase it to a maximum a max = 132 m following a = a min + iδa, where δa = 4 m and i = 0, 1, 2, . .., 31.This results in N d = 32 VES sounding recordings using N e = 128 electrodes.
To simulate the VES data, the model is discretized using a mix of uniform and non-uniform cell sizes in the x, y, and z directions.The main computational domain is discretized with a uniform grid spacings of 2 m in the x and y directions and 1 m in the z direction.The boundary regions are discretized with severely stretched non-uniform grid spacings following Jaysaval et al. (2014).The discretization results in a grid with 240 × 240 × 120 cells.Therefore, using the FV method to simulate electric potentials leads to a system of linear equations with 6 912 000 DOFs. PFLOTRAN simulated the VES data using 512 processes on Deception.The total simulation time was 45 s.The simulated VES results are compared to analytical solutions obtained from SimPEG (Cockett et al., 2015).Figure 4 shows the comparison of the apparent resistivity ρ a responses.Analytical and PFLOTRAN results are shown by the solid black line and filled red circles, respectively.We observe that both responses agree very well.To make the comparison more quantitative, we calculate the relative difference between the two responses and show the difference in Fig. 4 (blue line).The relative difference varies from 0.02 % to 1.3 %.We also calculate the average relative difference using where ρ a with subscripts 1 and 2, respectively, represents the apparent resistivity computed using PFLOTRAN and the analytical method from SimPEG.The average relative difference is 0.18 %.These relatively small differences demonstrate the high degree of accuracy of the PFLOTRAN ERT process model for the 1D model.Note that the apparent resistivity was computed using where φ = φ AM − φ AN − φ BM + φ BN and G is a geometric factor given as where distances between the source electrode A and the receiving electrodes M and N are, respectively, r AM and r AN , while those between the sink electrode B and the receiving electrodes M and N are, respectively, r BM and r BN .For the Wenner configuration, r AM = a, r AN = 2a, r BM = 2a, and r BN = a.Therefore, geometric factor G = 1/2π a was used for calculating ρ a using Eq. ( 23).

Three-dimensional earth model
In the previous example, PFLOTRAN accuracy for simulating ERT responses was validated for a simplistic 1D earth model.In the real world, however, the earth models are 3-D with varying degrees of heterogeneity.Therefore, it is important to validate the accuracy of numerical results computed using PFLOTRAN against results computed using well-established methods for 3D models.
We consider the 3D model shown in The main computational domain is discretized with a uniform grid spacing of 0.2 m in all the three directions, while the boundary domain is again discretized with severely stretched non-uniform grid spacings.The model discretization leads to a grid with 200 × 120 × 60 FV cells and, hence, 1 440 000 DOFs in the corresponding system of linear equations.PFLOTRAN simulated the ERT profiling data using 128 processes on Deception.The total simulation time was 9 s.sistivity on the right sides of the plots.These responses are compared to responses computed using E4D (Johnson et al., 2010;Johnson and Wellman, 2015).In E4D, an unstructuredmesh FE method is implemented to perform the forward modeling.The filled red circles show the responses computed using E4D.The relative differences between the two codes are shown by the lower blue lines, which vary from 0.02 % to 2.5 %.The average relative difference computed using Eq. ( 22) is 0.77 %.These relatively small differences imply a good agreement between responses computed using both codes.

Inversion results
To examine the efficiency of PFLOTRAN to invert ERT data, we consider a model modified from PFLOTRAN simulated these ERT data using 128 processes on Deception.The total simulation time was 6 s.In this case, ERT data were simulated in the form of the resistance obtained using φ I , which basically equals to the potential difference φ for unit current (I = 1 A).
The simulated ERT data were then inverted using PFLO-TRAN starting with a model having a half-space conductivity of 0.0022 Sm −1 .The starting model conductivity was obtained by averaging the apparent conductivity computed from the given ERT data.The previously described modeling grid also acted as the inversion grid; i.e., each cell in the model was used as an inversion parameter.Therefore, we needed to invert for N m = 864 000 model parameters.PFLO-TRAN inverted the ERT data using 512 processes on Deception.The total inversion time was 74 s.
Figure 7 shows the inverted conductivity model.The inverted model agrees well with the true model: the inversion recovered the conductive and resistive anomalies in the model.The inverted resistivity, however, varies smoothly in the model because a smooth regularization constraint is applied by minimizing the difference in the cell conductivity and the conductivity of its neighboring cells.Further discussion on the implementation of our regularization schemes is beyond the scope of the current paper.
The initial chi-squared data misfit χ 2 was 14.8.As the inversion progressed, χ 2 decreased as shown in Fig. 8.The inversion took 11 iterations to satisfy the convergence criteria defined by χ 2 ≤ 0.9.To further examine the efficiency of PFLOTRAN to fit the individual ERT data, a histogram of data-fit error is also shown in Fig. 9.The data-fit error is defined as ratio of the residual resistance (difference in the observed and predicted resistances) and observed resistance expressed as a percentage.The histogram implies that the ERT data were fitted comparatively well.

Scalability tests
Scalability tests demonstrate parallel performance or how efficiently a simulator utilizes additional processes to either speed up a simulation of fixed problem size (strong scaling) or maintain speed as the problem size and number of processes increase proportionally (weak scaling).For ideal strong scaling, the speedup is proportional to the increase in the number of processes (e.g., doubling the number of processes cuts the run time in half), while ideal weak scaling results in an unaltered or constant run time.In either (ideal) case, the additional computing capacity is efficiently utilized.Ideal strong scaling is often termed linear speedup, whereas superlinear speedup is better than ideal (e.g., doubling the number of processes reduces the run time to less than half).Superlinear performance is not uncommon, as access to an increased cache (efficient, high-performance memory on the processor core) can generate superlinear speedup.
As discussed earlier, there are two computationally intensive steps to invert ERT data: forward modeling and Jacobian computation.To demonstrate the parallel performance of PFLOTRAN for these two steps, large-scale strong scaling tests are reported in this section by using up to 1024 processes on the Deception supercomputer or up to 131 072 processes on the Theta supercomputer.Configuration details for Deception were previously given in the "Modeling benchmarking results" section (Sect.5).On the other hand, Theta is a leadership-class supercomputer, ranked within the top 100 fastest computers in the world and housed at the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory (https://www.alcf.anl.gov/alcf-resources/theta, last access: 27 January 2023).With a peak performance of 11.7 PFLOPS, it is composed of 4392 compute nodes connected through a Cray Aries network.Each node has 64-core Intel Xeon Phi 7230 processors running at 1.3 GHz with 192 GB of memory.Therefore, there are 281 088 processes available on Theta.The PFLOTRAN compilations on Deception and Theta were linked to Intel's Math Kernel Library (MKL).The code leverages solely the distributedmemory parallel capability on the machines and not the shared-memory multi-threading.

Forward modeling scalability
The three-layer earth model in Fig. 3 is first rediscretized to generate several test matrices for the analysis by considering different cell sizes in the x, y, and z directions.These test matrices result in six different modeling scenarios from coarser to finer grids.Table 1 completely specifies details of the test matrices M2, M7, M19, M35, M64, and M122 along with the number of DOFs in the governing matrix equations that ranges between 2 ×10 6 and 122 ×10 6 .The naming of these matrices is based on the number of DOFs they represent; e.g., M19 represents a matrix equation with about 19 ×10 6 DOFs.To the best of our knowledge, the modeling with 122 ×10 6 DOFs stands as the largest ERT modeling reported in the geophysical literature.The next largest 3D ERT modeling problem reported in the literature was composed of about 10 ×10 6 DOFs (Ren et al., 2018), while most others were often composed of less than 1 ×10 6 DOFs (Günther et al., 2006;Rücker et al., 2006;Johnson et al., 2010;Johnson and Wellman, 2015).Having the capability to solve for up to a https://doi.org/10.5194/gmd-16-961-2023 Geosci.Model Dev., 16, 961-976, 2023   of ideal performance based on the 64 process run time for the M19 simulation: where T n is the time on n processes and T 64 is the time on 64 processes.For all problem sizes, the wall-clock times flatten at higher numbers of processes.This phenomenon is expected as the amount of work per process (DOFs per process) decreases at higher process counts and the cost of communication between processes begins to dominate.The M19 performance begins to deviate from ideal at ∼ 2048 processes, which Table 1.Test matrices obtained by rediscretizing the three-layer earth model in Fig. 3 using different cell sizes dx, dy, and dz in the x, y, and z directions.N x , N y , and N z are, respectively, the number of FV cells in the x, y, and z directions.Note that the cell sizes are given only for the main computational domain; the boundary domain is discretized using severely stretched non-uniform cell sizes.19, 35, 64, and 122 ×10 6 (see Table 1 for details).
equates to ∼ 9000 DOFs per process.Similar behavior is observed for many of the other problem sizes.Hammond et al. (2014) demonstrate that for groundwater flow and reactive transport simulation PFLOTRAN runs most efficiently when at least 10 000 DOFs are assigned to each process.These results show that this 10 000 DOF threshold also applies for PFLOTRAN ERT simulations.That is, PFLOTRAN's scalability on the supercomputer should be more ideal when the number of DOFs per process is greater than 10 000.Another factor limiting parallel performance is the limited scalability of linear Krylov solvers at high process counts.Mills et al. (2009) demonstrates that the vector inner products performed within Krylov solvers become increasingly expensive as tens of thousands of processes are employed to solve a problem.These inner products (or MPI global reduction operations) require repeated global synchronization across all processes involved.As more processes are employed to solve the problems (requiring communication across a growing network of compute nodes), the transfer of information for each vector inner product is more expensive time-wise (i.e., increased number of hops within the highspeed network, longer distance traveled).Figure 10 reveals a similar phenomenon.

Jacobian computation scalability
To test the scalability of PFLOTRAN for the Jacobian computation, we consider a Jacobian matrix computed for inverting the ERT data in the "Inversion result" section (Sect.6).Recalling that N d = 2070 ERT data and N m = 864 000 model parameters for the inversion example, the dimension of the Jacobian matrix J is therefore 2070 × 864 000.The Jacobian matrix is computed at each inversion iteration after the forward modeling run and used in the normal equation (Eq.15) to get a model update δm.https://doi.org/10.5194/gmd-16-961-2023 Geosci.Model Dev., 16, 961-976, 2023 Figure 11.PFLOTRAN strong scaling for computing the Jacobian for the ERT inversion example in Fig. 7.The dimension of the Jacobian matrix J is 2070 × 864 000 and is distributed uniformly over each computing process.PFLOTRAN exhibits superlinear performance for larger numbers of processes.
The Jacobian scalability test is performed on the Deception supercomputer. Figure 11 illustrates the strong scalability of PFLOTRAN wall-clock time for computing J .It shows that PFLOTRAN exhibits linear (and sometimes superlinear) scalability for computing J with increasing numbers of processes.Consequently, the total wall-clock time for computing J using 1024 processes reduces to 0.5 s compared to 887 s when using a single process on Deception.This ideal performance in computing J is most likely due to even load balance across processes and the lack of MPI communication in its construction.Moreover, as noted before, the superlinear performance is most likely due to access to increased cache with an increasing number of processes.
For large-scale ERT inversion problems, the linear scaling for computing J can be exploited to reduce the overall inversion wall-clock time even if PFLOTRAN exhibits deteriorating scalability for the forward modeling on a very large number of processes.For example, for an ERT inversion problem where the J calculation time is much higher than the forward modeling time, increasing the number of processes to reduce the overall inversion wall-clock time is recommended, even when forward modeling exhibits a significant departure from the linear scalability.One has to find a balance between the two, and the most time-consuming operation should govern.

Discussions
In the past 20 years, a few open-source 3D ERT modeling and inversion codes were developed to handle moderatescale ERT problems (e.g., ResIPy,SimPEG,pyGIMLI,and E4D).Although some of these codes can practically tackle most present ERT modeling and inversion problems having less than 10 ×10 6 DOFs, they still lack full-scale parallel implementations to efficiently distribute numerical computations over massive supercomputers.The limitation can also lead to computational challenges for large-scale ERT problems with 100 ×10 6 or more DOFs, which may not be solved efficiently with the existing codes.The 3D ERT modeling and inversion capability developed within PFLOTRAN removes this limitation.To examine the accuracy of the forward modeling capability, numerical results from PFLO-TRAN were compared to analytical solutions for a layered earth model and numerical results obtained from E4D for a 3D earth model.The average relative differences stayed within 1 %, which demonstrated the high degree of accuracy of the PFLOTRAN modeling capability.Furthermore, large-scale strong scalability tests, using up to 131 072 processes, were performed on a leadership class supercomputer to demonstrate efficient distribution of computational workloads over massive supercomputers.The scalability tests demonstrate that computational times for the Jacobian computation scale linearly with increasing number of processes.For the forward modeling, PFLOTRAN showed a linear to near-linear scaling behavior up to 10 000 DOFs per process, beyond which performance tends to degrade.
Due to the efficient parallel performance of PFLOTRAN observed above, a large-scale ERT simulation problem with 122 ×10 6 DOFs and 122 electrodes could be solved in about 2 min by using 32 768 processes as compared to taking about 1 h with 512 processes.Combining this forward modeling scalability along with the ideal scaling of the Jacobian computation, we can aim for inverting moderate-scale ERT problems within minutes and large-scale problems within hours with the access to massive supercomputers.The moderatescale inversion example of Sect.6 demonstrated that inversion of N d = 2070 data and N m = 864 000 model parameters could be performed in about 1 min by using 512 processes.
Finally, there were two key motivations for implementing ERT capabilities in PFLOTRAN.The first motivation was to utilize the massive parallel infrastructure implemented within PFLOTRAN for the flow and transport simulations as illustrated by Hammond et al. (2012Hammond et al. ( , 2014) ) for simulating subsurface problems composed of billions of DOFs utilizing hundreds of thousands of computing processors.This infrastructure made PFLOTRAN a natural fit to achieve our goal of massive parallel implementation for the ERT forward modeling and inversion capabilities.Researchers familiar with PFLOTRAN can now have immediate access to the added geophysical ERT capabilities with minimal learning efforts.
The second motivation was to have a native implementation of ERT modeling and inversion capabilities along with flow and transport simulations.The new capabilities will allow us to perform fully coupled hydro-geophysical simulations and inversion.The idea behind the joint hydrogeophysical simulation is to use known flow and transport parameters, such as porosity and intrinsic permeability, and simulate for hydrological variables, e.g., liquid saturation and solute concentration.Using petrophysical relationships (e.g., Archie's law or its modifications; Slater, 2007), these parameters and variables can be mapped into electrical conductivity, which subsequently can be used for simulating ERT responses.In our previous work, a joint hydrogeophysical simulation capability was developed by combining the capabilities of PFLOTRAN and E4D (Johnson et al., 2017).However, it was not straightforward to use PFLOTRAN-E4D for joint hydro-geophysical simulations.This is because the implementation was based on obtaining flow and transport parameters and variables from PFLO-TRAN using a structured mesh, which then needed to be interpolated on an unstructured tetrahedral mesh for ERT simulations by E4D.The process was cumbersome and had the potential for errors.PFLOTRAN's native implementation of ERT avoids employing any kind of interpolation because both flow and transport and ERT simulations use the same modeling grid.
Furthermore, there are growing interests in joint hydrogeophysical i.e., estimating the distribution of subsurface flow and transport parameters, from time-lapse ERT data; see, e.g., Mboh et al. (2012), Camporese et al. (2015), Rücker et al. (2017), González-Quirós and Comte (2021), and Pleasants et al. (2022).The inclusion of the ERT simulation capability in PFLOTRAN fulfills the requirement of the forward modeling necessary for native implementation of hydro-geophysical inversion.Fully coupled hydrogeophysical simulation and inversion capabilities are currently under development within PFLOTRAN and will be a focus of our forthcoming papers.

Conclusions
Two new capabilities -forward modeling and inversion of geophysical ERT data -are developed within PFLOTRAN, a massively parallel open-source, state-of-the-art, multi-phase, multi-component subsurface flow and transport simulation code.The capabilities are accurate, robust, and highly scalable on HPC platforms.The accuracy of the forward modeling capability was demonstrated for layered earth and 3D earth conductivity models by comparing our numerical results against well-established analytical and numerical results.The average relative differences stayed within 1 %, demonstrating the high degree of accuracy of the PFLO-TRAN ERT process model.The inversion capability was demonstrated for a 3D synthetic case by recovering the conductivity of the model.Large-scale scalability tests illustrated that the forward modeling for models with 100 ×10 6 DOFs can be performed in a few minutes; e.g., ERT modeling for a model with 122 ×10 6 DOFs and 122 electrodes was performed in 128 s by using 32 768 processes on a supercomputer.The Jacobian computation, a key ingredient for the inversion, exhibited ideal or linear scaling, and as a result a Jacobian of 2070 × 864 000 dimension was computed in 0.5 s using 1024 processes compared to 887 s using a single process.Moreover, integration of geophysical ERT capabilities within PFLOTRAN opens the door to perform coupled hydro-geophysical modeling and inversion, which is a focus of our future research.

Figure 1 .
Figure1.Two adjacent FV cells i 1 and i 2 having conductivity σ i 1 and σ i 2 , respectively.The distances of the common face f from the center of cells i 1 and i 1 , respectively, are d i 1 and d i 2 .

Figure 2 .
Figure 2. Wenner electrode configuration: four electrodes A, B, M, and N are deployed in-line with an equal spacing a between the two neighboring electrodes.Electrodes A and B act as the source and sink electrodes where current I is injected, while electrodes M and N act as the receiving electrodes measuring a potential difference φ.

Figure 3 .
Figure 3. Vertical cross section of a three-layer earth resistivity model used for benchmarking PFLOTRAN results against the analytical solutions.

Figure 4 .
Figure 4. Apparent resistivity ρ a responses computed for the layered earth resistivity model in Fig. 3 (upper plot).They are calculated using the analytical method (solid black lines) and PFLO-TRAN (filled red circles).The lower plot (blue line) shows the relative difference (in percentage) between the analytical and PFLO-TRAN results.
Fig. 5.The model dimension is 32 × 16 × 8 m 3 .It has a homogeneous background of conductivity 0.002 Sm −1 and includes two anomalous blocks: a conductive block (0.2 Sm −1 ) and a resistive block (0.0002 Sm −1 ).Both anomalous blocks have a dimension of 2 × 2 × 2 m 3 and are separated by 4 m in the x direction.As in the previous case, the model boundaries are extended to ±1000 m in the x and y directions and 1000 m in the z direction to accommodate zero Dirichlet boundary conditions at the side and bottom boundaries.ERT profiling data are simulated along a survey profile (yellow line) at y = z = 0 m from x = −16 m to x = 16 m using the Wenner configuration with increasing electrode spacing a from 1 to 6 m.Unlike the VES performed for the 1D model where data were simulated at a single observation point at the center of the model, the ERT profiling, performed for the 3D model, records data at multiple observation points along the survey profile.The profiling, therefore, helps in investigating lateral conductivity variations in the model along the x direction.In this case, it results in N d = 129 ERT data recording by using N e = 32 electrodes placed uniformly at every 1 m from x = −15.5 m to x = 15.5 m.
Figure 6a-f (upper curves) show the apparent resistivity ρ a responses computed for the 3D model in Fig. 5 using the ERT profiling with the Wenner configuration with electrode spacings of 1, 2, 3, 4, 5, and, 6 m.The solid black lines show the responses computed using PFLOTRAN.All responses exhibit the characteristics of the conductive and resistive anomalous blocks embedded in a homogeneous background of conductivity 0.002 Sm −1 or resistivity 500 m; see, e.g., the low apparent resistivity on the left and high apparent rehttps://doi.org/10.5194/gmd-16-961-2023Geosci.Model Dev., 16, 961-976, 2023

Figure 5 .
Figure 5. Three-dimensional earth resistivity model composed of two anomalous blocks embedded in a homogeneous background.ERT profiling data are simulated along a survey profile at y = z = 0 m from x = −16 m to x = 16 m using the Wenner configuration with increasing electrode spacing a from 1 to 6 m.

Figure 6 .
Figure 6.Apparent resistivity ρ a responses computed for the 3D earth resistivity model in Fig. 5 using the Wenner configuration with spacings: (a) a = 1 m, (b) a = 2 m, (c) a = 3 m, (d) a = 4 m, (e) a = 5 m, and (f) a = 6 m (upper plots).They are calculated using PFLOTRAN (solid black lines) and E4D (filled red circles) for an ERT profiling along a survey profile at y = z = 0 from x = −16 m to x = 16 m.The lower plots (blue lines) show the relative difference (in percentage) between PFLOTRAN and E4D results.
Fig. 5 by cropping the model for |x| > 8 m.The modified model, therefore, has the dimension of 16 × 16 × 8 m 3 .The model is discretized using Geosci.Model Dev., 16, 961-976, 2023 https://doi.org/10.5194/gmd-16-961-2023P. Jaysaval et al.: ERT modeling and inversion using PFLOTRAN 969 the same grid spacings as in the previous 3D modeling example, which resulted in 120×120×60 = 864 000 FV cells.We computed synthetic ERT data for three survey lines located at the ground surface at y = −5 m, y = 0 m, and y = 5 m from x = −8 m to x = 8 m.Electrodes are placed uniformly with an electrode spacing of 1 m along the three survey lines from x = −7.5 m to x = 7.5 m, i.e., 16 electrodes along each survey line.Using these N e = 48 electrodes, ERT data were simulated considering different combinations of a set of 4 electrodes, where 2 electrodes acted as the current and the remaining 2 as the potential receiving electrodes.These electrode combinations result in N d = 2070 simulated ERT data.

970P.
Figure 7. Inversion result of the simulated ERT data for a model modified from Fig. 5 by cropping the model for |x| > 8 m.

Figure 8 .
Figure 8.A measure of data misfit as the inversion progresses: χ 2 versus number of iteration.

Figure 9 .
Figure9.Histogram of data-fit error, which is defined as ratio of the residual resistance (difference in the observed and predicted resistances) and observed resistance expressed as a percentage.