the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Automatic tuning of iterative pseudo-transient solvers for modeling the deformation of heterogeneous media
Thibault Duretz
Albert de Montserrat
Rubén Sevilla
Ludovic Räss
Ivan Utkin
Arne Spang
Geodynamic modeling has become a crucial tool for investigating the dynamics of Earth deformation across various scales. Such simulations often involve solving mechanical problems with significant material heterogeneities (e.g. strong viscosity contrasts) under nearly incompressible conditions. Recent advancements have enabled the development of iterative solvers based on Dynamic Relaxation (DR) or Pseudo-Transient schemes, which require minimal global communication and exhibit quasi-linear scaling on GPU and supercomputing architectures. These solvers incorporate automatic tuning of iterative parameters, including pseudo-time steps and damping coefficients, based on spectral estimates of the discrete operators, ensuring both robust and rapid convergence. We demonstrate the effectiveness of this approach on discretized problems with finite-difference and face-centered finite volume methods, including heterogeneous incompressible Stokes flows. Moreover, the relative algorithmic simplicity of DR-based methods allows straightforward extensions to compressible flow, multiphase flow, and nonlinear constitutive laws, opening promising avenues for large-scale, high-resolution simulations of geoscientific problems.
- Article
(6641 KB) - Full-text XML
- BibTeX
- EndNote
Geodynamic modeling has become an indispensable tool for simulating tectonic processes at various scales, ranging from the rock sample (e.g. Luisier et al., 2023) and outcrop levels (e.g. Schmalholz and Podladchikov, 2001) to regional (Gerya, 2013), global (e.g. Fuchs and Becker, 2022), and planetary scales (e.g. Tackley, 2023). Geodynamic modeling tools aim to solve the conservation equations of linear momentum, mass, and energy to produce full-field solutions for velocity, pressure, and temperature fields in space and time (Gerya, 2019).
While 2D geodynamic simulations can effectively leverage solvers that employ sparse-direct or direct-iterative schemes (e.g. Dabrowski et al., 2008; Gerya and Yuen, 2003; Fullsack, 1995; Popov and Sobolev, 2008), 3D simulations generally require iterative solvers as direct methods become prohibitively expensive. Most approaches in this context rely on multigrid solvers (e.g. May et al., 2015; Tackley, 1996; Kronbichler et al., 2012; Kaus et al., 2016; Baumgardner, 1985; Zhong et al., 2008; Shih et al., 2022), spectral methods (e.g. Balachandar and Yuen, 1994), or pseudo-transient integration techniques (Duretz et al., 2019; Räss et al., 2022) to achieve efficient solutions.
In geodynamic models, the viscosity field may exhibit smooth (i.e. thermal variations) and sharp (i.e. material interfaces) spatial variations of several orders of magnitude. This peculiarity makes the iterative solution procedure of the near-incompressible and incompressible Stokes equation particularly challenging (e.g. Tackley, 1996; May et al., 2015; Räss et al., 2022; Shih et al., 2022).
Iterative solvers based on pseudo-transient integration or Dynamic Relaxation (DR) have been employed to solve compressible and incompressible geomechanical and geodynamical problems (e.g. Cundall and Board, 1988; Poliakov et al., 1993; Hassani et al., 1997). These methods exhibit interesting parallel scaling properties on hardware accelerators such as Graphics Processing Units (GPU) and have therefore recently been applied to study incompressible Stokes flow and multi-physics coupled problems (e.g. Räss et al., 2022; Duretz et al., 2019; Räss et al., 2019; Halter et al., 2022; Spang et al., 2025). These solvers implement coupled solving strategies that involve iterative updates of both the velocity and pressure fields within a single iteration loop. Despite their algorithmic simplicity, their application to heterogeneous mechanical problems remains challenging (e.g. Halter et al., 2022). One strategy relies on smoothing sharp contrasts in material properties; however, this approach requires the use of a very fine spatial resolution to match exact flow solutions (e.g. Räss et al., 2022; Halter et al., 2022). In practice, Stokes solvers are typically coupled with Marker-In-Cell or Level-Set approaches (e.g. Gerya and Yuen, 2003; Hillebrand et al., 2014; Samuel and Evonuk, 2010; Schmeling et al., 2008) to handle large viscous deformations and to represent heterogeneous materials. In the latter case, the degree to which discontinuities in material properties are smoothed depends on the spatial extent (or support) of the interpolant basis functions (e.g. Duretz et al., 2011). The degree of smoothing is therefore not defined ad hoc and must be based on a narrow interpolation basis to best match the exact flow solution (Duretz et al., 2011). Moreover, current geodynamic pseudo-transient solvers rely on either empirical tuning of iterative parameters (e.g. Duretz et al., 2019) or exact derivation of optimal parameters based on simplified model configurations (e.g. Räss et al., 2022; Alkhimenkov et al., 2021). This situation hinders their applicability in solving “real-life” geodynamic problems, as optimal parameters may adapt to the evolving internal model state and material configuration. Exploring dynamic strategies to select optimal iterative parameters is a motivation to overcome current limitations.
In this study, we aim to solve quasi-steady mechanical problems with large and sharp variations in material properties using DR-like iterative solvers and automatic tuning of iterative parameters. We present a decoupled solution strategy that combines Schur-complement reduction, Powell–Hestenes iterations, and DR-type of iterative solvers.
The equations governing quasi-static mechanical equilibrium in a classical continuum medium may be expressed as:
where τ is the deviatoric stress tensor, P is pressure, and v is the velocity vector. The term is a forcing term, where ρ, g and K correspond to density, gravitational acceleration, and bulk modulus, respectively. The incompressible limit is reached if K→∞. The isotropic viscous constitutive relationship takes the form of:
where is the deviatoric strain rate tensor, , I is the identity matrix and η is the dynamic viscosity. In the following, the Stokes equations will be solved for velocity and pressure fields over a domain Π, accounting for Neumann boundary conditions () on the boundary ∂ΠN and Dirichlet boundary conditions (v=vD) on the boundary ∂ΠD.
We consider two different types of numerical discretization of the Stokes equations, the staggered-grid Finite-Difference (FD) and Face-Centered Finite Volume (FCFV) Method. With either discretization, both the components of the velocity vector and the pressure will constitute the primitives, which we aim to solve for.
3.1 Staggered-Grid Finite-Differences Method
The staggered-grid finite-difference is a popular approach to discretizing the Stokes equation originally proposed by Harlow and Welch (1965). This scheme is constructed using conservative second-order finite difference and a staggered arrangement of both velocity and pressure (Fig. 1a, b). The strong form of the Stokes equations (Eq. 2) is discretized using central finite differences:
where the hat symbol denotes the discrete representation of the differential operators. The primitive solution fields v and p are obtained by minimizing the momentum and continuity residuals (rmom and rcont). The staggered FD scheme represents, to our knowledge, the most computationally efficient stencil that satisfies the Inf-Sup stability condition in regular meshes (Shin and Strikwerda, 1997). For flows with smooth viscosity variations, FD achieves second-order accuracy in the L1 norm, while in the presence of sharp material heterogeneities, its accuracy reduces to first-order (Duretz et al., 2011). Geometrically, FD has been successfully implemented in Cartesian and polar coordinates (e.g. Räss et al., 2017), spherical configurations (e.g. Tackley, 2008; Macherel et al., 2024), and on adaptively refined meshes (e.g. Gerya, 2013; Goldade et al., 2019). However, its application in non-orthogonal meshes remains rare (e.g. de la Puente et al., 2014) due to the need for extended stencils and the associated increase in algorithmic complexity. Consequently, the treatment of free-surface boundary conditions remains a notable challenge (e.g. Duretz et al., 2016; Larionov et al., 2017; Gray et al., 2025).
3.2 Face-Centered Finite Volume Method (FCFV)
The Face-Centered Finite-Volume (FCFV) method is an emerging discretization scheme rooted in the principles of the hybridizable discontinuous Galerkin (HDG) method (Cockburn and Gopalakrishnan, 2005). This approach has been successfully employed to achieve full-field solutions for the Stokes equations in various types of viscosity distribution, including constant (Sevilla et al., 2018), discontinuous (Sevilla and Duretz, 2023), and smoothly varying fields (Sevilla and Duretz, 2024). The FCFV scheme satisfies the Inf-Sup stability condition, making it particularly suitable for solving the Stokes equations. The discrete system of equations is derived from the weak formulation of the Stokes problem, using a constant degree of approximation for both velocity and pressure fields:
The primitive variables, namely the hybrid velocity vector () and pressure (P), are obtained by minimizing the momentum and continuity residuals (rmom, rcont). The hybrid velocity is an independent variable defined on element faces, different from the element-wise velocity. It serves as a Lagrange multiplier–type trace variable that enforces weak continuity of the velocity field across elements. The symbol nfac corresponds to the number of faces of an element. [[⋅]] is the jump operator defined as , where ⊙ denotes quantities defined on elements located to the right (R) and left (L) of an interface. {{ ⋅ }} is the average operator defined as . The discretization requires the outward-pointing normal to each face (n), the face area (Γ), the volume of the element (Ω), the stabilization parameter (s), the factor a (), the traction vector (TN), and the function χ that indicates the presence of applied traction (0 or 1). The FCFV scheme requires the storage of the hybrid velocity vector and momentum residual at the midpoint of each element face, while the pressure, the deviatoric stress tensor and the continuity residual are stored in the centroids of the elements (Fig. 1c, d). The discretization considered here yields a first-order approximation of velocity and pressure fields (Sevilla et al., 2018), nevertheless second-order solutions can be constructed without altering the sparsity of the discretization (Giacomini and Sevilla, 2020; Vieira et al., 2020). In this study, we implement the FCFV approach using structured quadrangular meshes, although the method is versatile and supports arbitrary element geometries, exhibiting low sensitivity to mesh distortions (Sevilla et al., 2018). Additionally, FCFV can naturally handle internal boundaries and traction boundary conditions, such as free surfaces. Due to these features, the FCFV method is becoming increasingly popular in the geodynamics community, as highlighted by Burcet et al. (2024).
In this section, we discuss the principles behind the Dynamic Relaxation method and illustrate its basic properties using a simple 1D Poisson example.
4.1 Principle
The dynamic relaxation method is designed to iteratively solve both linear and nonlinear problems. The iterative procedure resembles an explicit time integration scheme, which is why it is also often referred to as a pseudo-transient integration method. As in classical iterative procedures, the DR method iteratively refines the unknown u until the magnitude of the residual of the partial differential operator r(u) drops below a given tolerance. Once this criterion is met, u is considered a solution satisfying r(u)=0. Note that the function r(u) may include physical transient terms; thus, this method is not limited to solving quasi-static problems (e.g. Räss et al., 2022). The algorithm is formulated by introducing the first and second-order derivative of the solution u in pseudo-time (Frankel, 1950):
where refers to pseudo-time, c is a damping parameter, and M is a preconditioner. Following previous work (e.g. Papadrakakis, 1981; Oakley and Knight, 1995), the second derivative is taken as the central derivative of the first derivative and the first derivative is taken as the average of the new and old values:
This leads to update rules for the rate of change of u, and u itself:
where and . Alternatively, the DR algorithm can be expressed in analogy to the conjugate gradient (CG) method (Feng, 2006), using the variable :
where p is analogous to the search direction, and β=b. The main difference is that the CG method involves different expressions for the parameters α and β, which need to be reevaluated at each iteration, requiring computationally expensive reduction operations (e.g. vector norms).
In either case, two iteration parameters need to be determined as a function of the minimum (λmin) and maximum (λmax) eigenvalues of the discrete problem (Papadrakakis, 1981; Oakley and Knight, 1995):
where cCFL and cdamp are scalar adjustment parameters. Whereas cCFL is strictly bounded (cCFL<1), no analogous upper bound applies to cdamp. In our experiments, values in the range were found to provide robust performance. For optimal convergence, at least in the case of linear problems, cCFL should be chosen as large as possible (e.g. cCFL=0.999). In theory (e.g. Oakley and Knight, 1995), the optimal value of cdamp is 1.0. Nevertheless, we empirically observed that slightly lower values may further enhance convergence.
The maximum eigenvalue is found using the Gershgorin circle theorem:
The minimum eigenvalue can only be approximated, using, for example, a Rayleigh quotient approach (e.g. Joldes et al., 2011; Rezaiee-Pajand and Sarafrazi, 2011; Oakley and Knight, 1995):
where and . The evaluation of iterative parameters requires two reductions. Reduction operations should be avoided when possible given that reduction algorithms in shared memory architectures are not embarrassingly parallel, and global communication between different nodes is needed in distributed memory architectures. Nevertheless, this procedure can be performed every n iterations, together with the evaluation of the stopping criteria. This approach provides successful convergence while minimizing the cost of global communication. It should be noted that the estimate of λmin relies on differences in the residual. Before the iteration is initiated, this quantity is set to zero by default.
Lastly, the DR method requires the definition of a preconditioner, typically defined as the Jacobi or diagonal preconditioner:
While this choice allows for trivial inversion and parallelisation, it restricts the convergence property of the iteration scheme. In the context of matrix-free schemes, both the evaluation of the preconditioner and maximum eigenvalue estimates can be achieved by using automatic differentiation of the residual function.
4.2 Example: 1D Poisson problem
An example of the application of the DR method to a variable coefficient Poisson problem is depicted in Fig. 2. The problem is defined as:
where k=f(x). The differential operators are discretized with both the FD and the FCFV over a 1D domain and Dirichlet boundary conditions , (Fig. 2a). The coefficient k varies with x (Fig. 2b) and is expressed as
The forcing term is set to zero so that b(x)=0. The problem converges for different resolutions to a relative residual of 10−10, with evaluation of iterative parameters and convergence check every 100 iterations. The frequency with which the parameters are reevaluated can influence the convergence behavior. In the present study, a value of 100 was empirically determined to provide an appropriate compromise. The typical convergence behavior of the DR method is shown in Fig. 2c. As a consequence of diagonal preconditioning, the number of iterations needed to reach convergence depends linearly on the size of the problem (Fig. 2d). It is worth noting that for the 1D Poisson case, the FD and the FCFV deliver very similar results, both in terms of the numerical solution and convergence behavior.
The DR method is particularly well-suited to solving symmetric positive-definite problems. The solution of indefinite problems, such as those arising in saddle-point systems such as the Stokes equations, requires additional modifications.
Figure 2Properties of the DR method for the solution of a 1D Poisson problem with variable coefficient using both FD and FCFV. (a) Solution field in space. (b) Spatial distribution of the conductivity coefficient k. (c) Convergence history for three different resolutions. (d) Linear scaling of the iteration count versus the resolution. The dashed line corresponds to a linear scaling trend. In all panels, the colored scatter points correspond to the FD results, while the solid black line represents the FCFV result.
Geodynamic problems often rely on mixed velocity–pressure formulations, and solvers must be able to perform robustly under both weakly compressible and incompressible conditions. These problems are generally indefinite or even singular, making them well-suited for Schur complement-based approaches (e.g. May and Moresi, 2008; Furuichi et al., 2011; Sanan et al., 2020), such as the Powell–Hestenes iteration method (Dabrowski et al., 2008; Räss et al., 2017).
Powell–Hestenes iterations provide a solution strategy based on the augmented Lagrangian formulation, ℒ(v,P):
This approach regularizes the original indefinite problem by adding a positive penalization term, γ, which can be interpreted as a numerical bulk viscosity. Powell–Hestenes iterations involve successive decoupled solutions of the velocity and pressure fields (see Algorithm 1). Each velocity update requires the solution of a velocity Schur complement. In matrix–free context, the residual of the velocity Schur complement can be formulated as a modified momentum equation:
where the superscript old denotes values from the previous Powell–Hestenes iteration. This problem is typically positive semi-definite, and can therefore be directly tackled using the previously described DR method. Moreover, this solve does not need to be exact, which is particularly desirable in the context of iterative methods. After each inexact velocity solve, the pressure is updated using the following:
Powell–Hestenes iterations are performed until the residuals of the momentum and continuity equations are satisfied to a given tolerance. A known drawback of Powell–Hestenes iterations is their sensitivity to the choice of γ, which is further explored in this study. Typically, this parameter is defined globally, such that a single value of γ is used for the entire computational domain.
In the following, we will use the DR method to solve for velocity updates within Powell–Hestenes iterations. We refer to this as the PH/DR scheme. For compressible models, the numerical bulk viscosity used in the Powell–Hestenes method is set to equal the local value of the physical (or effective) bulk viscosity which may thus vary in space. In this case, the iterative procedure converges in a single step.
Algorithm 1Powell–Hestenes with inner Dynamic Relaxation velocity solve (PH/DR).
The model implementation is performed in the Julia language (Bezanson et al., 2017) using the ParallelStencil.jl package (Omlin et al., 2022), which allows to effectively target different CPU and GPU hardware backends. The computations were performed on a single NVIDIA V100 SXM2 GPU with 32 GB of DRAM.
In this section, we present the model configuration used to test the PH/DR solver in the context of the FD. We then provide several analyzes that provide quantitative insight into the scaling behavior of the PH/DR scheme.
6.1 Model configuration: multiple inclusions
In order to test the robustness of the proposed solution procedure and to evaluate its scaling behavior, we have designed a model configuration that combines known challenging elements for geodynamic flow solvers. These elements include the presence of large and sharp viscosity variations, as well as the enforcement of the incompressibility constraint. This type of flow problem is known to be pathological for geodynamic solvers that rely on iterative solution procedures (May et al., 2014; Rudi et al., 2017; Räss et al., 2022).
Both viscosity variations and incompressibility contribute to a deterioration in the conditioning of the discrete system. While this does not pose a problem for direct solvers, it is problematic for iterative solvers, whose performance strongly depends on the spectral properties of the discrete system.
The selected model configuration consists of a unit volume cubic domain, centered at the origin. The viscosity within the inclusions varies according to the viscosity contrast (). The background viscosity ηref is defined as and is set to 1. A total of 50 spherical inclusions, with variable radii and positions, are distributed throughout the domain (see Appendix A). Weak inclusions have their viscosity set to the minimum viscosity value (ηweak=ηmin), while strong inclusions have their viscosity set to the maximum viscosity value (ηstrong=ηmax). The flow is driven by boundary conditions that prescribe pure shear deformation in the x–y plane, with extension along the x-axis. No flow occurs across the z boundaries, and free-slip conditions are applied to all faces of the domain.
An example of the model results is presented in Fig. 3, showcasing the model configuration and the resulting flow field structure for a viscosity contrast spanning four orders of magnitude. The computations were performed with a numerical resolution of 3203 cells (66 Mdof) for the FD (Fig. 3a) and 2563 cells (168 Mdof) for the FCFV (Fig. 3b). These resolutions correspond to the largest resolution that can be achieved on a single GPU with 32 GB of DRAM (NVIDIA Tesla V100-SXM2-32GB).
Figure 3Example of multiple inclusion simulation result. Panels (a) and (b) shows the resulting flow field for the FD (3203 cells and the FCFV (2563 cells). The inclusions are coloured by the magnitude of the velocity vector (range: [0,0.75]). The black arrows indicate the velocity vectors.
Overall, the two methods exhibit a consistent and satisfactory qualitative agreement. Small differences between the two methods can be observed, particularly in regions where inclusions are in close proximity. These discrepancies stem from the different spatial resolution used for the FD and the FCFV methods, their different order of accuracy, and their different treatments of interfaces. A more detailed comparison of the FD and FCFV results is provided in the Appendix B. The stopping criteria of the PH/DR scheme was set to 10−6, the inner DR stropping criteria was set to 10−3, the penalty parameter was set to 15 times the mean of the viscosity field (γ=15〈η〉).
6.1.1 Scaling: problem size
The first test investigates the effect of problem size. To this end, we vary the model resolution by increasing the number of cells (Nx) uniformly along each dimension. The lowest resolution is 323 cells, while the highest resolution is 3203 cells (131 Mdof) for the FD and 2563 cells (168 Mdof) for the FCFV.
The results show that for both the FD and the FCFV methods, 4 to 7 Powell–Hestenes iterations are required to reach convergence, regardless of the value of Nx. This number of Powell–Hestenes iterations appears to be largely insensitive to the problem size (Fig. 4a). In contrast, the total number of DR iterations accumulated throughout the solution process increases linearly with the problem size (Fig. 4b). This scaling behavior is consistent with that observed for the 1D Poisson problem discussed earlier (Sect. 4), and reflects the influence of the diagonal preconditioner. When normalized by the number of cells along one axis, the number of DR iterations required for convergence falls between 70 and 30 per Nx. The iteration count per Nx tends to decrease at higher resolutions, where the flow field is better resolved (Fig. 4c). The wall-time is reported as a function of the total number of degrees of freedom, Ndof. For both FD and FCFV, the observed scaling is approximately linear with respect to the problem size, which can be attributed to the combined effects of diagonal preconditioning and GPU parallelism (Fig. 4d). Interestingly, both FCFV and FD achieve comparable wall times for similar number of degrees of freedom and iterative tolerance ().
Figure 4Influence of problem size on the convergence of the PH/DR flow solver for a viscosity contrast of 1×104 using the FD and the FCFV methods. Panel (a) reports the total number of Powell–Hestenes iterations, (b) the total number of DR iterations for the entire solution procedure, (c) the number of DR iterations per grid point along one dimension, and (d) the wall-time.
6.1.2 Scaling: penalty factor
We further explore the influence of the penalty factor on the scaling of the PH/DR solver. In contrast to the previous tests, we vary the value of γ independently of the values of the viscosity field. We perform the computations for three different values of Δη.
The number of Powell–Hestenes iterations strongly depends on the magnitude of the penalty parameter γ. In particular, very small values of γ lead to a large number of Powell–Hestenes iterations that exceed 50 (Fig. 5a). This number decreases to fewer than 10 when γ>100, across all values of Δη. The total number of DR iterations is also higher for small values of γ, reflecting the increased number of Powell–Hestenes iterations required for convergence. However, excessively large values of γ also degrade convergence. Specifically, higher values of γ increase the spectral radius of the discrete velocity Schur complement, which in turn increases the number of DR iterations needed to reach convergence (Fig. 5b). An optimal range for γ is found between 10 and 1000, depending on the value of Δη. The wall-time exhibits a similar dependence on γ, closely following the behavior of the total number of DR iterations, with minimum values observed within the range 10–1000 (Fig. 5c).
Figure 5Influence of the penalty factor on the convergence of the PH/DR flow solver for a resolution of 1283 cells and 3 values of Δη. Panel (a) reports the total number of Powell–Hestenes iterations, (b) the total number of DR iteration for the entire solution procedure, (c) the wall time, and (d) the optimal value of γ as function of Δη
The magnitude of the penalty factor must be selected carefully to avoid an excessive number of iterations, which would degrade the wall-time. To guide this choice, we systematically varied the viscosity contrast and computed the optimal penalty value (γopt) for each corresponding Δη. The results reveal a linear relationship between γopt and Δη in log 10–log 10 space (Fig. 5d). This relationship allows for a least-squares fit with a slope of 0.3 and an intercept of 0.575. The resulting fit can be used to select an appropriate penalty parameter when varying the viscosity contrast. This particular scaling produces the best performance for the given model configuration; however, it should not be regarded as generally applicable to all cases. In most of the examples presented hereafter, we choose γ to be proportional to the mean of the viscosity field (γ∝〈η〉).
6.1.3 Scaling: viscosity contrast
We investigate the effect of viscosity contrast on the PH/DR solver by systematically varying it over a range spanning two to six orders of magnitude. As expected, increasing the viscosity contrast impacts the performance of the PH/DR solver.
The number of Powell–Hestenes iterations required for convergence ranges from 6 to 9 (Fig. 6a). The decrease in the number of Powell–Hestenes iterations with increasing Δη is attributed to the scaling of the penalty factor with the mean of the viscosity field. Therefore, higher values of Δη lead to higher values of γ, which in turn reduces the number of Powell–Hestenes iterations required for convergence. The total number of DR iterations scales approximately linearly with the viscosity contrast (Fig. 6b). Consequently, the wall-time also exhibits a linear dependency on the viscosity contrast (Fig. 6c).
6.1.4 Scaling: DR solve tolerance
Another important numerical parameter of the PH/DR procedure is the tolerance ϵDR selected to exit the inner DR iterations (ϵDR). We have systematically varied the value of ϵDR from 10−5 to 10−0.5.
The magnitude of ϵDR has a significant impact on the number of Powell–Hestenes iterations. Small values lead to more accurate inner DR solves, resulting in fewer Powell–Hestenes iterations. In contrast, increasing ϵDR causes a quadratic increase in the number of Powell–Hestenes iterations (Fig. 7a). The trend differs for the total number of DR iterations. Very small and very large values of ϵDR lead to a high total iteration count, while a minimum is observed for ϵDR in the range 10−3 to 10−2. High values of ϵDR result in inaccurate inner solves, increasing the number of Powell–Hestenes iterations. In contrast, very low values lead to over-solving of the inner system, unnecessarily increasing the total DR iterations (Fig. 7b). This behavior directly affects the wall-time, which can vary by up to a factor of 2 depending on the choice of ϵDR. Therefore, ϵDR must be carefully tuned to avoid these two limiting cases.
Figure 7Influence of the inner DR solve tolerance on the convergence of the PH/DR flow solver for a resolution of 1283 cells and Δη=104. Panel (a) reports the total number of Powell–Hestenes iterations, (b) the total number of DR iteration for the whole solution procedure, (c) the wall time. (d) Influence of the viscosity contrast.
In this section, we show how the proposed PH/DR solution strategy can be further extended to solve steady mechanical problems that involve enriched physical models. These models are performed using the FD method, as the FCFV method still needs to be developed to tackle two-phase flow and nonlinear mechanical problems.
7.1 Compressible mechanical problem
Although the previous examples were performed in the incompressible limit, the PH/DR scheme can adequately handle compressible deformation or flow. Here, we show examples of compressible flow performed in 2D, using FD. The model configuration consists of a domain with a circular inclusion of radius 0.2 (Fig. 8a). The background viscosity is set to 1 and the inclusion viscosity is 100. Pure shear background deformation is applied at the boundaries. Compressibility is accounted for by including a bulk viscosity term:
The degree of compressibility varies with the value of bulk viscosity (ηb). To solve this problem, one can use PH/DR, setting the penalty factor independently of the bulk viscosity. Here, we express the penalty factor as the quasi-harmonic average of a reference numerical bulk viscosity (γnum=60〈η〉) and the physical bulk viscosity, γphys=ηb, so that . In this case, several Powell–Hestenes iterations will be needed to reach convergence. Alternatively, one can set the penalty factor to the bulk viscosity (γ=ηb). In that case, PH/DR becomes equivalent to a basic DR scheme and only one Powell–Hestenes iteration is needed to reach convergence. We observe that in the case of compressible problems (ηb=2.0, equivalent to a Poisson ratio of 0.286), the basic DR can be beneficial, leading to a lower number of iterations compared to PH/DR (Fig. 8b). The PH/DR solver exhibits a characteristic sawtooth convergence history that reflects the update of pressure at each Powell–Hestenes iteration (Fig. 8b). In contrast, in weakly compressible cases (, equivalent to a Poisson ratio of 0.499), the PH/DR schemes outperform the standard DR iteration approach. In the context of geodynamic problems involving weakly compressible-to-incompressible materials, PH/DR is preferred.
Figure 8Example of compressible flow: pure shear deformation with viscosity heterogeneity. (a) Divergence field either compressible case (ηb=2) or weakly compressible case (). Convergence history of DR and PH/DR for the (b) compressible and (c) weakly compressible case. For the PH/DR case, the solid line represents the velocity error norm and the dotted line corresponds to the pressure error norm. For the PH/DR case, iterations are summed over the Powell–Hestenes needed for converging the problem. In (a) and (b), x and y have arbitrary units.
7.2 Hydro-mechanical coupling
We further applied the PH/DR scheme to the solution of a coupled multi-physics problem. The equations governing deformation and fluid flow in a Darcy poro-viscous medium can be expressed following Yarushina and Podladchikov (2015):
Here, the Darcy flux is expressed as . The overbar denotes phase-averaged quantities, ϕ is porosity, and the superscripts s and f indicate solid and fluid properties, respectively. k and η refer to permeability and viscosity. An elliptic equation for fluid pressure can be obtained by substituting the Darcy flux into Eq. (32).
The PH/DR scheme is extended by including fluid pressure updates within the inner DR iterations. The update parameters for fluid pressure are obtained from the automatic DR procedure described in Sect. 4. The pseudo-time steps and damping coefficients for velocity and fluid pressure are determined independently, with each variable assigned its own value. As in previous examples, the total pressure is updated at each Powell–Hestenes iteration.
The test case we consider uses a configuration similar to that in the previous example (Sect. 7.1). The shear viscosity of the inclusion is assigned a value 100 times greater than that of the matrix, while the bulk viscosity is taken as twice the shear viscosity. In addition, the background porosity is set to , and the fluid permeability-viscosity ratio is set to . The penalty factor is defined as
where and γnum=5〈ηs〉.
Figure 9 shows examples of solution fields and the scaling properties of the PH/DR scheme. We compute the two-phase flow problem with a resolution varying from 312 to 9922 cells and apply a tolerance of 10−7. For the chosen parameters, pronounced differences are observed between the total and fluid pressure fields (Fig. 9a, b). The total pressure field is sharply defined and zero inside the stiff inclusion, in contrast to the fluid pressure field, which is smoother and diffuses within the inclusion. We observe the characteristic sawtooth convergence history for both momentum and fluid continuity (Fig. 9c).
Figure 9Poro-viscous hydro-mechanical simulation. (a) Total pressure field. (b) Fluid pressure field. (c) Convergence history for a resolution of 9922. (d) Scaling of iterations relative to the problem size. In (a) and (b), x and y have arbitrary units.
We find that low-resolution models require a relatively high number of iterations (Fig. 9d). Nevertheless, the number of DR iterations per Nx drops to a stable low value (approximately 20) for resolutions greater than 1002 cells. This indicates a linear dependence of the number of iterations on the problem size, which is consistent with the behavior documented in the previous examples.
7.3 Visco-elasto-viscoplasticity
In this example, we show how nonlinear material properties can be incorporated into the PH/DR solution strategy, considering the case of non-associated plasticity. With such nonlinear problems, parameters such as the effective viscosity vary throughout the nonlinear solution procedure, which directly affects the eigenvalues of the discrete operator. It is thus important to reevaluate the maximum pseudo-time step (Eq. 18) during the iterations to ensure stability of the procedure. Here, we reevaluate the pseudo-time step every 100 iterations by applying the Gershgorin circle theorem (Eq. 21).
We adopt a single-phase formulation and apply a visco-elasto-viscoplastic material model (de Borst and Duretz, 2020). An additive decomposition of the deviatoric strain rate and the divergence rate is assumed:
where the superscripts v, e, and p correspond to the viscous, elastic, and plastic components, respectively.
The viscoplastic Drucker–Prager yield and potential functions are defined as:
where τII, C, Φ, Ψ, and ηvp denote the second deviatoric stress invariant, the cohesion, the friction angle, the dilatancy angle, and the viscoplastic viscosity, respectively.
The model domain is km2 and contains a circular inclusion of radius km (Fig. 10a). A pure shear loading rate of s−1 is applied, inducing horizontal extension. The inclusion has a low viscosity (1×1010 Pa s), while the surrounding matrix is effectively elasto-plastic (1×1023 Pa s). Both shear and bulk moduli are set to 3×1010 Pa. The cohesion is set to 50 MPa, the friction and dilatancy angles are set to 35 and 5°, respectively, and the viscoplastic viscosity is 2×1020 Pa s. The simulation proceeds over 50 time steps, each of size s.
Figure 10Visco-Elasto-Viscoplastic shear banding. (a, b, c) Model evolution (strain rate field). (d) Evolution of the mean stress. (e) Number of iteration per grid point in x over simulation time. (f) Convergence history for the most demanding time step.
The temporal evolution of the second invariant of the deviatoric strain rate is shown in Fig. 10a, b, and c, while the evolution of the mean stress is shown in Fig. 10d. The model undergoes an initial phase of elastic loading (Fig. 10a), after which shear banding initiates, propagates (Fig. 10b), and reflects off the domain boundaries (Fig. 10c). As a result, stress is limited after 30 kyr and undergoes minor fluctuations witnessing underlying structural softening (Fig. 10d). With a relative tolerance of , the number of iterations per Nx varies between 5 and 25 (Fig. 10e). Low iteration counts correspond to linear-elastic steps, while higher counts occur during nonlinear steps involving significant changes in the solution pattern (e.g. reflections). The convergence history for the most challenging step is shown in Fig. 10f, where the typical sawtooth convergence pattern of the PH/DR is also observed.
Overall, the model converges reliably without the need for manual tuning of numerical parameters throughout the simulation.
7.4 Practical Performance of the PH/DR Method
In this section, we investigate the benefits of the proposed PH/DR method for solving two practical geodynamic problems. The main benefit of the PH/DR method over the APT approach is its ability to automatically adapt to transient changes in solution patterns, caused by variation of the model geometry or by the emergence of internal features. The PH/DR approach can therefore determine parameters that ensure convergence while also significantly reducing the overall wall time. The results presented below are intended to be indicative. In principle, the APT method could deliver results that are at least as good as those of the PH/DR method. However, this would require manually tuning the iteration parameters at every time step, which is not a desirable task. Several simulations were performed using the open-source code JustRelax.jl (de Montserrat et al., 2026), which features both the PH/DR solver and the Accelerated Pseudo-Transient (APT) method (Räss et al., 2022).
The first model is a simplified subduction setup based on the configuration of Schmeling et al. (2008). Simulations were performed over a model duration of 25 Myr, after which the slab tip reaches a depth of approximately 500 km (Fig. 11). We observe that the use of the PH/DR scheme significantly reduces cumulative wall time compared to the APT method (Fig. 11a). The number of iterations per grid point in one dimension is also well behaved, fluctuating around 20 iterations per grid point. For the same tolerance, the APT scheme requires about 200 iterations per grid point and exhibits larger fluctuations (Fig. 11b). Overall, both approaches converge at each iteration and yield similar results (Fig. 11c, d).
Figure 11(a) Cumulative wall times over 25 Myr of model simulation steps using both PH/DR and APT methods at various resolutions. (b) Iterations per time step for the two methods. (c) Map of material phases at 25 Myr using the PH/DR method. The mantle is green, the slab is pink and the sticky air is blue. (d) Same as (c) for the APT method.
The second model is a crustal-scale shear banding setup, which features a visco-elasto-viscoplastic rheology and temperature-dependent creep laws. The reader is referred to Duretz et al. (2020) for the detailed model configuration and material parameters. The simulations are run for 100 time steps, in which the solution pattern evolves significantly in response to shear banding and frictional viscoplasticity. The results presented in Fig. 12 show the cumulative wall time and the number of iterations over 100 time steps for various model resolutions. Models performed with the PH/DR scheme consistently result in 2 to 12 times shorter wall times (Fig. 12a). Unlike APT models, simulations using the PH/DR method provide a reliable and well-defined range of iterations per time step. Nevertheless, the overall shear banding patterns are similar (Fig. 12c, d). The minor differences may be attributed to the fact that APT models could not reach convergence at every time step.
Figure 12(a) Cumulative wall times over 100 time steps for both PH/DR and APT methods at various resolutions. (b) Iterations per time step for the two methods. The number of pseudo-transient iterations per time step was capped at 250×103; as a result, some APT simulations did not reach the convergence criteria (relative tolerance of ). (c) Map of the second deviatoric strain rate invariant at time step 100 (1 Myr) obtained with the PH/DR method. (d) Same as (c) for the APT method.
Our study presents a quantitative assessment of the performance of the DR method, extends it to incompressible flow problems through a Powell–Hestenes strategy, and demonstrates its applicability to nonlinear and multiphysics problems.
The efficiency of pseudo-transient (PT) solvers relies on a careful selection of iterative parameters (see Appendix C). The proposed DR and PHDR solvers are not expected to outperform standard PT solvers with optimally chosen parameters. However, the DR-based methods introduced here are designed to automatically determine these iterative parameters. This automated selection introduces only a minor computational overhead and has a negligible impact on the convergence history compared to an optimally tuned PT solver (see Appendix D). In practice, DR-based schemes can offer substantial advantages over standard pseudo-transient solvers that do not feature automatic tuning of iteration parameters.
Unlike multigrid methods, DR-based approaches exhibit a total iteration count that increases linearly with problem size (see Figs. 2 and 4b). Although this behavior may appear disadvantageous, the DR methods are straightforward to parallelize and require only a few global communication steps. In all the examples presented, global communication operations (e.g., norms and inner products) and iterative parameter updates were performed every 100 DR iterations. This makes DR-based methods particularly suitable for GPU parallelism.
In particular, we show that, when using a single GPU, the wall time for three-dimensional incompressible and heterogeneous Stokes problems scales quasi-linearly with problem size (Fig. 4d). This property is attractive for large-scale 3D geodynamic simulations and makes DR-based algorithms competitive with legacy multigrid-based solvers.
Furthermore, DR-based solvers can be naturally extended to multi-GPU parallelism. Computational frameworks such as ImplicitGlobalGrid.jl (Omlin et al., 2022) provide efficient communication scheduling schemes, enabling near-ideal scaling behavior in modern computing clusters (Räss et al., 2022). In the future, several improvements may be envisaged, in particular regarding preconditioning. To our knowledge, Jacobi preconditioning is known to be relatively inefficient in comparison to other preconditioners (e.g. incomplete Cholesky). Nevertheless, the application of Jacobi is fully parallel, and thus efficiently implemented on GPUs. Recently, iterative implementations of incomplete LU and Cholesky preconditioners have been introduced (Anzt et al., 2016; Chow et al., 2018). Although these methods are fully parallel, achieving effective preconditioning requires accurate application of the LU/Cholesky operators, often necessitating many iterations. Whether such an approach would offer benefits within the framework of DR solvers remains to be established. Multigrid preconditioning represents another possible avenue. In principle, it could maintain a constant iteration count as the resolution increases. However, its implementation requires hierarchical data structures and coarse-grid solvers on GPUs, and it remains uncertain whether this added complexity would yield tangible reductions in wall time or enhance the already quasi-linear wall time scaling.
DR solvers are traditionally used in conjunction with the Finite Element Method (FEM) (e.g. Oakley and Knight, 1995; Joldes et al., 2011). Here, we have shown that such solvers can also be applied to finite difference methods, as well as finite volume or low-order discontinuous Galerkin methods. In practice, efficient implementation is facilitated by the structured nature of these problems, which allows information from neighboring cells to be easily gathered for stencil evaluations. Extending DR solvers to unstructured FEM or FCFV discretizations in the context of GPU computing requires specialized algorithms to improve memory access (e.g. Fu et al., 2014; Kiran et al., 2019). The combination of these approaches with DR solvers may provide an alternative to traditionally employed solvers, a possibility that remains to be explored.
The relatively low algorithmic complexity of DR methods makes them particularly attractive for solving nonlinear and multiphysics coupled problems. While we have focused on basic hydro-mechanical and linear plasticity models, future implementations could naturally extend these solvers to include thermal or chemical couplings.
In this contribution, we have employed DR-based techniques to develop robust and high-performance solvers for geodynamic problems. These solvers incorporate automatic tuning of iterative parameters, including pseudo-time steps and damping coefficients based on estimates of the spectral bounds of the discrete operators.
The approach was applied to problems discretized on structured grids using both finite-difference (FD) and face-centered finite-volume/discontinuous Galerkin (FCFV) methods. To handle incompressible deformations, DR-based solvers were combined with Powell–Hestenes iterations, enabling the successful solution of heterogeneous Stokes problems.
When implemented on GPU accelerators, the solvers exhibit a quasi-linear scaling of solution time with problem size, demonstrating excellent computational efficiency. Moreover, the method can be trivially extended to handle compressible flow, multiphase flow, and nonlinear constitutive laws.
This flexibility stems from the algorithmic simplicity of the DR-based methods, which makes them particularly well-suited for large-scale geoscientific simulations and opens new paths for exploring complex coupled physical processes.
The position and type of inclusion used in the example in the main text are provided in the following table (Table A1). The extent of the model domain and the viscosity contrast are detailed in the main text.
To verify our implementation of PH/DR iterative solvers, the results were compared against solutions obtained using direct solvers. This verification could only be conducted in 2D due to the prohibitive computational cost of direct solvers in 3D. Both FD and FCFV models were executed using configurations analogous to the 3D setups. The inclusion positions are given in the table below (Table B1): weak inclusions have a viscosity of , while strong inclusions have a viscosity of 1000. No smoothing of the viscosity contrast was applied. A background pure shear rate and free-slip boundary conditions were imposed on the box boundaries.
FD models solved with a direct solver follow the approach of Räss et al. (2017), whereas the FCFV direct-solver implementation is based on Sevilla and Duretz (2023). All simulations were performed on meshes of 1282 cells. The resulting pressure fields are shown in Fig. B1.
For both discretizations, good agreement is obtained between models solved iteratively and those solved directly, demonstrating that the FD and FCFV implementations are consistent with previous studies. In addition, this comparison highlights differences between the two schemes: while the FD approach exhibits spurious oscillations at sharp viscosity contrasts, the FCFV method yields a smoother pressure field. It should be noted that the mesh used in the FCFV simulations is similar to that of the FD models and does not explicitly capture the material interface through traction continuity enforcement.
Table B1Tabulated values of x0, y0, r, and type (weak/strong). Each block of 5 rows is shown side by side.
Figure B1(a) Pressure field obtained with FD using a PH/DR scheme. (b) Pressure field obtained with a direct solution scheme. (c) Pressure field obtained with FCFV using a PH/DR scheme. (d) Pressure field obtained with FCFV using a direct solution scheme. In all panels, x and y have arbitrary units.
The DR and PH/DR schemes are formulated in a manner similar to pseudo-transient (PT) solvers. The solution fields are updated iteratively, with update steps determined through stability analysis to ensure stable integration in pseudo-time. Most importantly, damping is applied, which effectively modifies the form of the equation to be integrated in pseudo-time, resulting in a second-order integration scheme following Frankel (1950). The pseudo-time step and damping parameters constitute the two numerical parameters that control the convergence behavior.
Optimal values for these parameters can be derived analytically (e.g. Räss et al., 2022). However, they depend on the spectral properties of the discrete operator Papadrakakis (1981); Oakley and Knight (1995). Consequently, determining iteration parameters a priori that remain optimal throughout simulations involving evolving geometry or time-dependent material properties is highly challenging. This may result in degraded convergence, stagnation, or even divergence.
The DR and PH/DR schemes address this difficulty by estimating spectral bounds and adjusting the iteration parameters accordingly. In practice, the main additional components are:
-
a routine to estimate the maximum eigenvalue (Gershgorin analysis),
-
a routine to approximate the minimum eigenvalue (Rayleigh quotient), and
-
a recalculation of the iteration parameters based on these estimates.
In addition, the PH/DR scheme features an outer loop for pressure updates. During the velocity update cycles, the pressure is held fixed. At the end of each such cycle, the pressure is updated using a numerical bulk viscosity and the continuity residual, following the classical Powell–Hestenes approach. The pressure is thus updated less frequently than the velocity, which constitutes a key difference from the pseudo-transient approach presented in Räss et al. (2022).
We employed the PT solvers developed by Räss et al. (2022) to assess the relative performance of the PH/DR solver, using the single-inclusion benchmark configuration. The inclusion has a viscosity of , while the surrounding matrix has a viscosity of 1. A smoothing of the viscosity contrast similar to that used in Räss et al. (2022) was applied. Both the PT and PH/DR models were run to the same relative tolerance of . The resulting model fields are shown in Fig. D1.
The PT models of Räss et al. (2022) were computed using optimally tuned iterative parameters and thus represent the best possible performance bound. The present results demonstrate that the additional computational cost associated with the automatic parameter selection inherent to the PH/DR scheme is very small. Both methods converge in fewer than 20 iterations per grid point for this model configuration. Because pressure updates are less frequent than velocity updates, the convergence of the pressure residual does not closely track that of the momentum residuals.
Figure D1(a) Pressure field obtained by Räss et al. (2022). (b) Pressure field obtained in this study. (c) Convergence of momentum and continuity residuals. In (a) and (b), x and y have arbitrary units.
The different software packages developed and used in this study are licensed under MIT License. The latest versions of the code is available in Zenodo (https://doi.org/10.5281/zenodo.20269899, Räss and Duretz, 2025). The scripts can also be accessed there: https://github.com/PTsolvers/AutoTuningGMD (last access: 15 May 2026). No data sets were used in this article.
TD – Original study design, code and algorithm development, simulations, figures, writing, review. AdM – Development, study design, writing, review. RS – Development of discretization, verification, writing, and review. LR – Writing, figures, concept, and review. IU – Writing, concept, review. AS – Code testing, writing, review.
At least one of the (co-)authors is a member of the editorial board of Geoscientific Model Development. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
We thank the Geocomputing Centre at the University of Lausanne for providing access to the Octopus supercomputer. LR and IU gratefully acknowledge funding from the Swiss University Conference and the Swiss Council of Federal Institutes of Technology through the Platform for Advanced Scientific Computing (PASC) program. This research was also supported by the Swiss National Supercomputing Centre (CSCS) under project ID c23 and c44, awarded via the PASC projects GPU4GEO and ∂GPU4GEO. We thank Pascal Aellig for providing access to a GH200 node.
This research has been supported by the Swiss University Conference and the Swiss Council of Federal Institutes of Technology through the Platform for Advanced Scientific Computing (PASC), obtained via the PASC project ∂GPU4GEO (c44). Arne Spang was funded by the DFG Emmy Noether grant TH 2076/8-1.
This open-access publication was funded by Goethe University Frankfurt.
This paper was edited by Mauro Cacace and reviewed by Anthony Jourdon and two anonymous referees.
Alkhimenkov, Y., Räss, L., Khakimova, L., Quintal, B., and Podladchikov, Y.: Resolving Wave Propagation in Anisotropic Poroelastic Media Using Graphical Processing Units (GPUs), J. Geophys. Res.-Sol. Ea., 126, https://doi.org/10.1029/2020JB021175, 2021. a
Anzt, H., Chow, E., Saak, J., and Dongarra, J.: Updating incomplete factorization preconditioners for model order reduction, Numer. Algorithms, 73, 611–630, https://doi.org/10.1007/s11075-016-0110-2, 2016. a
Balachandar, S. and Yuen, D. A.: Three-dimensional fully spectral numerical method for mantle convection with depth-dependent properties, J. Comput. Phys., 113, https://doi.org/10.1006/jcph.1994.1118, 1994. a
Baumgardner, J. R.: Three-dimensional treatment of convective flow in the earth's mantle, J. Stat. Phys., 39, 501–511, https://doi.org/10.1007/BF01008348, 1985. a
Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B.: Julia: A Fresh Approach to Numerical Computing, SIAM Rev., 59, 65–98, https://doi.org/10.1137/141000671, 2017. a
Burcet, M., Oliveira, B., Afonso, J. C., and Zlotnik, S.: A face-centred finite volume approach for coupled transport phenomena and fluid flow, Appl. Math. Model., 125, 293–312, https://doi.org/10.1016/j.apm.2023.08.031, 2024. a
Chow, E., Anzt, H., Scott, J., and Dongarra, J.: Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning, J. Parallel Distr. Com., 119, 219–230, https://doi.org/10.1016/j.jpdc.2018.04.017, 2018. a
Cockburn, B. and Gopalakrishnan, J.: New hybridization techniques, GAMM-Mitteilungen, 28, 154–182, https://doi.org/10.1002/gamm.201490017, 2005. a
Cundall, P. and Board, M.: A microcomputer program for modelling large-strain plasticity problems, Proceedings of the Sixth International Conference on Numerical Methods in Geomechanics, 11–15 April 1988, Innsbruck, Austria, Vols. 1–3, ISBN 978-90-6191-80, 1988. a
Dabrowski, M., Krotkiewski, M., and Schmid, D. W.: MILAMIN: MATLAB-based finite element method solver for large problems, Geochem. Geophy. Geosy., 9, https://doi.org/10.1029/2007GC001719, 2008. a, b
de Borst, R. and Duretz, T.: On viscoplastic regularisation of strain-softening rocks and soils, Int. J. Numer. Anal. Met., 44, 890–903, https://doi.org/10.1002/nag.3046, 2020. a
de la Puente, J., Ferrer, M., Hanzich, M., Castillo, J. E., and Cela, J. M.: Mimetic seismic wave modeling including topography on deformed staggered grids, Geophysics, https://doi.org/10.1190/geo2013-0371.1, 2014. a
de Montserrat, A., Aellig, P. S., Schuler, C., Navarrete, I., Räss, L., Fuchs, L., Kaus, B. J. P., and Dominguez, H.: JustRelax.jl: A Julia package for geodynamic modeling with matrix-free solvers, Journal of Open Source Software, 11, 9365, https://doi.org/10.21105/joss.09365, 2026. a
Duretz, T., May, D. A., Gerya, T. V., and Tackley, P. J.: Discretization errors and free surface stabilization in the finite difference and marker-in-cell method for applied geodynamics: A numerical study, Geochem. Geophy. Geosy., 12, https://doi.org/10.1029/2011GC003567, 2011. a, b, c
Duretz, T., May, D. A., and Yamato, P.: A free surface capturing discretization for the staggered grid finite difference scheme, Geophys. J. Int., 204, 1518–1530, https://doi.org/10.1093/gji/ggv526, 2016. a
Duretz, T., Räss, L., Podladchikov, Y., and Schmalholz, S.: Resolving thermomechanical coupling in two and three dimensions: spontaneous strain localization owing to shear heating, Geophys. J. Int., 216, 365–379, https://doi.org/10.1093/gji/ggy434, 2019. a, b, c
Duretz, T., de Borst, R., Yamato, P., and Le Pourhiet, L.: Toward Robust and Predictive Geodynamic Modeling: The Way Forward in Frictional Plasticity, Geophys. Res. Lett., 47, e2019GL086027, https://doi.org/10.1029/2019GL086027, 2020. a
Feng, Y. T.: On the discrete dynamic nature of the conjugate gradient method, J. Comput. Phys., 211, 91–98, https://doi.org/10.1016/j.jcp.2005.05.006, 2006. a
Frankel, S. P.: Convergence rates of iterative treatments of partial differential equations, Mathematical Tables and Other Aids to Computation, 4, 65–75, 1950. a, b
Fu, Z., James Lewis, T., Kirby, R. M., and Whitaker, R. T.: Architecting the finite element method pipeline for the GPU, J. Comput. Appl. Math., 257, 195–211, https://doi.org/10.1016/j.cam.2013.09.001, 2014. a
Fuchs, L. and Becker, T. W.: On the Role of Rheological Memory for Convection-Driven Plate Reorganizations, Geophys. Res. Lett., 49, e2022GL099574, https://doi.org/10.1029/2022GL099574, 2022. a
Fullsack, P.: An arbitrary Lagrangian-Eulerian formulation for creeping flows and its application in tectonic models, Geophys. J. Int., 120, 1–23, https://doi.org/10.1111/j.1365-246X.1995.tb05908.x, 1995. a
Furuichi, M., May, D. A., and Tackley, P. J.: Development of a Stokes flow solver robust to large viscosity jumps using a Schur complement approach with mixed precision arithmetic, J. Comput. Phys., 230, 8835–8851, https://doi.org/10.1016/j.jcp.2011.09.007, 2011. a
Gerya, T.: Introduction to Numerical Geodynamic Modelling, Cambridge University Press, Cambridge, England, UK, ISBN 978-1-10714314-2, 2019. a
Gerya, T. V.: Initiation of transform faults at rifted continental margins: 3D petrological-thermomechanical modeling and comparison to the Woodlark Basin, Petrology, 21, 550–560, https://doi.org/10.1134/S0869591113060039, 2013. a, b
Gerya, T. V. and Yuen, D. A.: Characteristics-based marker-in-cell method with conservative finite-differences schemes for modeling geological flows with strongly variable transport properties, Phys. Earth Planet. In., 140, 293–318, https://doi.org/10.1016/j.pepi.2003.09.006, 2003. a, b
Giacomini, M. and Sevilla, R.: A second-order face-centred finite volume method on general meshes with automatic mesh adaptation, Int. J. Numer. Meth. Eng., 121, 5227–5255, https://doi.org/10.1002/nme.6428, 2020. a
Goldade, R., Wang, Y., Aanjaneya, M., and Batty, C.: An adaptive variational finite difference framework for efficient symmetric octree viscosity, ACM Trans. Graph., 38, 1–14, https://doi.org/10.1145/3306346.3322939, 2019. a
Gray, T. S., Tackley, P. J., and Gerya, T.: Variational Stokes method applied to free surface boundaries in numerical geodynamic models, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2025-6354, 2025. a
Halter, W. R., Macherel, E., and Schmalholz, S. M.: A simple computer program for calculating stress and strain rate in 2D viscous inclusion-matrix systems, J. Struct. Geol., 160, 104617, https://doi.org/10.1016/j.jsg.2022.104617, 2022. a, b, c
Harlow, F. H. and Welch, J. E.: Numerical Calculation of Time-Dependent Viscous Incompressible Flow of Fluid with Free Surface, Phys. Fluids, 8, 2182–2189, https://doi.org/10.1063/1.1761178, 1965. a
Hassani, R., Jongmans, D., and Chéry, J.: Study of plate deformation and stress in subduction processes using two-dimensional numerical models, J. Geophys. Res.-Sol. Ea., 102, 17951–17965, https://doi.org/10.1029/97JB01354, 1997. a
Hillebrand, B., Thieulot, C., Geenen, T., van den Berg, A. P., and Spakman, W.: Using the level set method in geodynamical modeling of multi-material flows and Earth's free surface, Solid Earth, 5, 1087–1098, https://doi.org/10.5194/se-5-1087-2014, 2014. a
Joldes, G. R., Wittek, A., and Miller, K.: An adaptive Dynamic Relaxation method for solving nonlinear finite element problems. Application to brain shift estimation, Int. J. Numer. Meth. Bio., 27, 173–185, https://doi.org/10.1002/cnm.1407, 2011. a, b
Kaus, B. J. P., Popov, A. A., Baumann, T. S., Püsök, A. E., Bauville, A., Fernandez, N., and Collignon, M.: Forward and Inverse Modelling of Lithospheric Deformation on Geological Timescales, in: NIC Symposium 2016, edited by: Binder, K., Müller, M., Kremer, A., and Schnurpfeil, A., Vol. 48, 299–307, Forschungszentrum Jülich, Jülich, ISBN 978-3-95806-109-5, 2016. a
Kiran, U., Sharma, D., and Gautam, S. S.: GPU-warp based finite element matrices generation and assembly using coloring method, J. Comput. Des. Eng., 6, 705–718, https://doi.org/10.1016/j.jcde.2018.11.001, 2019. a
Kronbichler, M., Heister, T., and Bangerth, W.: High Accuracy Mantle Convection Simulation through Modern Numerical Methods, Geophys. J. Int., 191, 12–29, https://doi.org/10.1111/j.1365-246X.2012.05609.x, 2012. a
Larionov, E., Batty, C., and Bridson, R.: Variational stokes: a unified pressure-viscosity solver for accurate viscous liquids, ACM T. Graphic., 36, 1–11, https://doi.org/10.1145/3072959.3073628, 2017. a
Luisier, C., Tajčmanová, L., Yamato, P., and Duretz, T.: Garnet microstructures suggest ultra-fast decompression of ultrahigh-pressure rocks, Nat. Commun., 14, 1–8, https://doi.org/10.1038/s41467-023-41310-w, 2023. a
Macherel, E., Räss, L., and Schmalholz, S. M.: 3D Stresses and Velocities Caused by Continental Plateaus: Scaling Analysis and Numerical Calculations With Application to the Tibetan Plateau, Geochem. Geophy. Geosy., 25, e2023GC011356, https://doi.org/10.1029/2023GC011356, 2024. a
May, D., Brown, J., and Le Pourhiet, L.: A scalable, matrix-free multigrid preconditioner for finite element discretizations of heterogeneous Stokes flow, Comput. Method. Appl. M., 290, 496–523, https://doi.org/10.1016/j.cma.2015.03.014, 2015. a, b
May, D. A. and Moresi, L.: Preconditioned iterative methods for Stokes flow problems arising in computational geodynamics, Phys. Earth Planet. In., 171, 33–47, https://doi.org/10.1016/j.pepi.2008.07.036, 2008. a
May, D. A., Brown, J., and Pourhiet, L. L.: pTatin3D: High-Performance Methods for Long-Term Lithospheric Dynamics, in: SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 274–284, https://doi.org/10.1109/SC.2014.28, 2014. a
Oakley, D. R. and Knight, N. F.: Adaptive dynamic relaxation algorithm for non-linear hyperelastic structures Part I. Formulation, Comput. Method. Appl. M., 126, 67–89, https://doi.org/10.1016/0045-7825(95)00805-B, 1995. a, b, c, d, e, f
Omlin, S., Räss, L., and Utkin, I.: Distributed Parallelization of xPU Stencil Computations in Julia, Proceedings of the JuliaCon Conferences, 137 pp., https://doi.org/10.21105/jcon.00137, 2022. a, b
Papadrakakis, M.: A method for the automatic evaluation of the dynamic relaxation parameters, Comput. Method. Appl. M., 25, 35–48, https://doi.org/10.1016/0045-7825(81)90066-9, 1981. a, b, c
Poliakov, A. N. B., Cundall, P. A., Podladchikov, Y. Y., and Lyakhovsky, V. A.: An Explicit Inertial Method for the Simulation of Viscoelastic Flow: An Evaluation of Elastic Effects on Diapiric Flow in Two- and Three- Layers Models, in: Flow and Creep in the Solar System: Observations, Modeling and Theory, 175–195, Springer, Dordrecht, the Netherlands, ISBN 978-94-015-8206-3, https://doi.org/10.1007/978-94-015-8206-3_12, 1993. a
Popov, A. A. and Sobolev, S. V.: SLIM3D: A Tool for Three-dimensional Thermomechanical Modeling of Lithospheric Deformation with Elasto-visco-plastic Rheology, Phys. Earth Planet. In., 171, 55–75, https://doi.org/10.1016/j.pepi.2008.03.007, 2008. a
Räss, L. and Duretz, T.: PTsolvers/AutoTuningGMD: Release 1.0, Zenodo [code], https://doi.org/10.5281/zenodo.20269899, 2025. a
Räss, L., Duretz, T., Podladchikov, Y. Y., and Schmalholz, S. M.: M2Di: Concise and efficient MATLAB 2-D Stokes solvers using the Finite Difference Method, Geochem. Geophy. Geosy., 18, 755–768, https://doi.org/10.1002/2016GC006727, 2017. a, b, c
Räss, L., Duretz, T., and Podladchikov, Y. Y.: Resolving hydro-mechanical coupling in two and three dimensions: Spontaneous channelling of porous fluids owing to decompaction weakening, Geophys. J. Int., https://doi.org/10.1093/gji/ggz239, 2019. a
Räss, L., Utkin, I., Duretz, T., Omlin, S., and Podladchikov, Y. Y.: Assessing the robustness and scalability of the accelerated pseudo-transient method, Geosci. Model Dev., 15, 5757–5786, https://doi.org/10.5194/gmd-15-5757-2022, 2022. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o
Rezaiee-Pajand, M. and Sarafrazi, S. R.: Nonlinear dynamic structural analysis using dynamic relaxation with zero damping, Comput. Struct., 89, 1274–1285, https://doi.org/10.1016/j.compstruc.2011.04.005, 2011. a
Rudi, J., Stadler, G., and Ghattas, O.: Weighted BFBT Preconditioner for Stokes Flow Problems with Highly Heterogeneous Viscosity, SIAM J. Sci. Comput., 39, S272–S297, https://doi.org/10.1137/16M108450X, 2017. a
Samuel, H. and Evonuk, M.: Modeling advection in geophysical flows with particle level sets, Geochem. Geophy. Geosy., 11, Q08020, https://doi.org/10.1029/2010GC003081, 2010. a
Sanan, P., May, D. A., Bollhöfer, M., and Schenk, O.: Pragmatic solvers for 3D Stokes and elasticity problems with heterogeneous coefficients: evaluating modern incomplete LDLT preconditioners, Solid Earth, 11, 2031–2045, https://doi.org/10.5194/se-11-2031-2020, 2020. a
Schmalholz, S. M. and Podladchikov, Y. Yu.: Strain and competence contrast estimation from fold shape, Tectonophysics, 340, 195–213, https://doi.org/10.1016/S0040-1951(01)00151-2, 2001. a
Schmeling, H., Babeyko, A. Y., Enns, A., Faccenna, C., Funiciello, F., Gerya, T., Golabek, G. J., Grigull, S., Kaus, B. J. P., Morra, G., Schmalholz, S. M., and van Hunen, J.: A benchmark comparison of spontaneous subduction models – Towards a free surface, Phys. Earth Planet. Int., 171, 198–223, https://doi.org/10.1016/j.pepi.2008.06.028, 2008. a, b
Sevilla, R. and Duretz, T.: A face-centred finite volume method for high-contrast Stokes interface problems, Int. J. Numer. Meth. Eng., 124, 3709–3732, https://doi.org/10.1002/nme.7294, 2023. a, b
Sevilla, R. and Duretz, T.: Face-centred finite volume methods for Stokes flows with variable viscosity, Int. J. Numer. Meth. Eng., 125, e7450, https://doi.org/10.1002/nme.7450, 2024. a
Sevilla, R., Giacomini, M., and Huerta, A.: A face-centred finite volume method for second-order elliptic problems, Int. J. Numer. Meth. Eng., 115, 986–1014, https://doi.org/10.1002/nme.5833, 2018. a, b, c
Shih, Y.-h., Stadler, G., and Wechsung, F.: Robust Multigrid Techniques for Augmented Lagrangian Preconditioning of Incompressible Stokes Equations with Extreme Viscosity Variations, SIAM J. Sci. Comput., https://doi.org/10.1137/21M1430698, 2022. a, b
Shin, D. and Strikwerda, J. C.: Inf-sup conditions for finite-difference approximations of the stokes equations, ANZIAM J., 39, 121–134, https://doi.org/10.1017/S0334270000009255, 1997. a
Spang, A., Thielmann, M., de Montserrat, A., and Duretz, T.: Transient Propagation of Ductile Ruptures by Thermal Runaway, J. Geophys. Res.-Sol. Ea., 130, e2025JB031240, https://doi.org/10.1029/2025JB031240, 2025. a
Tackley, P. J.: Effects of strongly variable viscosity on three-dimensional compressible convection in planetary mantles, J. Geophys. Res.-Sol. Ea., 101, 3311–3332, https://doi.org/10.1029/95JB03211, 1996. a, b
Tackley, P. J.: Modelling compressible mantle convection with large viscosity contrasts in a three-dimensional spherical shell using the yin-yang grid, Phys. Earth Planet. In., 171, 7–18, https://doi.org/10.1016/j.pepi.2008.08.005, 2008. a
Tackley, P. J.: Tectono-Convective Modes on Earth and Other Terrestrial Bodies, in: Dynamics of Plate Tectonics and Mantle Convection, 159–180, Elsevier, Walthm, MA, USA, ISBN 978-0-323-85733-8, https://doi.org/10.1016/B978-0-323-85733-8.00006-8, 2023. a
Vieira, L. M., Giacomini, M., Sevilla, R., and Huerta, A.: A second-order face-centred finite volume method for elliptic problems, Comput. Method. Appl. M., 358, 112655, https://doi.org/10.1016/j.cma.2019.112655, 2020. a
Yarushina, V. M. and Podladchikov, Y. Y.: (De)compaction of porous viscoelastoplastic media: Model formulation, J. Geophys. Res.-Sol. Ea., 120, 4146–4170, https://doi.org/10.1002/2014JB011258, 2015. a
Zhong, S., McNamara, A., Tan, E., Moresi, L., and Gurnis, M.: A benchmark study on mantle convection in a 3-D spherical shell using CitcomS, Geochem. Geophy. Geosy., 9, https://doi.org/10.1029/2008GC002048, 2008. a
- Abstract
- Introduction
- Mechanical equilibrium
- Discretizations
- Dynamic Relaxation with automatic tuning
- Solution of mechanical problems: a decoupled Powell–Hestenes strategy
- Solution of mechanical problems: a pathological case
- Compressibility, multi-physics and nonlinear models
- Discussion
- Conclusions
- Appendix A: Three-dimensional multiple inclusion problem
- Appendix B: Two-dimensional heterogeneous Stokes flow with FD and FCFV
- Appendix C: Comparison with the pseudo-transient formulation
- Appendix D: Comparison with the pseudo-transient formulation and the results of Räss et al. (2022)
- Code and data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References
- Abstract
- Introduction
- Mechanical equilibrium
- Discretizations
- Dynamic Relaxation with automatic tuning
- Solution of mechanical problems: a decoupled Powell–Hestenes strategy
- Solution of mechanical problems: a pathological case
- Compressibility, multi-physics and nonlinear models
- Discussion
- Conclusions
- Appendix A: Three-dimensional multiple inclusion problem
- Appendix B: Two-dimensional heterogeneous Stokes flow with FD and FCFV
- Appendix C: Comparison with the pseudo-transient formulation
- Appendix D: Comparison with the pseudo-transient formulation and the results of Räss et al. (2022)
- Code and data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References