the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Automatic adjointbased inversion schemes for geodynamics: reconstructing the evolution of Earth's mantle in space and time
Sia Ghelichkhan
Angus Gibson
D. Rhodri Davies
Stephan C. Kramer
David A. Ham
Reconstructing the thermochemical evolution of Earth's mantle and its diverse surface manifestations is a widely recognised grand challenge for the geosciences. It requires the creation of a digital twin: a digital representation of Earth's mantle across space and time that is compatible with available observational constraints on the mantle's structure, dynamics and evolution. This has led geodynamicists to explore adjointbased approaches that reformulate mantle convection modelling as an inverse problem, in which unknown model parameters can be optimised to fit available observational data. Whilst there has been a notable increase in the use of adjointbased methods in geodynamics, the theoretical and practical challenges of deriving, implementing and validating adjoint systems for largescale, nonlinear, timedependent problems, such as global mantle flow, has hindered their broader use. Here, we present the Geoscientific ADjoint Optimisation PlaTform (GADOPT), an advanced computational modelling framework that overcomes these challenges for coupled, nonlinear, timedependent systems by integrating three main components: (i) Firedrake, an automated system for the solution of partial differential equations using the finiteelement method; (ii) DolfinAdjoint, which automatically generates discrete adjoint models in a form compatible with Firedrake; and (iii) the Rapid Optimisation Library, ROL, an efficient largescale optimisation toolkit; GADOPT enables the application of adjoint methods across geophysical continua, showcased herein for geodynamics. Through two sets of synthetic experiments, we demonstrate the application of this framework to the initial condition problem of mantle convection, in both square and annular geometries, for both isoviscous and nonlinear rheologies. We confirm the validity of the gradient computations underpinning the adjoint approach, for all cases, through secondorder Taylor remainder convergence tests and subsequently demonstrate excellent recovery of the unknown initial conditions. Moreover, we show that the framework achieves theoretical computational efficiency. Taken together, this confirms the suitability of GADOPT for reconstructing the evolution of Earth's mantle in space and time. The framework overcomes the significant theoretical and practical challenges of generating adjoint models and will allow the community to move from idealised forward models to datadriven simulations that rigorously account for observational constraints and their uncertainties using an inverse approach.
 Article
(9094 KB)  Fulltext XML
 BibTeX
 EndNote
Mantle convection is the “engine” driving our dynamic planet. It is the principal control on Earth's thermal and chemical evolution and underpins tectonic and geological activity at Earth's surface (e.g. Davies and Richards, 1992; Coltice et al., 2017). Through interactions with Earth's crust, it introduces heat and fluids that contribute to the formation and concentration of ore deposits (e.g. Hoggard et al., 2020). Mantle flow also induces vertical movements of Earth's surface (socalled dynamic topography; see Davies et al., 2023, for a review), leading to regional and global changes in sea level and climate (e.g. Poore et al., 2006; Moucha et al., 2008; Cloetingh and Haq, 2015). The lithosphere, considered here to be the mantle's upper thermal boundary layer, serves as a window into the form and timedependence of mantle convection, recorded through tectonic plate motions (e.g. Iaffaldano and Bunge, 2015; Müller et al., 2016; Stotz et al., 2017, 2018; Wang et al., 2023). Although substantial progress has been made in reconstructing the history of plate tectonic motions (e.g. Seton et al., 2012; Gurnis et al., 2012; Müller et al., 2019; Merdith et al., 2021), the quest for a dynamic reference, revealing the force equilibria within the underlying mantle, remains ongoing. In other words, the veracity of plate motion reconstructions is not matched by an equivalent knowledge of the thermochemical structure and flow history of the underlying mantle. This is a major limitation, as it inhibits our ability to understand fundamental processes that depend on timedependent interactions between Earth's surface and its deep interior.
The primary challenge in reconstructing past mantle flow stems from the lack of knowledge surrounding the initial state, compounded by uncertain physical and chemical parameters. Mantle convection is an initial condition problem, uniquely determined by an initial condition some time in the past: starting from some point in time, it can be uniquely modelled by solving conservation equations for mass, momentum and energy (e.g. Ricard, 2007; Zhong et al., 2007). However, the lack of knowledge on this initial condition – specifically the thermochemical state of Earth's mantle at some time in the past – renders reconstructions of mantle flow through conventional forward calculations intractable (Bunge et al., 2003) (Fig. 1a). Moreover, current global mantle convection models employ billions of degrees of freedom and require multiple time steps to resolve the multiscale dynamics of Earth's mantle (e.g. Davies and Davies, 2009; Wolstencroft et al., 2009; Stadler et al., 2010; Weismüller et al., 2015; Dannberg and Gassmöller, 2018; Bauer et al., 2019, 2020). Owing to the resulting computational expense, the use of conventional geophysical inverse methods, including Monte Carlo techniques (e.g. Sambridge and Mosegaard, 2002), is considered impractical for determining the mantle's past structure, dynamics and evolution.
The initial condition problem can be partially addressed through sequential dataassimilation techniques. In essence, the objective of sequential data assimilation is to leverage all accessible information to improve predictions of mantle flow in space and time. Data assimilation is commonly achieved through sequential filtering (e.g. Wunsch, 1996), in which the model is advanced in time over the period in which observations exist. Whenever observations become available, the model is adjusted or “corrected” (e.g. Bunge and Grand, 2000; Bocher et al., 2016). The magnitude of the correction can be optimally determined using methods such as the Kalman filter (e.g. Bocher et al., 2018), with the model subsequently restarted from the updated state, and this process is repeated until all available information has been utilised (Fig. 1b). In a geodynamic context, the most commonly exploited dataset consists of palaeosurface velocities from plate tectonic reconstruction models, primarily through a kinematic surface boundary condition in timedependent models. In recent years, there has been a notable increase in the use of this approach (e.g. Bunge et al., 2002; Davies et al., 2012; Bower et al., 2013; Zhong and Rudolph, 2015; Nerlich et al., 2016; Young et al., 2022; Panton et al., 2023). This can be attributed to two main factors: (i) improved confidence in the validity of plate tectonic reconstruction models and their extension further back in time (e.g. Merdith et al., 2021; Young et al., 2022; Müller et al., 2022) and (ii) the enhanced accessibility of such models, facilitated via opensource community frameworks like the GPlates project (e.g. Gurnis et al., 2012; Müller et al., 2018).
Sequential dataassimilation methods, however, come with an inherent limitation: due to the sequential nature of the assimilation process, each observation is incorporated to influence the model only at later times. Consequently, information propagates from the past into the future but cannot be transmitted back into the past. This drawback poses a significant limitation, as our knowledge of the mantle at the present day is substantially more detailed than at any other time. Thus, it becomes imperative to explore approaches that explicitly carry information backwards in time, or more precisely, enable the estimation of a timedependent model that best fits all available observational constraints.
Inverse geodynamics is a rapidly evolving field that embarks on this very idea. The foundation of this field is an optimisation approach, in which mantle convection modelling is reformulated as an inverse problem. Using inverse theory, unknown model parameters can be optimised to fit available observational data via the socalled adjoint method (e.g. Bunge et al., 2003; IsmailZadeh et al., 2004), through which the sensitivities of a performance measure (the socalled “objective functional”), with respect to model parameters (e.g. the choice of initial condition), can be computed. The resulting sensitivity information can be used to adjust model parameters, generating a model flow trajectory that matches observational constraints (e.g. presentday mantle thermochemical structure) (Fig. 1c). The geodynamic adjoint equations for reconstructing the initial condition have been derived for isochemical, incompressible (e.g. Bunge et al., 2003; Horbach et al., 2014), compressible (Ghelichkhan and Bunge, 2016) and thermochemical mantle flow (Ghelichkhan and Bunge, 2018). Moreover, the method has been enhanced for simultaneous recovery of initial temperature conditions and rheological parameters (Li et al., 2017) despite the inherent illposedness introduced in such a problem. Growing adoption of adjointbased methods within a broader geodynamic context is evidenced by their application in Reuber et al. (2020) to decipher subsurface structures and rheological parameters via inversion of principal stress directions and by Crawford et al. (2018) to quantify the sensitivity of postglacial sea level changes to lateral variations in mantle viscosity. Recent growth in computational power has led to multiple adjointbased mantle reconstructions, with a particular focus on regional geological events, as recorded in the geological record of the Americas (e.g. Spasojevic et al., 2009; Liu and Gurnis, 2010; Shephard et al., 2010) and the Atlantic realm (e.g. Colli et al., 2018). To supplement these findings, Ghelichkhan et al. (2021) undertook a systematic globalscale comparison between adjoint model predictions and independent geological constraints, with implications for the expected rates of gravitational and dynamic ellipticity changes resulting from convection within Earth's mantle (Ghelichkhan et al., 2018, 2020).
While substantial strides have been made in the application of adjointbased methods in geodynamics, there remain widespread obstacles to the broader use of these techniques within the geodynamic modelling community. One key challenge is the complexity involved with deriving, implementing and validating adjoint models for largescale, nonlinear, timedependent problems, core features of global mantle convection models. Owing to these difficulties, studies that have used the adjoint method to explore the thermochemical evolution of Earth's mantle often resort to a simplified strategy. This typically involves either a highly idealised treatment of mantle rheology (e.g. Colli et al., 2018; Ghelichkhan et al., 2021), omitting certain coupling terms in the adjoint equations, or doing both (e.g. Liu et al., 2008; Spasojevic et al., 2009; Liu and Gurnis, 2010; Shephard et al., 2010). Such simplifications limit the applicability of these results. The work of Li et al. (2017) is a notable exception to this trend, although its focus on a 2D rectangular domain, specifically aimed at reconstructing the dynamics of subduction, inherently limits its applicability to global mantle convection simulations. Furthermore, previous global applications of the adjoint approach have been hampered by their reliance on legacy community codes (e.g. Bunge et al., 2003; Liu et al., 2010; Shephard et al., 2010; Colli et al., 2018; Ghelichkhan et al., 2021). These codes are not easily extensible to different geometries or approximations of the underlying physics, employ solver strategies that have since been superseded, and often have limits on model resolution due to the fully structured discretisations encoded. Moreover, the complex and extensive lowlevel code implementation of coupled adjoint and forward calculations remains obscured, making it difficult to extend and validate, thus restricting the types of observational datasets that can be incorporated. An example of these datasets are the (uncertain) constraints provided by plate tectonic reconstruction models that are prescribed kinematically (e.g. Spasojevic et al., 2009; Shephard et al., 2010; Zhou and Liu, 2017; Colli et al., 2018; Ghelichkhan et al., 2021) as opposed to being formally incorporated through the objective functional. The kinematic prescription of the surface velocities, however, can only improve mantle reconstructions forward in time and therefore prohibits their influence on previous system states. In light of these limitations, there is a need for a general framework that can robustly handle the rheological complexities of Earth's mantle and is easily extensible and transferable to other problems in mantle and lithosphere dynamics. Such a framework must also be capable of utilising a variety of observational constraints to comprehensively unravel the historical evolution of Earth's interior and its diverse impacts at Earth's surface (Davies et al., 2023).
In this paper, we introduce the Geoscientific ADjoint Optimisation PlaTform (GADOPT), research software infrastructure that allows us to overcome these limitations. GADOPT is built around three stateoftheart software libraries – (i) Firedrake, an automated system for solving a range of partial differential equations using the finiteelement method (Rathgeber et al., 2016), recently validated for geodynamics (Davies et al., 2022); (ii) DolfinAdjoint, which automatically derives the discrete adjoint equations in a form compatible with Firedrake (Farrell et al., 2013a; Mitusch et al., 2019); and (iii) the Rapid Optimisation Library (ROL), a Trilinos package for performing highly efficient largescale optimisation (The ROL Project Team, 2022). When combined, they provide a geodynamic inversion framework that is highly efficient, with fully consistent forward and adjoint calculations that achieve theoretical computational efficiency.
We structure our paper as follows: in Sect. 2.1 we describe the geodynamic forward problem and our solution strategy using GADOPT. Section 2.2 describes the inverse problem considered herein, where we focus on finding the (unknown) initial condition using an objective functional that accounts for observations of surface velocity over time and the final state temperature field. This is broken into three subsections: (i) Sect. 2.2.1 describes discrete and continuous approaches for obtaining the adjoint systems followed by a detailed derivation of the discrete adjoint systems; (ii) Sect. 2.2.2 introduces DolfinAdjoint, the underlying approach utilised in GADOPT to compute the discrete adjoint and derivative fields; and (iii) Sect. 2.2.6 provides an overview of the gradientbased optimisation approach utilised here, facilitated by ROL. We demonstrate the applicability of our approach in Sect. 3 using two sets of twin experiments with increasing complexity, where we reconstruct the spatial and temporal evolution of a reference simulation. The first set of experiments (Sect. 3.1) involves a simple isoviscous simulation within an enclosed square domain, while the second set (Sect. 3.2) examines convection with a nonlinear (temperature, depth and strain rate dependent) rheology at Earthlike convective vigour within an annular domain, which has direct applicability to convection within Earth's mantle. We finish by discussing our results and conclusions in Sects. 4 and 5, respectively.
2.1 Forward problem
2.1.1 Governing equations and boundary conditions
Mantle flow is described by conservation laws for mass, momentum and energy. We solve these equations in their simplest form, assuming incompressibility and the Boussinesq approximation. The three nondimensional conservation equations are
with the vector field u and scalar fields p and T as the principal unknowns of velocity, pressure and temperature, respectively. Table 1 summarises other symbols used in Eqs. (1a)–(1c) and elsewhere. In Eq. (1a), the strainrate tensor $\dot{\mathit{\u03f5}}\left(\mathit{u}\right)$ is given by
and the Rayleigh number is defined by
For this problem, we define the time interval of interest as $I=[{t}_{\mathrm{I}},{t}_{\mathrm{F}}]$, with the computational domain V bounded by ∂V, and S and C denoting the top and bottom boundaries, respectively. For all simulations, freeslip and impermeable velocity boundary conditions are specified on all boundaries, whilst temperature boundary conditions are set to constant values of T_{S} and T_{C} at top and bottom boundaries, respectively. For the simulations considered in an enclosed square domain, natural temperature boundary conditions (zero heat flux) are specified on side walls. The set of boundary conditions, for top and bottom boundaries, are listed in Eqs. (4a)–(4d):
In Eqs. (4a)–(4d), n denotes the outer normal vector and s any tangential vector. Finally, as mantle convection is an initial value problem, we require a prescribed temperature field at initial time t_{I}:
2.1.2 Solution strategy: leveraging Firedrake through GADOPT
We use the finiteelement method to solve the coupled system of partial differential equations presented in Eqs. (1a)–(1c). For the Stokes system, we use a Q_{2}−Q_{1} (piecewise biquadratic and bilinear, respectively) finiteelement pair for velocity and pressure, with Q_{2} elements used for temperature. We strongly impose Dirichlet boundary conditions for temperature at top and bottom boundaries. Freeslip velocity boundary conditions are imposed in two ways: (i) in our square domain cases, we impose strong Dirichlet boundary conditions for u; (ii) in our annular domain cases, where boundaries do not align with Cartesian directions, we employ a symmetric Nitsche penalty method (Nitsche, 1971), which weakly enforces boundary conditions via a modification of the variational formulation. An implicit midpoint scheme is used for time integration in the energy equation.
To solve the coupled system of forward (and adjoint) equations, we employ Firedrake, which provides an automated system designed for the solution of partial differential equations using the finiteelement method (Ham et al., 2023). It incorporates principles and some aspects of the codebase from the FEniCS project (Logg et al., 2012), including its use of the Unified Form Language (UFL) (Alnæs et al., 2014) for the representation of variational problems. UFL is a highlevel language utilised for the symbolic description of the governing equations in a form that closely mimics their mathematical formulation. The key advantage of UFL in this work lies in its highlevel abstraction, which allows for an innovative automatic derivation approach when deriving adjoint systems. We will address the significance of UFL in automatic derivation of adjoint systems in Sect. 2.2.2.
Firedrake provides an array of features that are particularly conducive to tackling problems in geophysical fluid dynamics. Key among these features is support for a variety of finiteelement discretisations, including a highly efficient implementation of discretisations based on extruded meshes; programmable nonlinear solves; and operatoraware solver preconditioners that can be combined in a flexible manner to create linear or nonlinear systems, which are solved by PETSc (e.g. Balay et al., 1997; Dalcin et al., 2011; Balay et al., 2023). The suitability of Firedrake for geodynamics has been demonstrated via comparison with a comprehensive set of analytical solutions and community benchmarks in Davies et al. (2022). We refer the reader to this study, alongside Rathgeber et al. (2016), for a more indepth discussion on Firedrake and its dependencies, alongside an outline of the solution strategy employed herein.
Building upon the foundations provided by Firedrake, GADOPT introduces an array of features tailored for cuttingedge geodynamical simulations. It integrates smoothly with Firedrake's advanced dataloading capabilities, enabling finiteelement consistent pointdata interpolation (NixonHill et al., 2023) and facilitating the integration of diverse observational and physical datasets. Various nonlinear rheological laws can be effortlessly incorporated using the symbolic representation provided by UFL. GADOPT provides a selection of time discretisation schemes, including secondorder Runge–Kutta and backward/forward Euler methods. Visualisation capabilities are provided through integration with the “Visualization Toolkit” (VTK), allowing comprehensive analysis in ParaView, among other visualisation packages. Compatibility with meshes generated by gmsh and seamless integration with NETGEN provide straightforward support for a range of mesh configurations and adaptive mesh refinement. GADOPT offers a diverse array of boundary conditions, encompassing free and noslip conditions, a freesurface (Kramer et al., 2012), and kinematic forcing, for example, via seamless integration with pyGPlates (Gurnis et al., 2012; Müller et al., 2018). The collection of features, elaborately detailed in the Firedrake and GADOPT online documentation and referenced in Davies et al. (2022), highlights the platform's robustness and adaptability for geodynamical research.
2.2 Inverse problem: an optimisation approach
Representative reconstructions of the spatial and temporal evolution of mantle flow require knowledge of the initial condition (i.e. the thermochemical state of the mantle at some point in the past), the determination of which is an inverse problem. We therefore seek the bestfitting initial condition (T_{IC}) that results in the minimum of an objective functional that measures the difference between predictions and observations of mantle states and irregularity in the solutions. We use the following mathematical description of the objective functional:
The first term in Eq. (6) accounts for the misfit between model predictions and temperature recorded at the final instance, whilst the second term measures the misfit in surface velocities through time. The other terms are Tikhonov regularisation components (Tikhonov, 1963; Hansen, 1992), the first of which penalises deviations from an a priori depthaveraged profile, $\overline{T}$ (i.e. the damping term), and the second penalises the gradient of these deviations to produce less complex solutions (i.e. the smoothing term). For the damping term, we employ a depthdependent prefactor, G(r), that is zero within the midmantle but transitions to one in the thermal boundary layers. In these areas, lateral material transport and diffusive processes dominate, leading to diminished sensitivity between the choice of initial condition and the final temperature field. G(r), therefore, helps to minimise the amplitude of the solution within the top and bottom thermal boundary layers, removing noise that would otherwise diffuse over the simulation. The three weighting terms, β_{u}, β_{s} and β_{d}, in Eq. (6) are defined as
Note that Δt is the total duration of the simulation. The integrals in Eqs. (7) are employed to ensure normalised objective terms relative to the final temperature misfit (first term on the righthand side of Eq. 6). The three scaling parameters, α_{u}, α_{s} and α_{d}, can be set to adjust the importance of these terms relative to the final temperature misfit. We perform a parameter search to find the bestperforming combinations of α values.
For our inverse problem we are interested in optimising J described by Eq. (6). J depends on some “control” parameters, which in this case is the initial temperature field, and some “state” variables (i.e. surface velocity and the final temperature), which are solutions of the forward problem in Eqs. (1a)–(1c), with the forward system itself depending again on the control. To solve this problem, we define a “reduced” functional, which is a function of the initial condition, T_{IC}, alone. This reduced functional is typically defined by first solving the forward partial differential equations (PDEs) (Eqs. 1a–1c) for a given value of the control and then substituting the solutions into the expression for the functional (Eq. 6). The result is a functional that depends only on the control parameters, not the state variables directly (hence the name reduced).
Nonlinear optimisation methods provide the means to find the optimal T_{IC} by minimising a reduced functional defined by J. These methods are iterative. They begin with an initial guess of T_{IC} to generate a sequence of improved estimates (called iterates) until certain conditions, e.g. residual tolerance, in the objective functional are achieved. Crucial for the efficiency of these iterations are derivatives of Eq. (6) with respect to T_{IC}. Owing to the large number of unknowns in 3D spherical mantle convection models with Earthlike parameters, obtaining the derivative by means of classical finite differencing techniques is impractical. The adjoint method serves as a mathematically elegant and computationally efficient way to obtain the derivatives (e.g. Talagrand, 1997; Giles and Pierce, 2000; Plessix, 2006; Hinze et al., 2008).
2.2.1 Continuous versus discrete adjoints
Approaches to deriving, implementing and obtaining derivatives using the adjoint method primarily fall into two categories: (i) continuous, or the differentiatethendiscretise approach, and (ii) discrete, or the discretisethendifferentiate approach (Gunzburger, 2000).
The continuous approach commences with the derivation of the adjoint equations. The resemblance between the forward and resulting adjoint equations allows the adjoint PDEs to be discretised in a consistent manner and, subsequently, implemented within a numerical framework. By deriving the continuous adjoint PDEs, one can develop an understanding of the key characteristics of adjoint sensitivities, the physical implications of individual terms in the PDEs and their boundary conditions. Moreover, the continuous method affords complete autonomy in the discretisation and implementation of the adjoint system, often leading to simplified but more costeffective approximations of the solutions, as demonstrated by IsmailZadeh et al. (2004).
By contrast, the discrete approach relies on already discretised forward equations and then differentiates and transposes them to obtain the adjoint equations. This method's primary advantage lies in maintaining consistency in spatial and temporal discretisations, allowing for the automatic determination of the exact gradient of the (discrete) objective functional (Giering and Kaminski, 1998; Gunzburger, 2002). Such consistency ensures full convergence of secondorder Newtonian optimisation methods and simplifies the debugging of adjoint programs (Giles and Pierce, 2000). For example, with the discrete approach, even minor inconsistencies in the derivative can highlight numerical or programming errors that must be rectified. It also permits the automatic creation of the adjoint program, stemming from the property that a transposed (adjoint) matrix shares the same eigenvalues with the original linear matrix, ensuring convergence for the adjoint problem's iterative solution methods (Giles and Pierce, 2000). This advantage has facilitated the development of various automatic differentiation (AD) tools in recent decades, including those used in TensorFlow (Abadi et al., 2015), PyTorch (Paszke et al., 2019) and Enzyme (Moses et al., 2022). It is essential to note that the continuous and discrete methods are equivalent in the limit of infinite spatial and temporal resolution. However, in practical terms, it is the discrete method that typically provides more accurate gradient information. For more details on both approaches we refer the reader to Giles and Pierce (2000) and Gunzburger (2002).
2.2.2 DolfinAdjoint
Robust and efficient derivative calculations for largescale simulations using automatic differentiation is challenging and often too slow for the purpose of largescale optimisation problems (Naumann, 2011). This inefficiency is often attributed to the usually employed approach of treating a numerical model as a sequence of elementary instructions such as addition, multiplication or exponentiation, known as blocks. Once the AD tool establishes a sequence of blocks with their dependencies (a process often called taping), each block is individually differentiated, and one arrives at the derivative of the entire model using the chain rule.
DolfinAdjoint uses an innovative approach to achieve theoretical efficiency by using socalled operator overloading differentiation (Tijskens et al., 2002). By leveraging the highlevel mathematical language used by Firedrake and FEniCS (UFL), DolfinAdjoint performs the taping process at the highest abstraction level. This can result in blocks that symbolise whole PDE system solves, for which the adjoint derivation is performed at the same level of abstraction. The derived adjoint operation to a block is itself a Firedrake operation. This facilitates the generation of the lowlevel adjoint code using the same finiteelement form compiler as the forward model; i.e. while the taping operation is, in essence, similar to the fundamental abstraction in automatic differentiation techniques, DolfinAdjoint operates at a higher level of abstraction and, accordingly, can achieve maximum efficiency and robustness. Consequently, platforms such as Firedrake and FEniCS have incorporated DolfinAdjoint for precise derivative computations, which are essential for solving various optimisation and inverse problems. We note the pursuit of similar automatic differentiation techniques, which have been central to the development of numerical tools like PETSc's TSAdjoint (Zhang et al., 2022) and JuliaAdFEM (Xu and Darve, 2022).
To efficiently compute derivatives, DolfinAdjoint tracks and manipulates the computational operations involved in solving the forward and adjoint problem. Central to DolfinAdjoint's design is the Tape
class, responsible for recording a series of highlevel finiteelement operations – such as (non)linear solutions, functions and interpolations – that map the initial temperature field T_{IC} into an objective functional J through a series of operations. Operations executed during the forward problem are encapsulated within instances of Block
subclasses, each representing a distinct computational step. These Block
instances, by keeping track of their inputs and outputs through BlockVariable
instances, establish a computational graph that is used by DolfinAdjoint to navigate to compute adjoints. To seamlessly integrate this functionality, DolfinAdjoint employs operator overloading, allowing the creation of overloaded functions and data types that, to the user, behave identically to their original counterparts but are augmented to support automatic differentiation. This approach enables the overloading of data types through inheritance from both the original data type and the OverloadedType
class in DolfinAdjoint, ensuring that each data object carries a BlockVariable
for tracking its computational lineage. The elegance of DolfinAdjoint's design lies in its ability to abstract the computational details, allowing the scientific end user to focus on designing the forward problem and simultaneously leverage the computational efficiency of automatic differentiation. Mimicking Firedrake's strategy in utilising DolfinAdjoint, the forward operations in GADOPT are overloaded by importing the inverse module of GADOPT (gadopt.inverse
), which will be used to populate a tape and establish a ReducedFunctional
, providing the necessary functionalities for computing a functional and its derivative with respect to controls.
2.2.3 Discrete forward model
As explained in more detail in Davies et al. (2022), the forward geodynamical model can be described as a series of linear and nonlinear solutions. Although we solve for temperature using Q_{2} elements, we choose the control T_{IC} to be in the Q_{1} function space as a means to regularise the inversion problem. This means we need to project T_{IC} to the discrete function space Q=Q_{2} to obtain a temperature T^{0} used in the first time step. This can be formulated as solving the following system for T^{0}:
where q are test functions in Q. Subsequently we solve in each time step $n=\mathrm{0},\mathrm{\dots}N\mathrm{1}$ the following two systems for u^{n},p^{n} and T^{n+1}:
where V and W are the discrete function spaces for velocity and pressure (here $V={\left[{Q}_{\mathrm{2}}\right]}^{\text{dim}},W={Q}_{\mathrm{1}}$) with test functions v and w. T^{n+θ} is the weighted average $\mathit{\theta}{T}^{n+\mathrm{1}}+(\mathrm{1}\mathit{\theta}){T}^{n}$. Note that we assume a strain rate and temperaturedependent rheology and thus write $\mathit{\eta}=\mathit{\eta}(\mathit{u},T)$. This makes Eq. (9) a nonlinear system, which we solve through Newton's method. The discrete functional is calculated as
2.2.4 Calculating gradients using the adjoint method
We denote the entire forward solution as $z=({T}^{\mathrm{0}},\mathrm{\dots}{T}^{N},{\mathit{u}}^{\mathrm{0}},\mathrm{\dots}{\mathit{u}}^{N\mathrm{1}},{p}^{\mathrm{0}},\mathrm{\dots}{p}^{N\mathrm{1}})$ so that the functional can be written as a function J(z,T_{IC}) of z and the control T_{IC}. The forward solution itself is also dependent on T_{IC}, as for each choice of T_{IC} we can solve the forward model to obtain z(T_{IC}). We define the reduced functional $\widehat{J}$ as
Thus we can reformulate the PDEconstrained minimisation problem,
minimise J(z,T_{IC}) under the constraints (8–10),
as an unconstrained minimisation problem for $\widehat{J}\left({T}_{\mathrm{IC}}\right)$. To use efficient gradientbased optimisation algorithms we do, however, need a means of computing its gradient, for which we will employ the adjoint method. In addition to the forward solution z, we also define an adjoint solution $\mathit{\lambda}=({\mathrm{\Psi}}^{\mathrm{0}},\mathrm{\dots}{\mathrm{\Psi}}^{N},{\mathit{\varphi}}^{\mathrm{0}},\mathrm{\dots}{\mathit{\varphi}}^{N\mathrm{1}},{\mathit{\xi}}^{\mathrm{0}}\mathrm{\dots}{\mathit{\xi}}^{N\mathrm{1}})$, where each component is associated with one of the constraints: Ψ^{0}∈Q with Eq. (8), ${\mathit{\varphi}}^{n}\in V,{\mathit{\xi}}^{n}\in W$ with Eq. (9) and ${\mathrm{\Psi}}^{n+\mathrm{1}}\in Q$ with Eq. (10) for $n=\mathrm{0},\mathrm{\dots}N\mathrm{1}$. Using these we define the following sum of the constraints, with each constraint weighted by the corresponding adjoint solution λ:
Since, by definition, for any choice of T_{IC} the associated forward solution z(T_{IC}) satisfies all constraints, we have
and thus, for any choice of λ we have
If we choose λ to be the solution to the following, socalled adjoint equation,
we can work out the gradient of the reduced functional:
2.2.5 Discrete backward model
Although the adjoint Eq. (15) is derived symbolically from the forward model (Eqs. 8–10) and solved for fully automatically by DolfinAdjoint, we here briefly work out the discrete adjoint equations to show that these equations can still be interpreted as the solution process of a backwardsintime PDE, similar to the continuous adjoint approach, but that a specific time discretisation is derived, which is necessary to obtain a gradient that is consistent with the discrete forward model. Split out by component, the adjoint equations read
which can be solved for Ψ^{n},ϕ^{n} and ξ^{n}.
These equations can be solved by going backwards through the time steps $n=N\mathrm{1}\mathrm{\dots}\mathrm{0}$. In each, we first solve Eq. 22 for Ψ^{n+1}. Starting at the last time step $n=N\mathrm{1}$, we get
where we have applied the gradient of F with respect to T^{N} to an arbitrary perturbation δT. This allows us to interpret this equation as a weak form, tested with δT, that we can solve for Ψ^{N}:
Defining an auxiliary ${\mathrm{\Psi}}^{N+\mathrm{1}}=\mathrm{\Delta}t({T}^{N}{T}_{\mathrm{obs}})$ and integrating the advection term by parts, we get
which for θ=1 we may recognise an advection–diffusion equation run backwards in time.
For $n<N\mathrm{1}$, Eq. (22) contains more terms:
as the energy equation in both forward time steps n and n+1 depends on T^{n+1}, as does the Stokes system in time step n+1. Going backwards through the equations, however, we can still solve for Ψ^{n+1}, associated with time step n, as we have already solved for ${\mathit{\varphi}}^{n+\mathrm{1}},{\mathit{\xi}}^{n+\mathrm{1}}$ and Ψ^{n+2} associated with time step n+1. Note that the $\partial J/\partial {T}^{n+\mathrm{1}}$ term vanishes in this case, as J does not explicitly depend on intermediate temperature solutions. Similar to Eq. (25), we may rewrite it to
which we can interpret as a backwardintime θweighted advection diffusion step for Ψ, with source terms associated with sensitivity of the rheology and buoyancy to temperature. Note, however, that here in the advection term we weight both Ψ and u with θ in contrast to the forward time step Eq. (10).
After solving for Ψ^{n+1}, we can solve Eq. (20) together with Eq. (21) for ϕ^{n} and ξ^{n}:
for all velocity and pressure perturbations δu∈V and δp∈W. This leads to the following weakform equation for ϕ^{n} and ξ^{n}:
The lefthand side is similar to the Stokes system in Eq. (9), except for an additional $\partial \mathit{\eta}/\partial \mathit{u}$ term, alongside the fact that this is now just a linear system as the rheology only depends on the forward variables u^{n} and T^{n}. Instead of a buoyancy term, we now have forcing terms associated with sensitivity of temperature advection and the mismatch with observed velocities at the surface.
Finally, after having solved either Eq. (25) ($n=N\mathrm{1}$) or Eq. (27) ($n<N\mathrm{1}$) for Ψ^{n+1} and Eq. (29) for ϕ^{n} and ξ^{n} going backward through the time steps $n=N\mathrm{1}\to n=\mathrm{0}$, we can solve Eq. (19) for Ψ^{0}:
which can be worked out as a projection:
Finally, the gradient of the reduced functional with respect to the control is obtained by
2.2.6 Gradientbased nonlinear optimisation
To find the solution to the inverse problem, the gradient fields from Firedrake and DolfinAdjoint can be redirected to an optimisation package with a Python interface (e.g. scipy.optimize; Virtanen et al., 2020). However, the majority of wellestablished optimisation packages are hardcoded to apply the Euclidean (l^{2}) inner product for optimisationspecific operations. l^{2}, typically referred to as a sequence space, is often used in signal processing or discrete mathematics where functions are treated as sequences, i.e. discrete data, and do not represent continuous functions. The Euclidean inner product is therefore not suitable for finiteelement functionbased optimisation, and, unlike L^{2} or Sobolev spaces, it cannot produce mesh and/or basisfunctionindependent convergence (Schwedes et al., 2017). The Rapid Optimisation Library (ROL; The ROL Project Team, 2022), a Trilinos package for largescale optimisation, resolves this issue by introducing a generic interface for data structures that can be overloaded to perform innerproduct aware operations (e.g. in L^{2} space) and achieve meshindependent convergence results (Schwedes et al., 2016). ROL has been primarily used for the solution of optimal design, optimal control and inverse problems in largescale engineering applications (Iglesias et al., 2018; Kouri et al., 2021a, b, 2023) and has a comprehensive catalogue of gradientbased optimisation algorithms. For the results presented herein, we use the Python interface for ROL in Firedrake, which we have supplemented it with additional checkpointing functionality. This allows us to checkpoint intermediate optimisation states and variables, including those related to step lengths or Hessian estimates, and subsequently restart the optimisation procedure without loss of performance. This, in particular, is relevant on modern highperformance computing facilities with strict wall time limits.
The distinguishing factor between different optimisation algorithms (in ROL or elsewhere) is the strategy used to move from one iterate to the next. Broadly speaking, there are two strategies for moving from the current iterate, x_{k}, to a new one, x_{k+1}: linesearch and trustregion strategies (Nocedal and Wright, 1999). A fundamental distinction between linesearch and trustregion methods lies in the sequence of selecting the direction and distance (Nocedal and Wright, 2006). In linesearch methods, the direction is initially fixed, and an appropriate step length is subsequently determined. Conversely, trustregion methods commence by establishing an initial size for a trusted region (hence the name) and then simultaneously constraining the direction and step to achieve sufficient amount of improvement within this trusted region. The size of this trusted region around the current iterate is determined according to a model, m, that approximates the region around the current iterate, x_{k}, according to a quadratic approximation. At each iteration, the accuracy of this model then is assessed based on its agreement with the actual changes in the function, f. If the new value, f(x_{k}+p_{k}), is greater than the current value of f(x_{k}), m is not a good approximation of the objective functional around x_{k} and the size of the trustedregion measured by the trustregion radius is therefore reduced to improve the applicability of the model. By bounding the calculations to a trusted region where model m is applicable, trustregion methods prohibit overly aggressive steps, which make them suitable for handling negative curvature situations (nonconvexity optimisation problems) more gracefully compared to linesearch methods. Nonetheless, faster convergence can be achieved by expanding the trust region in the case of a predictive model, ensuring robust minimisation of the objective functional.
In this study, we employ the trustregion method of Lin and Moré (1999) implemented in ROL. Lin–More is a truncated Newton method, consisting of repeated application of an iterative algorithm to approximately solve Newton's equations (Dembo and Steihaug, 1983). Lin–More can effectively handle provided bound constraints by ensuring that variables remain within their specified bounds: at each iteration, variables are classified into “active” and “inactive” sets. Variables at their bounds and not allowing descent are considered active and are fixed during the iteration. The remaining variables, which can change without violating the bounds, are inactive. The described properties render the algorithm as a robust and efficient method for solving boundconstrained optimisation problems.
Twin experiments serve as a means to illustrate the feasibility of geophysical inverse methods. In our experimental setup, we generate a synthetic reference simulation that advances forward in time, starting from a userdefined initial condition. We use this reference simulation to emulate a realEarth reconstruction scenario, where the resulting temperature field at the final time, $T(x,t={t}_{\mathrm{F}})$, and the corresponding surface velocities at all times, $u(x={x}_{\mathrm{S}},t)$, are stored for subsequent use as “observations” in reconstructing the initial state of the mantle and its evolution through space and time. These fields are used to mimic fundamental datasets in mantle reconstruction models, drawing parallels to 3D models of mantle temperature inferred from seismic tomography images, as well as surface velocities derived from plate tectonic reconstructions. We examine two sets of 2D twin experiments: (i) simulations of a single upwelling in an enclosed square domain with an isoviscous rheology and (ii) convection within an annular domain, incorporating a nonlinear viscoplastic rheology, as has previously been employed to generate platelike behaviour in mantle convection models (see Coltice et al., 2017, for a review).
3.1 A single upwelling in an enclosed square domain
3.1.1 Forward problem
We start our experiments by reconstructing the evolution of a single upwelling plume within an enclosed square computational domain (freeslip boundary conditions on all boundaries). We assume an isoviscous rheology, incompressible flow under the Boussinesq approximation assumption and a Rayleigh number of Ra=10^{6}. The model is heated from below (T_{C}=1) and cooled from above (T_{S}=0). The initial condition is generated by a Gaussian anomaly of amplitude 0.1, centred at ${x}_{\mathrm{0}}=(\mathrm{0.5},\mathrm{0.2})$, superimposed on top of an average temperature profile generated by two error functions representing the top and bottom thermal boundary layers. Starting from this initial condition, the model is run for 80 time steps of $\mathit{\delta}t=\mathrm{4}\times {\mathrm{10}}^{\mathrm{6}}$ – the time required for the temperature anomaly to form a plume that reaches the domain's top boundary.
Listing 1 shows selected lines of a GADOPT script used to generate this reference synthetic experiment. The first step, illustrated on line 1, is to import the GADOPT module (which provides access to Firedrake and associated functionality). We next need a mesh, for which we use a builtin Firedrake meshing function. The computational domain is a unit square with 150×150 elements, loaded on lines 4–6. The function spaces, within which our solutions are defined, are specified as follows:

A vector function space,
V
, is specified for the velocity field (line 9), employing a Q_{2} discretisation. 
A scalar function space,
W
, is specified for pressure (line 10), utilising a Q_{1} discretisation. 
These are combined on line 11 to create the mixed function space,
Z
, for the Stokes (velocity and pressure) system. 
A function space,
Q
, is specified for the temperature field (line 12), using a Q_{2} discretisation.
We specify functions to hold our solutions on lines 14–18. The temperature field, T
, is initialised on line 20 using a symbolic expression for the coordinates from line 18. The initialisation includes a 1D profile along the y axis (line 21) and a Gaussian anomaly, as specified above (line 22). This problem has a constant pressure nullspace, defined as Z_nullspace
on line 24, which will subsequently be passed to the solver, and PETSc will seek a solution in the space orthogonal to the provided nullspace. A checkpoint file is initiated on lines 26–27 to retain the necessary fields for the subsequent adjoint inversion. Important constants in this problem (Rayleigh Number, Ra, timestepping parameters, delta_t
and max_timesteps
) are defined on lines 28–31, with the Boussinesq approximation specified on line 29 (later passed on to the Stokes and energy systems to determine which terms are to be assembled). Boundary conditions for velocity and temperature are specified on lines 34 and 35, respectively. The latter uses integer mesh markers to tag entities of meshes, with boundaries tagged as follows: tag 1 corresponds to the plane x=0 (left), 2 to x=1 (right), "bottom"
to y=0 and "top"
to y=1.
We now solve the variational problem, with solver objects for the energy, energy_solver
, and Stokes, stokes_solver
, systems created on lines 37 and 38. For the energy system we pass in the solution field T, velocity u, the physical approximation, time step, temporal discretisation approach (i.e. implicit middle point) and boundary conditions. For the Stokes system, we pass in the solution fields z
, temperature, the physical approximation, boundary conditions and the nullspace object. The solution of the two variational problems is undertaken by PETSc. The time loop is defined on lines 42–45, with the Stokes system solved on line 43, the energy equation on line 44, and velocities (for later use as timedependent surface constraints in our adjoint inversions) checkpointed on line 45. Figure 2a to e show temporal snapshots of the reference forward simulation. We note that the final temperature field (Fig. 2e), subsequently utilised in the adjoint inversion, is also checkpointed on line 47.
3.1.2 Inverse problem
Only trivial changes are required to convert the forward problem outlined in Listing 1 into its corresponding adjoint, which are outlined in Listing 2. We first augment our imports with gadopt.inverse
(line 2): this provides access to crucial DolfinAdjoint functionalities harnessed by Firedrake, enabling overloading differentiation and taping of finiteelement operations. On line 4, we activate disc checkpointing of intermediate forward solutions, ensuring that these fields – otherwise retained in memory – are available as inputs for solving the adjoint equations. For this problem, we select the initial temperature as the control, symbolised as Tic
on line 7, using a Q_{1} function space defined on line 6. On line 8, we define the average temperature field for regularisation terms using the same Q_{1} function space. Despite utilising Q_{2} elements for the forward temperature computation, opting for a basis function with a lower polynomial degree for the control is advantageous, as it curtails computational expense during the internal optimisation algorithm's operations by reducing the number of degrees of freedom in the solution. Furthermore, using a lower polynomial degree reduces the complexity and regularisation requirements of the optimisation problem, helping to avoid overfitting of the solution (e.g. Hastie et al., 2009).
On line 10, a checkpoint file is opened. The reference final temperature field is subsequently loaded on line 11, with our guess at the initial temperature condition specified on line 12 (noting that it corresponds to the terminal temperature field of the reference simulation). We specify the control for DolfinAdjoint on line 15, followed by projection of the initial temperature condition onto T
on line 18. During execution of the time loop (lines 20–24), we solve Stokes and energy systems, after which we load velocities from the reference simulation on line 24, used when accumulating contributions to the surface velocity misfit, u_misfit
, on line 24. After the time loop, we define several components of the objective functional. Specifically, we establish the damping term and its associated normalisation factor (lines 29–31), the smoothing term and associated normalisation (lines 32–33), two normalisation terms associated with final state temperature and surface velocity misfits (lines 34–35), and the misfit associated with the final temperature field (line 36). These terms are combined on lines 38–42 as our objective
, using weights (α_{u}, α_{d} and α_{s}) specified on line 37, and are later utilised on line 44 to define the reduced functional. We note that values for α_{u}, α_{d} and α_{s} are systematically tested herein.
3.1.3 Investigating the derivative
Our initial guess for T_{IC} is set to the final “observed” temperature field T_{obs} (i.e. the terminal temperature field of the forward model). This choice is grounded in the findings of Horbach et al. (2014) and our own tests, which demonstrate that the minimisation problem possesses a strong minimum, rendering it insensitive to the initial guess. The first optimisation iteration starts from this initial guess (Fig. 3a) and runs forward in time to arrive at the first modelled terminal temperature field ${T}_{t={t}_{\mathrm{F}}}$ (Fig. 3b). Compared to the reference T_{obs} (Fig. 3c), the model is further advanced in time, with the plume tail in the lower mantle narrower and hot buoyant anomalies spread throughout the upper part of the domain, generating a cold return flow and resulting in a thinning of the thermal boundary layers.
Previously, we noted that for our reconstruction simulations, the objective functional, denoted as J, encompasses four distinct terms: (i) final temperature field, (ii) surface velocity misfit, (iii) smoothing terms and (iv) damping terms. The relevance of the three latter terms is gauged by their respective weighting factors, namely, α_{u}, α_{s} and α_{d}. When combining terms linearly in the objective functional, the superposition principle dictates that the gradient of the entire objective functional is essentially the cumulative sum of the gradients of its constituent terms in J. Consequently, it becomes instructive to visualise and validate the gradient of each individual term within the objective functional.
Figure 3d to g display the gradient fields corresponding to the four terms in the objective functional. The gradient for the final temperature misfit is presented in Fig. 3d: it implies that a better match to the final state temperature field is achievable through changes to the initial condition that include a major reduction in temperature in the domain's centre, complemented by an increase in temperature moving towards the domain's edges, particularly towards the domain's upper regions. In Fig. 3e, we illustrate the gradient of the cumulative surface velocity misfit with respect to T_{IC}: it reveals sensitivities extending to the domain's base, indicating that to better align with the “observed” surface velocities, an optimal T_{IC} should be decreased in the middle but increased towards the domain's boundaries. For the smoothing term (Fig. 3f), the highest values emerge in areas with the most abrupt changes in T_{IC}. Given our optimisation strategy works by moving towards corrections opposite to the gradient's direction, this subsequently results in iterative refinement and smoothing of the solution. For the damping term (Fig. 3g), the gradient aims to minimise fluctuations in the field in relation to the ambient temperature profile ($\overline{T}$), adjacent to the top and bottom boundaries, implying that this can be achieved by reducing boundary layer temperatures in the centre but increasing boundary layer temperatures towards the domain's sides.
3.1.4 Verification of gradients: Taylor remainder convergence test
A fundamental tool used in verification of gradients is the Taylor remainder convergence test (Farrell et al., 2013b). For the reduced functional, J(T_{IC}), defined in Eq. (6) and its derivative, $\frac{\mathrm{d}J}{\mathrm{d}{T}_{\mathrm{IC}}}$, it can be proven that
The expression on the lefthand side of Eq. (33) is termed the secondorder Taylor remainder. This term's convergence rate of O(h^{2}) serves as a strong foundation for verifying any computational implementation meant for determining $\frac{\mathrm{d}J}{\mathrm{d}{T}_{\mathrm{IC}}}$ (the adjoint code) with respect to a specific functional that computes J(T_{IC}) (the forward code). Given any arbitrary selection of h and δT_{IC}, halving the value of h should decrease the magnitude of the secondorder Taylor remainder by a factor of 4. Grounded in this theoretical prediction, we employ these socalled Taylor tests to confirm the accuracy of the determined gradients.
We conduct a secondorder Taylor remainder test for each term of the objective functional. The results are displayed in Fig. 4, where the gradient fields are calculated for random perturbations of the initial temperature field T_{IC}, with the amplitude of these perturbations successively halved ($h/\mathrm{2}$, $h/\mathrm{4}$, $h/\mathrm{8}$, ...). Notably, all four Taylor remainder tests demonstrate a convergence rate of O(2.0), extending down to floatingpoint precision, defined as the smallest positive ϵ such that 1.0+ϵ is distinguishable from 1.0. Our results demonstrate consistency with theoretical expectations, highlighting the robustness of our approach.
3.1.5 Efficiency
Another metric that can be used to assess the suitability of our framework for largescale mantle convection optimisation problems is the efficiency of derivative calculations. Given the iterative nature of our inverse problem, where derivatives are computed frequently, any efficiency gain, or the lack thereof, can have profound implications for the overall computational cost and feasibility of our automated approach. The computational efficiency can be measured by comparing the computational time of a derivative calculation to that of a forward calculation. Using this reference, a theoretical optimum is defined which measures the ratio of the time that is required to calculate one set of forward and adjoint calculations to one forward calculation. For the Stokes problem we have detailed, in which a linear rheology has been employed, this ratio is considered to be 2.0 (Naumann, 2011; Funke and Farrell, 2013). This is primarily due to the similarity of the forward and adjoint systems. For the simulations presented in this section, we achieve a ratio of 2.01, consistent with theoretical expectations, thus demonstrating the efficiency of our approach.
3.1.6 Optimisation
Executing an optimisation task with GADOPT is straightforward. Once the reduced functional is set up (see Listing 2), only a few additional lines of Python are required (see Listing 3). As we use a bounded method for our optimisation problem, we specify a set of upper and lower bounds for the algorithm on lines 3–4. Subsequently, a minimisation problem is outlined (line 6) using both the reduced functional and the designated bounds. This minimisation problem, together with the associated parameters for optimisation, are passed to the Lin–More optimisation algorithm in ROL (line 8), which is executed on line 9.
Using this framework, we perform a suite of 81 different inverse simulations that aim to find the most optimal combination of the three weightings (α_{u}, α_{d} and α_{s}) that results in the best solution for T_{IC} when compared to the reference initial temperature field. The simulations are obtained by sweeping through values in ranges of $[{\mathrm{10}}^{\mathrm{1}},{\mathrm{10}}^{\mathrm{3}}]$, $[{\mathrm{10}}^{\mathrm{2}},{\mathrm{10}}^{\mathrm{4}}]$ and $[{\mathrm{10}}^{\mathrm{1}},{\mathrm{10}}^{\mathrm{7}}]$ for α_{d}, α_{s} and α_{u}, respectively.
Figure 5 provides an overview of the outcomes from a subset (16 out of 81) of these optimisation exercises. Minimisation of the objective functional is shown in Fig. 5a, alongside two additional metrics: (i) the misfit between the reconstructed final temperature field and T_{obs}, termed the final misfit (Fig. 5b), and (ii) the misfit between the reconstructed initial condition, T_{IC}, and the reference initial condition, highlighting the quality of the reconstructed initial condition (Fig. 5c). The reduction in the metrics in all cases is reported versus the cost, which is the sum of the number of forward and adjoint calculations. A consistent pattern is observed across all reconstruction simulations for these three metrics. Firstly, the solutions are unique, as all converge to a consistent initial condition following roughly 100 iterations (a cost of 200 forward and adjoint calculations). However, the trajectory to this solution varies based on the smoothing weight, denoted by α_{s}. A significant portion of the simulations with ${\mathit{\alpha}}_{\mathrm{s}}={\mathrm{10}}^{\mathrm{1}}$ exhibit subpar performance in the initial stages due to oversmoothing, with most of the bestperforming simulations utilising ${\mathit{\alpha}}_{\mathrm{s}}={\mathrm{10}}^{\mathrm{2}}$. Nevertheless, despite differences in convergence rates, all simulations eventually converge to similar misfits in all three metrics.
Figure 6 showcases multiple iterations from the bestperforming simulation using ${\mathit{\alpha}}_{\mathrm{u}}={\mathrm{10}}^{\mathrm{2}}$, ${\mathit{\alpha}}_{\mathrm{s}}={\mathrm{10}}^{\mathrm{3}}$ and ${\mathit{\alpha}}_{\mathrm{d}}={\mathrm{10}}^{\mathrm{2}}$. As mentioned previously, the reference final condition (Fig. 6b) is used as our starting guess for the inverse simulation (Fig. 6c), which yields substantial differences in the modelled final state (Fig. 6k), as illustrated through the squared difference between reconstructed and reference temperature (Fig. 6o). Leveraging derivative information, the inverse simulation corrects the initial condition through an iterative approach. The initial conditions obtained after 20, 50 and 100 iterations are shown in Fig. 6d, e and f, respectively. These solutions reveal that most corrections occur during the initial iterations, with significant improvement in the domain's upper part by iteration 20, albeit with some remaining noise in the solution (see misfit in Fig. 6h). By iteration 100, the solution (visible in Fig. 6f and n) closely mirrors the reference initial condition, represented by misfit values that have diminished by 3 orders of magnitude (Fig. 5b and c).
Figure 7 compares the evolution of reference forward (a–e) and reconstructed simulations (f–j), with differences highlighted (k–o). Furthermore, surface normal stresses are displayed as indicators of surface dynamic topography. For the reconstructed simulation (Fig. 7f), the initial condition is well captured as a Gaussian anomaly in the domain's lower section, superimposed on a depthdependent field, similar to the reference initial condition. This initial temperature anomaly ascends, culminating at the surface after 80 time steps. The precision of the reconstruction is evident from the misfit panels, with negligible differences between reference and reconstructed simulations throughout the model's evolution (less than 0.01). This is further substantiated by a reduction, by over 4 orders of magnitude, in the objective functional for both initial and end states (Fig. 5b and c). The accuracy of reconstructed surface normal stresses is evidenced by negligible discrepancies in the associated misfit visualisation.
3.2 Convection in an annulus, incorporating a viscoplastic rheology
We next analyse a set of reconstructions which utilise a reference twin that more closely mimics Earth's geometrical and rheological characteristics. We use a 2D annular domain, generated by extruding 128 times radially from a circular manifold consisting of 512 cells. Inner and outer radii are set to 1.22 and 2.22, respectively, ensuring unit depth and maintaining a comparable ratio between surface and cosmic microwave background (CMB) radii as Earth's mantle. We set Ra=10^{7} and adopt a composite viscoplastic rheology, with effective viscosity determined via a harmonic mean, represented as
Parameters specified in Eq. (34) are listed in Table 2. As demonstrated by Davies et al. (2022), the changes necessary to transform our forward model from a square to an annular domain and from an isoviscous to a viscoplastic rheology are only minor, noting that Firedrake has already been validated for simulations of this nature (see Davies et al., 2022, and repository accompanying this paper for a comprehensive script).
To determine the reference simulation's initial condition, we commence from a starting state with smallscale spherical harmonic perturbations and advance forward in time until a quasisteady state is achieved (i.e. when basal and surface heat fluxes are roughly in balance). The resulting state (Fig. 8.1.A and 2.A), which contains downwellings that descend from the upper thermal boundary layer and upwelling plumes that rise from the lower thermal boundary layer, forms our reference initial condition. We subsequently advance forward for a duration of $t=\mathrm{200}\times (\mathrm{5}\times {\mathrm{10}}^{\mathrm{6}})$ to reach the reference final state shown in Fig. 8.1.G. While the nondimensional time of 10^{−3} translates into 256 Myr of convection, the domain velocities suggest an equivalent Earthlike time of ≈ 100 Myr (due to a marginally reduced Ra relative to estimates for Earth's mantle). The sequence presented in Fig. 8.1.A–G and 2.A–G trace the temporal development of reference temperature and viscosity fields, respectively. The viscosity field spans approximately 3 orders of magnitude, extending from low asthenospheric values of ≈1.0, rising to 140 within the cooler segments of the lower mantle, and decreasing to 0.4 in locations of high strain rates at surface convergent boundaries.
3.2.1 Verification of gradients
As with the previous Cartesian case examined, reconstruction simulations are conducted with an objective functional encompassing the misfit component related to the terminal temperature field, accumulative surface velocity misfits and regularisation terms. For consistency with the previous case, we first confirm the accuracy of the calculated gradient fields corresponding to each component in the objective functional by performing secondorder Taylor remainder convergence tests. To further analyse the robustness of our results against solver tolerances, specifically concerning the final temperature field and surface velocities, we conduct a set of two additional convergence tests. The first set utilises a Newton (SNES) relative solver tolerance of 10^{−10} and is depicted using solid colours in Fig. 9, while the second set, using a relative tolerance of 10^{−5}, is represented by semitransparent colours. Our findings indicate that with tighter tolerances, results adhere to an O^{−2} trend consistent with theoretical predictions, confirming accuracy down to the smallest floatingpoint representation. Conversely, when the tolerance is relaxed to 10^{−5}, this behaviour holds only up to perturbations of similar magnitude ($h\approx {\mathrm{10}}^{\mathrm{5}}$), demonstrating a divergence from the expected O(h^{2}) trend for smaller perturbations.
3.2.2 Efficiency
In the experiments detailed in this section, an efficiency of 1.45 is achieved. This exceeds the previously outlined theoretical efficiency of 2.0 for the isoviscous Stokes model in Sect. 3.1.5 due to a major difference in the forward and adjoint momentum equations: while the forward momentum equation employs a nonlinear viscoplastic rheology and requires multiple linear Newton solves per time step, the adjoint momentum remains linear. The forward model is therefore more computationally demanding, explaining the improved efficiency ratio.
3.2.3 Optimisation
As highlighted with our previous example, absolute and relative variations in weighting of different objective functional components can generate solutions with distinct properties, some of which provide an improved match to the reference simulation. Accordingly, it is vital to assess the consequences of these distinct weight combinations for the case considered here. To address this, we have undertaken 21 simulations, adjusting the parameters α_{u}, α_{s} and α_{d} within the intervals [0.05,0.1], [0.01,0.1] and [0.01,0.1], respectively, with values motivated by the results of our previous set of simulations. The collective convergence of all 21 simulations is illustrated in Fig. 10.
Our objective functional (Fig. 10a) has initial values that range between $\sim \mathrm{1}\times {\mathrm{10}}^{\mathrm{1}}$ and $\mathrm{2}\times {\mathrm{10}}^{\mathrm{1}}$. We consistently observe a reduction of an order of magnitude or more in this measure. Notably, the steepest decline is seen within the initial 50 iterations (cost ∼100). The simulations denoted as X and XI exhibit the largest reduction, with a consistent reduction trajectory even when approaching iteration 200 (cost ∼400).
The final temperature misfit (Fig. 10b) exhibits different trends to the objective functional. In the initial iterations, the largest reductions in final misfit are observed for simulations I and II, with simulation XII trailing behind. Despite displaying a lower misfit reduction up until iterations 140 and 180 (cost ∼280–360), simulations X and XI eventually display a similar misfit by iteration 200 (cost ∼400). When we turn to reduction in the initial misfit (Fig. 10c), an entirely different trend comes to light: simulations I and II sustain their reduction until around iteration 150 (cost ∼300), after which they plateau. Conversely, many other simulations plateau at misfit values that are, on average, twice as high.
In our analysis, the reconstruction quality is predominantly governed by three key weighting parameters: surface velocity misfit (α_{u}), smoothing (α_{s}) and damping (α_{d}). These parameters calibrate the significance of different objective terms in relation to the final temperature misfit term. Thus, the primary metric to assess a reconstruction is the simultaneous reduction in misfits for both initial and final states. Incorporating the misfit associated with surface velocities enhances the quality of the reconstructions, noting that higher weightings of the surface velocities require higher values of smoothing. Among the weighting parameters, α_{s} is shown to have the highest effect in convergence outcomes: higher values for α_{s} lead to overregularisation, thereby limiting the role of sensitivity information tied to misfit terms in the solution. α_{d} offers a more varied range of effective values, which aligns with its role confined to thermal boundary layers and, consequently, its lesser impact on the overall numerical domain. Therefore, Case I, characterised by $({\mathit{\alpha}}_{\mathrm{u}},{\mathit{\alpha}}_{\mathrm{s}},{\mathit{\alpha}}_{\mathrm{d}})=(\mathrm{1}\times {\mathrm{10}}^{\mathrm{1}},\mathrm{1}\times {\mathrm{10}}^{\mathrm{1}},\mathrm{1}\times {\mathrm{10}}^{\mathrm{2}})$, emerges as the optimal set of weighting parameters for our reconstructions.
In contrast to the reduction of 3 orders of magnitude in the misfit functions from the isoviscous experiment, our nonlinear experiment exhibits a modest reduction of O(1). The key factor contributing to this is the extended total simulation time for the nonlinear experiment. Representing a far longer reconstruction period, this implies more information loss during the inversion process. Additionally, while the isoviscous experiment focused on a single temperature anomaly, this simulation tackles wholemantle convection, with numerous and occasionally complex and highly timedependent anomalies, reflecting the more intricate viscoplastic rheology. Nonetheless, this reduction of an order of magnitude in misfit translates into a satisfactory reconstruction of the initial condition, demonstrating the efficacy and robustness of the numerical approaches employed.
This is confirmed by visual inspection of the best reconstruction model, with temperature, viscosity and surface normal stresses presented and compared to the reference case in Fig. 11 (marked case I in Fig. 10, using values of ${\mathit{\alpha}}_{\mathrm{u}}={\mathrm{10}}^{\mathrm{1}}$, ${\mathit{\alpha}}_{\mathrm{s}}={\mathrm{10}}^{\mathrm{1}}$ and ${\mathit{\alpha}}_{\mathrm{d}}={\mathrm{10}}^{\mathrm{2}}$). At t=0, the reconstructed temperature field exhibits upwelling and downwelling features that are reconstructed in the correct locations, although temperature anomalies are generally smoother than those of the reference case (Fig. 11.1.A). The corresponding viscosity field mirrors this smoothness despite capturing weaker convergence zones at the surface. Despite these variations, the spatial misfit is generally below 10^{−2} (Fig. 11.5.A) with errors over 0.05 restricted to sharper features that are inevitably smoothed in the reconstruction process. This smoothness is also reflected in recovered surface normal stresses, with highs and lows correctly positioned, albeit at longer wavelengths than the reference case.
Given the application of a freeslip boundary condition at the surface of this simulation, a prominent outcome of this set of experiments is the emergence of sharp subducting slabs and weak zones at the top boundary as the simulation evolves (Fig. 11.3.B and 4.B). The marked decrease in misfit over time (Fig. 11.5.B) confirms the development of more detailed convective patterns. As the simulation evolves, reconstructed plume features become more precise, and reconstructed surface normal stresses more closely resemble the reference case. This enhancement progresses up to the final time step, where the reconstructed thermal field and surface normal stresses are indistinguishable from those in the reference simulation, reflected via an order of magnitude reduction in the spatial misfit field.
Robust reconstructions of the spatial and temporal evolution of Earth's mantle and its diverse surface expressions are critical to scientific progress across the geosciences. It requires the construction of a digital twin: a vital instrument for analysing and revealing the complex interplay between the mantle and Earth's other systems. To this end, the adjoint method provides the necessary means for obtaining and analysing model sensitivities with respect to earlier mantle states. A burgeoning number of studies exploiting this methodology for reconstructions of mantle convection have emerged in recent years (e.g. Bunge et al., 2003; IsmailZadeh et al., 2004; Liu et al., 2010; Spasojevic et al., 2009; Li et al., 2017; Price and Davies, 2018; Ghelichkhan et al., 2021). Nevertheless, the derivation, implementation and validation of adjoint systems for coupled, nonlinear, timedependent systems remain notoriously difficult. It is due to these difficulties that existing applications of the geodynamic adjoint method often include major simplifications, either incorporating an oversimplified treatment of mantle rheology (e.g. Colli et al., 2018; Ghelichkhan et al., 2021), neglecting certain (coupling) terms in the adjoint equations (e.g. IsmailZadeh et al., 2004) or both (e.g. Liu and Gurnis, 2010): they are therefore likely limited in their applicability to realistic Earth scenarios. In this study, we leverage the latest advances in scientific computing to overcome these limitations and develop GADOPT, an opensource numerical framework for geoscientific adjoint reconstructions, developed in full compliance with FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al., 2016).
GADOPT is underpinned by three primary software elements. The first is Firedrake (Ham et al., 2023), an automated system for solving partial differential equations using the finiteelement method. In our previous work (Davies et al., 2022) we examined the applicability of Firedrake to geodynamical simulations, confirming its accuracy, efficiency, extensibility and parallel scalability through comprehensive benchmarks and stateoftheart mantle convection simulations. The second element is DolfinAdjoint (Farrell et al., 2013a; Mitusch et al., 2019), a system that automatically generates the discrete adjoint from forward models designed in Firedrake. DolfinAdjoint elevates the conventional abstraction of automatic differentiation from individual floatingpoint operations to complete systems of differential equations, leveraging the highlevel mathematical abstraction of finiteelement problems and their symbolic representation in UFL (Alnæs et al., 2014). The adjoint systems derived by DolfinAdjoint are UFL expressions and valid Firedrake inputs. Therefore, they inherit the parallel support native to the forward model, which results in optimal computational efficiency. The third element is the Rapid Optimisation Library, ROL, a Trilinos package for largescale optimisation problems (The ROL Project Team, 2022), enhanced herein with intraoptimisation checkpointing functionality.
We have demonstrated the applicability of GADOPT for timedependent geodynamic reconstructions herein. The objective functional utilised in our reconstructions is composed of two distinct misfit components. The first is a term that quantifies the misfit corresponding to the observed final state temperature field, analogous to the presentday temperature field within Earth's mantle as obtained through a combination of mantle mineralogical models (e.g. Stixrude and LithgowBertelloni, 2011; Chust et al., 2017) and seismic imaging (e.g. Rawlinson et al., 2010; French and Romanowicz, 2014; Simmons et al., 2015; Bozdağ et al., 2016; Koelemeijer et al., 2016; Fichtner et al., 2018). The second term corresponds to observed surface velocities, accessible through plate tectonic reconstruction models (e.g. Müller et al., 2019). Additionally, smoothing and damping terms have been incorporated to enforce regularity in our solutions.
Our study analysed two sets of reconstructions of systematically increasing complexity. We first examined the evolution of a single ascending hot anomaly in an enclosed isoviscous square domain. By taking advantage of the simplicity of the geometry and rheological properties, we were able to deliver an indepth examination of the gradients for each term, including a parameterspace search to ascertain optimal weighting parameters. Our results reveal a general convergence of the solutions, notwithstanding substantial variations in convergence rates subject to the weightings. Additionally, although not detailed in this paper, we have explored a number of different optimisation methods and parameters. Through this comprehensive analysis, we are confident that the problem possesses a stable solution that can be found through an appropriate combination of weighting parameters. The second set of reconstruction experiments explored convection with a stress, depth and temperaturedependent rheology at the convective vigour of Earth's mantle, demonstrating the feasibility of reconstruction studies for Earth's mantle with a nonlinear rheology. The weightings selected for this series of experiments were broadly consistent with the first set. Given the success at reproducing surface velocities and normal stresses, our findings suggest that reconstruction models of Earth's mantle can serve as a powerful means for probing changes in the landscape at Earth's surface induced by mantle dynamics (e.g. Friedrich et al., 2018; Hoggard et al., 2021; Davies et al., 2023).
In both experimental sets, we assessed the numerical efficiency of our framework by evaluating the cost ratio between forward and adjoint calculations. In the first set, our results produced a ratio of 2.01, aligning with the theoretical efficiency of 2.0 (e.g. Naumann, 2011). In the second set, where we solved the nonlinear forward equations, we observed a ratio of 1.45. This efficiency is attributed to the linearised nature of the adjoint method: even when applying nonlinear rheologies in the forward equations, the adjoint equations remain linear. We also conducted secondorder Taylor remainder convergence tests for each of the objective functional terms to validate the adjoint calculations. We note our convergence is accurate down to floatingpoint precision, consistent with results presented by Coltice et al. (2023). Our assessments demonstrate the accuracy of the derivative calculations (Figs. 4 and 9). These Taylor remainder convergence tests provide a robust basis for future validations of geodynamic adjoint frameworks.
Our experiments incorporate two significant simplifications relative to realisticEarth scenarios, which were necessary to facilitate the number of reconstruction simulations analysed: (i) the use of a 2D computational domain and (ii) the application of the Boussinesq approximation instead of more pertinent approximations such as anelasticliquid approximations (e.g. Jarvis and McKenzie, 1980). Nevertheless, the composable nature of GADOPT should alleviate any concerns regarding the extensibility of our framework to these more realistic problem sets. Our prior work in Davies et al. (2022) demonstrates the flexible nature of our approach: for example, transitioning our 2D annulus simulations to a 3D spherical shell domain can be achieved via changes to only a few lines of Python. The application of GADOPT for reconstructing Earth's mantle evolution using nonlinear rheologies and compressibility will be the subject of future investigations, although we note that the forward modelling approach has already been developed (Davies et al., 2022). Moreover, our framework is extendable to various other problems in geodynamics. These include utilising principal stress directions (e.g. Reuber et al., 2020), surface plate velocities (e.g. Ratnaswamy et al., 2015; Bocher et al., 2018) and/or residual depth measurements (e.g. Panasyuk and Hager, 2000; Spasojevic et al., 2009) to explore the mantle's rheological properties and to study the viscoelastic adjustment of Earth's surface in response to the melting of Earth's polar ice sheets (AlAttar and Tromp, 2014; Martinec et al., 2015, e.g.) and postseismic deformation following significant subduction earthquakes (e.g. Sabadini and Vermeersen, 1997).
Reconstructing past mantle states is fraught with substantial theoretical and practical challenges. In this study, we targeted some of these theoretical and practical hurdles by introducing GADOPT. Nevertheless, significant obstacles exist that are beyond the scope of this work. We predicated our work on zero uncertainty in our reference fields (i.e. the presentday temperature field and past surface velocities), thereby committing what is known as the “inverse crime” (Colton et al., 1998), a term used to describe the situation when the code employed in the inversions is also utilised to generate reference simulations. Estimation of the presentday mantle state from seismic imaging and the assumptions regarding the thermal and compositional interpretation of seismic heterogeneity are both fraught with considerable uncertainty (e.g. Styles et al., 2011; Mosca et al., 2012; Zaroli et al., 2013; Davies et al., 2015). Furthermore, plate tectonic reconstructions can be uncertain, particularly further back in time and within the Pacific region (e.g. Shephard et al., 2012; Williams et al., 2015; Tetley et al., 2019). However, the existence of seafloor spreading isochrons up to approximately 125 Ma for all major plates provides confidence in modelling relative plate movements in more recent geological periods (Seton et al., 2020). An uncertainty impact study carried out by Colli et al. (2020) posits that the presence of uncertainties causes reconstructed and reference flow histories to diverge exponentially back in time, with unrealistic structures materialising within and adjacent to thermal boundary layers. To minimise these impacts, Colli et al. (2020) advocate for terminating the optimisation after a few iterations. Here, however, the inclusion of regularisation terms in the objective functional mitigates these impacts, effectively constraining the reconstruction to a smoother solution. This becomes particularly advantageous in realEarth applications where observational constraints become sparser and more uncertain back in time. Without a smoothing term, the solution to the initial condition can contain highfrequency noise, which would diffuse over the course of the simulation. Smoothing therefore drives the solution towards a longerwavelength initial state whilst maintaining sensitivity to shorter wavelength information recorded in seismic tomography images.
Moreover, by formally introducing past surface velocities into the objective functional, we infuse sensitivity information that propagates further back in time, refining the flow trajectory to improve the accuracy of reconstructions in the upper thermal boundary layer region. This sets our approach apart from the method used in previous adjoint reconstruction simulations (e.g. Vynnytska and Bunge, 2015; Zhou and Liu, 2017; Ghelichkhan et al., 2021), where the sequentialintime nature of plate velocity assimilation can improve flow trajectories only forward in time. Despite this, it is crucial to recognise that observed surface velocities are the result of a complex force balance, including contributions from within Earth's mantle and at Earth's surface, including those at plate boundaries, such as the genesis and destruction of orogenies, which have been associated with rapid changes in plate velocities (Colli et al., 2014; Iaffaldano and Bunge, 2015). These forces are not captured in current stateoftheart spherical mantle convection models. Although our inversions utilise absolute plate velocities – thereby accounting for the majority of the forces driving surface movements – it is essential to emphasise that constraints on Earth's presentday thermal field must be prioritised as the primary data source for such inversions.
Our study lays the foundations for exploring several unresolved geodynamical questions. Previous research has shown that prescribing plate tectonic reconstruction velocities as a top boundary condition improves the precision of mantle reconstruction models and diminishes noise (e.g. Colli et al., 2015; Taiwo et al., 2023). Our framework formally incorporates these constraints through misfit terms, and future studies should compare this with other methods to find the most efficient way to integrate these valuable data. Earlier research advocates solving the reduced adjoint system, effectively considering velocities as insensitive to initial conditions (e.g. IsmailZadeh et al., 2004; Liu et al., 2008). The secondorder Taylor remainder convergence tests, examined herein, provides a robust foundation for evaluating the accuracy of such simplifications. Furthermore, our framework sets the stage for including hitherto unused observations within our inversions, such as geochemical constraints on mantle temperature and pressure (Ball et al., 2021). This stems from the design principle of composable abstractions in the software packages used in GADOPT, ensuring all components' modularity, interoperability, reusability, scalability and maintainability. Specifically, Firedrake emphasises a clear separation between using the finiteelement method and implementing it. DolfinAdjoint automates the derivation and computation of the adjoint systems using highlevel symbolic language, ensuring the same advanced strategies that are applied for the forward calculation are utilised in the adjoints. Finally, ROL offers largescale optimisation algorithms that seamlessly integrate with Firedrake and DolfinAdjoint. Their integration through GADOPT is a groundbreaking development that opens up adjoint problems to a new class of user and developer.
Transitioning from idealised forward models to datadriven simulations necessitates an inverse approach that rigorously incorporates observational constraints and their uncertainties. The adjoint method has emerged as a key technique for optimising unknown model parameters against observational data (e.g. Bunge et al., 2003). In this context, we introduce the Geoscientific ADjoint Optimisation PlaTform (GADOPT), which capitalises on cuttingedge developments in computational sciences, particularly through innovations in Firedrake (e.g. Rathgeber et al., 2016; Davies et al., 2022), DolfinAdjoint (Farrell et al., 2013a; Mitusch et al., 2019) and the Rapid Optimisation Library (ROL) (The ROL Project Team, 2022).
Our investigation, validated by two distinct sets of twin experiments, demonstrates GADOPT's efficacy in deducing the initial conditions for mantle flow and tracing its progression through space and time. These synthetic experiments leverage misfit terms to integrate both contemporary constraints across the computational domain and temporal constraints at Earth's surface. Additionally, we explored regularisation techniques to modulate the amplitude and complexity within the optimal solution. Notwithstanding the simplifications made (e.g. neglecting compressibility and 3D geometry), GADOPT's architecture allows for straightforward adaptation to more complex scenarios with minimal modifications to the Firedrake forward model (as evidenced by Davies et al., 2022), alongside automated adjoint derivation and computation via DolfinAdjoint. The employment of the secondorder Taylor remainder convergence test further corroborates our methodology, establishing a precedent for the advancement of geodynamic adjoint frameworks and facilitating prompt validation of adjoint models for more complex analyses.
Historically, geoscientific modelling frameworks were tailored to specific equations, which limited their application across disciplines. The current progress in adjointbased techniques, crucial for data assimilation, sensitivity analysis and optimisation, has markedly benefited meteorology and oceanography, though similar progress in other fields has been hindered by derivation and implementation challenges. GADOPT's modular design seeks to bridge these divides. Its components are designed for ease of assembly and reusability, promoting a culture of modular, interoperable, scalable and maintainable methods and frameworks. This philosophy ensures that GADOPT can be readily adapted to a broad spectrum of geoscientific research areas.
For the specific components of GADOPT, including the full scripts of the simulations used in this paper, see https://doi.org/10.5281/zenodo.10050733 (Gibson et al., 2023). For the specific components of the Firedrake project used in this paper, see https://doi.org/10.5281/zenodo.10047031 (firedrakezenodo, 2023).
SG and DRD conceived this study, with all the authors having significant input on the design, development and validation of the examples and cases presented. All authors contributed towards writing the manuscript.
At least one of the (co)authors is a member of the editorial board of Geoscientific Model Development. The peerreview process was guided by an independent editor, and the authors also have no other competing interests to declare.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
Numerical simulations were undertaken on the Gadi supercomputer at the National Computational Infrastructure (NCI) in Canberra, Australia, which is supported by the Australian Commonwealth Government. The authors are grateful to the entire Firedrake, DolfinAdjoint and ROL development teams for support and advice at various points of this research. We are also grateful to Georg Reuber, Nicolas Coltice and the anonymous reviewer: their careful and constructive feedback has significantly improved the quality of this contribution.
This research has been supported by the Australian Research Data Commons (ARDC), AuScope, Geosciences Australia and the National Computational Infrastructure (NCI) under the GADOPT platform (grant no. PL031). It was also supported by the Australian Research Council (grant nos. DP170100058 and DP220100173) and the Engineering and Physical Sciences Research Council (grant no. EP/R029423/1).
This paper was edited by Boris Kaus and reviewed by Georg Reuber, Nicolas Coltice, and one anonymous referee.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: LargeScale Machine Learning on Heterogeneous Systems, https://www.tensorflow.org/ (last access: 30 June 2024), 2015. a
AlAttar, D. and Tromp, J.: Sensitivity kernels for viscoelastic loading based on adjoint methods, Geophys. J. Int., 196, 34–77, 2014. a
Alnæs, M. S., Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N.: Unified form language: A domainspecific language for weak formulations of partial differential equations, ACM T. Math. Software, 40, 1–37, https://doi.org/10.1145/2566630, 2014. a, b
Balay, S., Gropp, W. D., McInnes, L. C., and Smith, B. F.: Efficient Management of Parallelism in Object Oriented Numerical Software Libraries, in: Modern Software Tools in Scientific Computing, edited by: Arge, E., Bruaset, A. M., and Langtangen, H. P., 163–202, Birkhäuser Press, 1997. a
Balay, S., Abhyankar, S., Adams, M. F., Benson, S., Brown, J., Brune, P., Buschelman, K., Constantinescu, E. M., Dalcin, L., Dener, A., Eijkhout, V., Faibussowitsch, J., Gropp, W. D., Hapla, V., Isaac, T., Jolivet, P., Karpeev, D., Kaushik, D., Knepley, M. G., Kong, F., Kruger, S., May, D. A., McInnes, L. C., Mills, R. T., Mitchell, L., Munson, T., Roman, J. E., Rupp, K., Sanan, P., Sarich, J., Smith, B. F., Zampini, S., Zhang, H., Zhang, H., and Zhang, J.: PETSc Web page, https://petsc.org/ (last access: 30 June 2024), 2023. a
Ball, P., Czarnota, K., White, N., Klöcking, M., and Davies, D. R.: Thermal structure of eastern Australia's upper mantle and its relationship to Cenozoic volcanic activity and dynamic topography, Geochem. Geophys. Geosyst., 22, e2021GC009717, https://doi.org/10.1029/2021gc009717, 2021. a
Bauer, S., Huber, M., Ghelichkhan, S., Mohr, M., Rüde, U., and Wohlmuth, B.: Largescale simulation of mantle convection based on a new matrixfree approach, J. Comput. Sci., 31, 60–76, 2019. a
Bauer, S., Bunge, H.P., Drzisga, D., Ghelichkhan, S., Huber, M., Kohl, N., Mohr, M., Rüde, U., Thönnes, D., and Wohlmuth, B. I.: TerraNeo–Mantle Convection Beyond a Trillion Degrees of Freedom, in: Software for Exascale Computing – SPPEXA 2016–2019, Lecture Notes in Computational Science and Engineering, edited by: Bungartz, H J., Reiz, S., Uekermann, B., Neumann, P., and Nagel, W., vol 136, Springer, Cham. https://doi.org/10.1007/9783030479565_19, 2020. a
Bocher, M., Coltice, N., Fournier, A., and Tackley, P. J.: A sequential data assimilation approach for the joint reconstruction of mantle convection and surface tectonics, Geophys. J. Int., 204, 200–214, https://doi.org/10.1093/gji/ggv427, 2016. a
Bocher, M., Fournier, A., and Coltice, N.: Ensemble Kalman filter for the reconstruction of the Earth's mantle circulation, Nonlin. Processes Geophys., 25, 99–123, https://doi.org/10.5194/npg25992018, 2018. a, b
Bower, D. J., Gurnis, M., and Seton, M.: Lower mantle structure from paleogeographically constrained dynamic Earth models, Geochem. Geophys. Geosyst., 14, 44–63, https://doi.org/10.1029/2012GC004267, 2013. a
Bozdağ, E., Peter, D., Lefebvre, M., Komatitsch, D., Tromp, J., Hill, J., Podhorszki, N., and Pugmire, D.: Global adjoint tomography: firstgeneration model, Geophys. J. Int., 207, 1739–1766, https://doi.org/10.1093/gji/ggw356, 2016. a
Bunge, H.P. and Grand, S. P.: Mesozoic platemotion history below the northeast Pacific Ocean from seismic images of the subducted Farallon slab, Nature, 405, 337–340, 2000. a
Bunge, H.P., Richards, M., and Baumgardner, J.: Mantle–circulation models with sequential data assimilation: inferring present–day mantle structure from plate–motion histories, Philos. T. Roy. Soc. Lond. A, 360, 2545–2567, https://doi.org/10.1098/rsta.2002.1080, 2002. a
Bunge, H.P., Hagelberg, C. R., and Travis, B. J.: Mantle circulation models with variational data assimilation: inferring past mantle flow and structure from plate motion histories and seismic tomography, Geophys. J. Int., 152, 280–301, https://doi.org/10.1046/j.1365246X.2003.01823.x, 2003. a, b, c, d, e, f
Carrassi, A., Bocquet, M., Bertino, L., and Evensen, G.: Data assimilation in the geosciences: An overview of methods, issues, and perspectives, WIREs Clim. Change, 9, e535, https://doi.org/10.1002/wcc.535, 2018. a
Chust, T. C., SteinleNeumann, G., Dolejš, D., Schuberth, B. S. A., and Bunge, H.P.: MMAEoS: A Computational Framework for Mineralogical Thermodynamics, J. Geophys. Res.Sol. Ea., 122, 9881–9920, https://doi.org/10.1002/2017JB014501, 2017. a
Cloetingh, S. and Haq, B. U.: Inherited landscapes and sea level change, Science, 347, 1258375, https://doi.org/10.1126/science.1258375, 2015. a
Colli, L., Stotz, I., Bunge, H.P., Smethurst, M., Clark, S. R., Iaffaldano, G., Tassara, A., Guillocheau, F., and Bianchi, M. C.: Rapid South Atlantic spreading changes and coeval vertical motion in surrounding continents: Evidence for temporal changes of pressuredriven upper mantle flow, Tectonics, 33, 1304–1321, https://doi.org/10.1002/2014TC003612, 2014. a
Colli, L., Bunge, H.P., and Schuberth, B. S. A.: On retrodictions of global mantle flow with assimilated surface velocities, Geophys. Res. Lett., 42, 8341–8348, https://doi.org/10.1002/2015GL066001, 2015. a
Colli, L., Ghelichkhan, S., Bunge, H.P., and Oeser, J.: Retrodictions of Mid Paleogene mantle flow and dynamic topography in the Atlantic region from compressible high resolution adjoint mantle convection models: Sensitivity to deep mantle viscosity and tomographic input model, Gondwana Res., 53, 252–272, https://doi.org/10.1016/j.gr.2017.04.027, 2018. a, b, c, d, e
Colli, L., Bunge, H., and Oeser, J.: Impact of model inconsistencies on reconstructions of past mantle flow obtained using the adjoint method, Geophys. J. Int., 221, 617–639, https://doi.org/10.1093/gji/ggaa023, 2020. a, b
Coltice, N., Gérault, M., and Ulvrová, M.: A mantle convection perspective on global tectonics, EarthSci. Rev., 165, 120–150, https://doi.org/10.1016/j.earscirev.2016.11.006, 2017. a, b
Coltice, N., Blessing, S., Giering, R., and Tackley, P.: Sensitivity Analysis of Global Kinematics on Mantle Structure Using Automatically Generated Adjoint Thermochemical Convection Codes, Earth ArXiv, https://doi.org/10.31223/X5N09Q, 2023. a
Colton, D. L., Kress, R., and Kress, R.: Inverse acoustic and electromagnetic scattering theory, vol. 93, Springer, 1998. a
Crawford, O., AlAttar, D., Tromp, J., Mitrovica, J. X., Austermann, J., and Lau, H. C.: Quantifying the sensitivity of postglacial sea level change to laterally varying viscosity, Geophys. J. Int., 214, 1324–1363, https://doi.org/10.1093/gji/ggy184, 2018. a
Dalcin, L. D., Paz, R. R., Kler, P. A., and Cosimo, A.: Parallel distributed computing using Python, Adv. Water Resour., 34, 1124–1139, https://doi.org/10.1016/j.advwatres.2011.04.013, 2011. a
Dannberg, J. and Gassmöller, R.: Chemical trends in ocean islands explained by plume–slab interaction, P. Natl. Acad. Sci. USA, 115, 4351–4356, https://doi.org/10.1073/pnas.1714125115, 2018. a
Davies, D. R. and Davies, J. H.: Thermallydriven mantle plumes reconcile multiple hotspot observations, Earth Planet. Sc. Lett., 278, 50–54, 2009. a
Davies, D. R., Goes, S., Davies, J. H., Schuberth, B. S. A., Bunge, H.P., and Ritsema, J.: Reconciling dynamic and seismic models of Earth's lower mantle: The dominant role of thermal heterogeneity, Earth Planet. Sc. Lett., 353–354, 253–269, https://doi.org/10.1016/j.epsl.2012.08.016, 2012. a
Davies, D. R., Goes, S., and Lau, H. C. P.: Thermally Dominated Deep Mantle LLSVPs: A Review, pp. 441–477, Springer International Publishing, Cham, ISBN 9783319156279, https://doi.org/10.1007/9783319156279_14, 2015. a
Davies, D. R., Kramer, S. C., Ghelichkhan, S., and Gibson, A.: Towards automatic finiteelement methods for geodynamics via Firedrake, Geosci. Model Dev., 15, 5127–5166, https://doi.org/10.5194/gmd1551272022, 2022. a, b, c, d, e, f, g, h, i, j, k
Davies, D. R., Ghelichkhan, S., Hoggard, M., Valentine, A. P., and Richards, F. D.: Observations and models of dynamic topography: Current status and future directions, Dynamics of Plate Tectonics and Mantle Convection, 2023, 223–269, https://doi.org/10.1016/B9780323857338.000172, 2023. a, b, c, d
Davies, G. F. and Richards, M. A.: Mantle Convection, J. Geol., 100, 151–206, https://doi.org/10.1086/629582, 1992. a
Dembo, R. S. and Steihaug, T.: TruncatedNewton algorithms for largescale unconstrained optimization, Math. Program., 26, 190–212, https://doi.org/10.1007/BF02592055, 1983. a
Farrell, P. E., Ham, D. A., Funke, S. W., and Rognes, M. E.: Automated derivation of the adjoint of highlevel transient finite element programs, SIAM J. Sci. Comput., 35, 1–27, https://doi.org/10.1137/120873558, 2013a. a, b, c
Farrell, P. E., Ham, D. A., Funke, S. W., and Rognes, M. E.: Automated derivation of the adjoint of highlevel transient finite element programs, SIAM J. Sci. Comput., 35, C369–C393, 2013b. a
Fichtner, A., van Herwaarden, D.P., Afanasiev, M., Simutė, S., Krischer, L., ÇubukSabuncu, Y., Taymaz, T., Colli, L., Saygin, E., Villaseñor, A., Trampert, J., Cupillard, P., Bunge, H. P., and Igel, H.: The collaborative seismic earth model: generation 1, Geophys. Res. Lett., 45, 4007–4016, https://doi.org/10.1029/2018GL077338, 2018. a
firedrakezenodo: Software used in `Software for gadopt v2.1.0' (Firedrake_20231027.0), Zenodo [code], https://doi.org/10.5281/zenodo.10047031, 2023. a
French, S. W. and Romanowicz, B. A.: Wholemantle radially anisotropic shear velocity structure from spectralelement waveform tomography, Geophys. J. Int., 199, 1303–1327, https://doi.org/10.1093/gji/ggu334, 2014. a
Friedrich, A. M., Bunge, H.P., Rieger, S. M., Colli, L., Ghelichkhan, S., and Nerlich, R.: Stratigraphic framework for the plume mode of mantle convection and the analysis of interregional unconformities on geological maps, Gondwana Res., 53, 159–188, https://doi.org/10.1016/j.gr.2017.06.003, 2018. a
Funke, S. W. and Farrell, P. E.: A framework for automated PDEconstrained optimisation, arXiv [preprint], arXiv:1302.3894, 2013. a
Ghelichkhan, S. and Bunge, H.P.: The compressible adjoint equations in geodynamics: Derivation and numerical assessment, GEMInternational J. Geomath., 7, 1–30, https://doi.org/10.1007/s1313701600805, 2016. a
Ghelichkhan, S. and Bunge, H.P.: The adjoint equations for thermochemical compressible mantle convection: derivation and verification by twin experiments, P. Roy. Soc. A, 474, 20180329, https://doi.org/10.1098/rspa.2018.0329, 2018. a
Ghelichkhan, S., Murböck, M., Colli, L., Pail, R., and Bunge, H.P.: On the observability of epeirogenic movement in current and future gravity missions, Gondwana Res., 53, 273–284, https://doi.org/10.1016/j.gr.2017.04.016, 2018. a
Ghelichkhan, S., Fuentes, J. J., Hoggard, M. J., Richards, F. D., and Mitrovica, J. X.: The precession constant and its longterm variation, Icarus, 358, 114172, https://doi.org/10.1016/j.icarus.2020.114172, 2020. a
Ghelichkhan, S., Bunge, H.P., and Oeser, J.: Global mantle flow retrodictions for the early Cenozoic using an adjoint method: evolving dynamic topographies, deep mantle structures, flow trajectories and sublithospheric stresses, Geophys. J. Int., 226, 1432–1460, https://doi.org/10.1093/gji/ggab108, 2021. a, b, c, d, e, f, g
Gibson, A., Davies, R., Kramer, S., Ghelichkhan, S., Duvernay, T., and Turner, R.: gadopt/gadopt: v2.1.0 (v2.1.0), Zenodo [code], https://doi.org/10.5281/zenodo.10050733, 2023. a
Giering, R. and Kaminski, T.: Recipes for adjoint code construction, ACM T. Math. Software, 24, 437–474, https://doi.org/10.1145/293686.293695, 1998. a
Giles, M. B. and Pierce, N. A.: An introduction to the adjoint approach to design, Flow Turbul. Combust., 65, 393–415, 2000. a, b, c, d
Gunzburger, M.: Adjoint equationbased methods for control problems in incompressible, viscous flows, Flow Turbul. Combust., 65, 249–272, 2000. a
Gunzburger, M. D.: Perspectives in flow control and optimization, Society for Industrial and Applied Mathematics, https://doi.org/10.1137/1.9780898718720, 2002. a, b
Gurnis, M., Turner, M., Zahirovic, S., DiCaprio, L., Spasojevic, S., Müller, R. D., Boyden, J., Seton, M., Manea, V. C., and Bower, D. J.: Plate tectonic reconstructions with continuously closing plates, Comput. Geosci., 38, 35–42, https://doi.org/10.1016/j.cageo.2011.04.014, 2012. a, b, c
Ham, D. A., Kelly, P. H. J., Mitchell, L., Cotter, C. J., Kirby, R. C., Sagiyama, K., Bouziani, N., Vorderwuelbecke, S., Gregory, T. J., Betteridge, J., Shapero, D. R., NixonHill, R. W., Ward, C. J., Farrell, P. E., Brubeck, P. D., Marsden, I., Gibson, T. H., Homolya, M., Sun, T., McRae, A. T. T., Luporini, F., Gregory, A., Lange, M., Funke, S. W., Rathgeber, F., Bercea, G.T., and Markall, G. R.: Firedrake User Manual, Imperial College London and University of Oxford and Baylor University and University of Washington, 1st edn., https://doi.org/10.25561/104839, 2023. a, b
Hansen, P. C.: Analysis of discrete illposed problems by means of the Lcurve, SIAM review, 34, 561–580, 1992. a
Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H.: The elements of statistical learning: data mining, inference, and prediction, vol. 2, Springer, 2009. a
Hinze, M., Pinnau, R., Ulbrich, M., and Ulbrich, S.: Optimization with PDE constraints, vol. 23, Springer Science & Business Media, 2008. a
Hoggard, M., Austerman, J., Randel, C., and Stephenson, S.: Observing dynamic topography through space and time, AGU Geophys. Monogr. 263, Washington, DC, https://doi.org/10.1002/9781119528609.ch15, 2021. a
Hoggard, M. J., Czarnota, K., Richards, F. D., Huston, D. L., Jaques, A. L., and Ghelichkhan, S.: Global distribution of sedimenthosted metals controlled by craton edge stability, Nat. Geosci., 13, 504–510, https://doi.org/10.1038/s4156102005932, 2020. a
Horbach, A., Bunge, H.P., and Oeser, J.: The adjoint method in geodynamics: derivation from a general operator formulation and application to the initial condition problem in a high resolution mantle circulation model, GEM: Int. J. Geomath., 5, 163–194, https://doi.org/10.1007/s1313701400615, 2014. a, b
Iaffaldano, G. and Bunge, H.P.: Rapid plate motion variations through geological time: Observations serving geodynamic interpretation, Annu. Rev. Earth Planet. Sci., 43, 571–592, 2015. a, b
Iglesias, J. A., Sturm, K., and Wechsung, F.: Twodimensional shape optimization with nearly conformal transformations, SIAM J, Sci, Comput,, 40, A3807–A3830, 2018. a
IsmailZadeh, A., Schubert, G., Tsepelev, I., and Korotkii, A.: Inverse problem of thermal convection: Numerical approach and application to mantle plume restoration, Phys. Earth Planet. In., 145, 99–114, https://doi.org/10.1016/j.pepi.2004.03.006, 2004. a, b, c, d, e
Jarvis, G. T. and McKenzie, D. P.: Convection in a compressible fluid with infinite Prandtl number, J. Fluid Mech., 96, 515–583, https://doi.org/10.1017/S002211208000225X, 1980. a
Koelemeijer, P., Ritsema, J., Deuss, A., and Van Heijst, H.J.: SP12RTS: a degree12 model of shearand compressionalwave velocity for Earth's mantle, Geophys. J. Int., 204, 1024–1039, https://doi.org/10.1093/gji/ggv481, 2016. a
Kouri, D. P., Jakeman, J. D., Huerta, J. G., Smith, C. B., Walsh, T. F., Udell, M., and Uryasev, S.: RiskAdaptive Experimental Design for HighConsequence Systems: LDRD Final Report, Tech. rep., Sandia National Lab.(SNLNM), Albuquerque, NM (United States), 2021a. a
Kouri, D. P., Ridzal, D., and Tuminaro, R.: KKT preconditioners for PDEconstrained optimization with the Helmholtz equation, SIAM J. Sci. Comput., 43, S225–S248, 2021b. a
Kouri, D. P., Staudigl, M., and Surowiec, T. M.: A relaxationbased probabilistic approach for PDEconstrained optimization under uncertainty with pointwise state constraints, Computational Optimization and Applications, 85, 441–478, 2023. a
Kramer, S. C., Wilson, C. R., and Davies, D. R.: An implicit free surface algorithm for geodynamical simulations, Phys. Earth Planet. In., 194, 25–37, 2012. a
Li, D., Gurnis, M., and Stadler, G.: Towards adjointbased inversion of timedependent mantle convection with nonlinear viscosity, Geophys. J. Int., 209, 86–105, https://doi.org/10.1093/gji/ggw493, 2017. a, b, c
Lin, C.J. and Moré, J. J.: Newton's method for large boundconstrained optimization problems, SIAM J. Optim., 9, 1100–1127, https://doi.org/10.1137/S1052623498345075, 1999. a
Liu, L. and Gurnis, M.: Dynamic subsidence and uplift of the Colorado Plateau, Geology, 38, 663–666, https://doi.org/10.1130/g30624.1, 2010. a, b, c
Liu, L., Spasojević, S., and Gurnis, M.: Reconstructing Farallon plate subduction beneath North America back to the Late Cretaceous, Science, 322, 934–938, https://doi.org/10.1126/science.1162921, 2008. a, b
Liu, L., Gurnis, M., Seton, M., Saleeby, J., Müller, R. D., and Jackson, J. M.: The role of oceanic plateau subduction in the Laramide orogeny, Nat. Geosci., 3, 353–357, https://doi.org/10.1038/ngeo829, 2010. a, b
Logg, A., Mardal, K.A., and Wells, G.: Automated solution of differential equations by the finite element method: The FEniCS book, vol. 84, Springer Science & Business Media, https://doi.org/10.1007/9783642230998, 2012. a
Martinec, Z., Sasgen, I., and Velímskỳ, J.: The forward sensitivity and adjointstate methods of glacial isostatic adjustment, Geophys. J. Int., 200, 77–105, 2015. a
Merdith, A. S., Williams, S. E., Collins, A. S., Tetley, M. G., Mulder, J. A., Blades, M. L., Young, A., Armistead, S. E., Cannon, J., Zahirovic, S., and Müller, R. D.: Extending fullplate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic, EarthSci. Rev., 214, 103477, https://doi.org/10.1016/j.earscirev.2020.103477, 2021. a, b
Mitusch, S. K., Funke, S. W., and Dokken, J. S.: dolfinadjoint 2018.1: automated adjoints for FEniCS and Firedrake, J. Open Source Softw., 4, 1292, https://doi.org/10.21105/joss.01292, 2019. a, b, c
Mosca, I., Cobden, L., Deuss, A., Ritsema, J., and Trampert, J.: Seismic and mineralogical structures of the lower mantle from probabilistic tomography, J. Geophys. Res., 117, B06304, https://doi.org/10.1029/2011jb008851, 2012. a
Moses, W. S., Narayanan, S. H. K., Paehler, L., Churavy, V., Schanen, M., Hückelheim, J., Doerfert, J., and Hovland, P.: Scalable automatic differentiation of multiple parallel paradigms through compiler augmentation, in: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '22, IEEE Press, ISBN 9784665454445, 2022. a
Moucha, R., Forte, A. M., Mitrovica, J. X., Rowley, D. B., Quéré, S., Simmons, N. A., and Grand, S. P.: Dynamic topography and longterm sealevel variations: There is no such thing as a stable continental platform, Earth Planet. Sc. Lett., 271, 101–108, 2008. a
Müller, R. D., Seton, M., Zahirovic, S., Williams, S. E., Matthews, K. J., Wright, N. M., Shephard, G. E., Maloney, K. T., BarnettMoore, N., Hosseinpour, M., Bower, D. J., and Cannon, J.: Ocean Basin Evolution and GlobalScale Plate Reorganization Events Since Pangea Breakup, Annu. Rev. Earth Planet. Sci., 44, 107–138, https://doi.org/10.1146/annurevearth060115012211, 2016. a
Müller, R. D., Cannon, J., Qin, X., Watson, R. J., Gurnis, M., Williams, S., Pfaffelmoser, T., Seton, M., Russell, S. H. J., and Zahirovic, S.: GPlates: Building a Virtual Earth Through Deep Time, Geochem. Geophys. Geosyst., 19, 2243–2261, https://doi.org/10.1029/2018GC007584, 2018. a
Müller, R. D., Zahirovic, S., Williams, S. E., Cannon, J., Seton, M., Bower, D. J., Tetley, M. G., Heine, C., Le Breton, E., Liu, S., Liu, S., Russell, S. H. J., Yang, T., Leonard, J., and Gurnis, M.: A global plate model including lithospheric deformation along major rifts and orogens since the Triassic, Tectonics, 38, 1884–1907, https://doi.org/10.1029/2018TC005462, 2019. a, b
Müller, R. D., Flament, N., Cannon, J., Tetley, M. G., Williams, S. E., Cao, X., Bodur, Ö. F., Zahirovic, S., and Merdith, A.: A tectonicrulesbased mantle reference frame since 1 billion years ago – implications for supercontinent cycles and plate–mantle system evolution, Solid Earth, 13, 1127–1159, https://doi.org/10.5194/se1311272022, 2022. a
Müller, R. D., Cannon, J., Qin, X., Watson, R. J., Gurnis, M., Williams, S., Pfaffelmoser, T., Seton, M., Russell, S. H. J., and Zahirovic, S.: GPlates: Building a Virtual Earth Through Deep Time, Geochem. Geophys. Geosyst., 19, 2243–2261, https://doi.org/10.1029/2018GC007584, 2018. a
Naumann, U.: The Art of Differentiating Computer Programs, Society for Industrial and Applied Mathematics, https://doi.org/10.1137/1.9781611972078, 2011. a, b, c
Nerlich, R., Colli, L., Ghelichkhan, S., Schuberth, B., and Bunge, H.P.: Constraining central NeoTethys Ocean reconstructions with mantle convection models, Geophys. Res. Lett., 43, 9595–9603, https://doi.org/10.1002/2016gl070524, 2016. a
Nitsche, J.: Über ein Variationsprinzip zur Lösung von DirichletProblemen bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind, in: Abhandlungen aus dem mathematischen Seminar der Universität Hamburg, vol. 36, 9–15, Springer, 1971. a
NixonHill, R. W., Shapero, D., Cotter, C. J., and Ham, D. A.: Consistent Point Data Assimilation in Firedrake and Icepack, arXiv [preprint], https://doi.org/10.48550/arXiv.2304.06058, 2023. a
Nocedal, J. and Wright, S. J. (Eds.): Numerical optimization, Springer, Springer New York, NY, https://doi.org/10.1007/b98874, , 1999. a
Nocedal, J. and Wright, S. J.: Quadratic programming, Numerical optimization, 448–492, 2006. a
Panasyuk, S. V. and Hager, B. H.: Inversion for mantle viscosity profiles constrained by dynamic topography and the geoid, and their estimated errors, Geophys. J. Int., 143, 821–836, 2000. a
Panton, J., Davies, J. H., and Myhill, R.: The Stability of Dense Oceanic Crust Near the CoreMantle Boundary, J. Geophys. Res.Sol. Ea., 128, e2022JB025610, https://doi.org/10.1029/2022JB025610, 2023. a
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S.: PyTorch: An Imperative Style, HighPerformance Deep Learning Library, in: Advances in Neural Information Processing Systems 32, 8024–8035, Curran Associates, Inc., http://papers.neurips.cc/paper/9015pytorchanimperativestylehighperformancedeeplearninglibrary.pdf (last access: 30 June 2024), 2019. a
Plessix, R.E.: A review of the adjointstate method for computing the gradient of a functional with geophysical applications, Geophys. J. Int., 167, 495–503, 2006. a
Poore, H. R., Samworth, R., White, N. J., Jones, S. M., and McCave, I. N.: Neogene overflow of Northern Component Water at the GreenlandScotland Ridge, Geochem. Geophys. Geosyst., 7, Q06010, https://doi.org/10.1029/2005GC001085, 2006. a
Price, M. G. and Davies, J. H.: Profiling the robustness, efficiency and limits of the forwardadjoint method for 3D mantle convection modelling, Geophys. J. Int., 212, 1450–1462, https://doi.org/10.1093/gji/ggx489, 2018. a
Rathgeber, F., Ham, D. A., Mitchell, L., Lange, M., Luporini, F., Mcrae, A. T. T., Bercea, G.T., Markall, G. R., and Kelly, P. H. J.: Firedrake: Automating the Finite Element Method by Composing Abstractions, ACT Trans. Math. Softw., 43, 1–24, https://doi.org/10.1145/2998441, 2016. a, b, c
Ratnaswamy, V., Stadler, G., and Gurnis, M.: Adjointbased estimation of plate coupling in a nonlinear mantle flow model: theory and examples, Geophys. J. Int., 202, 768–786, 2015. a
Rawlinson, N., Pozgay, S., and Fishwick, S.: Seismic tomography: a window into deep Earth, Phys. Earth Planet. In., 178, 101–135, https://doi.org/10.1016/j.pepi.2009.10.002, 2010. a
Reuber, G. S., Holbach, L., Popov, A. A., Hanke, M., and Kaus, B. J.: Inferring rheology and geometry of subsurface structures by adjointbased inversion of principal stress directions, Geophys. J. Int., 223, 851–861, https://doi.org/10.1093/gji/ggaa344, 2020. a, b
Ricard, Y.: Physics of mantle convection, Treatise on geophysics, 7, 31–87, 2007. a
Sabadini, R. and Vermeersen, L.: Influence of lithospheric and mantle stratification on global postseismic deformation, Geophys. Res. Lett., 24, 2075–2078, 1997. a
Sambridge, M. and Mosegaard, K.: Monte Carlo methods in geophysical inverse problems, Rev. Geophys., 40, 3–1, https://doi.org/10.1029/2000rg000089, 2002. a
Schwedes, T., Funke, S. W., and Ham, D. A.: An iteration count estimate for a meshdependent steepest descent method based on finite elements and Riesz inner product representation, arXiv [preprint], arXiv:1606.08069, 2016. a
Schwedes, T., Ham, D. A., Funke, S. W., Piggott, M. D., Schwedes, T., Ham, D. A., Funke, S. W., and Piggott, M. D.: Mesh dependence in PDEconstrained optimisation, Springer, https://doi.org/10.1007/9783319594835_2, 2017. a
Seton, M., Müller, R. D., Zahirovic, S., Gaina, C., Torsvik, T. H., Shephard, G., Talsma, A., Gurnis, M., Turner, M., Maus, S., and Chandler, M.: Global continental and ocean basin reconstructions since 200 Ma, EarthSci. Rev., 113, 212–270, https://doi.org/10.1016/j.earscirev.2012.03.002, 2012. a
Seton, M., Müller, R. D., Zahirovic, S., Williams, S., Wright, N. M., Cannon, J., Whittaker, J. M., Matthews, K. J., and McGirr, R.: A Global Data Set of PresentDay Oceanic Crustal Age and Seafloor Spreading Parameters, Geochem. Geophys. Geosyst., 21, e2020GC009214, https://doi.org/10.1029/2020GC009214, 2020. a
Shephard, G., Müller, R., Liu, L., and Gurnis, M.: Miocene drainage reversal of the Amazon River driven by plate–mantle interaction, Nat. Geosci., 3, 870–875, 2010. a, b, c, d
Shephard, G., Bunge, H.P., Schuberth, B., Müller, R., Talsma, A., Moder, C., and Landgrebe, T.: Testing absolute plate reference frames and the implications for the generation of geodynamic mantle heterogeneity structure, Earth Planet. Sc. Lett., 317–318, 204–217, https://doi.org/10.1016/j.epsl.2011.11.027, 2012. a
Simmons, N. A., Myers, S. C., Johannesson, G., Matzel, E., and Grand, S. P.: Evidence for longlived subduction of an ancient tectonic plate beneath the southern Indian Ocean, Geophys. Res. Lett., 42, 9270–9278, https://doi.org/10.1002/2015gl066237, 2015. a
Spasojevic, S., Liu, L., and Gurnis, M.: Adjoint models of mantle convection with seismic, plate motion, and stratigraphic constraints: North America since the Late Cretaceous, Geochem. Geophys. Geosyst., 10, Q05W02, https://doi.org/10.1029/2008gc002345, 2009. a, b, c, d, e
Stadler, G., Gurnis, M., Burstedde, C., Wilcox, L. C., Alisic, L., and Ghattas, O.: The Dynamics of Plate Tectonics and Mantle Flow: From Local to Global Scales, Science, 329, 1033–1038, https://doi.org/10.1126/science.1191223, 2010. a
Stixrude, L. and LithgowBertelloni, C.: Thermodynamics of mantle minerals  II. Phase equilibria, Geophys. J. Int., 184, 1180–1213, https://doi.org/10.1111/j.1365246X.2010.04890.x, 2011. a
Stotz, I., Iaffaldano, G., and Davies, D. R.: Pressuredriven Poiseuille flow: A major component of the torquebalance governing Pacific plate motion, Geophys. Res. Lett., 45, 117–125, 2018. a
Stotz, I. L., Iaffaldano, G., and Davies, D. R.: Late Miocene Pacific plate kinematic change explained with coupled global models of mantle and lithosphere dynamics, Geophys. Res. Lett., 44, 7177–7186, https://doi.org/10.1002/2017GL073920, 2017. a
Styles, E., Davies, D. R., and Goes, S.: Mapping spherical seismic into physical structure: biases from 3D phasetransition and thermal boundarylayer heterogeneity, Geophys. J. Int., 184, 1371–1378, https://doi.org/10.1111/j.1365246x.2010.04914.x, 2011. a
Taiwo, A., Bunge, H.P., Schuberth, B. S. A., Colli, L., and Vilacis, B.: Robust global mantle flow trajectories and their validation via dynamic topography histories, Geophys. J. Int., 234, 2160–2179, https://doi.org/10.1093/gji/ggad188, 2023. a
Talagrand, O.: Assimilation of observations, an introduction (gtspecial issueltdata assimilation in meteology and oceanography: Theory and practice), J. Meteorol. Soc. Jpn. Ser. II, 75, 191–209, https://doi.org/10.2151/jmsj1965.75.1B_191, 1997. a
Tetley, M. G., Williams, S. E., Gurnis, M., Flament, N., and Müller, R. D.: Constraining Absolute Plate Motions Since the Triassic, J. Geophys. Res.Sol. Ea., 124, https://doi.org/10.1029/2019JB017442, 2019. a
The ROL Project Team: The ROL Project Website, https://trilinos.github.io/rol.html (last access: 30 June 2024), 2022. a, b, c, d
Tijskens, E., Roose, D., Ramon, H., and De Baerdemaeker, J.: Automatic differentiation for solving nonlinear partial differential equations: an efficient operator overloading approach, Numer. Algorithms, 30, 259–301, 2002. a
Tikhonov, A. N.: Solution of incorrectly formulated problems and the regularization method., Sov. Dok., 4, 1035–1038, 1963. a
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in Python, Nature Methods, 17, 261–272, https://doi.org/10.1038/s4159201906862, 2020. a
Vynnytska, L. and Bunge, H.P.: Restoring past mantle convection structure through fluid dynamic inverse theory: Regularisation through surface velocity boundary conditions, GEMInt. J. Geomath., 6, 83–100, https://doi.org/10.1007/s1313701400606, 2015. a
Wang, Z. R., Stotz, I. L., Bunge, H.P., Vilacís, B., Hayek, J. N., Ghelichkhan, S., and Lebedev, S.: Cenozoic upper mantle flow history of the Atlantic realm based on Couette/Poiseuille models: Towards paleomantleflowgraphy, Phys. Earth Planet. In., 340, 107045, https://doi.org/10.1016/j.pepi.2023.107045, 2023. a
Weismüller, J., Gmeiner, B., Ghelichkhan, S., Huber, M., John, L., Wohlmuth, B., Rüde, U., and Bunge, H.P.: Fast asthenosphere motion in highresolution global mantle flow models, Geophys. Res. Lett., 42, 7429–7435, https://doi.org/10.1002/2015GL063727, 2015. a
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., GonzalezBeltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., RoccaSerra, P., Roos, M., van Schaik, R., Sansone, A.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data, 3, 1–9, 2016. a
Williams, S., Flament, N., Müller, R. D., and Butterworth, N.: Absolute plate motions since 130 Ma constrained by subduction zone kinematics, Earth Planet. Sc. Lett., 418, 66–77, 2015. a
Wolstencroft, M., Davies, J. H., and Davies, D. R.: Nusselt–Rayleigh number scaling for spherical shell Earth mantle simulation up to a Rayleigh number of 109, Phys. Earth Planet. In., 176, 132–141, 2009. a
Wunsch, C.: The Ocean Circulation Inverse Problem, Cambridge University Press, Cambridge, ISBN 9780511629570, https://doi.org/10.1017/CBO9780511629570, 1996. a
Xu, K. and Darve, E.: Physics constrained learning for datadriven inverse modeling from sparse observations, J. Comput. Phys., 453, 110938, https://doi.org/10.48550/arXiv.2002.10521, 2022. a
Young, A., Flament, N., Williams, S. E., Merdith, A., Cao, X., and Müller, R. D.: Longterm Phanerozoic sea level change from solid Earth processes, Earth Planet. Sc. Lett., 584, 117451, https://doi.org/10.1016/j.epsl.2022.117451, 2022. a, b
Zaroli, C., Sambridge, M., Lévêque, J.J., Debayle, E., and Nolet, G.: An objective rationale for the choice of regularisation parameter with application to global multiplefrequency Swave tomography, Solid Earth, 4, 357–371, https://doi.org/10.5194/se43572013, 2013. a
Zhang, H., Constantinescu, E. M., and Smith, B. F.: PETSc TSAdjoint: a discrete adjoint ODE solver for firstorder and secondorder sensitivity analysis, SIAM J. Sci. Comput., 44, C1–C24, 2022. a
Zhong, S. and Rudolph, M. L.: On the temporal evolution of longwavelength mantle structure of the Earth since the early Paleozoic, Geochem. Geophys. Geosyst., 16, 1599–1615, https://doi.org/10.1002/2015GC005782, 2015. a
Zhong, S., Yuen, D. A., Moresi, L. N., and Knepley, M.: Numerical methods for mantle convection, Treatise on geophysics, 7, 227–252, 2007. a
Zhou, Q. and Liu, L.: A hybrid approach to data assimilation for reconstructing the evolution of mantle dynamics, Geochem. Geophys. Geosyst., 18, 3854–3868, 2017. a, b