Automatic adjoint-based inversion schemes for geodynamics: Reconstructing the evolution of Earth’s mantle in space and time

.


Introduction
Mantle convection is the "engine" driving our dynamic planet.It is the principal control on Earth's thermal and chemical evolution and underpins tectonic and geological activity at Earth's surface (e.g.Davies and Richards, 1992;Coltice et al., 2017).Through interactions with Earth's crust, it introduces heat and fluids that contribute to the formation and concentration of ore deposits (e.g.Hoggard et al., 2020).Mantle flow also induces vertical movements of Earth's surface (so-called dynamic topography; see Davies et al., 2023, for a review), leading to regional and global changes in sea level and climate (e.g.Poore et al., 2006;Moucha et al., 2008;Cloetingh and Haq, 2015).The lithosphere, consid-Published by Copernicus Publications on behalf of the European Geosciences Union.S. Ghelichkhan et al.: Automatic adjoint optimisation schemes for geodynamics ered here to be the mantle's upper thermal boundary layer, serves as a window into the form and time-dependence of mantle convection, recorded through tectonic plate motions (e.g.Iaffaldano and Bunge, 2015;Müller et al., 2016;Stotz et al., 2017Stotz et al., , 2018;;Wang et al., 2023).Although substantial progress has been made in reconstructing the history of plate tectonic motions (e.g.Seton et al., 2012;Gurnis et al., 2012;Müller et al., 2019;Merdith et al., 2021), the quest for a dynamic reference, revealing the force equilibria within the underlying mantle, remains ongoing.In other words, the veracity of plate motion reconstructions is not matched by an equivalent knowledge of the thermo-chemical structure and flow history of the underlying mantle.This is a major limitation, as it inhibits our ability to understand fundamental processes that depend on time-dependent interactions between Earth's surface and its deep interior.
The primary challenge in reconstructing past mantle flow stems from the lack of knowledge surrounding the initial state, compounded by uncertain physical and chemical parameters.Mantle convection is an initial condition problem, uniquely determined by an initial condition some time in the past: starting from some point in time, it can be uniquely modelled by solving conservation equations for mass, momentum and energy (e.g.Ricard, 2007;Zhong et al., 2007).However, the lack of knowledge on this initial condition -specifically the thermo-chemical state of Earth's mantle at some time in the past -renders reconstructions of mantle flow through conventional forward calculations intractable (Bunge et al., 2003) (Fig. 1a).Moreover, current global mantle convection models employ billions of degrees of freedom and require multiple time steps to resolve the multi-scale dynamics of Earth's mantle (e.g.Davies and Davies, 2009;Wolstencroft et al., 2009;Stadler et al., 2010;Weismüller et al., 2015;Dannberg and Gassmöller, 2018;Bauer et al., 2019Bauer et al., , 2020)).Owing to the resulting computational expense, the use of conventional geophysical inverse methods, including Monte Carlo techniques (e.g.Sambridge and Mosegaard, 2002), is considered impractical for determining the mantle's past structure, dynamics and evolution.
The initial condition problem can be partially addressed through sequential data-assimilation techniques.In essence, the objective of sequential data assimilation is to leverage all accessible information to improve predictions of mantle flow in space and time.Data assimilation is commonly achieved through sequential filtering (e.g.Wunsch, 1996), in which the model is advanced in time over the period in which observations exist.Whenever observations become available, the model is adjusted or "corrected" (e.g.Bunge and Grand, 2000;Bocher et al., 2016).The magnitude of the correction can be optimally determined using methods such as the Kalman filter (e.g.Bocher et al., 2018), with the model subsequently restarted from the updated state, and this process is repeated until all available information has been utilised (Fig. 1b).In a geodynamic context, the most commonly exploited dataset consists of palaeo-surface velocities from plate tectonic reconstruction models, primarily through a kinematic surface boundary condition in time-dependent models.In recent years, there has been a notable increase in the use of this approach (e.g.Bunge et al., 2002;Davies et al., 2012;Bower et al., 2013;Zhong and Rudolph, 2015;Nerlich et al., 2016;Young et al., 2022;Panton et al., 2023).This can be attributed to two main factors: (i) improved confidence in the validity of plate tectonic reconstruction models and their extension further back in time (e.g.Merdith et al., 2021;Young et al., 2022;Müller et al., 2022) and (ii) the enhanced accessibility of such models, facilitated via opensource community frameworks like the GPlates project (e.g.Gurnis et al., 2012;Müller et al., 2018).
Sequential data-assimilation methods, however, come with an inherent limitation: due to the sequential nature of the assimilation process, each observation is incorporated to influence the model only at later times.Consequently, information propagates from the past into the future but cannot be transmitted back into the past.This drawback poses a significant limitation, as our knowledge of the mantle at the present day is substantially more detailed than at any other time.Thus, it becomes imperative to explore approaches that explicitly carry information backwards in time, or more precisely, enable the estimation of a time-dependent model that best fits all available observational constraints.
Inverse geodynamics is a rapidly evolving field that embarks on this very idea.The foundation of this field is an optimisation approach, in which mantle convection modelling is reformulated as an inverse problem.Using inverse theory, unknown model parameters can be optimised to fit available observational data via the so-called adjoint method (e.g.Bunge et al., 2003;Ismail-Zadeh et al., 2004), through which the sensitivities of a performance measure (the so-called "objective functional"), with respect to model parameters (e.g. the choice of initial condition), can be computed.The resulting sensitivity information can be used to adjust model parameters, generating a model flow trajectory that matches observational constraints (e.g.present-day mantle thermo-chemical structure) (Fig. 1c).The geodynamic adjoint equations for reconstructing the initial condition have been derived for isochemical, incompressible (e.g.Bunge et al., 2003;Horbach et al., 2014), compressible (Ghelichkhan and Bunge, 2016) and thermo-chemical mantle flow (Ghelichkhan and Bunge, 2018).Moreover, the method has been enhanced for simultaneous recovery of initial temperature conditions and rheological parameters (Li et al., 2017) despite the inherent illposedness introduced in such a problem.Growing adoption of adjoint-based methods within a broader geodynamic context is evidenced by their application in Reuber et al. (2020) to decipher subsurface structures and rheological parameters via inversion of principal stress directions and by Crawford et al. (2018) to quantify the sensitivity of post-glacial sea level changes to lateral variations in mantle viscosity.Recent growth in computational power has led to multiple adjoint-based mantle reconstructions, with a particular fo-Figure 1. Illustration of different procedures available for estimating past mantle structure: (a) forward modelling prediction, where an unknown initial condition is estimated at t 0 , with modelling error measured as the difference between the modelled and true states.This difference is represented by the distance along the y axis, which typically grows in time.Variations in forward modelling include "nudging" of the initial condition to better match present-day structure; (b) sequential data assimilation or "Kalman filtering" -starting from an initial condition at t 0 , the forward model is run until t 1 .An analysis is subsequently undertaken from the resulting model and the available observation.The corrected model is then subsequently integrated in time until t 2 , when the next observation is available.This process is repeated until t 3 or the last time step with available observational data.The information flow diagram depicts how information is carried from both the past and present, using current data; (c) adjoint-based or 4-D variational data assimilation, which is capable of carrying information on present-day structure (e.g.images from seismic tomography) explicitly backward in time.In (c) observational data that constrain present-day mantle structure, alongside data from different points in space and time, are used to optimise the unknown initial condition.Here, all available observations between t 3 and t 0 contribute to the analysis.The true (unknown) signal is represented by the solid line.Observations (stars), predictions (dark circles) and analyses (squares) are surrounded by ellipsoids of a size proportional to the estimated model uncertainty.Modified from Carrassi et al. (2018) and Davies et al. (2023).cus on regional geological events, as recorded in the geological record of the Americas (e.g.Spasojevic et al., 2009;Liu and Gurnis, 2010;Shephard et al., 2010) and the Atlantic realm (e.g.Colli et al., 2018).To supplement these findings, Ghelichkhan et al. (2021) undertook a systematic global-scale comparison between adjoint model predictions and independent geological constraints, with implications for the expected rates of gravitational and dynamic ellipticity changes resulting from convection within Earth's mantle (Ghelichkhan et al., 2018(Ghelichkhan et al., , 2020)).
While substantial strides have been made in the application of adjoint-based methods in geodynamics, there remain widespread obstacles to the broader use of these techniques within the geodynamic modelling community.One key challenge is the complexity involved with deriving, implementing and validating adjoint models for large-scale, non-linear, time-dependent problems, core features of global mantle convection models.Owing to these difficulties, studies that have used the adjoint method to explore the thermochemical evolution of Earth's mantle often resort to a simplified strategy.This typically involves either a highly idealised treatment of mantle rheology (e.g.Colli et al., 2018;Ghelichkhan et al., 2021), omitting certain coupling terms in the adjoint equations, or doing both (e.g.Liu et al., 2008;Spasojevic et al., 2009;Liu and Gurnis, 2010;Shephard et al., 2010).Such simplifications limit the applicability of these results.The work of Li et al. (2017) is a notable exception to this trend, although its focus on a 2-D rectangular domain, specifically aimed at reconstructing the dynamics of subduction, inherently limits its applicability to global mantle convection simulations.Furthermore, previous global applications of the adjoint approach have been hampered by their reliance on legacy community codes (e.g.Bunge et al., 2003;Liu et al., 2010;Shephard et al., 2010;Colli et al., 2018;Ghelichkhan et al., 2021).These codes are not easily extensible to different geometries or approximations of the underlying physics, employ solver strategies that have since been superseded, and often have limits on model resolution due to the fully structured discretisations encoded.Moreover, the complex and extensive low-level code implementation of coupled adjoint and forward calculations remains obscured, making it difficult to extend and validate, thus restricting the types of observational datasets that can be incorporated.An example of these datasets are the (uncertain) constraints provided by plate tectonic reconstruction models that are prescribed kinematically (e.g.Spasojevic et al., 2009;Shephard et al., 2010;Zhou and Liu, 2017;Colli et al., 2018;Ghelichkhan et al., 2021) as opposed to being formally incorporated through the objective functional.The kinematic prescription of the surface velocities, however, can only improve mantle reconstructions forward in time and therefore prohibits their influence on previous system states.In light of these limitations, there is a need for a general framework that can robustly handle the rheological complexities of Earth's mantle and is easily extensible and transferable to other problems in mantle and lithosphere dynamics.Such a framework must also be https://doi.org/10.5194/gmd-17-5057-2024 Geosci.Model Dev., 17, 5057-5086, 2024 capable of utilising a variety of observational constraints to comprehensively unravel the historical evolution of Earth's interior and its diverse impacts at Earth's surface (Davies et al., 2023).
In this paper, we introduce the Geoscientific ADjoint Optimisation PlaTform (G-ADOPT), research software infrastructure that allows us to overcome these limitations.G-ADOPT is built around three state-of-the-art software libraries -(i) Firedrake, an automated system for solving a range of partial differential equations using the finite-element method (Rathgeber et al., 2016), recently validated for geodynamics (Davies et al., 2022); (ii) Dolfin-Adjoint, which automatically derives the discrete adjoint equations in a form compatible with Firedrake (Farrell et al., 2013a;Mitusch et al., 2019); and (iii) the Rapid Optimisation Library (ROL), a Trilinos package for performing highly efficient large-scale optimisation (The ROL Project Team, 2022).When combined, they provide a geodynamic inversion framework that is highly efficient, with fully consistent forward and adjoint calculations that achieve theoretical computational efficiency.
We structure our paper as follows: in Sect.2.1 we describe the geodynamic forward problem and our solution strategy using G-ADOPT.Section 2.2 describes the inverse problem considered herein, where we focus on finding the (unknown) initial condition using an objective functional that accounts for observations of surface velocity over time and the final state temperature field.This is broken into three subsections: (i) Sect.2.2.1 describes discrete and continuous approaches for obtaining the adjoint systems followed by a detailed derivation of the discrete adjoint systems; (ii) Sect.2.2.2 introduces Dolfin-Adjoint, the underlying approach utilised in G-ADOPT to compute the discrete adjoint and derivative fields; and (iii) Sect.2.2.6 provides an overview of the gradient-based optimisation approach utilised here, facilitated by ROL.We demonstrate the applicability of our approach in Sect. 3 using two sets of twin experiments with increasing complexity, where we reconstruct the spatial and temporal evolution of a reference simulation.The first set of experiments (Sect.3.1) involves a simple isoviscous simulation within an enclosed square domain, while the second set (Sect.3.2) examines convection with a non-linear (temperature, depth and strain rate dependent) rheology at Earth-like convective vigour within an annular domain, which has direct applicability to convection within Earth's mantle.We finish by discussing our results and conclusions in Sects.4 and 5, respectively.

Governing equations and boundary conditions
Mantle flow is described by conservation laws for mass, momentum and energy.We solve these equations in their simplest form, assuming incompressibility and the Boussinesq approximation.The three non-dimensional conservation equations are with the vector field u and scalar fields p and T as the principal unknowns of velocity, pressure and temperature, respectively.Table 1 summarises other symbols used in Eqs. ( 1a)-(1c) and elsewhere.In Eq. ( 1a), the strain-rate tensor ˙ (u) is given by and the Rayleigh number is defined by For this problem, we define the time interval of interest as I = [t I , t F ], with the computational domain V bounded by ∂V , and S and C denoting the top and bottom boundaries, respectively.For all simulations, free-slip and impermeable velocity boundary conditions are specified on all boundaries, whilst temperature boundary conditions are set to constant values of T S and T C at top and bottom boundaries, respectively.For the simulations considered in an enclosed square domain, natural temperature boundary conditions (zero heat flux) are specified on side walls.The set of boundary conditions, for top and bottom boundaries, are listed in Eqs.(4a)-(4d): In Eqs.(4a)-(4d), n denotes the outer normal vector and s any tangential vector.Finally, as mantle convection is an initial value problem, we require a prescribed temperature field at initial time t I : (5)  (Nitsche, 1971), which weakly enforces boundary conditions via a modification of the variational formulation.An implicit mid-point scheme is used for time integration in the energy equation.
To solve the coupled system of forward (and adjoint) equations, we employ Firedrake, which provides an automated system designed for the solution of partial differential equations using the finite-element method (Ham et al., 2023).It incorporates principles and some aspects of the code-base from the FEniCS project (Logg et al., 2012), including its use of the Unified Form Language (UFL) (Alnaes et al., 2014) for the representation of variational problems.UFL is a highlevel language utilised for the symbolic description of the governing equations in a form that closely mimics their mathematical formulation.The key advantage of UFL in this work lies in its high-level abstraction, which allows for an innovative automatic derivation approach when deriving adjoint systems.We will address the significance of UFL in automatic derivation of adjoint systems in Sect.2.2.2.
Firedrake provides an array of features that are particularly conducive to tackling problems in geophysical fluid dynamics.Key among these features is support for a variety of finite-element discretisations, including a highly efficient implementation of discretisations based on extruded meshes; programmable non-linear solves; and operator-aware solver preconditioners that can be combined in a flexible manner to create linear or non-linear systems, which are solved by PETSc (e.g.Balay et al., 1997;Dalcin et al., 2011;Balay et al., 2023).The suitability of Firedrake for geodynamics has been demonstrated via comparison with a comprehensive set of analytical solutions and community benchmarks in Davies et al. (2022).We refer the reader to this study, alongside Rathgeber et al. (2016), for a more in-depth discussion on Firedrake and its dependencies, alongside an outline of the solution strategy employed herein.
Building upon the foundations provided by Firedrake, G-ADOPT introduces an array of features tailored for cuttingedge geodynamical simulations.It integrates smoothly with Firedrake's advanced data-loading capabilities, enabling finite-element consistent point-data interpolation (Nixon-Hill et al., 2023) and facilitating the integration of diverse observational and physical datasets.Various non-linear rheological laws can be effortlessly incorporated using the symbolic representation provided by UFL.G-ADOPT provides a selection of time discretisation schemes, including secondorder Runge-Kutta and backward/forward Euler methods.Visualisation capabilities are provided through integration with the "Visualization Toolkit" (VTK), allowing comprehensive analysis in ParaView, among other visualisation packages.Compatibility with meshes generated by gmsh and seamless integration with NETGEN provide straightforward support for a range of mesh configurations and adaptive mesh refinement.G-ADOPT offers a diverse array of boundary conditions, encompassing free-and no-slip conditions, a free-surface (Kramer et al., 2012), and kinematic forcing, for example, via seamless integration with pyGPlates (Gurnis et al., 2012;Müller et al., 2018).The collection of features, elaborately detailed in the Firedrake and G-ADOPT online documentation and referenced in Davies et al. (2022) Representative reconstructions of the spatial and temporal evolution of mantle flow require knowledge of the initial condition (i.e. the thermo-chemical state of the mantle at some point in the past), the determination of which is an inverse problem.We therefore seek the best-fitting initial condition (T IC ) that results in the minimum of an objective functional that measures the difference between predictions and observations of mantle states and irregularity in the solutions.We use the following mathematical description of the objective functional: The first term in Eq. ( 6) accounts for the misfit between model predictions and temperature recorded at the final instance, whilst the second term measures the misfit in surface velocities through time.The other terms are Tikhonov regularisation components (Tikhonov, 1963;Hansen, 1992), the first of which penalises deviations from an a priori depthaveraged profile, T (i.e. the damping term), and the second penalises the gradient of these deviations to produce less complex solutions (i.e. the smoothing term).For the damping term, we employ a depth-dependent pre-factor, G(r), that is zero within the mid-mantle but transitions to one in the thermal boundary layers.In these areas, lateral material transport and diffusive processes dominate, leading to diminished sensitivity between the choice of initial condition and the final temperature field.G(r), therefore, helps to minimise the amplitude of the solution within the top and bottom thermal boundary layers, removing noise that would otherwise diffuse over the simulation.The three weighting terms, β u , β s and β d , in Eq. ( 6) are defined as Note that t is the total duration of the simulation.The integrals in Eqs. ( 7) are employed to ensure normalised objective terms relative to the final temperature misfit (first term on the right-hand side of Eq. 6).The three scaling parameters, α u , α s and α d , can be set to adjust the importance of these terms relative to the final temperature misfit.We perform a parameter search to find the best-performing combinations of α values.
For our inverse problem we are interested in optimising J described by Eq. ( 6).J depends on some "control" parameters, which in this case is the initial temperature field, and some "state" variables (i.e.surface velocity and the final temperature), which are solutions of the forward problem in Eqs.(1a)-(1c), with the forward system itself depending again on the control.To solve this problem, we define a "reduced" functional, which is a function of the initial condition, T IC , alone.This reduced functional is typically defined by first solving the forward partial differential equations (PDEs) (Eqs.1a-1c) for a given value of the control and then substituting the solutions into the expression for the functional (Eq.6).The result is a functional that depends only on the control parameters, not the state variables directly (hence the name reduced).
Non-linear optimisation methods provide the means to find the optimal T IC by minimising a reduced functional defined by J .These methods are iterative.They begin with an initial guess of T IC to generate a sequence of improved estimates (called iterates) until certain conditions, e.g.residual tolerance, in the objective functional are achieved.Crucial for the efficiency of these iterations are derivatives of Eq. ( 6) with respect to T IC .Owing to the large number of unknowns in 3-D spherical mantle convection models with Earth-like parameters, obtaining the derivative by means of classical finite differencing techniques is impractical.The adjoint method serves as a mathematically elegant and computationally efficient way to obtain the derivatives (e.g.Talagrand, 1997;Giles and Pierce, 2000;Plessix, 2006;Hinze et al., 2008).

Continuous versus discrete adjoints
Approaches to deriving, implementing and obtaining derivatives using the adjoint method primarily fall into two categories: (i) continuous, or the differentiate-then-discretise approach, and (ii) discrete, or the discretise-then-differentiate approach (Gunzburger, 2000).
The continuous approach commences with the derivation of the adjoint equations.The resemblance between the forward and resulting adjoint equations allows the adjoint PDEs to be discretised in a consistent manner and, subsequently, implemented within a numerical framework.By deriving the continuous adjoint PDEs, one can develop an understanding of the key characteristics of adjoint sensitivities, the physical implications of individual terms in the PDEs and their boundary conditions.Moreover, the continuous method affords complete autonomy in the discretisation and imple-mentation of the adjoint system, often leading to simplified but more cost-effective approximations of the solutions, as demonstrated by Ismail-Zadeh et al. (2004).
By contrast, the discrete approach relies on already discretised forward equations and then differentiates and transposes them to obtain the adjoint equations.This method's primary advantage lies in maintaining consistency in spatial and temporal discretisations, allowing for the automatic determination of the exact gradient of the (discrete) objective functional (Giering and Kaminski, 1998;Gunzburger, 2002).Such consistency ensures full convergence of second-order Newtonian optimisation methods and simplifies the debugging of adjoint programs (Giles and Pierce, 2000).For example, with the discrete approach, even minor inconsistencies in the derivative can highlight numerical or programming errors that must be rectified.It also permits the automatic creation of the adjoint program, stemming from the property that a transposed (adjoint) matrix shares the same eigenvalues with the original linear matrix, ensuring convergence for the adjoint problem's iterative solution methods (Giles and Pierce, 2000).This advantage has facilitated the development of various automatic differentiation (AD) tools in recent decades, including those used in TensorFlow (Abadi et al., 2015), Py-Torch (Paszke et al., 2019) and Enzyme (Moses et al., 2022).It is essential to note that the continuous and discrete methods are equivalent in the limit of infinite spatial and temporal resolution.However, in practical terms, it is the discrete method that typically provides more accurate gradient information.For more details on both approaches we refer the reader to Giles and Pierce (2000) and Gunzburger (2002).

Dolfin-Adjoint
Robust and efficient derivative calculations for large-scale simulations using automatic differentiation is challenging and often too slow for the purpose of large-scale optimisation problems (Naumann, 2011).This inefficiency is often attributed to the usually employed approach of treating a numerical model as a sequence of elementary instructions such as addition, multiplication or exponentiation, known as blocks.Once the AD tool establishes a sequence of blocks with their dependencies (a process often called taping), each block is individually differentiated, and one arrives at the derivative of the entire model using the chain rule.
Dolfin-Adjoint uses an innovative approach to achieve theoretical efficiency by using so-called operator overloading differentiation (Tijskens et al., 2002).By leveraging the highlevel mathematical language used by Firedrake and FEn-iCS (UFL), Dolfin-Adjoint performs the taping process at the highest abstraction level.This can result in blocks that symbolise whole PDE system solves, for which the adjoint derivation is performed at the same level of abstraction.The derived adjoint operation to a block is itself a Firedrake operation.This facilitates the generation of the low-level adjoint code using the same finite-element form compiler as the for-ward model; i.e. while the taping operation is, in essence, similar to the fundamental abstraction in automatic differentiation techniques, Dolfin-Adjoint operates at a higher level of abstraction and, accordingly, can achieve maximum efficiency and robustness.Consequently, platforms such as Firedrake and FEniCS have incorporated Dolfin-Adjoint for precise derivative computations, which are essential for solving various optimisation and inverse problems.We note the pursuit of similar automatic differentiation techniques, which have been central to the development of numerical tools like PETSc's TSAdjoint (Zhang et al., 2022) and JuliaAd-FEM (Xu and Darve, 2022).
To efficiently compute derivatives, Dolfin-Adjoint tracks and manipulates the computational operations involved in solving the forward and adjoint problem.Central to Dolfin-Adjoint's design is the Tape class, responsible for recording a series of high-level finite-element operations -such as (non-)linear solutions, functions and interpolations -that map the initial temperature field T IC into an objective functional J through a series of operations.Operations executed during the forward problem are encapsulated within instances of Block subclasses, each representing a distinct computational step.These Block instances, by keeping track of their inputs and outputs through BlockVariable instances, establish a computational graph that is used by Dolfin-Adjoint to navigate to compute adjoints.To seamlessly integrate this functionality, Dolfin-Adjoint employs operator overloading, allowing the creation of overloaded functions and data types that, to the user, behave identically to their original counterparts but are augmented to support automatic differentiation.This approach enables the overloading of data types through inheritance from both the original data type and the OverloadedType class in Dolfin-Adjoint, ensuring that each data object carries a BlockVariable for tracking its computational lineage.The elegance of Dolfin-Adjoint's design lies in its ability to abstract the computational details, allowing the scientific end user to focus on designing the forward problem and simultaneously leverage the computational efficiency of automatic differentiation.Mimicking Firedrake's strategy in utilising Dolfin-Adjoint, the forward operations in G-ADOPT are overloaded by importing the inverse module of G-ADOPT (gadopt.inverse),which will be used to populate a tape and establish a ReducedFunctional, providing the necessary functionalities for computing a functional and its derivative with respect to controls.

Discrete forward model
As explained in more detail in Davies et al. (2022), the forward geodynamical model can be described as a series of linear and non-linear solutions.Although we solve for temperature using Q 2 elements, we choose the control T IC to be in the Q 1 function space as a means to regularise the inversion problem.This means we need to project T IC to the https://doi.org/10.5194/gmd-17-5057-2024 Geosci.Model Dev., 17, 5057-5086, 2024 S. Ghelichkhan et al.: Automatic adjoint optimisation schemes for geodynamics discrete function space Q = Q 2 to obtain a temperature T 0 used in the first time step.This can be formulated as solving the following system for T 0 : where q are test functions in Q. Subsequently we solve in each time step n = 0, . ..N − 1 the following two systems for u n , p n and T n+1 : where V and W are the discrete function spaces for velocity and pressure (here with test functions v and w.T n+θ is the weighted average θ T n+1 + (1 − θ)T n .Note that we assume a strain rate and temperaturedependent rheology and thus write η = η(u, T ).This makes Eq. ( 9) a non-linear system, which we solve through Newton's method.The discrete functional is calculated as

Calculating gradients using the adjoint method
We denote the entire forward solution as z = (T 0 , . ..T N , u 0 , . ..u N − , p 0 , . . .N −1 ) so that the func-tional can be written as a function J (z, T IC ) of z and the control T IC .The forward solution itself is also dependent on T IC , as for each choice of T IC we can solve the forward model to obtain z(T IC ).We define the reduced functional Ĵ as Thus we can reformulate the PDE-constrained minimisation problem, minimise J (z, T IC ) under the constraints (8-10), as an unconstrained minimisation problem for Ĵ (T IC ).
To use efficient gradient-based optimisation algorithms we do, however, need a means of computing its gradient, for which we will employ the adjoint method.In addition to the forward solution z, we also define an adjoint solution λ = ( 0 , . . .N , φ 0 , . ..φ N − , ξ 0 . . .N−1 ), where each component is associated with one of the constraints: 0 ∈ Q with Eq. ( 8), φ n ∈ V , ξ n ∈ W with Eq. ( 9) and n+1 ∈ Q with Eq. ( 10) for n = 0, . ..N − 1.Using these we define the following sum of the constraints, with each constraint weighted by the corresponding adjoint solution λ: Since, by definition, for any choice of T IC the associated forward solution z(T IC ) satisfies all constraints, we have and thus, for any choice of λ we have If we choose λ to be the solution to the following, so-called adjoint equation, we can work out the gradient of the reduced functional: Although the adjoint Eq. ( 15) is derived symbolically from the forward model (Eqs.8-10) and solved for fully automatically by Dolfin-Adjoint, we here briefly work out the discrete adjoint equations to show that these equations can still be interpreted as the solution process of a backwards-in-time PDE, similar to the continuous adjoint approach, but that a specific time discretisation is derived, which is necessary to obtain a gradient that is consistent with the discrete forward model.Split out by component, the adjoint equations read which can be solved for n , φ n and ξ n .These equations can be solved by going backwards through the time steps n = N − 1. ..0.In each, we first solve Eq. 22 for n+1 .Starting at the last time step n = N − 1, we get where we have applied the gradient of F with respect to T N to an arbitrary perturbation δT .This allows us to interpret this equation as a weak form, tested with δT , that we can solve for N : Defining an auxiliary N +1 = t (T N − T obs ) and integrating the advection term by parts, we get which for θ = 1 we may recognise an advection-diffusion equation run backwards in time.
For n < N − 1, Eq. ( 22) contains more terms: as the energy equation in both forward time steps n and n+1 depends on T n+1 , as does the Stokes system in time step n + 1. Going backwards through the equations, however, we can still solve for n+1 , associated with time step n, as we have already solved for φ n+1 , ξ n+1 and n+2 associated with time step n + 1.Note that the ∂J /∂T n+1 term vanishes in this case, as J does not explicitly depend on intermediate temperature solutions.Similar to Eq. ( 25), we may rewrite it to • ∇u n+1 + ∇u n+1 T δT dx which we can interpret as a backward-in-time θ -weighted advection diffusion step for , with source terms associated with sensitivity of the rheology and buoyancy to temperature.Note, however, that here in the advection term we weight both and u with θ in contrast to the forward time step Eq. ( 10).
After solving for n+1 , we can solve Eq. ( 20) together with Eq. ( 21) for φ n and ξ n : for all velocity and pressure perturbations δu ∈ V and δp ∈ W .This leads to the following weak-form equation for The left-hand side is similar to the Stokes system in Eq. ( 9), except for an additional ∂η/∂u term, alongside the fact that this is now just a linear system as the rheology only depends on the forward variables u n and T n .Instead of a buoyancy term, we now have forcing terms associated with sensitivity of temperature advection and the mismatch with observed velocities at the surface.Finally, after having solved either Eq. ( 25) (n = N − 1) or Eq. ( 27) (n < N − 1) for n+1 and Eq. ( 29) for φ n and ξ n going backward through the time steps n = N − 1 → n = 0, we can solve Eq. ( 19) for 0 : which can be worked out as a projection: Finally, the gradient of the reduced functional with respect to the control is obtained by

Gradient-based non-linear optimisation
To find the solution to the inverse problem, the gradient fields from Firedrake and Dolfin-Adjoint can be redirected to an optimisation package with a Python interface (e.g.scipy.optimize;Virtanen et al., 2020).However, the majority of well-established optimisation packages are hard-coded to apply the Euclidean (l 2 ) inner product for optimisation-specific operations.l 2 , typically referred to as a sequence space, is often used in signal processing or discrete mathematics where functions are treated as sequences, i.e. discrete data, and do not represent continuous functions.The Euclidean inner product is therefore not suitable for finite-element function-based optimisation, and, unlike L 2 or Sobolev spaces, it cannot produce mesh-and/or basis-function-independent convergence (Schwedes et al., 2017).The Rapid Optimisation Library (ROL; The ROL Project Team, 2022), a Trilinos package for large-scale optimisation, resolves this issue by introducing a generic interface for data structures that can be overloaded to perform inner-product aware operations (e.g. in L 2 space) and achieve mesh-independent convergence results (Schwedes et al., 2016).ROL has been primarily used for the solution of optimal design, optimal control and inverse problems in large-scale engineering applications (Iglesias et al., 2018;Kouri et al., 2021aKouri et al., , b, 2023) ) and has a comprehensive catalogue of gradient-based optimisation algorithms.For the results presented herein, we use the Python interface for ROL in Firedrake, which we have supplemented it with additional checkpointing functionality.This allows us to checkpoint intermediate optimisation states and variables, including those related to step lengths or Hessian estimates, and subsequently restart the optimisation procedure without loss of performance.This, in particular, is relevant on modern high-performance computing facilities with strict wall time limits.
The distinguishing factor between different optimisation algorithms (in ROL or elsewhere) is the strategy used to move from one iterate to the next.Broadly speaking, there are two strategies for moving from the current iterate, x k , to a new one, x k+1 : line-search and trust-region strategies (Nocedal and Wright, 1999).A fundamental distinction between line-search and trust-region methods lies in the sequence of selecting the direction and distance (Nocedal and Wright, 2006).In line-search methods, the direction is initially fixed, and an appropriate step length is subsequently determined.
Conversely, trust-region methods commence by establishing an initial size for a trusted region (hence the name) and then simultaneously constraining the direction and step to achieve sufficient amount of improvement within this trusted region.The size of this trusted region around the current iterate is determined according to a model, m, that approximates the region around the current iterate, x k , according to a quadratic approximation.At each iteration, the accuracy of this model then is assessed based on its agreement with the actual changes in the function, f .If the new value, f (x k + p k ), is greater than the current value of f (x k ), m is not a good approximation of the objective functional around x k and the size of the trusted-region measured by the trustregion radius is therefore reduced to improve the applicability of the model.By bounding the calculations to a trusted region where model m is applicable, trust-region methods prohibit overly aggressive steps, which make them suitable for handling negative curvature situations (non-convexity optimisation problems) more gracefully compared to line-search methods.Nonetheless, faster convergence can be achieved by expanding the trust region in the case of a predictive model, ensuring robust minimisation of the objective functional.
In this study, we employ the trust-region method of Lin and Moré (1999) implemented in ROL.Lin-More is a truncated Newton method, consisting of repeated application of an iterative algorithm to approximately solve Newton's equations (Dembo and Steihaug, 1983).Lin-More can effectively handle provided bound constraints by ensuring that variables remain within their specified bounds: at each iteration, variables are classified into "active" and "inactive" sets.Variables at their bounds and not allowing descent are considered active and are fixed during the iteration.The remaining variables, which can change without violating the bounds, are inactive.The described properties render the algorithm as a robust and efficient method for solving bound-constrained optimisation problems.

Numerical experiments and results
Twin experiments serve as a means to illustrate the feasibility of geophysical inverse methods.In our experimental setup, we generate a synthetic reference simulation that advances forward in time, starting from a user-defined initial condition.We use this reference simulation to emulate a real-Earth reconstruction scenario, where the resulting temperature field at the final time, T (x, t = t F ), and the corresponding surface velocities at all times, u(x = x S , t), are stored for subsequent use as "observations" in reconstructing the initial state of the mantle and its evolution through space and time.These fields are used to mimic fundamental datasets in mantle reconstruction models, drawing parallels to 3-D models of mantle temperature inferred from seismic tomography images, as well as surface velocities derived from plate tectonic reconstructions.We examine two sets of 2-D twin experiments: (i) sim-ulations of a single upwelling in an enclosed square domain with an isoviscous rheology and (ii) convection within an annular domain, incorporating a non-linear visco-plastic rheology, as has previously been employed to generate plate-like behaviour in mantle convection models (see Coltice et al., 2017, for a review).
3.1 A single upwelling in an enclosed square domain

Forward problem
We start our experiments by reconstructing the evolution of a single upwelling plume within an enclosed square computational domain (free-slip boundary conditions on all boundaries).We assume an isoviscous rheology, incompressible flow under the Boussinesq approximation assumption and a Rayleigh number of Ra = 10 6 .The model is heated from below (T C = 1) and cooled from above (T S = 0).The initial condition is generated by a Gaussian anomaly of amplitude 0.1, centred at x 0 = (0.5, 0.2), superimposed on top of an average temperature profile generated by two error functions representing the top and bottom thermal boundary layers.Starting from this initial condition, the model is run for 80 time steps of δt = 4×10 −6 -the time required for the temperature anomaly to form a plume that reaches the domain's top boundary.
Listing 1 shows selected lines of a G-ADOPT script used to generate this reference synthetic experiment.The first step, illustrated on line 1, is to import the G-ADOPT module (which provides access to Firedrake and associated functionality).We next need a mesh, for which we use a built-in Firedrake meshing function.The computational domain is a unit square with 150 × 150 elements, loaded on lines 4-6.The function spaces, within which our solutions are defined, are specified as follows: 1.A vector function space, V, is specified for the velocity field (line 9), employing a Q 2 discretisation.
2. A scalar function space, W, is specified for pressure (line 10), utilising a Q 1 discretisation.
3. These are combined on line 11 to create the mixed function space, Z, for the Stokes (velocity and pressure) system.
4. A function space, Q, is specified for the temperature field (line 12), using a Q 2 discretisation.
We specify functions to hold our solutions on lines 14-18.
The temperature field, T, is initialised on line 20 using a symbolic expression for the coordinates from line 18.The initialisation includes a 1-D profile along the y axis (line 21) and a Gaussian anomaly, as specified above (line 22).This problem has a constant pressure nullspace, defined as Z_nullspace on line 24, which will subsequently be passed to the solver, and PETSc will seek a solution in the space orthogonal https://doi.org/10.5194/gmd-17-5057-2024Geosci.Model Dev., 17, 5057-5086, 2024 Listing 1. Selected lines from G-ADOPT code, demonstrating generation of our reference isoviscous simulation in an enclosed square domain.
to the provided nullspace.A checkpoint file is initiated on lines 26-27 to retain the necessary fields for the subsequent adjoint inversion.Important constants in this problem (Rayleigh Number, Ra, time-stepping parameters, delta_t and max_timesteps) are defined on lines 28-31, with the Boussinesq approximation specified on line 29 (later passed on to the Stokes and energy systems to determine which terms are to be assembled).Boundary conditions for velocity and temperature are specified on lines 34 and 35, respectively.The latter uses integer mesh markers to tag entities of meshes, with boundaries tagged as follows: tag 1 corresponds to the plane x = 0 (left), 2 to x = 1 (right), "bottom" to y = 0 and "top" to y = 1.We now solve the variational problem, with solver objects for the energy, energy_solver, and Stokes, stokes_solver, systems created on lines 37 and 38.For the energy system we pass in the solution field T , velocity u, the physical approximation, time step, temporal discretisation approach (i.e.implicit middle point) and boundary conditions.For the Stokes system, we pass in the solution fields z, temperature, the physical approximation, boundary conditions and the nullspace object.The solution of the two variational problems is undertaken by PETSc.The time loop is defined on lines 42-45, with the Stokes system solved on line 43, the energy equation on line 44, and velocities (for later use as time-dependent surface constraints in our adjoint inversions) checkpointed on line 45. Figure 2a to e show temporal snapshots of the reference forward simulation.We note that the final temperature field (Fig. 2e), subsequently utilised in the adjoint inversion, is also checkpointed on line 47.

Inverse problem
Only trivial changes are required to convert the forward problem outlined in Listing 1 into its corresponding adjoint, which are outlined in Listing 2. We first augment our imports with gadopt.inverse(line 2): this provides access to crucial Dolfin-Adjoint functionalities harnessed by Firedrake, enabling overloading differentiation and taping of finite-element operations.On line 4, we activate disc checkpointing of intermediate forward solutions, ensuring that these fields -otherwise retained in memory -are available as inputs for solving the adjoint equations.For this problem, we select the initial temperature as the control, symbolised as Tic on line 7, using a Q 1 function space defined on line 6.On line 8, we define the average temperature field for regularisation terms using the same Q 1 function space.Despite utilising Q 2 elements for the forward temperature computation, opting for a basis function with a lower polynomial degree for the control is advantageous, as it curtails computational expense during the internal optimisation algorithm's operations by reducing the number of degrees of freedom in the solution.Furthermore, using a lower polynomial degree reduces the complexity and regularisation requirements of the optimisation problem, helping to avoid over-fitting of the solution (e.g.Hastie et al., 2009).
On line 10, a checkpoint file is opened.The reference final temperature field is subsequently loaded on line 11, with our guess at the initial temperature condition specified on line 12 (noting that it corresponds to the terminal temperature field of the reference simulation).We specify the control for Dolfin-Adjoint on line 15, followed by projection of the initial temperature condition onto T on line 18.During execution of the time loop (lines 20-24), we solve Stokes and energy systems, after which we load velocities from the reference simulation on line 24, used when accumulating contributions to the surface velocity misfit, u_misfit, on line 24.After the time loop, we define several components of the objective functional.Specifically, we establish the damping term and its associated normalisation factor (lines 29-31), the smoothing term and associated normalisation (lines 32-33), two normalisation terms associated with final state temperature and surface velocity misfits (lines 34-35), and the misfit associated with the final temperature field (line 36).These terms are combined on lines 38-42 as our objective, using weights (α u , α d and α s ) specified on line 37, and are later utilised on line 44 to define the reduced functional.We note that values for α u , α d and α s are systematically tested herein.

Investigating the derivative
Our initial guess for T IC is set to the final "observed" temperature field T obs (i.e. the terminal temperature field of the forward model).This choice is grounded in the findings of Horbach et al. (2014) and our own tests, which demonstrate that the minimisation problem possesses a strong minimum, rendering it insensitive to the initial guess.The first optimisation iteration starts from this initial guess (Fig. 3a) and runs forward in time to arrive at the first modelled terminal temperature field T t=t F (Fig. 3b).Compared to the reference T obs (Fig. 3c), the model is further advanced in time, with the plume tail in the lower mantle narrower and hot buoyant anomalies spread throughout the upper part of the domain, generating a cold return flow and resulting in a thinning of the thermal boundary layers.
Previously, we noted that for our reconstruction simulations, the objective functional, denoted as J , encompasses four distinct terms: (i) final temperature field, (ii) surface velocity misfit, (iii) smoothing terms and (iv) damping terms.The relevance of the three latter terms is gauged by their respective weighting factors, namely, α u , α s and α d .When combining terms linearly in the objective functional, the superposition principle dictates that the gradient of the entire objective functional is essentially the cumulative sum of the gradients of its constituent terms in J .Consequently, it becomes instructive to visualise and validate the gradient of each individual term within the objective functional.
Figure 3d to g display the gradient fields corresponding to the four terms in the objective functional.The gradient for the final temperature misfit is presented in Fig. 3d: it implies that a better match to the final state temperature field is achievable through changes to the initial condition that include a major reduction in temperature in the domain's centre, complemented by an increase in temperature moving towards the domain's edges, particularly towards the domain's upper regions.In Fig. 3e, we illustrate the gradient of the cumulative surface velocity misfit with respect to T IC : it reveals sensitivities extending to the domain's base, indicating that to better align with the "observed" surface velocities, an optimal T IC https://doi.org/10.5194/gmd-17-5057-2024 Geosci.Model Dev., 17, 5057-5086, 2024 Listing 2. Changes compared to the forward script in Listing 1 required to define the reduced functional used in the adjoint problem.The three alpha parameters on line 36 denote the three weighting parameters for surface, damping and smoothing terms, respectively.
should be decreased in the middle but increased towards the domain's boundaries.For the smoothing term (Fig. 3f), the highest values emerge in areas with the most abrupt changes in T IC .Given our optimisation strategy works by moving towards corrections opposite to the gradient's direction, this subsequently results in iterative refinement and smoothing of the solution.For the damping term (Fig. 3g), the gradient aims to minimise fluctuations in the field in relation to the ambient temperature profile ( T ), adjacent to the top and bottom boundaries, implying that this can be achieved by reducing boundary layer temperatures in the centre but increasing boundary layer temperatures towards the domain's sides.

Verification of gradients: Taylor remainder convergence test
A fundamental tool used in verification of gradients is the Taylor remainder convergence test (Farrell et al., 2013b).For the reduced functional, J (T IC ), defined in Eq. ( 6) and its derivative, dJ dT IC , it can be proven that ( The expression on the left-hand side of Eq. ( 33) is termed the second-order Taylor remainder.This term's convergence rate of O(h 2 ) serves as a strong foundation for verifying any computational implementation meant for determining dJ dT IC (the adjoint code) with respect to a specific functional that computes J (T IC ) (the forward code).Given any arbitrary selection of h and δT IC , halving the value of h should decrease the magnitude of the second-order Taylor remainder by a factor of 4. Grounded in this theoretical prediction, we employ these so-called Taylor tests to confirm the accuracy of the determined gradients.
We conduct a second-order Taylor remainder test for each term of the objective functional.The results are displayed in Fig. 4, where the gradient fields are calculated for random perturbations of the initial temperature field T IC , with the amplitude of these perturbations successively halved (h/2, h/4, h/8, ...).Notably, all four Taylor remainder tests demonstrate a convergence rate of O(2.0), extending down to floatingpoint precision, defined as the smallest positive such that 1.0 + is distinguishable from 1.0.Our results demonstrate consistency with theoretical expectations, highlighting the robustness of our approach.

Efficiency
Another metric that can be used to assess the suitability of our framework for large-scale mantle convection optimisation problems is the efficiency of derivative calculations.Given the iterative nature of our inverse problem, where derivatives are computed frequently, any efficiency gain, or the lack thereof, can have profound implications for the overall computational cost and feasibility of our automated approach.The computational efficiency can be measured by  comparing the computational time of a derivative calculation to that of a forward calculation.Using this reference, a theoretical optimum is defined which measures the ratio of the time that is required to calculate one set of forward and adjoint calculations to one forward calculation.For the Stokes problem we have detailed, in which a linear rheology has been employed, this ratio is considered to be 2.0 (Naumann, 2011;Funke and Farrell, 2013).This is primarily due to the similarity of the forward and adjoint systems.For the simulations presented in this section, we achieve a ratio of 2.01, consistent with theoretical expectations, thus demonstrating the efficiency of our approach.

Optimisation
Executing an optimisation task with G-ADOPT is straightforward.Once the reduced functional is set up (see Listing 2), only a few additional lines of Python are required (see Listing 3).As we use a bounded method for our optimisation problem, we specify a set of upper and lower bounds for the algorithm on lines 3-4.Subsequently, a minimisation problem is outlined (line 6) using both the reduced functional and the designated bounds.This minimisation problem, together with the associated parameters for optimisation, are passed to the Lin-More optimisation algorithm in ROL (line 8), which is executed on line 9.Using this framework, we perform a suite of 81 different inverse simulations that aim to find the most optimal combination of the three weightings (α u , α d and α s ) that results in the best solution for T IC when compared to the reference initial temperature field.The simulations are obtained by sweeping through values in ranges of [10 −1 , 10 −3 ], [10 −2 , 10 −4 ] and [10 1 , 10 −7 ] for α d , α s and α u , respectively.Figure 5 provides an overview of the outcomes from a subset (16 out of 81) of these optimisation exercises.Minimisation of the objective functional is shown in Fig. 5a, alongside two additional metrics: (i) the misfit between the reconstructed final temperature field and T obs , termed the final misfit (Fig. 5b), and (ii) the misfit between the reconstructed initial condition, T IC , and the reference initial condition, highlighting the quality of the reconstructed initial condition (Fig. 5c).The reduction in the metrics in all cases is reported versus the cost, which is the sum of the number of forward and adjoint calculations.A consistent pattern is observed across all reconstruction simulations for these three metrics.Firstly, the solutions are unique, as all converge to a consistent initial condition following roughly 100 iterations (a cost of 200 forward and adjoint calculations).However, the trajectory to this solution varies based on the smoothing weight, denoted by α s .A significant portion of the simulations with α s = 10 −1 exhibit sub-par performance in the initial stages due to over-smoothing, with most of the best-performing simulations utilising α s = 10 −2 .Nevertheless, despite differences in convergence rates, all simulations eventually converge to similar misfits in all three metrics.Figure 6 showcases multiple iterations from the bestperforming simulation using α u = 10 −2 , α s = 10 −3 and α d = 10 −2 .As mentioned previously, the reference final condition (Fig. 6b) is used as our starting guess for the inverse simulation (Fig. 6c), which yields substantial differences in the modelled final state (Fig. 6k), as illustrated through the squared difference between reconstructed and reference temperature (Fig. 6o).Leveraging derivative information, the inverse simulation corrects the initial condition through an iterative approach.The initial conditions obtained after 20, 50 and 100 iterations are shown in Fig. 6d, e and f, respectively.These solutions reveal that most corrections occur during the initial iterations, with significant improvement in the domain's upper part by iteration 20, albeit with some remaining noise in the solution (see misfit in Fig. 6h).By iteration 100, the solution (visible in Fig. 6f and n) closely mirrors the reference initial condition, represented by misfit values that have diminished by 3 orders of magnitude (Fig. 5b and c).
Figure 7 compares the evolution of reference forward (ae) and reconstructed simulations (f-j), with differences highlighted (k-o).Furthermore, surface normal stresses are displayed as indicators of surface dynamic topography.For the reconstructed simulation (Fig. 7f), the initial condition is well captured as a Gaussian anomaly in the domain's lower section, superimposed on a depth-dependent field, similar to the reference initial condition.This initial temperature anomaly ascends, culminating at the surface after 80 time steps.The precision of the reconstruction is evident from the misfit panels, with negligible differences between reference and reconstructed simulations throughout the model's evolution (less than 0.01).This is further substantiated by a reduction, by over 4 orders of magnitude, in the objective functional for both initial and end states (Fig. 5b and c).The accuracy of reconstructed surface normal stresses is evidenced by negligible discrepancies in the associated misfit visualisation.

Convection in an annulus, incorporating a visco-plastic rheology
We next analyse a set of reconstructions which utilise a reference twin that more closely mimics Earth's geometrical and rheological characteristics.We use a 2-D annular domain, generated by extruding 128 times radially from a circular manifold consisting of 512 cells.Inner and outer radii are set to 1.22 and 2.22, respectively, ensuring unit depth and maintaining a comparable ratio between surface and cosmic microwave background (CMB) radii as Earth's mantle.We set Ra = 10 7 and adopt a composite visco-plastic rheology, with effective viscosity determined via a harmonic mean, represented as Parameters specified in Eq. ( 34) are listed in Table 2.As demonstrated by Davies et al. (2022), the changes necessary to transform our forward model from a square to an annular domain and from an isoviscous to a visco-plastic rheology are only minor, noting that Firedrake has already been validated for simulations of this nature (see Davies et al., 2022, and repository accompanying this paper for a comprehensive script).
To determine the reference simulation's initial condition, we commence from a starting state with small-scale spherical harmonic perturbations and advance forward in time until a quasi-steady state is achieved (i.e. when basal and sur-face heat fluxes are roughly in balance).The resulting state (Fig. 8.1.A and 2.A), which contains downwellings that descend from the upper thermal boundary layer and upwelling plumes that rise from the lower thermal boundary layer, forms our reference initial condition.We subsequently advance forward for a duration of t = 200×(5×10 −6 ) to reach the reference final state shown in Fig. 8.1.G.While the nondimensional time of 10 −3 translates into 256 Myr of convection, the domain velocities suggest an equivalent Earth-like time of ≈ 100 Myr (due to a marginally reduced Ra relative to estimates for Earth's mantle).The sequence presented in Fig. 8.1.A-G and 2.A-G trace the temporal development of reference temperature and viscosity fields, respectively.The viscosity field spans approximately 3 orders of magnitude, extending from low asthenospheric values of ≈ 1.0, rising to 140 within the cooler segments of the lower mantle, and decreasing to 0.4 in locations of high strain rates at surface convergent boundaries.https://doi.org/10.5194/gmd-17-5057-2024 Geosci.Model Dev., 17, 5057-5086, 2024

Verification of gradients
As with the previous Cartesian case examined, reconstruction simulations are conducted with an objective functional encompassing the misfit component related to the terminal temperature field, accumulative surface velocity misfits and regularisation terms.For consistency with the previous case, we first confirm the accuracy of the calculated gradient fields corresponding to each component in the objective functional by performing second-order Taylor remainder convergence tests.To further analyse the robustness of our results against solver tolerances, specifically concerning the final temper-   ature field and surface velocities, we conduct a set of two additional convergence tests.The first set utilises a Newton (SNES) relative solver tolerance of 10 −10 and is depicted using solid colours in Fig. 9, while the second set, using a relative tolerance of 10 −5 , is represented by semi-transparent colours.Our findings indicate that with tighter tolerances, results adhere to an O −2 trend consistent with theoretical predictions, confirming accuracy down to the smallest floatingpoint representation.Conversely, when the tolerance is relaxed to 10 −5 , this behaviour holds only up to perturbations of similar magnitude (h ≈ 10 −5 ), demonstrating a divergence from the expected O(h 2 ) trend for smaller perturbations.

Efficiency
In the experiments detailed in this section, an efficiency of 1.45 is achieved.This exceeds the previously outlined theoretical efficiency of 2.0 for the isoviscous Stokes model in Sect.3.1.5due to a major difference in the forward and adjoint momentum equations: while the forward momentum equation employs a non-linear visco-plastic rheology and requires multiple linear Newton solves per time step, the adjoint momentum remains linear.The forward model is therefore more computationally demanding, explaining the improved efficiency ratio.

Optimisation
As highlighted with our previous example, absolute and relative variations in weighting of different objective functional components can generate solutions with distinct properties, some of which provide an improved match to the reference simulation.Accordingly, it is vital to assess the consequences of these distinct weight combinations for the case considered here.To address this, we have undertaken 21 simulations, adjusting the parameters α u , α s and α d within the intervals [0.05, 0.1], [0.01, 0.1] and [0.01, 0.1], respectively, with values motivated by the results of our previous set of simulations.The collective convergence of all 21 simulations is illustrated in Fig. 10.
Our objective functional (Fig. 10a) has initial values that range between ∼ 1×10 −1 and 2×10 −1 .We consistently ob-serve a reduction of an order of magnitude or more in this measure.Notably, the steepest decline is seen within the initial 50 iterations (cost ∼ 100).The simulations denoted as X and XI exhibit the largest reduction, with a consistent reduction trajectory even when approaching iteration 200 (cost ∼ 400).
The final temperature misfit (Fig. 10b) exhibits different trends to the objective functional.In the initial iterations, the largest reductions in final misfit are observed for simulations I and II, with simulation XII trailing behind.Despite displaying a lower misfit reduction up until iterations 140 and 180 (cost ∼ 280-360), simulations X and XI eventually display a similar misfit by iteration 200 (cost ∼ 400).When we turn to reduction in the initial misfit (Fig. 10c), an entirely different trend comes to light: simulations I and II sustain their reduction until around iteration 150 (cost ∼ 300), after which they plateau.Conversely, many other simulations plateau at misfit values that are, on average, twice as high.
In our analysis, the reconstruction quality is predominantly governed by three key weighting parameters: surface velocity misfit (α u ), smoothing (α s ) and damping (α d ).These parameters calibrate the significance of different objective terms in relation to the final temperature misfit term.Thus, the primary metric to assess a reconstruction is the simultaneous reduction in misfits for both initial and final states.Incorporating the misfit associated with surface velocities enhances the quality of the reconstructions, noting that higher weightings of the surface velocities require higher values of smoothing.Among the weighting parameters, α s is shown to have the highest effect in convergence outcomes: higher values for α s lead to over-regularisation, thereby limiting the role of sensitivity information tied to misfit terms in the solution.α d offers a more varied range of effective values, which aligns with its role confined to thermal boundary layers and, consequently, its lesser impact on the overall numerical domain.Therefore, Case I, characterised by (α u , α s , α d ) = (1×10 −1 , 1×10 −1 , 1×10 −2 ), emerges as the optimal set of weighting parameters for our reconstructions.
In contrast to the reduction of 3 orders of magnitude in the misfit functions from the isoviscous experiment, our nonlinear experiment exhibits a modest reduction of O(1).The key factor contributing to this is the extended total simulation time for the non-linear experiment.Representing a far longer reconstruction period, this implies more information loss during the inversion process.Additionally, while the isoviscous experiment focused on a single temperature anomaly, this simulation tackles whole-mantle convection, with numerous and occasionally complex and highly time-dependent anomalies, reflecting the more intricate visco-plastic rheology.Nonetheless, this reduction of an order of magnitude in misfit translates into a satisfactory reconstruction of the initial condition, demonstrating the efficacy and robustness of the numerical approaches employed.This is confirmed by visual inspection of the best reconstruction model, with temperature, viscosity and surface nor-  mal stresses presented and compared to the reference case in Fig. 11 (marked case I in Fig. 10, using values of α u = 10 −1 , α s = 10 −1 and α d = 10 −2 ).At t = 0, the reconstructed temperature field exhibits upwelling and downwelling features that are reconstructed in the correct locations, although temperature anomalies are generally smoother than those of the reference case (Fig. 11.1.A).The corresponding viscosity field mirrors this smoothness despite capturing weaker convergence zones at the surface.Despite these variations, the spatial misfit is generally below 10 −2 (Fig. 11.5.A) with errors over 0.05 restricted to sharper features that are inevitably smoothed in the reconstruction process.This smoothness is also reflected in recovered surface normal stresses, with highs and lows correctly positioned, albeit at longer wavelengths than the reference case.
Given the application of a free-slip boundary condition at the surface of this simulation, a prominent outcome of this set of experiments is the emergence of sharp subducting slabs and weak zones at the top boundary as the simulation evolves (Fig. reference case.This enhancement progresses up to the final time step, where the reconstructed thermal field and surface normal stresses are indistinguishable from those in the reference simulation, reflected via an order of magnitude reduction in the spatial misfit field.

Discussion
Robust reconstructions of the spatial and temporal evolution of Earth's mantle and its diverse surface expressions are critical to scientific progress across the geosciences.It requires the construction of a digital twin: a vital instrument for analysing and revealing the complex interplay between the mantle and Earth's other systems.To this end, the adjoint method provides the necessary means for obtaining and analysing model sensitivities with respect to earlier mantle states.A burgeoning number of studies exploiting this methodology for reconstructions of mantle convec-tion have emerged in recent years (e.g.Bunge et al., 2003;Ismail-Zadeh et al., 2004;Liu et al., 2010;Spasojevic et al., 2009;Li et al., 2017;Price and Davies, 2018;Ghelichkhan et al., 2021).Nevertheless, the derivation, implementation and validation of adjoint systems for coupled, non-linear, time-dependent systems remain notoriously difficult.It is due to these difficulties that existing applications of the geodynamic adjoint method often include major simplifications, either incorporating an oversimplified treatment of mantle rheology (e.g.Colli et al., 2018;Ghelichkhan et al., 2021), neglecting certain (coupling) terms in the adjoint equations (e.g.Ismail-Zadeh et al., 2004) or both (e.g.Liu and Gurnis, 2010): they are therefore likely limited in their applicability to realistic Earth scenarios.In this study, we leverage the latest advances in scientific computing to overcome these limitations and develop G-ADOPT, an open-source numerical framework for geoscientific adjoint reconstructions, developed in full compliance with FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al., 2016).
G-ADOPT is underpinned by three primary software elements.The first is Firedrake (Ham et al., 2023), an automated system for solving partial differential equations using the finite-element method.In our previous work (Davies et al., 2022) we examined the applicability of Firedrake to geodynamical simulations, confirming its accuracy, efficiency, extensibility and parallel scalability through comprehensive benchmarks and state-of-the-art mantle convection simulations.The second element is Dolfin-Adjoint (Farrell et al., 2013a;Mitusch et al., 2019), a system that automatically generates the discrete adjoint from forward models designed in Firedrake.Dolfin-Adjoint elevates the conventional abstraction of automatic differentiation from individual floatingpoint operations to complete systems of differential equations, leveraging the high-level mathematical abstraction of finite-element problems and their symbolic representation in UFL (Alnaes et al., 2014).The adjoint systems derived by Dolfin-Adjoint are UFL expressions and valid Firedrake inputs.Therefore, they inherit the parallel support native to the forward model, which results in optimal computational efficiency.The third element is the Rapid Optimisation Library, ROL, a Trilinos package for large-scale optimisation problems (The ROL Project Team, 2022), enhanced herein with intra-optimisation checkpointing functionality.
We have demonstrated the applicability of G-ADOPT for time-dependent geodynamic reconstructions herein.The objective functional utilised in our reconstructions is composed of two distinct misfit components.The first is a term that quantifies the misfit corresponding to the observed final state temperature field, analogous to the present-day temperature field within Earth's mantle as obtained through a combination of mantle mineralogical models (e.g.Stixrude and Lithgow-Bertelloni, 2011;Chust et al., 2017) and seismic imaging (e.g.Rawlinson et al., 2010;French and Romanowicz, 2014;Simmons et al., 2015;Bozdag et al., 2016;Koelemeijer et al., 2016;Fichtner et al., 2018).The second term corresponds to observed surface velocities, accessible through plate tectonic reconstruction models (e.g.Müller et al., 2019).Additionally, smoothing and damping terms have been incorporated to enforce regularity in our solutions.
Our study analysed two sets of reconstructions of systematically increasing complexity.We first examined the evolution of a single ascending hot anomaly in an enclosed isoviscous square domain.By taking advantage of the simplicity of the geometry and rheological properties, we were able to deliver an in-depth examination of the gradients for each term, including a parameter-space search to ascertain optimal weighting parameters.Our results reveal a general convergence of the solutions, notwithstanding substantial variations in convergence rates subject to the weightings.Additionally, although not detailed in this paper, we have explored a number of different optimisation methods and parameters.Through this comprehensive analysis, we are confident that the problem possesses a stable solution that can be found through an appropriate combination of weighting parame-ters.The second set of reconstruction experiments explored convection with a stress-, depth-and temperature-dependent rheology at the convective vigour of Earth's mantle, demonstrating the feasibility of reconstruction studies for Earth's mantle with a non-linear rheology.The weightings selected for this series of experiments were broadly consistent with the first set.Given the success at reproducing surface velocities and normal stresses, our findings suggest that reconstruction models of Earth's mantle can serve as a powerful means for probing changes in the landscape at Earth's surface induced by mantle dynamics (e.g.Friedrich et al., 2018;Hoggard et al., 2021;Davies et al., 2023).
In both experimental sets, we assessed the numerical efficiency of our framework by evaluating the cost ratio between forward and adjoint calculations.In the first set, our results produced a ratio of 2.01, aligning with the theoretical efficiency of 2.0 (e.g.Naumann, 2011).In the second set, where we solved the non-linear forward equations, we observed a ratio of 1.45.This efficiency is attributed to the linearised nature of the adjoint method: even when applying non-linear rheologies in the forward equations, the adjoint equations remain linear.We also conducted second-order Taylor remainder convergence tests for each of the objective functional terms to validate the adjoint calculations.We note our convergence is accurate down to floating-point precision, consistent with results presented by Coltice et al. (2023).Our assessments demonstrate the accuracy of the derivative calculations (Figs. 4 and 9).These Taylor remainder convergence tests provide a robust basis for future validations of geodynamic adjoint frameworks.
Our experiments incorporate two significant simplifications relative to realistic-Earth scenarios, which were necessary to facilitate the number of reconstruction simulations analysed: (i) the use of a 2-D computational domain and (ii) the application of the Boussinesq approximation instead of more pertinent approximations such as anelastic-liquid approximations (e.g.Jarvis and McKenzie, 1980).Nevertheless, the composable nature of G-ADOPT should alleviate any concerns regarding the extensibility of our framework to these more realistic problem sets.Our prior work in Davies et al. (2022) demonstrates the flexible nature of our approach: for example, transitioning our 2-D annulus simulations to a 3-D spherical shell domain can be achieved via changes to only a few lines of Python.The application of G-ADOPT for reconstructing Earth's mantle evolution using non-linear rheologies and compressibility will be the subject of future investigations, although we note that the forward modelling approach has already been developed (Davies et al., 2022).Moreover, our framework is extendable to various other problems in geodynamics.These include utilising principal stress directions (e.g.Reuber et al., 2020), surface plate velocities (e.g.Ratnaswamy et al., 2015;Bocher et al., 2018) and/or residual depth measurements (e.g.Panasyuk and Hager, 2000;Spasojevic et al., 2009) to explore the mantle's rheological properties and to study the https://doi.org/10.5194/gmd-17-5057-2024 Geosci.Model Dev., 17, 5057-5086, 2024 visco-elastic adjustment of Earth's surface in response to the melting of Earth's polar ice sheets (Al-Attar and Tromp, 2014;Martinec et al., 2015, e.g.) and post-seismic deformation following significant subduction earthquakes (e.g.Sabadini and Vermeersen, 1997).
Reconstructing past mantle states is fraught with substantial theoretical and practical challenges.In this study, we targeted some of these theoretical and practical hurdles by introducing G-ADOPT.Nevertheless, significant obstacles exist that are beyond the scope of this work.We predicated our work on zero uncertainty in our reference fields (i.e. the present-day temperature field and past surface velocities), thereby committing what is known as the "inverse crime" (Colton et al., 1998), a term used to describe the situation when the code employed in the inversions is also utilised to generate reference simulations.Estimation of the presentday mantle state from seismic imaging and the assumptions regarding the thermal and compositional interpretation of seismic heterogeneity are both fraught with considerable uncertainty (e.g.Styles et al., 2011;Mosca et al., 2012;Zaroli et al., 2013;Davies et al., 2015).Furthermore, plate tectonic reconstructions can be uncertain, particularly further back in time and within the Pacific region (e.g. et al., 2012;Williams et al., 2015;Tetley et al., 2019).However, the existence of seafloor spreading isochrons up to approximately 125 Ma for all major plates provides confidence in modelling relative plate movements in more recent geological periods (Seton et al., 2020).An uncertainty impact study carried out by Colli et al. (2020) posits that the presence of uncertainties causes reconstructed and reference flow histories to diverge exponentially back in time, with unrealistic structures materialising within and adjacent to thermal boundary layers.To minimise these impacts, Colli et al. (2020) advocate for terminating the optimisation after a few iterations.Here, however, the inclusion of regularisation terms in the objective functional mitigates these impacts, effectively constraining the reconstruction to a smoother solution.This becomes particularly advantageous in real-Earth applications where observational constraints become sparser and more uncertain back in time.Without a smoothing term, the solution to the initial condition can contain high-frequency noise, which would diffuse over the course of the simulation.Smoothing therefore drives the solution towards a longer-wavelength initial state whilst maintaining sensitivity to shorter wavelength information recorded in seismic tomography images.
Moreover, by formally introducing past surface velocities into the objective functional, we infuse sensitivity information that propagates further back in time, refining the flow trajectory to improve the accuracy of reconstructions in the upper thermal boundary layer region.This sets our approach apart from the method used in previous adjoint reconstruction simulations (e.g.Vynnytska and Bunge, 2015;Zhou and Liu, 2017;Ghelichkhan et al., 2021), where the sequentialin-time nature of plate velocity assimilation can improve flow trajectories only forward in time.Despite this, it is crucial to recognise that observed surface velocities are the result of a complex force balance, including contributions from within Earth's mantle and at Earth's surface, including those at plate boundaries, such as the genesis and destruction of orogenies, which have been associated with rapid changes in plate velocities (Colli et al., 2014;Iaffaldano and Bunge, 2015).These forces are not captured in current state-of-theart spherical mantle convection models.Although our inversions utilise absolute plate velocities -thereby accounting for the majority of the forces driving surface movements -it is essential to emphasise that constraints on Earth's present-day thermal field must be prioritised as the primary data source for such inversions.
Our study lays the foundations for exploring several unresolved geodynamical questions.Previous research has shown that prescribing plate tectonic reconstruction velocities as a top boundary condition improves the precision of mantle reconstruction models and diminishes noise (e.g.Colli et al., 2015;Taiwo et al., 2023).Our framework formally incorporates these constraints through misfit terms, and future studies should compare this with other methods to find the most efficient way to integrate these valuable data.Earlier research advocates solving the reduced adjoint system, effectively considering velocities as insensitive to initial conditions (e.g.Ismail-Zadeh et al., 2004;Liu et al., 2008).The second-order Taylor remainder convergence tests, examined herein, provides a robust foundation for evaluating the accuracy of such simplifications.Furthermore, our framework sets the stage for including hitherto unused observations within our inversions, such as geochemical constraints on mantle temperature and pressure (Ball et al., 2021).This stems from the design principle of composable abstractions in the software packages used in G-ADOPT, ensuring all components' modularity, interoperability, reusability, scalability and maintainability.Specifically, Firedrake emphasises a clear separation between using the finite-element method and implementing it.Dolfin-Adjoint automates the derivation and computation of the adjoint systems using high-level symbolic language, ensuring the same advanced strategies that are applied for the forward calculation are utilised in the adjoints.Finally, ROL offers large-scale optimisation algorithms that seamlessly integrate with Firedrake and Dolfin-Adjoint.Their integration through G-ADOPT is a ground-breaking development that opens up adjoint problems to a new class of user and developer.

Conclusions
Transitioning from idealised forward models to data-driven simulations necessitates an inverse approach that rigorously incorporates observational constraints and their uncertainties.The adjoint method has emerged as a key technique for optimising unknown model parameters against observational data (e.g.Bunge et al., 2003).In this context, we intro-duce the Geoscientific ADjoint Optimisation PlaTform (G-ADOPT), which capitalises on cutting-edge developments in computational sciences, particularly through innovations in Firedrake (e.g.Rathgeber et al., 2016;Davies et al., 2022), Dolfin-Adjoint (Farrell et al., 2013a;Mitusch et al., 2019) and the Rapid Optimisation Library (ROL) (The ROL Project Team, 2022).
Our investigation, validated by two distinct sets of twin experiments, demonstrates G-ADOPT's efficacy in deducing the initial conditions for mantle flow and tracing its progression through space and time.These synthetic experiments leverage misfit terms to integrate both contemporary constraints across the computational domain and temporal constraints at Earth's surface.Additionally, we explored regularisation techniques to modulate the amplitude and complexity within the optimal solution.Notwithstanding the simplifications made (e.g.neglecting compressibility and 3-D geometry), G-ADOPT's architecture allows for straightforward adaptation to more complex scenarios with minimal modifications to the Firedrake forward model (as evidenced by Davies et al., 2022), alongside automated adjoint derivation and computation via Dolfin-Adjoint.The employment of the second-order Taylor remainder convergence test further corroborates our methodology, establishing a precedent for the advancement of geodynamic adjoint frameworks and facilitating prompt validation of adjoint models for more complex analyses.
Historically, geoscientific modelling frameworks were tailored to specific equations, which limited their application across disciplines.The current progress in adjoint-based techniques, crucial for data assimilation, sensitivity analysis and optimisation, has markedly benefited meteorology and oceanography, though similar progress in other fields has been hindered by derivation and implementation challenges.G-ADOPT's modular design seeks to bridge these divides.Its components are designed for ease of assembly and reusability, promoting a culture of modular, interoperable, scalable and maintainable methods and frameworks.This philosophy ensures that G-ADOPT can be readily adapted to a broad spectrum of geoscientific research areas.
Code and data availability.For the specific components of G-ADOPT, including the full scripts of the simulations used in this paper, see https://doi.org/10.5281/zenodo.10050733(Gibson et al., 2023).For the specific components of the Firedrake project used in this paper, see https://doi.org/10.5281/zenodo.10047031(firedrakezenodo, 2023).
Author contributions.SG and DRD conceived this study, with all the authors having significant input on the design, development and validation of the examples and cases presented.All authors contributed towards writing the manuscript.
Competing interests.At least one of the (co-)authors is a member of the editorial board of Geoscientific Model Development.The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper.While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Figure 2 .
Figure 2. Reference forward simulation: the initial condition is generated by superimposing a Gaussian anomaly of maximum amplitude 0.1 centred at x 0 = (0.5, 0.2) on top of an adiabatic profile.The simulation runs forward in time from this initial condition, until the plume-like feature approaches the top boundary after 80 time steps.Also visualised at the top of each snapshot are the normal stresses acting on the top boundary, which are proportional to dynamic topography.Boxed in dashed red is the final state of the simulation that is subsequently used as T obs in our synthetic adjoint simulation.

Figure 3 .
Figure 3. First forward run and associated gradients: (a) is the first guess for T IC .For all experiments in this study we choose the "observed" temperature field (i.e. the final state) as the initial guess for our optimisations.Panel (b) is the final temperature field after integrating forward in time from (a) for 80 time steps.Panel (c) is the reference final temperature field, T obs , which is used in the definition of the misfit functional.Panels (d)-(g) illustrate the gradient fields.Panel (e) is gradient of the final temperature misfit.Panel (e) is the gradient of the total surface velocity misfit.Panels (f) and (g) are for the regularising smoothing and damping terms, respectively.

Figure 4 .
Figure4.Second-order Taylor remainder test: for each gradient field, the test is performed by computing the functional and the associated gradient when randomly perturbing the initial temperature field T IC and subsequently dividing the perturbations by a factor of 2 at each level.The dashed line is the theoretical convergence rate of O(2.0).Residuals show accuracy of gradient information down to floating-point precision (defined as the smallest positive , such that 1.0 + is distinguishable from 1.0.)

Listing 3 .
Necessary changes to solve the minimisation problem.T_lb and T_ub are the lower and upper bounds for the minimisation problem, respectively.

Figure 5 .
Figure 5.An overview of optimisation results across all experiments: (a) illustrates the minimisation of the objective functional.Panel (b) depicts the final misfit, representing the difference between the reconstructed final temperature field and T obs .Panel (c) highlights the initial misfit, characterising the discrepancy between the reconstructed initial condition T IC and the reference initial condition.The bottom horizontal axis in all figures reports the cost, here defined as the sum of forward and derivative calculations.The top axis shows the approximate equivalent number of iterations, which simply is the cost divided by 2. Across all metrics, simulations exhibit a consistent pattern: they converge to the same solution approximately after 100 iterations, though the path to this solution diverges depending on the smoothing weight, α s .

Figure 6 .
Figure 6.Iterative optimisation process visualised: (a) and (b) depict the reference initial and final conditions.Panels (c)-(f) present the reconstructed initial conditions at the 0th, 20th, 50th and 100th iterations.Panels (g)-(j) highlight the misfits, representing the squared differences between the reconstructed initial temperature fields and the reference temperature.Similarly, (k)-(n) display the reconstructed final temperature fields after the 0th, 20th, 50th and 100th iterations, with their respective misfits demonstrated in (o)-(r).

Figure 7 .
Figure 7.Comparison of reference forward and reconstructed simulations over time: (a)-(e) present the evolution in the reference forward, while (f)-(j) depict the evolution in the reconstructed simulations.The misfits between the reference and reconstructed scenarios at each time step are illustrated in (k)-(o).Note that surface dynamic topography is represented by visualising the normal stresses at the top boundary.

Figure 8 .
Figure 8. Reference forward simulation spanning a duration of t = 200 × (5 × 10 −6 ), as depicted in (1.A) to (1.G).Panels (1) (upper row) detail the temporal evolution of the reference temperature field, while Panels (2) (lower row) show the viscosity field at each time.The viscosity demonstrates a variation of nearly 3 orders of magnitude: approximately 1.0 in the asthenosphere, 140 within colder lower-mantle slabs and 0.4 in convergent regions exhibiting elevated strain rates.

Figure 9 .
Figure 9. Second-order Taylor remainder test for convection with a temperature-, depth-and stress-dependent rheology.The semi-transparent markers are cases where the relative solver tolerance of Newton solves (SNES) is set to 10 −5 .With such tolerances, perturbations beyond solver tolerance exhibit divergence from the O(h 2 ) trend.When setting this tolerance to 10 −10 (larger solid markers), residuals show accuracy of gradient information down to floating-point precision.

Figure 10 .
Figure 10.Summary of optimisation outcomes from 2-D annulus experiments.Panel (a) visualises the process of objective functional minimisation.Panel (b) illustrates the final misfit, representing the misfit between the reconstructed final temperature field and T obs .Panel (c) depicts the initial misfit, indicating the difference between the reconstructed initial condition T IC and the reference initial condition.Notably, despite significant reduction in the objective functional and final misfit in simulations XI and XII, these simulations do not perform as well in terms of the initial misfit, which is a key measure in our experiments.

Figure 11 .
Figure 11.Comparing the reference (1, 2) and the best reconstruction simulation (3, 4).Temperature and viscosity fields are shown in panels (1, 3) and (2, 4), respectively.The misfit, which is the squared difference between the reconstructed and reference temperatures, is shown in panel (5).To highlight the effectiveness of the reconstruction of the evolution of surface dynamic topography, a field representing the normal stresses acting on the top boundary is visualised alongside the temperature fields in (1) and (3).The reconstruction simulation employs values of α u = 10 −1 , α s = 10 −1 and α d = 10 −2 and is marked with I in Fig. 10.

Table 1 .
Symbols used in this study.

Table 2 .
Parameters employed for 2-D annulus twin experiments.