Evaluation of iterative Kalman smoother schemes for multi-decadal past climate analysis with comprehensive Earth system models

Paleoclimate reconstruction based on assimilation of proxy observations requires specification of the control variables and their background statistics. As opposed to numerical weather prediction (NWP), which is mostly an initial condition problem, the main source of error growth in deterministic Earth system models (ESMs) regarding the model low-frequency response comes from errors in other inputs: parameters for the small-scale physics, as well as forcing and boundary conditions. Also, comprehensive ESMs are non-linear and only a few ensemble members can be run in current high-performance computers. Under these conditions we evaluate two assimilation schemes, which (a) count on iterations to deal with non-linearity and (b) are based on low-dimensional control vectors to reduce the computational need. The practical implementation would assume that the ESM has been previously globally tuned with current observations and that for a given situation there is previous knowledge of the most sensitive inputs (given corresponding uncertainties), which should be selected as control variables. The low dimension of the control vector allows for using fullrank covariances and resorting to finite-difference sensitivities (FDSs). The schemes are then an FDS implementation of the iterative Kalman smoother (FDS-IKS, a Gauss–Newton scheme) and a so-called FDS-multistep Kalman smoother (FDS-MKS, based on repeated assimilation of the observations). We describe the schemes and evaluate the analysis step for a data assimilation window in two numerical experiments: (a) a simple 1-D energy balance model (Ebm1D; which has an adjoint code) with present-day surface air temperature from the NCEP/NCAR reanalysis data as a target and (b) a multi-decadal synthetic case with the Community Earth System Model (CESM v1.2, with no adjoint). In the Ebm1D experiment, the FDS-IKS converges to the same parameters and cost function values as a 4D-Var scheme. For similar iterations to the FDS-IKS, the FDS-MKS results in slightly higher cost function values, which are still substantially lower than those of an ensemble transform Kalman filter (ETKF). In the CESM experiment, we include an ETKF with Gaussian anamorphosis (ETKF-GA) implementation as a potential non-linear assimilation alternative. For three iterations, both FDS schemes obtain cost functions values that are close between them and (with about half the computational cost) lower than those of the ETKF and ETKF-GA (with similar cost function values). Overall, the FDS-IKS seems more adequate for the problem, with the FDS-MKS potentially more useful to damp increments in early iterations of the FDS-IKS.


Introduction
Earth system models (ESMs) to simulate the Earth system and global climate are usually developed using the present and recent historical climates as references, but climate projections indicate that future climates will lie outside these conditions. Paleoclimates very different from these reference states therefore provide a way to assess whether the ESM sensitivity to forcings is compatible with the evidence given by paleoclimatic records (Kageyama et al., 2018). Coupled atmosphere-ocean 5 general circulation models (AOGCMs) and comprehensive ESMs have enabled the paleoclimate community to gain insights into internally generated and externally forced variability, and to investigate climate dynamics, modes of variability (e.g.; Ortega et al., 2015;Zanchettin et al., 2015), and regional processes in detail (e.g.; PAGES 2k-PMIP3 group, 2015). However, AOGCMs and comprehensive ESMs demand high computational resources, which severily limits the length and number of affordable model integrations in current High Performance Computers (HPCs). Thus, the last millenium ensemble with the Intercomparison Project (CMIP, since CMIP6) relies on coherent modelling protocols followed by the paleoclimate modelling teams in independent HPCs (e.g.; Jungclaus et al., 2017). On the other hand, the gathering and analysis of existing and new 15 paleoclimate proxy records to create multiproxy databases also relies on collective efforts focused on specific time spans as, for example, the global multiproxy database for temperature reconstructions of the Common Era (PAGES2K) (PAGES2k Consortium, 2017), or the multiproxy approach for the reconstruction of the glacial ocean surface (MARGO) (MARGO Project Members, 2009), which focuses on the Last Glacial Maximum (LGM), a period between 23 and 19 thousand years before present (BP). The quantitative fusion of comprehensive ESMs and paleoclimate proxy observations should provide 20 deeper insight into past climate low-frequency variability, to which (here and throughout the article) we refer to variability on times scales of 30-50 years or longer (e.g.; as Christiansen and Ljungqvist, 2017). However, this fusion is hampered by the high computational demand of AOGCMs and comprehensive ESMs.
The issue of fusing data into models arises in scientific areas that enjoy a profusion of data and use costly models. In the geophysical community this is referred to as inverse methods and data assimilation (DA), whose aim is finding the best estimate 25 of the state (the analysis), by combining information from the observations and from the numerical and theoretical knowledge of the underlying governing dynamical laws. Most known DA methods stem from Bayes' theorem (Lorenc, 1986), and each is made practical by making approximations (Bannister, 2017). In numerical weather prediction (NWP) the assimilation is mostly an initial condition problem. In contrast, the climate of a sufficiently long trajectory is typically much less sensitive to initial conditions, being essentially a sample of the underlying true model climate contaminated by a small amount of 30 deterministic noise due to the finite integration interval (Annan et al., 2005b). The low-frequency errors in deterministic ESMs are so mostly dependent on model errors, including the parameters for the small-scale physics, and errors in forcings and boundary conditions. DA has been used as a technique for low-frequency past climate field reconstruction (CFR), with real case studies as, e.g., assimilation of marine sediment proxies of sea surface temperature (SST) in a regional ocean model of North Atlantic at the termination of the Younger Dryas (YD) cold interval (Marchal et al., 2016), and synthetic studies, as assimilation of treering-width into an atmospheric GCM (AGCM) (Acevedo et al., 2017), analysis of time-averaged observations (Dirren and Hakim, 2005), or evaluation of particle filters for paleodata assimilation (Dubinkina et al., 2011). Including model parameters 5 as control variables, early work in climate analysis was done by Hargreaves and Annan (2002), who evaluated a Monte Carlo Markov Chain (MCMC) method with a simple ESM. Later, the technique of state augmentation with model parameters (e.g.; Friedland, 1969;Smith et al., 2011) and an ensemble Kalman filter (EnKF) (Evensen, 1994) was used by Annan et al. (2005a) and Annan et al. (2005b) in synthetic experiments with an Earth system model of intermediate complexity (EMIC) and with an AGCM coupled to a slab ocean, respectively. The additional issue of sparsity in paleoclimate proxies was addressed by 10 Paul and Schäfer-Neth (2005) for climate field reconstructions with an EMIC and manual tuning. More recent applied work, in part motivated by the nonlinearity of climate models, has used four dimensional variational DA (4D-Var). Thus, Paul and Losch (2012) applied 4D-Var with a conceptual climate model, and Kurahashi-Nakamura et al. (2017) used 4D-Var with the Massachusetts Institute of Technology general circulation model (MITcgm) for ocean state estimation considering joint initial conditions, atmospheric forcings, and an ocean vertical diffusion coefficient as control variables to analyse the global ocean 15 state in equilibrium conditions during the Last Glacial Maximum (LGM). We share the motivation of this recent work but put the focus on deterministic and comprehensive ESMs. As, in general, these models are not suited to automatic differentiation (AD) and the development of hand-coded tangent linear and adjoint models is out of reach (so, standard 4D-Var and related hybrid approaches as En4DVar are not applicable), we seek assimilation strategies that take into account the nonlinearity in ESMs and the computational constraints with current HPCs for low-frequency analysis. 20 The question remains about how should one choose the control vector for the assimilation. One possibility is to select a relatively high-dimensional control vector and to resort to ensemble methods, which involve a low-rank representation of covariances. For example, the (adjoint-free) iterative ensemble Kalman smoother (IEnKS) in Bocquet and Sakov (2014), counts on iterations to deal with nonlinearity. Also, the IEnKS has been evaluated in a synthetic study with the low-order model Lorenz-95 by state augmentation (Bocquet and Sakov, 2013). However, the ensemble (low-rank) covariances and sensitivities 25 would be noisy because of the small ensemble size. An alternative is to resort to a low-dimensional control vector, so that its error covariance is explicitly represented, and the low dimension allows for an estimation of the sensitivity of the observation space to the control vector by conducting individual perturbation experiments of the control variables; i.e., a finite difference sensitivity (FDS) estimation. With respect to adjoint sensitivities, FDS estimation has the disadvantages that the computing cost is proportional to the dimension of the state vector and that the choice of the perturbations is critical. Too small perturbations 30 lead to numerical errors, while too high perturbations lead to truncation errors. An advantage is that FDS consider the full physics of the nonlinear model.
In any case, in a practical application of such a low-dimensional control vector approach (whose dimension would be imposed by computational constraints), the selection of the control variables should be carefully done. From all available model inputs, the selected control variables (given their respective background uncertainties) and model should try to explain the most of the observed variability. In turn, this assumes (a) a general need to perform sensitivity analysis beforehand, and (b) that the model has been previously comprehensively tuned. The exclusion of relatively less sensitive inputs from the control vector, and the previous tuning would reduce possible compensation effects (i.e., that increments in the control vector due to the assimilation take partial responsibility for errors elsewhere). Nonetheless, some error compensation will alway be present (for example, this is intrinsic to the common tuning of the coupled ESM, which follows tuning of individual 5 components) and very difficult to deal with. A striking example is given by Palmer and Weisheimer (2011), who report how an inadequate representation of horizontal baroclinic fluxes resulted in a climate model error equal in magnitude and opposite to the systematic error caused by insufficiently represented vertical orographic wave fluxes. Thus, the selected control vector has the responsibility of embracing the model climate background uncertainty, and their updated values will likely compensate for non-accounted errors. For low-frequency climate analysis it is very likely that after some years of model integration, the 10 modelled climate is less sensitive to (reasonable) initial conditions than to other possible inputs. Thus, one would generally select the the most sensitive parameters for the model physics and forward operators, forcings and boundary conditions for a given situation as control vector.
Also, regarding initial conditions in a sequence of multi-decadal and longer data assimilation windows with transient forcings, there is no clear consensus about how should one approach the initialization at each DAW. For example, Holland et al. 15 (2013) indicate that inizialization had little little impact (in general, limited to a couple of years) on Artic and Antartic sea ice predictability in the Community Climate System Model 3 (CCSM3) in a perfect-model framework. However, in a later synthetic study including assimilation with a perfect-model framework and an Earth system model of intermediate complexity Zunz et al. (2015) obtained a similar interannual predictability (∼3 years), but noted that the initialization for the DAW can still influence the state at multi-decadal timescales (although with a larger impact of external forcing). Among others, we 20 mention these examples as sea ice has been related to changes in atmospheric circulation patterns and teleconnections with the tropical Pacific and Atlantic oceans (e.g.; Meehl et al., 2016;Marchi et al., 2018), and these relatively fast climate dynamics can, for example, influence the onset or termination of glacial conditions and stronger climate changes. On the other hand, given the limited predictability at multidecadal timescales (and to reduce computational costs), the reinitialization in paleo-DA has often be removed after the assimilation altogether, with a common initial integration being applied as background climate 25 for a number of DAWs. This has been named as off-line assimilation (e.g.; Steiger et al., 2014;Hakim et al., 2016;Klein and Goosse, 2017;Acevedo et al., 2017). The perspective of the off-line approach can be modified when model parameters are included in the control vector because, as opposed to initial conditions, the impact of model parameters does not decay with the model integration. In general, updated model parameters as part of the assimilation as well as their physically consistent climate would then be used as (augmented) initial conditions for a subsequent DAW. 30 Throughout this study, all observations available during a data assimilation window (DAW) are assimilated in parallel. This has been termed four-dimensional data asimilation or asynchronous data assimilation, and is also commonly referred to as the smoothing problem (Sakov et al., 2010). Here we choose the term smoother for the evaluated schemes, but they could as well be termed asynchronous (or four-dimensional) filters. This study focuses on evaluating two assimilation schemes for low-frequency past climate reconstruction. They are based on finite difference sensitivities (FDS) and low-dimensional control 35 vectors, and rely on iterations to account for nonlinearity. The schemes are then a FDS implementation of the iterative Kalman smoother (FDS-IKS, a Gauss-Newton scheme), and an alternative named as FDS-multistep Kalman smoother (FDS-MKS, based on repeated assimilation of the observations). Other paleoclimate assimilation issues, as the sparsity and measurement error characteristics (e.g.; temporal autocorrelation), and the representation error, or the development of forward operators for specific proxies (proxy system models) (e.g.; Evans et al., 2013;Dee et al., 2016) and further complexities in the model-5 observation comparison (e.g.; Goosse, 2016;Weitzel et al., 2018) are not addresed in this manuscript.
The rest of this article is organised as follows. In section 2, within the broader context of the joint state-parameter estimation problem, we summarise the strong constraint incremental 4D-Var formulation (Courtier et al., 1994) from a perspective where the state vector is augmented with the model parameters to arrive, under given assumptions, to the formulation of the iterative Kalman smoother (IKS) scheme as implemented here. Then, we describe the sensitivity estimation and the two schemes, 10 the FDS-IKS and the FDS-MKS in a concise, algorithmic format. The description of our implementation of the Gaussian Anamorphosis (GA), which we applied along with an ensemble transform Kalman filter (ETKF-GA) as alternative nonlinear assimilation approach, is also included in section 2. In section 3 we conduct an experiment with a simple 1D energy balance model (Ebm1D) and present day NCEP/NCAR reanalysis surface air temperature as target, and in section 4 we conduct an identical twin experiment with CESM, assimilating MARGO-like data (MARGO uncertainties, timescales and locations) as 15 example of paleoclimate observing dataset. Ebm1D is amenable to automatic differenciation, so that we applied 4D-Var and ETKF as benchmark schemes. CESM lacks an adjoint, and we applied ETKF and ETKF-GA as benchmark schemes. The experimental setup, results and discussions are given in each case. We finish with conclusions in section 5.
2 Assimilation schemes 2.1 Analysis approach 20 The problem is to estimate the mean state (seasonal and annual means) of a past climate state along a time window for multidecadal and longer time scales. From a variational perspective, in numerical weather prediction (NWP) this would be referred to as a four-dimensional variational data assimilation (4D-Var) problem, where the initial conditions of a model integration are estimated subject to model dynamics and according to background and observation uncertainties within a data assimilation window (DAW). In NWP, the background (or prior) is normally given by a previous model forecast. In this article, 25 time t k and its index k measure time relative to the start of the DAW, which is t 0 , using conventions similar to those of 4D-Var.
We consider a discrete nonlinear time invariant dynamical system model where x k ∈ R n is the state vector at time t k , and θ k ∈ R q is a vector of selected inputs in addition to initial conditions (model parameters, forcings and boundary conditions).
We assume a gridded system, where specification of the model state and other inputs at time t k uniquely determines the model state at all future times, and that θ k is constant during the DAW (i.e., θ ≡ θ k+1 = θ k ). For example, within θ, one can include 30 a constant error term (a bias to be estimated as part of the asimilation) in a prescribed transient radiative constituent (e.g.; a CH 4 time series). Then, we consider an augmented state vector z: and an augmented deterministic nonlinear dynamics operator M : R n+q → R n+q , such that Observations at time t k are represented by the vector y o k ∈ R p k and related to the model state by where H k : R n+q → R p k is a deterministic nonlinear observation operator that maps from the augmented state z k to the observation space, and k ∈ R p k is a realisation of a noise process, which consists of mesurement errors and representation errors (errors due to unresolved scales and processes, observation-operator errors, and pre-processing or quality-control errors ;Janjić et al., 2017). We assume k is a Gaussian variable with mean 0 and covariance matrix R k . The error covariance matrix of a 10 state z k = [x T k , θ T ] T , where the superscript "T" denotes matrix transposition, at any time t k within the DAW is where P xx k ∈ R n×n is the error covariance matrix of x k , P θθ ∈ R q×q is the error covariance matrix of θ, and P xθ k ∈ R n×q is the error covariance between x k and θ. The goal in 4D-Var is then to find the initial state z 0 that minimizes a non-quadratic cost function given by where a 2 A −1 ≡ a T A −1 a. The first term (the background term, J b ) measures the deviation between z b 0 and z 0 , with the background-error covariance matrix P 0 as L 2 norm. The second term (the observation term, J o ) measures the deviation betweenŷ o ∈ R p (where p = n k k=0 p k , indicating all observations throughout the DAW) and its model equivalentŷ z ≡Ĥ(z 0 ), using the observation-error covariance matrixR as L 2 norm.Ĥ(z 0 ) : R n+q → R p is a generalised observation operator maping 20 from the augmented initial state to all the observations at any time in the DAW (i.e.,Ĥ ≡ H • M). The maximum a posteriori estimation in (5) is also known as conditional mode estimation, or maximum of the conditional density. As presented by Lorenc (1986) from a Bayesian view, this is the maximum likelihood of the state under a Gaussian assumption for the various error terms. The cost function (5) is subject to the states satisfying the nonlinear dynamical system (2) and is known as strong constraint variational formulation, while the additional inclusion of a term for the model error would lead to a weak constraint 25 4D-Var. The solution to the functional J 1 (z 0 ) is z a 0 , with the resulting states along the DAW referred to as the analysis. In general, an exact solution cannot be found. In the incremental formulation of 4D-Var, the solution to (5) is approximated by a sequence of minimizations of quadratic cost functions. Thus, incremental 4D-Var has first an outer loop, where z l 0 provides the current approximation and initially, for l = 1, z 1 0 = z b 0 . The innovations are then given by the residual between the observations and the mapping of the initial state in the current approximation into observation space where the computation of the initial state mapped to observation space,Ĥ(z l 0 ), has the following way: 5 Then, incremental 4D-Var has an inner loop, where two approximations are conducted. First where H k (z l k ) is the Jacobian of H k (•) evaluated at z l k . H k is referred to as the tangent linear operator in the DA literature. Second, the dynamical model is also linearized, obtaining the tangent linear model (TLM): where δz 0 = z 0 − z l 0 is the increment. By considering (6) and (10), the generalised error termˆ in (5) for all observations in the DAW can be expressed aŝ 15 whereĤ l ≡Ĥ(z l 0 ). This approximation ofˆ is introduced in (5) leading to a quadratic cost function with the increment δz 0 as argument: The minimizations of J 2 (δz 0 ) is the inner loop, which is conducted by gradient descent algorithms (e.g.; Lawless, 2013) 20 until it meets a given criterion, yielding an optimal δz l+1 0 . Then, the outer loop takes control, where the estimate of the initial state is updated with the estimated increment: z l+1 0 = z l 0 + δz l+1 0 . Incremental 4D-Var has been shown to be an inexact Gauss-Newton method applied to the original nonlinear cost function (Lawless et al., 2005).
In our context in this paper, we assume Gaussian statistics and a perfect-model framework except for the sources of model uncertainty in z 0 . Thus, the conditional mode given by minimization of Eq. (12) is also the conditional mean (also called the 25 minimum variance estimate) given by the explicit solution where K l is known as the Kalman gain matrix, given by So, the inner loop is omitted and the state vector is explicitly updated as Thus, as incremental 4D-Var, the iterative approach described by Eq. (15) gives an approximation to the conditional mode or 5 maximum likelihood of the cost function (5). Iterative methods have a long history for DA applications in nonlinear systems. Jazwinski (1970) considers local (conducted over a single assimilation cycle) and global (conducted over several assimilation cycles) iterations of the extended Kalman filter (EKF). Local iterations of the Kalman filters are designed to deal with nonlinear observation operators and non-Gaussian errors. The locally iterative (extended) Kalman filter (IKF) is a Gauss-Newton method for approximating a maximum-likelihood estimate (Bell and Cathey, 1993), and actually it is algebraically equivalent to nonlin-10 ear three dimensional variational (3D-Var) analysis algorithms (Cohn, 1997). For the first loop, the IKF is identical to an EKF (see e.g.; Jazwinski, 1970). Later, Bell (1994) showed that the iterative Kalman smoother (IKS) represents a Gauss-Newton method to obtain an approximate maximum likelihood, as was shown later for incremental 4D-Var (Lawless et al., 2005). The IKF and the IKS circumvent the need for choosing a step size, which is sometimes a source of difficulty in descent methods.
However, as a Gauss-Newton method, not even local convergence is guaranteed. Eq. (15) is actually akin to the formulation of the IKF, but generalized to a DAW, and it is so an IKS. It differs, though, from the IKS formulation in Bell (1994), in that (15) is a strong constraint version (cost term for the model neglected) without the backward pass (see Fig. (1) in Bell, 1994). Now, there remains to see how one would apply a scheme as Eq. (15) for multidecadal and longer term paleoclimate analysis.
In the last years there is a growing effort toward development of stochastic physical parameterizations in weather forecast and climate models. However, stochastic parameterization is still in its infancy in comprehensive ESMs. For deterministic 20 parameterizations, on which we focus here, the model climate converges to its own dynamical attractor, and the climate of a sufficiently long model trajectory is typically much less sensitive to initial conditions than to other model inputs.
In general, ensemble methods rely on perturbations δz 0 of the control vector to estimate the sensitivity matrixĤ l . Two general kind of simulations and climate analysis are of interest to the paleoclimate community; the so-called equilibrium simulations, and the transient ones. Equilibrium simulations are subject to solar forcing prescribed to a specific calendar year 25 and fixed radiative constituents in the atmosphere, representing a situation when the past climate is considered stable. The goal is to evaluate the low-frequency seasonal and annual means and variability within these relative long-term stable climate conditions (e.g.; Eeemian, LGM or mid-Holocene). In these simulations, the model is integrated until it reaches equilibrium conditions. Typically, starting from a standard climatology from, e.g., the World Ocean Atlas, it takes a few thousand years for an ESM integration to converge to its equilibrium (even in the deepest ocean) for stationary forcings. After an initial spin-up 30 similar to the equilibrium conditions, transient simulations with ESMs use then the corresponding time-varying solar forcing, and normally use prescribed time series of radiative atmospheric constituents reconstructed from observations for the time window of interest, as well as transient boundary conditions. From an assimilation perspective, irrespective of whether the analysis is for equilibrium or transient forcing, the perturbation of model parameters and other inputs introduces a shock at the start of the model integration (in some way analogous to shocks in unbalanced ocean-atmosphere coupled models initialized with uncoupled data assimilation in NWP). And further than the initial oscillations, the model climatology in the first years of model integration is not consistent with the perturbed parameters. Thus, the estimation of the sensitivitiesĤ l can be spurious 5 in the first years of model integration, and also the innovations do not result from the model climatology. It is convenient then to set up a model integration time threshold and to disregard sensitivities earlier than this time. For such purpose, we loosely define a model quasi-equilibrium condition as the situation in which a model climatology is in reasonable physical consistency with the control vector (initial conditions plus model parameters, etc.). With equilibrium simulations, a quasi-equilibrium time t q can, for example, be evaluated based on the convergence on the maximum meridional overturning circulation, for which each 10 ensemble member will converge towards its own attractor. By the time this convergence is reached, the correlations between the atmosphere, surface climate, ocean mixed layer and the (paleoclimate) observation space should be fully developed. Then, low-frequency climate means (annual and/or seasonal) after the integration time to quasi-equilibrium, t q , would be evaluated against the climate proxy database (e.g.; MARGO for the mean annual and seasonal climate across the LGM) to obtain the sensitivityĤ l and the innovations δŷ l at each iteration. This is the approach followed by the experiments in this study. 15 For the more general transient forcing situation, in a current DAW, the effects from a perturbed control vector and from transient forcing are entangled. As opposed to the equilibrium simulations, integration times here match physical forcing times. Also observations earlier than a specified t q should be now disregarded, where quasi-equilibrium here would not refer to a model state, which will be transient as the forcings, but (as above) to a situation in which the model state is physically consistent with the input given in the control vector. At each DAW, one could first estimate t q by conducting an 'equilibrium' 20 simulation with forcings and boundary conditions prescribed to those at the start of the integration time (the DAW). Then, use this estimate as surrogate for the transient t q . This would, however, increase computations. Alternatively, one could set up a t q based on previous experience and tests for equilibrium forcings.
In both cases, it is unlikely that errors in initial conditions are among the most sensitive ones out of all possible input errors for the evaluated integration times after quasi-equilibrium. Thus, for low-frequency past climate analysis, it should be generally 25 acceptable to exclude x 0 from the control variables, allowing for a reduced problem. In the following, we take this assumption.
To simplify the notation, we then define G(•) as a generalised (deterministic) observation operator mapping a vector θ into the where t q represents the model integration time to quasi-equilibrium. Instead of (5), a reduced problem is posed now by minimization of the non-quadratic cost function After the assimilation, a forward integration with the updated θ a leads to its physically consistent climate estimate. The sensitivity matrix, or Jacobian of G, is noted as G ∈ R p×q . The trivial substitutions into the incremental formulation (12) and its solution (15), with estimation of G via finite differences, lead to the finite difference sensitivity iterative Kalman smoother (FDS-IKS), which is summarised in section 2.3. The finite difference sensitivity multistep Kalman smoother (FDS-MKS), described in 2.4, is an alternative approach to deal with nonlinearity.

Background error covariances and sensitivity estimation
The current implementation of variational assimilation is different in each operational NWP center. A recent review of operational methods of variational and ensemble-variational data assimilation is given by Bannister (2017). However, for many geophysical models, codes are not suited to automatic differentiation and it is extremely complex to develop and maintain tangent linear and adjoint codes. This has motivated more recent research toward (adjoint-free) ensemble DA methods for 10 high-dimensional models. Lorenc (2013) recommended to use the term 'EnVar' for variational methods using ensemble covariances. Thus, Liu et al. (2008) proposed (the later called) 4DEnVar, as a way to estimate the generalised sensitivities of the observation space to initial conditions based on an ensemble of model integrations within a 4D-Var formulation. Gu and Oliver (2007) introduced a scheme called Ensemble Randomized Maximum Likelihood Filter (EnRML) for on-line nonlinear parameter estimation, which was later adapted as a smoother, the batch-EnRML, by Chen and Oliver (2012). The EnRML 15 estimates sensitivities by multivariate linear regression between the ensemble model equivalent of the observations and the ensemble perturbations to the control vector. These are so mean ensemble sensitivities. The iterative EnKF (IEnKF) is similar to the EnRML, but instead uses an ensemble square root filter, and rescales the ensemble perturbation (deflates) before and (inflates) after propagation of the ensemble as alternative to estimate the sensitivities (Sakov et al., 2012). In this way the estimated sensitivites are more local about the current estimation. An extension to the IEnKF as a fixed-lag smoother led to the 20 IEnKS in Bocquet and Sakov (2014). However, while in the 4D-Var approach the computing cost of the sensitivites with the adjoint method is independent of the dimension of the control vector in the cost function, in numerical estimates of the sensitivities the computing cost increases with the size of the control vector. The low-rank property of the ensemble covariances in ensemble methods means that sampling error problems will inevitably arise when the number of ensemble members is small in comparison with the size of the control vector. Instead, as indicated in above, in the low-dimensional control vector schemes 25 evaluated here, the background-error covariance is a full-rank explicit matrix, at the expense of computations being linearly proportional to the size of the control vector.
While the relation between G and P k is implied in the previous section, it is instructive to look at it in some detail. Let us consider the case of a specific observation time t k . The Kalman gain matrix (disregarding the loop index, if any) for the components of the model input θ, which we denote in this section as K k θ , can be expressed in the two following ways: where the first way is the standard one in Kalman smoother expressions including parameter estimation via state vector augmentation, and the second one is a parameter space formulation, which we apply here. Both are equivalent, but the covariance information in P k has been transferred to G k , or sensitivity matrix, in the second expression. Let us further consider the case that at t k there is a single observation y of a state variable within the vector z k , denoted as x ky , and we focus on the representer matrix for a single parameter θ i . The covariance between θ i and the observation y is expressed in both cases as which, as ∂y ∂θj = ∂y ∂x ky ∂x ky ∂θj , indicates that the linear equality is taken from a bottom-up approach in the parameter space formulation, where all sources of uncertainty are specifically evaluated to compose the covariance σ x ky θi . In our experiments with ETKF and ETKF-GA, with parameter augmentation, the 10 first alternative in Eq. (19) is used, and the generalized sensitivity matrix G is not explicitly computed. So, for comparison with the finite differences sensitivity (FDS) schemes, we estimate an ensemble-based average sensitivity matrix by solving forḠ in ∆Y =Ḡ∆θ, where ∆θ ∈ R q×m is the matrix of random model parameter perturbations drawn from P θθ around the background values, and ∆Y ∈ R p×m are the resulting perturbations in the observation space. In an iterative approach the sensitivities would need to be evaluated for perturbations around the current estimate. 15 Alternatively, finite differences sensitivity (FDS) directly samples from the conditional probability density function (cpdf) of the perturbed variable, as the remaining control variables are kept to their current estimate. However, the computing requirements in FDS are linearly proportional to the size of the input vector, and its numerical estimation of derivatives is inaccurate and the associated error can be unacceptably large due to inadequate choice of the finite differencing step size. High perturbations increase the truncation error, which increases linearly with the perturbation magnitude, while as the magnitude of the 20 perturbation gets smaller the accuracy of the differentiation degrades by the loss of computer precision (Dennis and Schnabel, 1996). It is possible to do more than one perturbation experiment by sampling from the cpdf for each parameter, and estimating the sensitivity by univariate regression, which ameliorates the problem of non-optimal perturbations, and we include a few tests in this sense for the first experiment in this study (see Appendix A). However, it is currently computationally difficult to do more than one perturbation per control variable with comprehensive ESM and long integrations (and one might as well 25 resort then to ensemble approaches -full-rank in this case-if more computations were possible). The sensitivity estimates by forward finite differences at a loop l (initially, the background) are then computed at each column G l :,i as: where for each parameter, δθ i is a small perturbation (or variation) to the current approximation of θ i and δθ i is the vector 0 ∈ R q but with element θ i replaced by δθ i . As indicated, the estimation of sensitivities by local finite difference approximations results from sampling the conditional density function in the control vector space.

Finite difference sensitivity iterative Kalman smoother
The algorithm we describe here, denoted as finite difference sensitivity iterative Kalman smoother (FDS-IKS), is a Gauss-

5
Newton scheme akin to the IKF and the IKS. The 'FDS' acronym to clarifies that the scheme is (a) expressed in terms of explicit sensitivities to all variables in the control vector, and (b) these local sensitivites are estimated numerically by individual perturbation experiments for each variable in the control vector. The scheme then uses a full-rank representation of the background-error covariance matrix (hence, it is not called an ensemble method). It is a smooother rather than a filter as it assimilates (future) observations along a DAW to update the control variables, applicable since the start of the DAW. After 10 iterations are stopped (due to convergence criterion or reaching a maximum iteration number), a model reintegration of with the updated control vector θ a leads to the analysis (or climate field reconstruction) over the DAW.
For any natural number l, the FDS-IKS provides the update The sequences {θ l : l ≥ 0} and {P l θθ : l ≥ 0} are defined inductively as follows 20 where for notational convenience and Equations (22) and (25) show that, as in the IKS and incremental 4D-Var, the FDS-IKS uses the initial background error 25 statistics P b θθ during all iterations. The updated P a θθ is just calculated in the last iteration.

Finite difference sensitivity multistep Kalman smoother
Here, a multistep approach is conducted by inflating the observation-error covariance matrix R and recursively applying a standard Kalman smoother over the assimilation window with the inflated R and the same observations. The multistep idea of inflating R for repeated assimilation of the observations was proposed by Annan et al. (2005a) and further clarified and applied by Annan et al. (2005b) for an atmospheric GCM using the EnKF with parameter augmentation. Their approach is 5 designed for steady-state cases, where time-averaged climate observations corresponding to a long DAW can be assumed as constant along a sequence of smaller assimilation sub-windows in which the DAW is divided. The model parameters are then sequentially updated in small increments and the loss of balance in a more general nonlinear model should be reduced with the multistep approach (Annan et al., 2005b). The inflation weights are such that in a linear case, after the predefined sequence of integrations/assimilations, the solution is identical to that of the single step scheme along the whole DAW.

10
The multistep strategy was termed multiple data assimilation (MDA) by Emerick and Reynolds (2013) in the context of ensemble smoothing, and was then further developed by Sakov (2013, 2014) in their iterative ensemble Kalman smoother (IEnKS) where the weights for the inflation of R were applied in overlapping data assimilation windows (MDA IEnKS). We apply here the MDA strategy to a recursive formulation of the KF in term of FDS estimates for the dual of the observation space to the control vector, and here we denote the scheme as finite difference sensitivity multistep Kalman 15 smoother (FDS-MKS). As the inflation of R results in a reduced influence of the observations at each iteration, the increments in the early iterations are relatively reduced with respect to the FDS-IKS, making the FDS-MKS is potentially more stable. We note, however, that the inflation of R modifies the direction of the increment in nonlinear cases. Thus it does not converges to the same (local) minimum than the FDS-IKS (or 4DVar), but to an approximate point in the control space. In contrast with the scheme in Annan et al. (2005b), the FDS-MKS considers recursive integration along the complete DAW (and non-overlappings 20 DAWs). Thus, it is not restricted to steady-state conditions.
The scheme considers the total increment in the state vector that would result from the linear assimilation of one specific observation, and alternatively conducts a recursive sequence of assimilations of the same observation whose sum of fractional increments equals the total increment. This is achieved by considering that the observation error variance at loop l is the product of an inflation factor β l and the variance of the "complete" observation: σ 2 y l = β l σ 2 y . As the (linear) increment is 25 inversely proportional to the observation error variance, for the total increment to be the same in both situations the condition Here, the sensitivity matrix G is estimated at each recursive step (iteration), similarly to the FDS-IKS. However, the error covariance P θθ is also updated at each iteration. Hence, as indicated in (20), the covariance between any climatic variable and 30 an input θ i (σ x ky θi ) is also affected by the sensitivity of the climatic variable to other inputs. The FDS-MKS is a recursive direct method, as attemps to solve the problem in a pre-specified finite sequence of iterations N l : The sequences {θ l : l = 0, . . . , N l } and {P l θθ : l = 0, . . . , N l } are defined inductively as follows where for notational convenience and

10
Regarding β, a possible step size approach is to set the inflation weight constant for all the iterations, which given (26) leads to β = N l 1, where the column vector 1 ∈ R N l has all values set to 1. However, as the iterations proceed, the updated background covariance decreases so the fractional increments get smaller. A more even distribution of fractional increments, with likely improved stability, can be given by decreasing inflation weights as the iterations proceed, so initial weights are relatively higher. Thus, among other possible solutions for the inflation factor at step l, here we adopt the expression which satisfies requirement (26). A numerical advantage of inflating R for the multiple data assimilation approach is that it reduces the condition number of the matrix to be inverted in the assimilation at each iteration. A practical advantage of the FDS-MKS with respect to the FDS-IKS is that the number of iterations is predefined. With given computer resources and model computational throughput statistics, it is possible to evaluate how many ESM integrations are affordable, and set the 20 FDS-MKS inflation weights and computing schedule accordingly. This does not mean, though, that the FDS-IKS would not converge closer to the (local) minimum of the cost function with the same iterations. Also, without specific consideration of constraint (26), the idea of inflating R has also been considered by previous studies as a mechanism to improve initial sampling for the EnKF (Oliver and Chen, 2008), and also to damp model changes at early iterations in Newton-like methods (Wu et al., 1999;Gao and Reynolds, 2006).

Early-stopped iterations for the FDS-MKS
The computational cost of the ESMs integrations is much higher than that of the assimilation steps as considered in the FDS-MKS for a low-dimensional control vector. In this study, we do not evaluate adaptive strategies for the planning of the weights in the FDS-MKS. However, the evolution of the increments in the control variables along the iterations could potentially be used to guide the size of β l at each loop, and even to conduct an early stopping of the iterations. At each iteration, it is possible to compute the standard update using the corresponding pre-planned weight β l , and simultaneously to compute an alternative update with early termination of the iterations by applying a completion weight β c l that both terminates the iterations and fulfils the condition (26) as alternative to β l in (30): Comparison of the sequence of increments given by the fractional steps of the FDS-MKS with those with simultaneous early-stopped solutions may be used to support replanning of weights and even to decide on an early stopping of the iterations, using the update given by using the completion weight as final solution.

ETKF and Gaussian anamorphosis 10
The ensemble Kalman filter (EnKF) was introduced by Evensen (1994). It makes it possible to apply the Kalman filter to highdimensional discrete systems, when the explicit storage and manipulation of the system state error covariance is impossible or impractical. The EnKF methods may be characterized by the application of the analysis equations given by the Kalman filter to an ensemble of forecasts. One of the main differences among the several proposed versions of ensemble Kalman filters is how the analysis ensemble is chosen. Ensemble square-root filters use deterministic algorithms to generate an analysis ensemble 15 with the desired sample mean and covariance (e.g.; Bishop et al., 2001;Whitaker and Hamill, 2002;Tippett et al., 2003). Here, in our experiments with global model parameters, we use the mean-preserving ensemble transfom Kalman filter (ETKF), or "symmetric solution", described by Ott et al. (2004), and also referred to as "spherical simplex" solution by Wang et al. (2004).
Still, for the (En)KF to be optimal, three special conditions need to apply: (1) Gaussianity in the prior, (2) linearity of the 20 observation operator, and (3) Gaussianity in the additive observational error density. In order to better deal with nonlinearity, a number of studies have addressed the use of transformation of the model background and observation to obtain a Gaussian distribution, in a way that the (En)KF can be applied under optimal conditions. This preprocessing transformation step is known as Gaussian anamorphosis (GA) (e.g.; Chìles and Delfiner, 2012). The GA procedure was introduced into the context of data assimilation by Bertino et al. (2003), and has been applied for many years in the field of geostatistics (e.g.; Matheron, 1973;25 Deutsch and Journel, 1998).
It is not standard, however, how the GA should be applied in the context of DA (Amezcua and Leeuwen, 2014 In a theoretical framework and with simple experiments, Amezcua and Leeuwen (2014) evaluated several approaches using univariate GA transformations. As key point, they found that when any of above (1)-(3) conditions are violated the analysis step in the EnKF will not recover the exact posterior density in spite of any transformation. Also, they concluded that when ensemble sizes are small and the knowledge of the conditional p y|x (y|x) is not too precise, it is perhaps better to rely on independent marginal transformations for both a state variable x and observation y than on joint transforms. For field variables, one can consider them to have homogeneous distributions so that each kind of model variable is transformed using the same monovariate anamorphosis function at all grid points of the model (e.g.; Bertino, 2009, 2012), or to apply local 5 transformation functions at different gridpoints (Béal et al., 2010;Doron et al., 2011;Zhou et al., 2011). In both cases, the GA in these studies has been applied to the filtering problem. In the context of low-frequency past climate analysis, the temporal dimension has to be considered. For example, point (2) above would refer to the linearity in the generalized observation operator, which includes the model dynamics. Given the considerations in Amezcua and Leeuwen (2014), the sparsity of lowfrequency paleoclimate records, and the lack of homogeneity in global ESM variables, here we follow the approach in Béal In our implementation of the ETKF we augmented the state vector with the model equivalent of the observations. We evaluated transformations of the control variables as well as transformations in sea surface temperature (SST), as observed variable. We transformed the control variables marginally. Regarding SST, due to sparsity and heterogeneity, we consider it is not possible to estimate the marginal distribution of the low-frequency paleo-climate observations with enough confidence to 15 support a transformation. Thus, in our experiments we estimated the marginal distribution of the model equivalent of the SST observations, as derived from the background ensemble, and also used the same transformation for the SST observations. The transformation then operates in the marginals in independent way at each gridpoint: where P ξ (ξ) denotes the cumulative density function (cdf), and Φξ(•) explicitly indicates that the cdf in the transformed space 20 is that of a Gaussian random variable. For comparison, Eq. (33) corresponds to transformacion (c) in Amezcua and Leeuwen (2014). Tests with standard ETKF are included in the two experiments below. A test with ETKF including GA as just described is included in experiment 2, with CESM.
As indicated, here we use empirical cumulative density functions (cdf's) for the anamorphosis based on the background ensemble. The risk of using the tails of the transformation function during the anamorphosis of the ensemble is significant, and 25 tail estimation can highly impact the analysis. Here, we obtained linear tails following Bertino (2009, 2012), which consists of extrapolating to infinity the first and last segments of the interpolation function with the same slope. In practice, we just extrapolated, in each direction, until twice the original range of the Gaussian variablex. Then, we truncate (only) the physical coordinate in tail points in the transformation function to its physical bound in case it is exceeded. Then, we set the two first moments of the target Gaussian variablex to those of the original ensemble (see section 4.4 in Bertino et al., 2003).

Model description
This experiment is based on a conceptual one-dimensional, South-North, energy-balance model (Ebm1D), for which Paul and Losch (2012)  Here we summarize the model in relation with these parameters, and the reader is referred to PL2012 for a thorough description of all model parameters and related equations. The ocean mixed layer depth, H o , controls the effective heat capacity of the atmosphere-ocean system. A is a constant term in the calculation of the outgoing longwave radiation F ↑ ∞ , which also depends linearly on the surface temperature and the logarithm of the ratio of the actual value of the atmospheric CO 2 concentration to a reference value (Eq. 6 in PL2012). Meridional heat transport is treated 15 as a diffusive process driven by latitudinal temperature gradients, where the horizontal heat transport depends linearly on a , where K 0 , K 2 , and K 4 are the remanining three parameters included in the control vector (Table 1), and x = sin Φ, where Φ is latitude.

Observations and cost function
As observations, we took surface air temperature (SAT) derived from the NCEP/NCAR reanalysis data (Kalnay et al., 1996). 20 From the reanalysis data we first calculated global zonal means of SAT. Then, we obtained SAT means for present-day climate at each grid cell (i.e., latitude) for winter (January, February and March; JFM) and summer (July, August and September; JAS) in the Northern Hemisphere. These zonal averages of SAT, T s , were the target for the analysis. The mean of the last 10 years, out of 100 years of model integration, were taken as model equivalent of the observations. That is, each grid cell in the 1D model has one observation and model equivalent for winter (JFM) and similarly for summer (JAS) in the cost function, defined 25 by: where W is a diagonal matrix, whose diagonal is a vector of weights w ∈ R p given to the observations. This cost function can be written as The observational target and control variables are identical to those in PL2012, but they did not include a background term J b in the cost function. As PL2012, we assumed that observation errors are uncorrelated (R 30 is diagonal), with all observations having a standard deviation σ T0 = 1 • C. The explicit division of the norm for J o in terms of R −1 and w facilitates the comparison of scenarios as a function of increasing observational weight.

Experimental setup
As PL2012, we set the grid resolution to 10 • . We assumed a diagonal P b θθ , with standard deviations given in Table 1, which we considered as reasonable. Other than the parametric uncertainty we considered a perfect-model assumption, which is overly optimistic in this specific case as, in addition to the 1D Earth climate representation, there are strongly simplified physics in the energy balance model. While PL2012 also assume this perfect-model framework, they point out to a number of specific 5 structural model errors. Thus, the control variables will attempt to compensate for the unaccounted error in either of the evaluated estimation approaches.
For the PD1 tests, we made the observation weights w proportional to the area of the zonal band (i.e., decreasing toward the poles) with Σ p i=1 w i = 1, and we compared the FDS-MKS, the FDS-IKS, the ETKF (m = 60 members) and 4D-Var. We evaluated a two-step and a three-step FDS-MKS (a one-step FDS-MKS equals a FDS-EKS, or first iteration of the FDS-IKS).

10
To evaluate the resilience of the FDS schemes to high perturbations, we conducted three tests with different perturbation sizes for each FDS scheme. In each one, the perturbation applied to each of the control variables was proportional to its background standard deviation by a factor sdfac. This perturbation factor was sdfac ∈ {0.001, 0.01, 0.1}. The evaluation of the cost function for the ETKF (as a smoother) was conducted with a single forward integration of the mean of the posterior control variables.
In the 4D-Var minimization, as PL2012, we used a variable memory Quasi-Newton algorithm as implemented in M1QN3 by 15 Gilbert and Lemaréchal (1989), and to compute the gradient we used a discrete adjoint approach with the tangent and adjoint codes generated automatically by the "Transformation of Algorithms in Fortran" code (TAF, http://www.autodiff.org; Giering and Kaminski, 1998;Giering et al., 2005). The number of simulations can be higher than the number of iterations as the minimizer M1QN3 takes a stepsize determined by a line search that sometimes reduces the initial unit stepsize (see Gilbert and Lemaréchal, 1989). For 4D-Var, as a stopping criterion we required a relative precision on the norm of the gradient of the cost 20 function of 10 4 .
We conducted a number of aditional tests to compare the convergence of the FDS-IKS versus the FDS-MKS as higher weight is given to the observations. These were named PD2 and PD3, corresponding to Σ p i=1 w i = 3, and Σ p i=1 w i = 5, respectively. As these weights increase, the effect of the regularization by J b decreases, and one can expect the convergence of the Gauss-Newton scheme (the FDS-IKS) to be more difficult. A few of these aditional tests evaluate how/if the convergence of the 25 FDS-IKS can be improved (mostly in a low-regularization situation) by increasing the number of perturbations per parameter (that is, the ensemble size). The results of these aditional tests are briefly described here and expanded in Appendix A.

Results
Here we provide a succinct summary of the estimation process. Broader explanation of the model climatology in relation with the control variables is given in PL2012. The background sensitivity of the 10-yr mean surface temperature T s to the control 30 variables is shown in Fig. 1, where, to ease comparison, the sensitivity matrix G is scaled by multiplying each of its columns by the assumed background standard deviation of the corresponding parameter. Fig. 1a shows mean ensemble sensitivities estimated from the background ensemble for the ETKF, and Fig. 1b shows local finite difference sensitivities (FDS) estimated Diffusion coefficients

K0
(1.5 × 10 5 , 1.5 × 10 5 ) m 2 s −1 Constant term K2 (-1.33, 0.75) Second-order coefficient (North et al., 1983) K4 (0.67, 0.6) Fourth-order coefficient (North et al., 1983) 1 References just for the mean values.  Table   3 in PL2012). In any case, Fig. 2 shows that the 4D-Var convergence becames apparently similar to that of the FDS-IKS tests from simulation 40 onwards. In Fig. 2, the three convergence series for the FDS-IKS with the three different perturbation Ebm1D: ensemble sensitivity Ebm1D: FD sensitivity [sdfac = 0.001]  Wu et al. (1999) or Gao and Reynolds (2006). Alternatively, one can conduct a line search along the direction given by the FDS-IKS increments at each iteration. Further details for this experiment are in Appendix A, focusing on the convergence of FDS-IKS versus FDS-MKS as the observational weight increases.   For the purpose of this study, it is not so important that the actual climate of the model matches that of the LGM but that the case study is realistic from the estimation point of view. Thus we just make use of the MARGO database characteristics (location, seasonality and 25 uncertainty), but conduct a synthetic experiment for preindustrial climate conditions. To do so, before starting the experiment we spun up CESM for 1 200 yr starting from Levitus climatology with standard preindustrial conditions to reach an equilibrium state. Then, we used the restart files from the end of the spin up time to create a 60 yr control simulation (as synthetic truth), in which in addition to the preindustrial forcings and boundary conditions, we added a flux term to the ocean, as detailed below.
To create the background ensemble we perturbed a number of parameters for the (deterministic) physics in both the ocean 30 and the atmosphere components, as well as input greenhouse gases and an additional influx of water into the North Atlantic ocean. As indicated in the introduction, the selected control variables have the responsibility of creating all the background uncertainty in a perfect-model scenario, and through the assimilation they will try to compensate for any unaccounted model error elsewhere. In a step-by-step approach, here all perturbed model parameters and forcings were included as control variables in the assimilation. An obvious (still synthetic) and very useful extension would be to perturb a wider set of model parameters and/or forcings and boundary conditions (e.g.; various ice sheet configurations, or alternative freshwater influx) and explicitly evaluate the compensation effect and climate reconstruction results by using subsets of the perturbed inputs as control vector.
Here, the selected parameters for model physics and radiative constituents are relevant to the global energy budget of the Earth system, but not neccessarily the most sensitive model inputs for multidecadal and longer scales. In real cases, the selection of 5 control variables (if the control vector is to be kept low-dimensional) should be done carefully and generally based on previous global sensitivity analyses.
We included an influx of water into the North Atlantic from melting in the Greenland ice sheet (GIS) to the true run and as control variable. This flux was homogeneously distributed along the coast of Greenland and at the ocean surface, and it is appealing to explore as control variable because the Atlantic meridional overturning circulation (AMOC) plays a critical 10 role in maintaining the global ocean heat and freshwater balance, and it is commonly acknowledged that the North Atlantic deep-water (NADW) formation is key in sustaining the AMOC (e.g.; Delworth et al., 1997), while in turn freshwater flux in the North Atlantic, along with surface wind forcing, ocean tides and convection, provides the energy for the NADW formation (e.g.; Gregory and Tailleux, 2011). Adding this freshwater flux (or freshwater hosing) makes the identification of the model parameters more complicated, but it is realistic to expect that current paleoclimate melting estimates can hold some bias and it 15 is useful to know how the evaluated schemes deal with this possibility. In real cases, flux terms have been used in paleoclimate modelling as a mean to account for model errors. So, they relax the perfect-model assumption in a parametric way. Here, the estimated flux term attempts to correct the mean state towards the observations along with the model parameters. As far as the authors know, this is the first experiment with a comprehensive ESM which attempts (even in a synthetic way) a joint flux and model parameter estimation for climate field reconstruction, as these are more commonly seen as competing strategies. 20 We initiated the background with biased control variables with respect to the truth and a zero mean Greenland ice sheet freshwater flux. We used reasonable uncertainties in the control variables derived from previous publications. Separate analyses (weakly coupled assimilation) for different model components (atmosphere, ocean, land) may be inconsistent. In our setup, all observations are allowed to directly impact model parameters from any component in the Earth system model. This is known as strongly coupled data assimilation. Both truth and background simulation were branched from the same initial conditions, 25 which allowed to use relatively short integration times (60 yr) in the experiment. In a real case with steady-state forcings (e.g.; estimation of real LGM climate state by assimilating the MARGO database), the model should be integrated even longer towards quasi-equilibrium to ensure that errors in the initial conditions will not affect the analysis (or they should be accounted for). Also, each model equivalent of the observations has to be mapped into the corresponding spatio-temporal domain of each paleoclimate proxy observation. Similar to the previous experiment, for the FDS schemes, we set the perturbations for each 30 control variable as equal to their standard deviation multiplied by a perturbation factor sdfac. For computational reasons we only tested sdfac ∈ {0.001, 0.1}.
The cost function was as Eq. (17), where the set of control variables used for the experiment is summarised in Table 4.

Sections 4.2 and 4.3 give a brief information of the atmospheric and ocean components of CESM as used in this experiment.
For the rest of the model components we used default configurations for the indicated CESM compset. Given that adjoint codes are not available for CESM, here we alternatively tested an ETKF (with m = 60 ensemble members) including Gaussian anamorphosis (ETKF-GA) as possible nonlinear approach, which has a negligible extra cost over a standard ETKF. We also evaluated the 3-step FDS-MKS, the FDS-IKS (with 3 as maximum number of iterations), and the ETKF, also with m = 60.

CAM
We used the Community Atmosphere Model version 4 (CAM4) as atmospheric global circulation model (AGCM) component.

5
A comprehensive description of CAM4 can be found in Neale and Coauthors (2011). Precipitation and the associated latent heat release drive the Earth's hydrological cycle and atmospheric circulations, and many model processes in AGCMs, including deep and shallow convection and stratiform cloud micro-and macrophysics, are responsible for the partitioning of precipitation through competition for moisture and cooperation for precipitation generation (Yang et al., 2013). Cumulus convection is a key process for producing precipitation and redistributing atmospheric heat and moisture (Arakawa, 2004), and consequently, the 10 global radiative budget (Yang et al., 2013). Since AGCMs are unable to resolve the scales of convective processes, various convection parameterization schemes (CPSs) have been developed based on different types of assumptions. The CPS usually includes multiple tunable parameter, which are related to the subscale internal physics and are thought to have wide ranges of possible values (e.g.; Yang et al., 2012). Also, the dependence of CPS parameters on model grid size and climate regime is an important issue for weather and climate simulations (Arakawa et al., 2011). In addition, AGCMs include parameterization of 15 macrophysics, microphysics, and subgrid vertical velocity and cloud variability to simulate the subgrid stratiform precipitation.
Here we used CAM4 with the Zhang and McFarlane (1995, hereafter ZM) deep convection scheme, and the Hack (1994) shallow convection scheme. For represention of stratiform microphysics we used the scheme by Rasch and Kristjánsson (1998); a single-moment scheme that predicts the mixing ratios of cloud droplets and ice. Regarding cloud emisivity, clouds in CAM4 are grey bodies with emissivities that depend on cloud phase, condensed water path, and the effective radius of ice particles. 20 By default, the CAM4 physics package uses prescribed gases except for water vapor. In CAM4, the principal greenhouse gases whose longwave radiative effect is included are H 2 O, CO 2 , O 3 , CH 4 , N 2 O, CFC11, and CFC12. CO 2 is asumed to be well mixed. As the use of prescribed specie distributions is computationally less expensive than prognostic schemes, for long term paleoclimate analysis we would generally favour the use of prescribed greenhouse gases, for example as given by the recently published 156 kyr history of the atmosperic greenhouse gases by Köhler et al. (2017). Still, we would acknowledge that these 25 emerging dataset have an associated uncertainty and that it is generally appropriate to include the most influential ones as control variables in the climate analysis so their errors can be estimated as part of the assimilation.
In this study, as perturbed parameters and control variables we selected parameters related to the ZM deep convection scheme and the relative humidity thresholds for low and high stable cloud formation. Also, within the radiative constituents, we included invariant surface values of CO 2 and CH 4 as control variables. Table 4 shows the control variables in both CAM 30 and POP2, and Table 5 shows the true run values in column x t .
This experiment is based on equilibrium simulations. Regarding real cases in transient conditions, θ could, e.g., include a constant error term (i.e., a bias) for a transient greenhouse gas datasets as prescribed radiative constituent in the atmospheric model (e.g.; from Köhler et al., 2017). The estimated bias would be updated for succesive DAWs. One could think of more complicated autocorrelated error models for the greenhouse gas dataset (e.g.; García-Pintado et al., 2013), but it seems highly unlikely to us that available proxy datasets for low-frequency climate variability can constraint errors further than simple biases in GHG forcing. We did not evaluate any parameter in relation with indirect effects of aerosol to cloud nucleation and autoconversion, despite the overall effect of aerosol to cloud albedo and cloud lifetime, and so to climate, remains largely uncertain (Chuang et al., 2012). 5

POP2
As ocean component, we used POP2 (Smith et al., 2010). Subgrid scale mixing parametrization includes horizontal diffusion and viscosity and vertical mixing. For horizontal diffusion we chose an anisotropic mixing of momentum, and the Gent and Mcwilliams (1990) parameterization, which forces mixing of tracers to take place along isopycnic surfaces, with activated submesoscale mixing. The main drawback in the Gent and Mcwilliams (1990) scheme is that it nearly doubles the running 10 time with respect to other simpler schemes. For vertical mixing, we chose the K-profile parameterization (KPP) of Large et al. (1994). In the KPP mixing, interior mixing coefficient (viscosity and diffusivity) are computed at all model interfaces on the grid as the sum of individual coefficients corresponding to a number of different physical processes. The first coefficients are denoted as background diffusivity κ ω and background viscosity υ ω (not to be confused with the "background" in assimilation terminology), which represents diapycnal mixing due to internal waves and other mechanisms in the mostly 15 adiabatic ocean. Other coefficient are associated with shear instability mixing, convective instability, and diffusive convective instability. The background viscosity is allowed to vary with depth, but here we assumed a depth-constant vertical viscosity κ ω = bckgrnd_vdc1, where bckgrnd_vdc1 is a model input parameter. The model then computes υ ω = P r κ ω , where P r is the dimensionless Prandtl number (set to P r = 10 in the model).
As control variables in POP2 we chose the Gent-Williams isopycnic tracer diffusion parameter and the (constant with depth) 20 KPP background viscosity, both with default values for the truth. A third control variable in POP2 was the total freshwater influx from the Greenland ice sheet, which we distributed homogeneously along the coast of Greenland and only at the ocean surface.

Observations
The observational dataset is composed of point samples of climate averages for the last 20 years out of a total 60 yr integration 25 time of a true simulation. The synthetic observations were located at the horizontal locations and 10 m depth of the MARGO database, and the sampling characteristics reproduce those of the MARGO. The MARGO database is a synthesis of six different proxies and is considered to represent the combined expertise of at least a sizeable fraction of the LGM paleocommunity.
The observational uncertainty was taken from the MARGO database as input to the assimilation, but we did not add any error to the synthetic observations. MARGO provides observations (or reconstructions) of near sea surface temperature (SST), for 30 the Last Glacial Maximum (LGM). The proxy types on which the SST estimates are based are a) microfossil-based: planktonic foraminifera, diatom, dinoflagellate cyst and radiolarian abundances, and b) geochemical paleothermometers: alkenones unsaturation ratios (U K 37 ) and planktonic foraminifera Mg/Ca. Details of the database are given in MARGO Project Members (2009).
In summary, MARGO provides seasonal means for Northern Hemisphere winter (January, February and March; JFM) and summer (July, August and September; JAS), as well as annual means. However, the data availability for each of the three temporal means (winter, summer, and annual) is different for each proxy type. Specifically, diatoms are just available for 5 Southern Hemisphere summer; dinoflagelates, foraminifera, and Mg/Ca are available for the three temporal means; and U K 37 are only available as annual means. The observation errors are assumed uncorrelated, but each individual record in MARGO contains a specific uncertainty. Mapped into the SST space, the range of standard deviations in MARGO is within 0.79 and 4.87 • C, with relatively homogeneous uncertainty ranges among proxy types. Fig. 3 shows the type and location of the proxy data. In addition to the locations and uncertainty, we emulated the temporal mean availability of the observations, with all 10 temporal means calculated over the last 20 yr of the integration time of the true simulation. Thus, the synthetic observations in the experiment, as well as those in MARGO, impose a less rectrictive constraint not only in areas in which observations are more sparse, but also in those locations for which just one season or just annual means are available.

Nonlinearity and Gaussian anamorphosis transformations
The need for nonlinear estimation is justified based on the assumed nonlinear relationship between the control variables and the observation space. In this experiment, nonlinearity is imposed by the Earth system model, which directly generates SST; the observed variable. In a more general case of past climate analysis, the forward operator (proxy system model) can impose further 5 nonlinearity when the observed variables are direct proxy records (e.g.; foraminiferal counts, ring tree widths, speleothems, etc.). In addition to model and forward operator nonlinearity, non-Gaussianity in the control variables also renders (En)KF nonoptimal. Here we conducted a test with ETKF including the Gaussian anamorphosis (GA) transformation (ETKF-GA). The test transforms the control variables, whose background deviations from Gaussianity here derive from imposed bounds, and the SST in both the model equivalent of the observations, and the observations themselves, with the strategy explained in section 10 2.6. In this section, we show an example of non-Gaussianity and nonlinearity in this experiment, which are the motivation for the iterative FDS schemes evaluated in this manuscript as well as for testing the Gaussian anamorphosis. Specific estimation results are given in section 4.5. Note that no transformation has been applied to the FDS schemes.
In the ETKF-GA test, we conducted the marginal Gaussian anamorphosis for all the variables in the control vector. Table 3 shows the p-values of the Shapiro-Wilk normality test (Shapiro and Wilk, 1965) conducted on the original (raw) control vector, 15 and the anamorphosed ones (ana). The second column shows that Gaussianity is improved in all cases, most of them even reaching a p − value = 1. We note that despite the GA was applied to all the control vector, the raw samples of the background control variables had also a p − value > 0.05 in all cases. For the SST observations, we evaluated the marginal normality of the raw SST with the Shapiro-Wilk test. Then, we only applied the GA transformations at those locations with the background SST deviating significantly from Gaussianity. 20 The minimum relative humidity for high stable cloud formation (CAM.cldfrc_rhminl parameter), given its background uncertainty, was shown to have a strong effect on SST in the experiment. In most of locations of the global ocean the marginal background SST had a non-Gaussian but still unimodal probability density function. Especially complicated was the North Atlantic, with strongly bimodal background distributions in some locations. One of these cases is depicted in Fig. 4. As shown in Table 3, the CAM.cldfrc_rhminl parameter had a sharp increase in its background Gaussianity in the anamorphosed variable, 25 and Fig. 4a describes the transformation of SST in a location in the North Atlantic ocean. Specifically, this is the observation with the highest negative innovation in the experiment, and it seems to represents the most complicated case in the test. Apart from the high (negative) innovation, the raw observation falls well in the tail of the transformation, so its forward and backward transformations are rather sensitive to the dealing of the tails in the anamorphosis function, as mentioned in section 2.6 and Bertino (2009, 2012). In addition, there is the effect of bimodality in SST in this case. As a result, despite both is not bivariate Gaussian, as seen in Fig. 4b. This generic possibility also very well described by Amezcua and Leeuwen (2014). Appendix B shows an example of the more general case, where the Gaussian anamorphosis does not show specific issues.  Table 5 shows the values of the control variables for the true simulation, x t , the background (or prior), , and the ETKF-GA in a very similar J (θ a ) = 66.8. As expected, the FDS-EKS (or first iteration of the FDS-IKS), resulted in the highest cost value, with J (θ a ) = 93.13, as it is linear (as the ETKF) and also the FDS reduced exploration of the sensitivity is noisier than that of the ETKF with m = 60 members. The cost function for the thee iterations of the FDS-IKS is shown in Table B1.

Sensitivities and minimization
As seen, regarding the cost function, the Gaussian anamorphosis (as implemented here) did not improve the minimization 5 with respect to the ETKF, although the transformation served to obtain a lower value for the observational term J o . Out of several tranformation possibilities, Amezcua and Leeuwen (2014) concluded that using the cdf estimated from the ensemble model equivalent of the observations to also transform the observations (as done here, given the considerations indicated in section 2.6) was the worst option. In any case, we have also seen here a specific example that shows that even if marginal Gaussianity is improved in both transformed control variables and the model equivalent of the observations, the joint probability 10 is not guaranteed to be bivariate Gaussian (Fig. 4). It is also interesting to note that the slope of the scatterplot examples shown in Fig. 4 is positive for the two separate branches off an apparent switch ocurring in SST (with a sharp negative step in SST around CAM.clfrc_rhminl=0.88) at this location in the North Atlantic. Thus, the slope for a linear regression in this relation would have been clearly negative, as opposed to the slopes in the individual two branches. This switch in SST here seems to be associated to a displacement in the North Atlantic gyre and (possibly coupled) cloud development. This general 15 negative slope in SST with respect to CAM.cldfrc_rhminl is clearly shown at the given location in the ensemble sensitivity plot in Fig. 6. The switch also indicates that the background values and perturbation sizes are key for FDS estimation, and supports why the corresponding SST plot for FDS in Fig. 6 differs substantially from the ensemble sensitivity at the given location (note that the sensitivity about the background is the same for both FDS schemes; the FDS-MKS and the FDS-IKS).
Additional sensitivity plots (not shown) also indicate a positive mean ensemble sensitivity of both the total cloud cover and 20 the large-scale precipitation with respect to the CAM.clfrc_rhminl at this location, while the sensitivity of the cloud cover to CAM.clfrc_rhminl becomes negative for latitudes lower than 50 • N in the North Atlantic. A major point here is that the Gaussian anamorphosis cannot solve the strong nonlinear relationship in this case. Also, the MARGO coverage is far denser in the North Atlantic than in the rest of the global ocean. Thus, the updates in global parameters in the control vector are strongly influenced by the sensitivities in the area. The MARGO coverage in the North Atlantic can be adequate regarding the analysis 25 of the glacial climate at the LGM, and here it has been useful to show that the anamorphosis, as applied here, has not able to solve the strong nonlinearities found in the North Atlantic area. Still, in other conditions with available proxy time series, one could possibly apply alternative transformations for the observations based on their own statistics, and results could perhaps improve. Also, a further evaluation of the specific contribution from each observation to the increments could serve to design quality controls and improved ensemble approaches. By only looking at the posterior estimates in Table 5, it does not seem 30 possible to derive any general conclusion about the benefit of the Gaussian anamorphosis in ensemble Kalman schemes for multidecadal climate analysis at the view of this experiment, but further exploration is needed.
The lower cost function values of the FDS schemes with respect to the ETKF (with/without GA) suggests a benefit in the more limited (and noisier) but iterated local sensitivity estimation. Also, the computational cost in the FDS tests was about half of the ETKF (and ETKG-GA). Regarding the estimation of specific control variables, all of the evaluated schemes had some 35 variables for which the estimation, starting from the background, went in the wrong direction with respect to the true values.
For example, while the closest estimate to the true value of the relative humidity threshold for high stable cloud formation (cldfrc_rhminh) was given by the FDS-MKS, with a slight overshooting (0.81 versus 0.80 for the truth). It may have been that this slight overshooting has partially compensated for the effect of other control variables. Thus, the FDS-MKS estimates of the freshwater flux from the Greenland influx went in the wrong directions, as were the estimates for the autoconversion coefficients 5 in the Zhang-McFarlane deep convection scheme. On the other hand, the FDS-IKS had a total increment in cldfrc_rhminh in the wrong direction, but had the Greenland influx total increment in the right direction. The ETKF did not show any overshooting, but had some control variable increments going in the wrong direction. For the ocean background vertical diffusitivity, the only two schemes for which there was some, albeit minor, improvement in the estimate were the ETKF and the FDS-IKS. Still, the improvement is so light in these cases that it could be a random effect.

10
The perturbations in the FDS may have been far from optimal for sensitivity estimation regarding their effect on the model SST at the locations (including depth) of the observations for the integration times. Table B2 shows the corresponding standard deviation for the background and posterior estimates in the ensemble and iterated FDS schemes. In general, the posterior variance of the control vector was higher for the ETKF and the ETKF-GA and a bit smaller in the FDS schemes, but also the relative reduction in variances with respect to the background was not systematically lower or higher for any specific scheme. 15 In sensitivity studies, the conditional sensitivity exploration of the FDS has also been termed one-at-a-time ( to the mean ensemble sensitivities). For example, they found with both methods that the highest sensitivity of the upward shortwave flux (solar energy reflected back to space by the atmosphere and surface) was with respect to cldfrc_rhminl, out of 21 evaluated CAM parameters. As cldfrc_rhminl is the threshold of the relative humidity value at which low-level water vapor starts to condense into cloud droplets, their result is consistent with the role of thick stratus clouds in reflecting sunlight.
This study does not attemp to give an in-depth analysis of the assimilation results for the corresponding climate field recon- 25 structions. However, we summarise some results of the spatial patterns shown in the climate reconstructions and give examples of sensitivities as estimated by the FDS schemes and the ETKF. Fig. 5 shows, in general, a similar absolute bias reduction for the FDS-MKS and the ETKF in both general magnitude and spatial patterns for both SST and SSS. For SST, the most problematic area is that where most of the observations come from the diatom locations in the MARGO database. A reason for that seems to be that observations for diatoms locations are just the 20 yr means for winter in the Northern Hemisphere, 30 which reduces its impact on the climate analysis with respect to other observation types. Still, in general the FDS-MKS has slightly less areas where the absolute bias in the SST could not be reduced. Also the negative effect of the assimilation, regarding absolute bias reduction, on SSS which is shown for the ETKF for the North Atlantic, the Bering strait, and in some areas of the Artic ocean is negligible in the FDS-MKS. While these unobserved areas (from the point of view of the MARGO database) remain largely unconstrained, the FDS-MKS seeems generally more able to correct for the biases in areas with more observations and simultaneously not having a negative effect in areas far from the observations.
As an example of estimated sensitivities in the ocean, Fig. 6 shows the sensitivity of the sea surface temperature (SST) and sea surface salinity (SSS) to cldfrc_rhminl in CAM, as mean ensemble sensitivities and FDS about the background value. In the case of SST the general pattern of both sensitivities is quite similar, except for the much more negative sensitivity shown at Vertical diffusion in the ocean determines ocean heat uptake and in turn the air-sea heat flux and atmospheric heat transport, but also sea-surface temperature, evaporation and atmospheric moisture transport. Regions more sensitive to the ocean vertical difussion would be coastal upwelling systems (e.g.; Namibia), the equatorial oceans and the Southern Ocean in case 15 of upwelling, but also the North Atlantic Ocean in case of downwelling (deep-water formation), which is key in sustaining the Atlantic meridional overturning circulation (AMOC) (e.g.; Delworth et al., 1997). Thus, as a further example in the ocean, we find also interesting to show a sensitivity example for deeper ocean layers considering the short integration time (60 years although also diverging in some areas as Northern Europe around Scandinavia and Northern Asia. The relative amplified sensitivity seen in the FDS plot leads also to some areas reaching a negative sensitivity in the FDS estimates, while having a low but still positive sensitivity in the ensemble plot, as happens to the South of 30 • S in the Atlantic and Pacific around South America. Part of the differences are for sure due to the more global sensitivity explored in the ensemble, but also it is pending to seee how different perturbation sizes in the FDS would have affected its sensitivity pattern and amplitude. With different 5 patterns, a similar comparison can be done between the ensemble sensitivity and the FDS for the convective precipitation, which in both cases show a higher (in absolute values) sensitivity in the tropics, and a rather reasonable agreement, although very different in the tropical Eastern Pacific to the South of the Equator. The tropical climate may indicate the indirect effects of changes in evaporation, atmospheric convection and cloud formation. Higher differences result for the large-scale precipitation, although some similarities are also present; as the trough stretching from the tropical Eastern Pacific, across South America, to 10 the South of Africa, or the high sensitivity in the tropical Western Pacific. The total cloud cover finds a substantial difference between the sensitivity plots above ∼ 30 • N. However, the delineation (0 isoline) between the negative and positive values is substantially similar in both sensitivities. In general, despite the differences between the two sensitivity estimates (due to the different exploration of the control vector space), the similarities, along with the reasonable sensitivity patterns shown for the AMOC for the same ocean parameter κ ω , point to the usefulness of the strongly coupled assimilation for the low-frequency 15 climate analysis, where observation from one component of the Earth system is allowed to influence the state and parameters for a different component.
An important last consideration is that the assimilation will just attempt to minimise (or get the first moments) of the cost function. So the assumed background statistics in the cost function are instrumental in controlling the control vector increments in the assimilation and the resulting climate field reconstruction (CFR). In this synthetic experiment the source of 20 errors is known, and we assume a perfect-model framework except for the assumed uncertainties in the chosen control vector.
However, the real applied situation does not know about real model errors. As described in the introduction, the control vector increments will compensate for non-accounted errors. Although the minimization can highly reduce the value of a cost function and improve the corresponding CFR, it does not necessarily imply that updated parameters for the model physics (or their moments), as part of the control vector, actually correspond to improved (extrapolable) model physics. For example, the use of 25 the posterior model parameters can potentially lead to improved climate simulations for other prospective climatic conditions, but not necessarily. Thus, it is important to distinguish between the use of the assimilation methods for CFR including model parameters as control variables, and the trust one can have in the estimated model parameters for future climate projections under very different climatic conditions. A fair caveat note was recently given by Dommenget and Rezny (2017) regarding the use of flux corrections as an alternative to parameter estimation in CGCMs. As they indicated, the compensating error 30 risk when using parameter estimates for one specific observation dataset for future projections can be eliminated by using flux corrections instead to estimate the CFR. However, as flux corrections (with specific shapes) serve as a parametric way of expressing a model error, but also try to account for errors in boundary conditions (e.g.; in melting in a non-modelled ice sheet), which are specific for a climatic condition, the estimated flux corrections do not serve to improve the climate projections either.
All in all, our experiment with CESM is an example of joint estimation of flux correction, model parameters and forcing errors.
The goal is to analyse the low-frequency past climate, and at the very least the divergence between the parameters for the model physics estimated for multidecadal and longer past climate conditions and the values estimated for present conditions should serve to evaluate the reasons for the divergences, which points back to our introductory opening reference (Kageyama et al., 2018).

Conclusions
This study focuses on low-frequency climate field reconstruction (multidecadal and longer timescales) with comprehensive deterministic Earth system models (ESMs). Given the enormous computational requirements of this class of models, we evaluate two iterative schemes based on reduced-order control vectors and the Kalman filter as assimilation approaches for climate field reconstruction. The schemes use an explicit representation of the background-error covariance matrix, and the Kalman gain is 5 based on finite difference sensitivity (FDS) experiments. As such, the schemes are computationally limited to the estimation of a low-dimensional control vector. The underlying assumption is so that a low-dimensional control vector and its background uncertainty, contanining the most sensitive variables for a given climate, can encapsulate most of the modelled internal and external climate variability. The control vector can contain parameterized errors in initial conditions, and parameters for the small-scale physics, as well as parameters for forcing and boundary condition errors (e.g.; a bias in a time-varying radiative 10 constituent). In general, it is expected that errors in initial conditions are a low sensitive input for the low-frequency model climate response. Thus, these would be generally excluded from the control vector, which makes it relatively easier to keep its low-dimensionality.
The evaluated schemes are a FDS implementation of the iterative Kalman smoother (FDS-IKS, a Gauss-Newton scheme), and a so-called FDS-multistep Kalman smoother (FDS-MKS, based on repeated assimilation of the observations). We have 15 conducted two assimilation experiments: (a) a simple 1D energy balance model (Ebm1D; which has an adjoint code) with present-day surface air temperature from the NCEP/NCAR reanalysis data as target, and (b) a multi-decadal synthetic case with the Community Earth System Model (CESM v1.2, with no adjoint). The methodological description and the first experiment serves to show that, under a strong constraint minimization and perfect-model framework, the FDS-IKS should converge to the same minimum as incremental 4D-Var. Actually. in this experiment, the FDS-IKS converges substantially faster than 4D-Var and to the same minimum. The FDS-MKS does not theoretically converges to the same minimum (except for linear cases), but it is more stable than the FDS-IKS for poorly regularized cost functions.
In a second experiment with CESM, given the lack of an adjoint code, we included an ETKF (with m = 60 ensemble members) and an ETKF with Gaussian Anamorphosis (ETKF-GA), as nonlinear estimation approach alternative to 4D-Var, 5 as benchmarking schemes. As far as the authors know, this is also the first time that the ETKF-GA is evaluated with a comprehensive ESM for past climate multidecadal analysis. Regarding the cost function as performance criterion, the ETKF-GA was not clearly better nor worse than a standard ETKF. We have shown that the GA does not solve the strong nonlinearities which sensitivities may find at some observations. A clear example is the North Atlantic SST. The results cannot be extrapolated, for example, to shorter term climate analyses. Also, alternative anamorphosis strategies for low-frequency analysis could could be used initially to update the control vector estimate, and then be followed by FDS-IKS iterations (with the background covariance). Note that alternatively a linear search along the same direction provided by the FDS-IKS could also be possible.
An evaluation of further possibilities is needed.
Appendix A: Experiment 1: convergence analysis for finite difference schemes Here we give tabulated details and a summary of the convergence tests for the finite difference schemes in the experiment with the model Ebm1D.  -NN: number of perturbations m θ for each control variable. Including the estimate at each loop, the total ensemble size is m = m θ × q + 1, where q is the dimension of the control vector.
The tests are considered to evaluate the resilience of the Gauss-Newton scheme (FDS-IKS) to high perturbations and decreasing regularization, and how this affects the relative performance of the Gauss-Newton scheme versus the multistep scheme (FDS-MKS). Specific cost function values are to be compared quantitatively only within a specific weight scenario (PD1, PD2, 20 or PD3). For higher weights (being PD3 the highest), the effect of regularization by the background term decreases, which increases the chances of the FDS-IKS not converging (STOPPED tests due to unstable model integrations). Further tests (not shown) with even higher observational weight than PD3 made more and more difficult for the FDS-IKS to converge.  instances, the second step of the 3-step FDS-MKS has already lower cost function values than the final estimation of the 2-step FDS-MKS. This effect increases with decreased regularization. When the FDS-IKS starts to have convergence problems (see PD3 tests), increasing the ensemble size by conducting 2 perturbations per parameter (akin to central finite differences), makes the FDS-IKS more stable. Higher number of perturbations (not shown) make the scheme more stable, but this would not be practical for long term analyses with comprehensive ESMs.
Appendix B: Experiment 2 B1 Example of well behaved Gaussian anamorphosis As an example, Fig. B1 shows scatterplots between the parameter CAM.cldfrc_rhminl and SST for a location in the South Pacific in the limits of the Antartic circumpolar current. The location is shown in Fig. 5. Scatterplots are shown for the original variables and those with a Gaussian anamorphosis transformation. As opossed to Fig. 4, in this case Gaussianity is improved 5 in the anamorphosed SST, and also the scatterplots show that an implicit pseudolinearization appears between the two anamorphosed variables, mostly as a result of the SST transformation.
B2 Convergence of the FDS-IKS and posterior standard deviations Table B1 shows the convergence of the cost function for the FDS-IKS in the CESM experiment. Table B2 shows the posterior standard deviations of the control vector for the iterated FDS schemes, as well as for the ETKF and the ETKF-GA.   Alliance (HLRN) for providing HPC resources that have contributed to the research results reported in this paper. We thank two anonymous reviewers for their positive comments, which have been very helpful to improve the manuscript.