Parameter space Kalman smoothers for multi-decadal climate analysis in high resolution coupled Global Circulation Models

In climate reanalyses for multi-decadal or longer scales with coupled atmosphere-ocean General Circulation models (CGCMs) it can be assumed that the growth of prediction errors arises chiefly from imprecisely known model parameters, which have a nonlinear relationship with the climate observations (paleoclimate proxies). Also, high-resolution CGCMs for climate analysis are extremely expensive to run, which constrains the applicability of assimilation schemes. In a model framework where we assume that model dynamic parameters account for (nearly) all forecast errors at observation times, we compare 5 two computationally efficient iterative schemes for approximate nonlinear model parameter estimation and joint flux estimation (taking the specific shape of freshwater from melting in the Greenland ice sheet), and its physically consistent state. First, a trivial adaptation of the strong constraint incremental 4D-Var formulation leads to what we refer to as the parameter space iterative extended Kalman smoother (pIKS); a Gauss-Newton scheme. Second, a so-called parameter space fractional Kalman smoother (pFKS) is an alternative controlled-step line search, which can potentially be a more stable approach. While these 10 iterative schemes have been used in data assimilation, we revisit them together within the context of parameter estimation in climate reanalysis, as compared to the more general 4D-Var formulation. Then, the two schemes are evaluated in numerical experiments with a simple 1D energy balance model (Ebm1D) and with a fully-coupled Community Earth System Model (CESM v1.2). Firstly, with Ebm1D the pFKS obtains a cost function similar to the adjoint method with highly reduced computational cost, while an ensemble transform Kalman filter with an m= 60 ensemble size (ETKF60) behaves slightly worse. The pIKS 15 behaves worse than the ETKF60, but an ETKF10 (m= 10) is even worst. Accordingly, with CESM we evaluate the pKFS and the ETKF60 along with an ETKF with Gaussian Anamorphosis (ETKF-GA60). From all the options, the pFKS has the lowest cost function and seems the favored overall option under heavy computational restrictions, but the ETKF obtains better estimates of the flux term.


Introduction
The issue of fusing data into models arises in all scientific areas that enjoy a profusion of data.In the geophysical community this is referred to as inverse methods and data assimilation (DA), and it has the goal of estimating an unknown true state (Ide and Jones, 2007).Such methods can be considered as an approach for interpolating or smoothing a data set in space and time where a model acts as a dynamical constraint (Evensen, 1994a).DA is predominantly used for state estimation, combining observational data with model predictions to produce an updated model state that most accurately approximates the true system state whilst keeping the model parameters fixed.However, even with perfect initial conditions, inaccurate model parameters will lead to the growth of prediction errors.Thus to generate reliable forecasts and reanalyses, we need good estimates of both the current system state and the model parameters.Unknown parameters can be estimated as part of the assimilation by using state space augmentation (Friedland, 1969), and recent developments are being done in this sense.For example, Smith et al. (2011), by applying the technique of state augmentation, propose a hybrid sequential 3D-Var for joint state-parameter estimation.In either case, the parameter estimation problem is essentially nonlinear and may become extremely difficult to solve.Thus, care must be taken to have a consistent estimation of poorly known parameters in a model, and Evensen et al. (1998) showed that a basic rule is that all parameters which will be estimated should be added in a penalty function as weak constraints measuring their distance from a first guess in some norm.Specifically, in contrast to short-term and medium range operational weather prediction, multi-decadal and longer climate forecasts depend strongly on parameterisations rather than initial conditions.Although the model's trajectory through state space is highly sensitive to initial conditions, the climate of a sufficiently long trajectory (for example, temporal means of particular model variables) is typically much less sensitive to initial conditions, being essentially a sample of the underlying true model climate (i.e. the limit as integration time tends to infinity) contaminated by a small (and controllable) amount of deterministic noise due to the finite integration interval (Annan et al., 2005b).Other geophysical applications share this relevance of model parameters on the assimilation problem, as the estimation of distributed parameters and state for multiphase flow in petroleum reservoirs (e.g.; Gu and Oliver, 2007;Oliver et al., 2011), or hydraulic tomography for groundwater applications (e.g.; Schöniger et al., 2012).
A related issue is the enforcement of physically based conservation laws, which by default is not taken into account by (ensemble) Kalman filters.The importance of physical consistency has long been acknowledged in numerical weather prediction (NWP).Accordingly, the conservation of mass with ensemble-type Kalman filter algorithms has been the subject of recent research (e.g.; Janjíc et al., 2014, and references therein).If the control variables are the model dynamical parameters, once these parameters are estimated in the analysis, an additional confirming re-integration of the nonlinear forward model provides physically consistent estimates of the corresponding state variables along the assimilation window.
DA has been used as a technique for climate field reconstruction (CFR) in a number of studies without explicit consideration of parameter uncertainty.For example, Marchal et al. (2016) use marine sediment records of sea surface temperature (SST) and a regional model of ocean circulation with an extended Kalman filter (EKF) and a related smoother to analyse the transient movements of the North Atlantic subpolar front at the termination of the Younger Dryas (YD) cold interval.Here, we focus on the model parameter estimation problem under the assumption the errors in model parameters carry the bulk of the uncertainty for multi-decadal or longer timescales.The number of adjustable parameters in Global Circulation Models (GCM) is large, and these are known to have a significant impact on the simulated climate.Thus, perturbed physics experiments, with several configurations, are profusely used for understanding climate model responses to uncertain parameters controlling the model's physical processes and in order to design improved models.

Problem definition
In nonlinear filtering problems as well as in the linear ones, we are interested in computing the not only the conditional mean or mode, but also the conditional uncertainty.Frequently, the dynamical system depends on certain parameters whose values are imprecisely known.Our goal is to estimate the joint state-parameter mean/mode and covariance of a high-dimensional discrete representation of a dynamical process in the case that all sources of uncertainty can be assumed as coming from a small number of model parameters, θ.These parameters are considered to be uncertain but not time varying.However, the estimates of these parameters will change as new data are assimilated.
By high-dimensional state vector, we refer to not only a) that whose covariance matrix is too big to be explicitly stored and manipulated with current supercomputers, but also to b) that whose dynamic integration is also a big computational burden for the given requirement, such that ensemble integration, in order to work with a sufficient ensemble representation of the covariance matrix, is also severely limited.By small vector, we refer to that whose covariance matrix can be explicitly and efficiently stored and manipulated with today's computational capabilities.Generally, we assume that θ is a set of model parameters (or dynamical parameters), which may be regarded as random variables with known a priori statistics.The vector θ, along with dynamical parameters, may include uncertain initial and boundary conditions or a parameterized version of their uncertainty.Other than the sources of uncertainty explicitly included in θ, the model is considered a perfect representation of the (discrete) real dynamical system.Also, the non-linear forward operator H(•), which maps the state into the observation space, will usually depend on a number of parameters, which can be jointly estimated.However, we consider these parameter fixed in this study.So H is deterministic, which is the so-called perfect forward model assumption.We also assume that the model is weakly nonlinear, such that it can be linearized.
In general, with the restriction that θ is small, this problem definition limits the inclusion of initial conditions in the control vector in high-dimensional models, unless these initial conditions can be described with a reduced set of parameters.The range of problems is then narrowed down to specific cases.The specific one, motivating this study, is multi-decadal (or longer) paleoclimate reanalysis with high-resolution CGCMs.If initial conditions are reasonably unbiased, we can assume that the uncertainty in the multi-decadal climate integrations originates chiefly from the dynamical parameters for model physics after some model warming time.

Analysis approach
The problem is to fit three spatial dimensions in time.For variational schemes, this is referred to as four-dimensional variational data assimilation (4D-Var) in Numerical Weather Prediction (NWP), where the initial conditions of a model integration are estimated subject to model dynamics and according to background and observation uncertainties within a data assimilation window (DAW).The background (or prior) is normally given by a previous model forecast.In this article, time t k and its index k measure time relative to the start of the DAW, which is t 0 , using conventions similar to those of 4D-Var.Here we formulate the broader context of our estimation problem.We consider the discrete non-linear time invariant dynamical system model The vector x k ∈ R n is the state vector at time t k , θ k ∈ R q is a vector of uncertain model parameters, and M : R n → R n is its corresponding non-linear dynamics operator.We assume that specification of the model state and parameters at time t k uniquely determines the model state at all future times, and that model parameters are constant during the DAW (i.e., θ ≡ θ k+1 = θ k ).
That is, that the system can be represented on a discrete grid, and the model gives an exact description of the true behaviour of the system on the grid.We now consider an augmented state vector z, which includes x and the model parameters: so an equivalent augmented model is where M : R n+q → R n+q includes the static model parameters.Observations at time t k are represented by the vector and related to the model state by where H k : R n → R p k is a deterministic non-linear observation operator that maps from model to observation space, and k ∈ R p k is a realisation of a noise process, which consists of errors in the instrument, errors in the observation operator, and representativeness errors (those due to difference in spatial resolution between the measurement and the model state).Hk : R n+q → R p k is an equivalent augmented observation operator from the augmented state z k to the observation space.We assume k is a Gaussian variable with mean 0 and covariance matrix R k .The error covariance matrix of a state z k = [x T k , θ T ] T , where the superscript "T" denotes matrix transposition, at any time t k within the DAW is where P xx k ∈ R n×n is the error covariance matrix for x k , P θθ ∈ R q×q is the error covariance matrix of the parameter vector θ, and P xθ k ∈ R n×q is the error covariance between x k and θ.Within a Gaussian assumption for the various error terms, the joint state and parameter estimation goal in 4D-Var then to find the initial state z 0 that minimises a non-linear cost function given by  6) is subject (constrained) to the states satisfying the nonlinear dynamical system (3) and is known as strong constraint variational formulation, while the additional inclusion of model errors would lead to a weak constraint variational formulation.
The solution to the functional J 1 (z 0 ) is z a 0 , referred to as the analysis.In general, an exact solution cannot be found.In the incremental formulation of 4D-Var, the solution to the nonlinear minimization problem ( 6) is approximated by a sequence of minimizations of linear quadratic cost functions.These minimizations (inner loops) use gradient descent algorithms (e.g.; Lawless, 2013).Let us consider here the incremental approach, where z l 0 provides the current approximation, and initially, for l = 1, z 1 0 = z b 0 .The innovations are then given by the residual between the observations and the mapping of the initial state in the current approximation into observation space where the computation of the initial state mapped to observation space, Ĥ(z l 0 ), has the following way: The assumption that Hk is linear, such that where Hk (z l k ) is the Jacobian of Hk (•) evaluated at z l k , is the so-called tangent linear hypothesis (TLH), and Hk is referred to as the tangent linear operator in the DA literature.The incremental 4D-Var considers a further tangent linear (TL) approximation by including linearization of the model dynamics about the background as where δz 0 = z 0 − z l 0 is the increment.By considering ( 7) and ( 10), the generalised error term ˆ in (6) for all observations in the DAW can be expressed as where Ĥl ≡ Ĥ(z l 0 ).In the inner loop, this approximation of ˆ is introduced in (6) leading to a linear quadratic cost function as a function of the increment δz 0 Once the minimization of J 2 (δz 0 ) has met a given criterion, the outer loop takes control, where the estimate of the initial state is updated with the estimated increment: z l+1 0 = z l 0 + δz 0 .Incremental 4D-Var has been shown to be an inexact Gauss-Newton method applied to the original nonlinear cost function (Lawless et al., 2005).As we assume Gaussian R and P b 0 and a perfect-model framework except for the sources of model uncertainty in z 0 , the conditional mode given by the linear least squares minimization ( 12) is the same that the conditional mean (also called the minimum variance estimate) given by the explicit solution where K l is known as the Kalman gain matrix, given by So, the state vector is explicitly updated, without using an inner loop, as which actually is the formulation for the Iterative Kalman Smoother (IKS) with a neglected model error (see Bell, 1994).For the first loop (l = 1 ⇒ z b 0 − z l 0 = 0) Eq. ( 15) is reduced to the mean update of an extended Kalman filter (EKF) (see e.g.; Jazwinski, 1970) (but using future observations).So, in the same way that incremental 4D-Var, IKS gives an approximation to the conditional mode or maximum likelihood of the cost function (6), as shown by Bell (1994).
However, the climate of a sufficiently long trajectory is typically much less sensitive to initial conditions than the short and medium-range NWP.If model initial conditions are reasonably unbiased, after some time to a quasi-equilibrium, it is likely safe to assume that their influence on the climate forecast at multi-decadal (or longer) scales is negligible.Specifically, we assume that (a) there are no model errors apart from those considered in θ and (b) either the influence of the initial conditions on the state estimates is negligible at the observation space (given times and locations) or the initial conditions of the model state are correct (x b 0 = x t 0 , P b xx = 0, where x t 0 is the true system state).The optimization is then restricted to estimate the uncertain model parameters included in θ.If assumption (a) does not hold or none in (b) holds, the parameter estimation will attempt to compensate for these situations.For example, this does not imply that the deep ocean has reached its full equilibrium if no corresponding observations are available for the analysis.
With the above considerations, we define G(•) as a generalised (deterministic) observation operator mapping a vector θ into the observation space; G : R q → R p , which follows where t q represents the time to quasi-equilibrium; i.e. limit of the memory of the initial conditions mapped into the available observations.This leads to a reduced problem, where ( 6) is replaced by the non-linear cost function This is what we call here a parameter space formulation, as only the vector of parameters θ is then updated in the analysis.
To avoid confusion, note that this has been called the model space in other contexts (e.g.; Tarantola, 2005).Here, within the context of climate (and weather), we choose the parameter space term as it explicitly avoids the inclusion of the time-varying state vector x in the space of the solutions.Once the parameters are estimated, forward integration with the updated parameters leads to their physically consistent climate estimates.The trivial substitutions into the incremental formulation (12) and its solution (15) lead to the parameter space iterative Kalman smoother (pIKS) that we summarize later in a specific section.An alternative approximation to the non-linear problem follows then as the parameter space Fractional Kalman smoother (pFKS).
Importantly, a potential drawback of the conditional mean estimation, as given by the En(KF) methods including the EKF, arises when the density conditioned to the observations is multimodal.For long-term climate analysis this arises by the possibility of multiple equilibria (e.g.; see Evensen, 1994b;Ghil, 1997).While it the availability of dense observations should counteract this problem by constraining the attractor basin in which the state lies (Cohn, 1997), paleoclimate proxies are generally sparse.On the other hand, the IKF (as incremental 4D-Var) estimate obtain an approximate maximum likelihood estimate instead.Thus, the possibility of multiple equilibria has to kept in mind and a careful analysis of estimated states should be conducted in either case.
The following section is devoted to the estimation of G ∈ R p×q , the Jacobian of G.Both iterative schemes described later need to estimate G, or climate sensitivities to the parameter space, to obtain θ as approximation to the minimization in (17).

Estimation of background sensitivities and error covariances
G, the Jacobian of G, is needed in the iterative schemes described later.The explicit calculation of the Jacobian of complex functions can be difficult, requiring complicated derivatives if done analytically or being computationally costly if done numerically.While the relation between G and P k is implied in the previous section, it is instructive to look at it in some detail.
Let us consider the case of a specific observation time t k .The Kalman gain matrix (disregarding the loop index, if any) for the components of the model parameters, which we denote in this section as K k θ , can be expressed in the two following ways: where the first way is the standard one in Kalman smoother expressions including parameter estimation via state vector augmentation, and the second one is the parameter space formulation.Both are equivalent, but the covariance information in P k has been transferred to G k , or sensitivity matrix, in the second expression.Let us further consider the case that at t k there is a single observation y of a state variable within the vector z k , denoted as x ky , and we focus on the gain for a single parameter θ i .The covariance between θ i and the observation y is expressed in both cases as which, as ∂y ∂θj = ∂y ∂x ky ∂x ky ∂θj , indicates that the linear equality is taken from a bottom-up approach in the parameter space formulation, where all sources of uncertainty are specifically evaluated to compose the covariance σ x ky θi .Interestingly, Gu and Oliver (2007) introduced a scheme called Ensemble Randomized Maximum Likelihood Filter (EnRML) for on-line nonlinear parameter estimation, which was later adapted as a smoother, the batch-EnRML, by Chen and Oliver (2012).The EnRML and the batch-EnRML, oriented towards history matching in reservoir engineering, were also formulated in the parameter space, but designed for cases where the parameter vector is too big, such that an ensemble representation of the parameter error covariance is used.Thus, they estimated an ensemble-based average sensitivity matrix Ḡl at each iteration based on model integrations with parameter sets from a m-member multivariate random sample drawn from the full P θθ : where ∆θ l ∈ R q×m is the matrix of random model parameter perturbations drawn from P θθ , around the estimate for the current loop (i.e.; ∆θ l = θ l − θl ), and ∆Y l ∈ R p×m are the resulting perturbations in the observation space.As ∆θ l is not generally invertible, the EnRML uses a singular value decomposition (SVD) of ∆θ l to solve for Ḡl .It should be noted that the sensitivies estimated in such a way result from a sampling from the (Gaussian) background multivariate probability density function of the parameters.
Here, given the size of θ in our problem definition, we do not need to resort to ensemble covariances, and we follow the computationally less expensive and direct approach of doing one-at-a-time (OAT) sensitivity experiments to explicitly obtain the entries in G, where each parameter is perturbed independently to obtain the corresponding sensitivity to that specific parameter.As the sensitivity of the climate variables to the individual parameter perturbation is not necessarily linear, the perturbation tests may in general be conducted by generating a random sample of perturbations for the individual parameter and evaluating the model climate response to the perturbed parameter sample.Positively, this approach is not subject to the development of spurious correlations by the ensemble representation of the covariance matrix.However, the computing requirements are linearly proportional to the number of estimated parameters.Also, in contrast with the EnKF approach (as the batch-EnRML), this OAT estimation of sensitivities to each parameter is evaluated by sampling from the background univariate probability density function for each parameter conditioned to the rest of the parameters being fixed to their background mean.The corresponding model integrations, where m θ > 1 is the ensemble size for each parameter, and q the number of parameters.Also, it requires p × q linear regressions.Note this is different than the slope of the linear regression of a variable against the same parameter for a sample drawn from the whole multivariate distribution (as the EnKF), which is also affected by the sensitivity to other parameters even in a linear situation if P θθ is not diagonal, as shown in (20).
The model integrations for all the sensitivity experiments can be done in parallel as a general ensemble approach, but the parameter error covariance P θθ is explicitly represented.The integration of the model with the mean background parameters is required to calculate the innovations, and its output can be used as additional sample for each linear regression.If computational constraints are severe, in the limit the estimation of the OAT (or conditional) sensitivities reduces to a single perturbation experiment for each parameter; that is to get a local finite difference approximation.In this case, the columns G :,i of G are estimated by computing where for each parameter, δθ i is a small perturbation (or variation) to the current approximation of θ i and δθ i is the vector 0 ∈ R q but with element θ i replaced by δθ i .As indicated above, the estimation of sensitivities by local finite difference approximations is still a result from sampling the conditional distributions in the parameter space.The fact that sensitivities are evaluated at the time of the observations is relevant regarding the assumed decorrelation time between the initial conditions and the observation vector.

Parameter space iterative Kalman smoother
Iterative linear methods are now common for DA applications in nonlinear systems.As approximate nonlinear filters, Jazwinski (1970) considers local (conducted over a single assimilation cycle) and global (conducted over several assimilation cycles) iterations of the EKF.Local iterations of the Kalman filters are designed to deal with nonlinear observation operator and non-Gaussian errors.The locally iterated (extended) Kalman filter (IKF) is a Gauss-Newton method for approximating a maximumlikelihood estimate Bell and Cathey (1993), and actually it is algebraically equivalent to nonlinear three dimensional variational (3D-Var) analysis algorithms (Cohn, 1997).Later, Bell (1994) showed that the iterated Kalman smoother (IKS) represents a Gauss-Newton method to obtain an approximate maximum likelihood, as was shown later for incremental 4D-Var (Lawless et al., 2005) and has been summarised in section 3.1 above.Thus, the IKF (as the IKS) circumvents the need for choosing a step size, which is sometimes a source of difficulty in descent methods.However, being a Gauss-Newton method, not even local convergence is guaranteed.The scheme we use here, which we call the parameter space iterative Kalman smoother (pIKS), is a Gauss-Newton scheme as the IKS and the EnRML.It is formulated in the parameter space as the EnRML, but despite sensitivities are derived via numerical perturbations of the parameter space, it is not an ensemble method.The computational demand will be lower than that of the EnRML if the number of parameters to be estimated is less than the ensemble size of the EnRML.For a high number of parameters (e.g.; inclusion of distributed initial conditions in θ), explicit management of the P θθ would not be feasible and one should resort, for example, to an ensemble scheme as the batch-EnRML.For any natural number l, the pIKS provides the update The sequences {θ l : l ≥ 0} and {P l θθ : l ≥ 0} are defined inductively as follows where for notational convenience and Equations ( 23) and ( 26) show that, as in the IKS and incremental 4D-Var, the pIKS uses the initial background error statistics P b θθ during all iterations.The updated P a θθ is just based on the sensitivity evaluated at the last iteration.

Parameter space fractional Kalman smoother
Here, contrary to the pIKS, a step size is chosen with the idea of potentially being able to deal with higher nonlinearity.By fractional step assimilation we refer to a first order line search strategy, whereby once a descent direction is estimated, a step size in that direction is conducted by inflating the observational error covariance matrix R and applying a standard Kalman filter (KF) with the inflated R. The approach of inflating R and using repetitive assimilations of the same observations was termed multiple data assimilation (MDA) by Emerick and Reynolds (2013) to propose an ensemble smoother based on an iterative scheme, and was also advocated by Bocquet and Sakov (2014) in their iterative ensemble Kalman smoother (IEnKS).
We apply a parameter space formulation of the MDA concept with the KF.The scheme considers the total increment in the state vector that would result from the linear assimilation of one specific observation, and alternatively conduct a sequence of repeated assimilations of the same observation whose sum of fractional increments equals the total increment.This is achieved by considering that the observation error variance at loop l is the product of an inflation factor β l and the variance of the "complete" observation: σ 2 y l = β l σ 2 y .As the (linear) increment is inversely proportional to the observation error variance, for the total increment to be the same in both situations the condition that (σ ) −1 must be fulfilled.This leads to the constraint Without specific consideration of constraint ( 27), the idea of inflating the observation error covariance matrix has also been considered by previous studies within the context of petroleum reservoir history matching, whose parameter-dominated uncertainty is a similar situation to that of the long-term climate analysis.Thus R has been inflated to improve initial sampling for the EnKF (Oliver and Chen, 2008), and also to damp model changes at early iterations in Newton-like methods (Wu et al., 1999;Gao and Reynolds, 2006).With the specific focus on parameter estimation in climate models, the idea of inflating R for repeated assimilation of the same observations was proposed by Annan et al. (2005a) and further clarified and applied by Annan et al. (2005b) for an atmospheric GCM using the EnKF with parameter augmentation.Their approach, also formulated within a perfect-model framework, differs from the iterative scheme described here in that they use a sequence of model integrations and assimilation steps advancing in time, where the observations at each assimilation time are inflated by a factor.The scheme is such that in a linear case, after the predefined sequence of integrations/assimilations, the attained solution is identical to that of the single step scheme along the whole DAW.Thus, their approach is designed for steady-state cases, where time-averaged climate observations corresponding to a long DAW can be assumed as constant during each of the smaller assimilation windows in which the problem is divided.The model parameters are then sequentially updated in smaller increments (with respect to a single step scheme) so the loss of balance in a more general nonlinear model should be reduced with the multi-step approach (Annan et al., 2005b).On the other hand, the iterative scheme used here, where the complete DAW is considered repeatedly, is not restricted to steady-state conditions.
Here, the sensitivities between the observations and the control vector, G l , are evaluated at each fractional step, similarly to the pIKS.However, opposite to the pIKS, we will see that the parameter error covariance is also updated at each iteration.
Thus, even if the background covariance P b θθ is diagonal, as the iterations proceed covariances among the parameters develop.Hence, as indicated in (20), the covariance between any climatic variable and a parameter θ i (σ x ky θi ) is also affected by the sensitivity of the climatic variable to other parameters.We refer to this iterative scheme as parameter space fractional Kalman smoother (pFKS).Contrary to the pIKS, the analysis in the pFKS is just defined after a complete set of pre-specified iterations N l is conducted; The sequences {θ l : l = 0, . . ., N l } and {P l θθ : l = 0, . . ., N l } are defined inductively as follows where for notational convenience and Regarding β, a possible step size approach is to set the inflation weight constant for all the iterations, which given ( 27) leads to β = N l 1, where the column vector 1 ∈ R N l has all values set to 1.However, as the iterations proceed, the updated background covariance decreases so the fractional increments get smaller.With high nonlinearity between the observations and the control vector, mostly the first fraction/s could still make the process diverge (as in the IKS) by setting the model to unstable conditions for the next iteration.A more even distribution of fractional increments, with likely reduced risk of instability, can be given by decreasing inflation weights as the iterations proceed, so initial weights are relatively higher.Thus, among other possible solutions for the inflation factor at fractional step l, here we adopt the expression which satisfies requirement (27).We do not consider here more adaptive strategies.For example, the increments obtained for a truncated solution (described in the next section) could potentially be used to guide the size of β l at each loop.A further advantage of inflating (the normally diagonal) R is that it reduces the condition number of the matrix to be inverted for the KF application at each loop.

Truncated solutions for the pFKS
Given the problem definition in section 2, the computational cost for model integrations should be much higher than that of the analysis steps for each iteration of the pFKS (or the pIKS).If required, an approximate solution can be found at each iteration of the pFKS by using the weight required to fulfil the condition (27) as alternative to β l in (31): where the completion weight β c l satisfies ( 27) and serves to yield an approximate solution at any iteration l.The truncated solution to the pFKS, using β c l in (31), can be computed simultaneously to the fractional step of the pFKS, which uses β l instead.In addition to be a temporal solution, comparison of the sequences given by the fractional steps of the pFKS with the simultaneous truncated solutions may also be used to detect linearity assumptions and support early decision to stop the fractional process, taking a truncated solution as acceptable approximation to the nonlinear minimisation.Here we do not explore this, nor the further possibility of intermediate schemes between the pIKS and the pFKS as, e.g., a Levenberg-Marquardt approach.

ETKF and Gaussian anamorphosis
The ensemble Kalman filter (EnKF) was introduced by Evensen (1994a).It makes it possible to apply the Kalman filter to highdimensional discrete systems, when the explicit storage and manipulation of the system state error covariance is impossible or impractical.The EnKF methods may be characterized by the application of the analysis equations given by the Kalman filter to an ensemble of forecasts.One of the main differences among the several proposed versions of ensemble Kalman filters is how the analysis ensemble is chosen.Ensemble square-root filters use deterministic algorithms to generate an analysis ensemble with the desired sample mean and covariance (e.g.; Bishop et al., 2001;Whitaker and Hamill, 2002;Tippett et al., 2003;Ott et al., 2004).For comparison with the above iterative schemes, we use here the LETKF version of this deterministic approach, as described by (Hunt et al., 2007), including state augmentation with the vector of parameters.While we use localization for state variables, this is not shown in this paper.Thus, as we do not apply any localization for the scalar model parameters in this study, we denote it as ETKF hereafter.
Still, for the En(KF) to be optimal, three special conditions need to apply: (1) Gaussianity in the prior, ( 2) linearity of the observation operator, and (3) Gaussianity in the additive observational error density.In order to better deal with non-linearity, a number of studies have addressed the use of transformation of the model background and observation to obtain a Gaussian distribution, in a way that the (En)KF can be applied under optimal conditions.This preprocessing transformation step is known as Gaussian Anamorphosis (GA) (e.g.; Chìles and Delfiner, 2012).The GA procedure was introduced into the context of data assimilation by Bertino et al. (2003), and has been applied for many years in the field of geostatistics (e.g.; Matheron, 1973;Deutsch and Journel, 1998).
It is not standard, however, how the GA should be applied in the context of DA.The process of GA involves transforming the (augmented) state vector and observations {z, y} into new variables {z, ỹ} with Gaussian statistics.The (En)KF analysis is computed with the new variables, and the resulting analysis is mapped back into the original space.For the transformations, the GA makes use of the integral probability transform theorem.In a theoretical framework and with simple experimental analysis, Amezcua and Leeuwen (2014) evaluated several approaches to deal with the various nonlinearities using univariate GA transformations.While they proposed a bivariate method where both a state variable x and and observation y could be jointly transformed by the GA, the bivariate method requires a good knowledge of the likelihood, which it not common in real cases.Thus they concluded that when ensemble sizes are small and the knowledge of the conditional p y|x (y|x) is not too precise, it is perhaps better to rely on independent marginal transformations for both.This is likely the general case of long term climate analyses, and the approach we evaluate here along with an ETKF for the joint state-parameter estimation problem via direct augmentation of the state vector with the model parameters.
In our specific implementation of the ETKF we further augmented the state vector with the model equivalent of the observations.Then we estimated the marginal distribution of the observations based of the background statistics of these model equivalent.The transformation then operates in the marginals in independent way: where P ξ (ξ) denotes the cumulative density function (cdf), and Φ ξ (•) explicitly indicates that the cdf in the transformed space is that of a Gaussian random variable.For example, this marginal transformation approach was followed by Simon and Bertino (2009).A thorough discussion of this transformation is given by Amezcua and Leeuwen (2014).Tests with standard ETKF are included in the two experiments below.A test with ETKF including GA is included in experiment 2, with the CGCM.Here we make a comparison with their PD1 scenario; a preindustrial test in which they estimate the parameters described in Table 1.PD2012 give a thorough description of all model parameters.The target for PD1 is the surface air temperature (SAT) from the NCEP/NCAR reanalysis data (Kalnay et al., 1996).As PL2012, we integrate the model for 100 years at daily time steps, and use monthly climatic means of the last 10 years for calculating the cost function.
The cost function in PL2012 does not include any regularization for the model parameters.Thus, they just consider a term J y (θ), which is a weighted sum of squared errors (SSE) between the 10-year climatic means for both winter and summer from the observations and the model.In our implementation we include the departure from the background as a penalty, as it is more standard 4D-Var applications.So, the cost function can be written as J (θ) = J y (θ) + J b (θ), representing the two terms in (17).We assumed Gaussian background statistics of the parameters errors and a diagonal P b θθ , with standard deviations given in Table 1.After optimization with the full cost function, for comparison purposes we indicate a J y (θ) cost function term as PD2012, which by design is necessarily higher in our regularized set up.Similarly, we obtain a full J (θ) for the adjoint method by adding to the J y (θ) term in PL2012 the deviations from the background, J b (θ).The general purpose of the benchmarking is to test if parameter convergence goes in the same direction as with the specific use of an adjoint code.The prior uncertainties were taken as sensible values, and other than the parametric uncertainty we considered a perfect-model framework.In this experiment, we evaluate the convergence of pFKS and pIKS as compared with the estimate of PL2012.Specifically, based on previous experiments, we observed that the model SAT is highly sensitive to some parameters in some regions of the parameter space (see results below).Thus, in order to obtain robust estimates of the local sensitivity matrix, we used a small ensemble of perturbation experiments, with m θ = 3 members for each parameter to obtain G by linear regression, as described in section 3.2.Thus, including the background (i.e. its update at each loop) as additional sample, each linear regression was calculated with four samples.To further improve the robustness of the estimation we conducted weighted linear regressions, where the weights were inversely proportional to the values of the cost function at the corresponding loop.So statistically better models have a higher weight in the regression.For clarification, this strategy is not directly related to Particle Filters (e.g.; Ades and van Leeuwen, 2013), where the likelihood of the observations are actually used to weight the conditional density.Here, once the local sensitivity is obtained, the weights are not used elsewhere.In addition, we include a test with a standard ETKF augmented with the model parameters using two different ensemble sizes (m = 10 and m = 60).The evaluation of the cost function for the ETKF (as a smoother) is conducted with a single forward integration of the mean of the estimated parameters.
To emulate the computational constraints of large CGCMs we set up the number of iterations in the pFKS to N l = 3.Also, we include the results for the extended Kalman smoother formulated in the parameter space, pEKS, which is the first loop of the pIKS.For a broader explanation of the influence of estimated parameters on the resulting model climate we refer to PL2012.
Here we provide a succinct summary of the estimation process.Table 2 in section 4.3 compares results from the evaluated methods.

Observations
As indicated, we took SAT observations derived from the NCEP/NCAR reanalysis data.From the reanalysis data we first calculated as worldwide zonal means of SAT.Then, we obtained SAT means for preindustrial climate at each (latitude) grid cell for winter (January, February and March; JFM) and summer (July, August and September; JAS).The mean of the last 10 years of the model integration were taken as model equivalent of the observations.That is, each grid cell in the 1D model has only one value for winter (JFM) and one value for summer (JAS) in the cost function.As PL2012, we assumed that observation errors are uncorrelated (R is a diagonal matrix), and that their variance is inversely proportional to the area of the zonal band; i.e., σ 2 , where a λ is the area at the corresponding latitude band relative to a 10 • latitudinal band centered at the Equator.That is, a λ is close to one near the Equator and approaches zero toward the poles.As PL2012, we chose the base standard deviation σ T0 = 1 • C.

Results
The background sensitivity of the 10-yr mean SAT to the evaluated parameters is shown in Fig. 1, where for comparison the sensitivity matrix G is scaled by multiplying each of its columns by the assumed background standard deviation of the corresponding parameter.Each scaled column is a line in the plot.Note the original (unscaled) sensitivities in G are independent of the background standard deviation.Except for the linearized longwave radiation, which is negatively correlated with SAT at all latitudes but with a relatively low sensitivity, the rest of the parameters show a rather neutral, but negative, sensitivity in the tropical belt and a positive sensitivity increasing toward the poles (nearly symmetrical off the Equator), with weaker scaled sensitivities for the ocean mixed-layer depth, H o .The pattern is similar for summer months (not shown).
-50 0 50 Ebm1D experiment.Background sensitivity of winter surface air temperature to the background parameter vectors.Sensitivities are scaled by the standard deviation of parameters, so they are in units of temperature, and each line refers to the corresponding column G:,i.
The adjoint method took 190 iterations and 236 simulations to converge (see Table 3 in PL2012), which indicates the difficulty of approaching the (possibly local) minimum of the cost function in this scenario.More simulations than iterations of the adjoint formulation were needed, as in some instances the cost function did not decrease, and a perturbation to the parameters had to be added to help the iteration finding an alternative convergence path towards the minimum.This difficulty seems to be related with the possible overshooting of parameter values along the iterations in the quasi-Newton method used in the adjoint approach.The possibility of overshooting can lead to regions of the parameter space to which the modelled climate is too sensitive as well as excessive departure from the minimum, and the method has to fight back with these situations.This chance is, for example, evidenced in the pEKS (which equals a one-loop pFKS or the first loop of the pIKS) estimate where, as shown in Table 2, the mean estimate of the ocean mixed-layer depth reaches an excessively low value resulting from the prior sensitivity and generally negative innovation values in the first loop.Note we set P b θθ as a diagonal matrix, so no covariance between parameters affected the analysis step in the pEKS, or the pIKS iterations.On the other hand, the estimation with As expected the posterior values from the pFKS lie between the values obtained by the adjoint method and the background values, because of the regularization we imposed in the cost function J (θ).Accordingly, the cost function J y (θ) obtained by reintegration with the mean updated parameter estimates is slightly higher for the pFKS than for the adjoint.However the value of total cost function for the posterior from the pFKS was lower than for the adjoint.This does not necessarily mean that the pFKS behaved better than the adjoint regarding the final estimate of the minimum.Had the adjoint method been applied to minimize the complete cost function J (θ), instead of J y (θ), it would have very likely obtained parameter values closer to those estimate by the pFKS, and also a value of J (θ) lower than 20.49[at the expense of a higher J y (θ)].In summary, it seems that both approaches behaved similarly regarding the estimate of the posterior mode.However, the computational cost of the adjoint method was higher, as we conducted 48 model integrations [(1 + m θ × q)N l ] with the pFKS (and the pIKS), versus the 236 forward integrations plus 190 adjoint integrations for the adjoint convergence indicated in PL2012.That is, the computational cost for the pFKF was 11.5% of that of the adjoint method.We note, however, that for other scenarios PL2012 report quite higher convergence rates for the adjoint method.
An advantage of the pFKS and the pIKS with respect to the adjoint method for parameter estimation is the estimation of a posterior parameter covariance.Table 2 also indicates the posterior standard deviations for the pFKS and the pIKS.Nondiagonal values of P a θθ are not shown.Overall, the posterior standard deviations for both, the pFKS and the pIKS, are about one order less of magnitude than the respective background values, resulting from the dense observation dataset in the experiment.
To further evaluate the consistency of the estimated mean and covariance, we conducted a new ensemble model re-integration with a sample of parameters drawn from the posterior statistics for the pFKS.Values for J y (θ) for this ensemble were in the range [16.23, 17.44], and values for J (θ) were in the range [19.51, 20.59].
The pIKS, limited to three loops in this test, was not able to obtain a lower cost function than the background, as a result of the imposed limit in the number of loops and initial overshooting of the Gauss-Newton approach.As shown in Table 2, its first loop (the pEKS) obtained some extreme values, with three parameters out of the range given by the background and the adjoint analysis: H o , K 2 , and K 4 , which resulted in extremely high cost function values for this first loop.In the second and third loops, the pIKS was very able to recover from this situation, but still not reaching the low cost function values obtained by the pFKS.Further iterations of the pIKS (not shown) did not improve these estimates up to N l = 6, which can be taken as the convergence being trapped in a local minimum, as seemed to happen in the adjoint approach.The addition of perturbations to the parameters estimates along the iterations could have helped the pIKS to reach the minimum, as it did for the adjoint.
However, with our focus on big CGCMs we decided to disregard this strategy.The mean estimate given by the ETKF with m = 60 members (computationally similar to the pFKS configuration) was better than the pIKS regarding the cost function values.However, it was not as good as the pFKS.Also, the deterioration of the ETKF was very significant with the reduced ensemble size m = 10.In view of the results in this experiment, we decided to just evaluate the pFKS in the experiment with the CGCM described in the next section.The experimental setup is designed to emulate a case of past climate reconstruction with sparse observations, as the sparse availability of observations is a usual constraint in paleoclimate analysis.Specifically, we take the distribution of available observations of near sea surface temperature for the Last Glacial Maximum (LGM) from the "multiproxy approach for the reconstruction of the glacial ocean surface" (MARGO) database (MARGO Project Members, 2009).The LGM has received great attention in the paleoclimate community for its relevance to understand climate feedbacks and future climate projections.
For the purpose of this study, it is not so important that the actual climate of the model matches that of the LGM but that the case study is realistic from the estimation point of view.Thus we just make use of the MARGO database characteristics (locations and uncertainties), but conduct a synthetic experiment for preindustrial climate conditions.To do so, before starting the experiment we spun up CESM for 1200 yr starting from Levitus climatology with preindustrial conditions to reach an equilibrium state.We used then restart files from the end of the spin up time to create a 60 yr control simulation (as synthetic truth) with the same parameters as used for the spin up.
Also, we added an extra small influx of water into the North Atlantic from melting in the Greenland ice sheet (GIS) to the true simulation as additional variable to be simultaneously estimated.This flux was homogeneously distributed along the coast of Greenland and at the ocean surface.We decided to include this freshwater flux in the North Atlantic as control variable because the Atlantic meridional overturning circulation (AMOC) plays a critical role in maintaining the global ocean heat and freshwater balance, and it is commonly acknowledged that the North Atlantic deep-water (NADW) formation is key in sustaining the AMOC (e.g.; Delworth et al., 1997), while in turn freshwater flux in the North Atlantic, along with surface wind forcing, ocean tides and convection, provides the energy for the NADW formation (e.g.; Gregory and Tailleux, 2011).
Adding this freshwater flux from the GIS (or freshwater hosing) as simultaneous uncertain variable makes the identification of the model parameters more complicated, but we understand it is realistic to expect that current melting estimates for past climate can hold some bias and it is useful to know how the various estimation methods evaluated deal with this possibility.Also, the addition of this melting can be considered as the inclusion of a (constant) flux correction term.So, the estimated flux term attempts to correct the mean state towards the observations along with the model parameters.The experiment serves as an example of joint estimation of parameters and flux terms for CFR.Then we initiated the background with biased parameters with respect to the truth and a zero mean freshwater flux from GIS.We used standard deviations for the selected parameters as reasonable values derived from previous publications, and all simulations were branched from the same simulation than the truth.Thus, as initial conditions are the same for both the true and the runs with perturbed physics, it is guaranteed that the initial conditions do not have an influence in the analysis, which allows to use short integration times for the experiment.In a real situation integration times should be longer to ensure that errors in the initial conditions will not affect the analysis (or they should be accounted for, otherwise).
Separate re-analyses for different model components (atmosphere, ocean, land) may be inconsistent.Thus, in our coupled setup, observations from any component of the Earth system are allowed to directly impact model parameters from any component.This is known as multi-component data assimilation experiment.As other GCMs, the atmosphere and ocean components of CESM include a variety of empirical parameterizations to model processes occurring at subgrid space and time scales.As in the previous experiment, we considered as uncertain a number of physical parameters, in both the atmosphere and ocean, relevant to the global energy budget of the Earth system.The set of control variables used for the experiment is summarised in Table 3, and sections 5.2 and 5.3 give a brief information of the CAM and POP2 components of CESM as used in this experiment.As indicated, in view of the experiment in section 4, in order to save computing time we decided to just evaluate a 3-loop pFKS, along a pEKS.Also, we evaluated the ETKF with 60 members (ETKF 60 ), and the ETKF including Gaussian anamorphosis (ETKF 60 -GA) as alternative non-linear approach.

CAM
Cumulus convection is a key process for producing precipitation and redistributing atmospheric heat and moisture (Arakawa, 2004).Then, precipitation and the associated latent heat release drive the Earth's hydrological cycle and atmospheric circulations.Also, cumulus convection can affect the distribution of clouds and, consequently, the global radiative budget (Yang et al., 2013).However, since GCMs are unable to resolve the convective processes, various convection parameterization schemes (CPSs) have been developed based on different types of assumptions.CPS usually includes multiple tunable parameter, which are related to the subscale internal physics and are thought to have wide ranges of possible values (Yang et al., 2012, e.g.;).Also, the dependence of CPS parameters on model grid size and climate regime is an important issue for weather and climate simulations (Arakawa et al., 2011).The CPSs are generally developed with the assumption that the scale of a convective cloud is much smaller than the grid scale, which allows one to formulate the statistical effects of cloud ensembles.When a locally unstable condition exists, precipitation will be generated if the convective updrafts can penetrate the unstable layers to some height (Zhang and McFarlane, 1995).In addition, atmospheric GCMs also include parameterization of macrophysics, microphysics, and subgrid vertical velocity and cloud variability to simulate the subgrid stratiform precipitation.Many model processes, including deep convection and stratiform cloud micro-and macrophysics, are responsible for the partitioning of precipitation through competition for moisture and cooperation for precipitation generation (Yang et al., 2013).
Here we used the Community Atmosphere Model version 4 (CAM4) with the Zhang-McFarlane (ZM) deep convection scheme.For represention of stratiform microphysics we used the scheme by Rasch and Kristjánsson (1998); a single-moment In this study we decided to perturb parameters related to the deep convection in the ZM scheme and the relative humidity thresholds for low and high stable cloud formation.Also, we included the volume mixing ratio of greenhouse gases CO 2 and CH 4 as control variables.We did not evaluate any parameter in relation with indirect effects of aerosol to cloud nucleation and autoconversion, despite the overall effect of aerosol to cloud albedo and cloud lifetime, and so to climate, remains largely uncertain (Chuang et al., 2012).Table 3 shows the evaluated parameters in both CAM and POP2, and Table 4 shows the parameter values for the true simulation in column x t .

POP2
Several vertical mixing parameterizations are available within the POP2 model.Here, we used the K-profile parameterization (KPP) of Large et al. (1994).For KPP, we assumed a depth-constant vertical diffusivity and viscosity.Vertical viscosity (υ) is implemented in the KPP mixing as the function υ = P r κ, where P r, is the dimensionless Prandtl number (set to P r = 10 by default in the model), and κ is vertical diffusivity, for which we set the default value in POP2 as the true value (POP2.bckgrnd_vdc1 in Table 3).
For horizontal mixing we chose an anisotropic mixing of momentum, and the Gent and Mcwilliams (1990) parameterization, which forces mixing of tracers to take place along isopycnic surfaces, with activated submesoscale mixing.The main drawback in the Gent and Mcwilliams (1990) scheme is that it nearly doubles the running time with respect to other simpler schemes.
The third control variable in POP2 was the freshwater influx from the GIS, which we distributed homogeneously along the coast of Greenland and only at the ocean surface.The true value, x t , for the POP2 control variables is shown in Table 4.

Observations
The observational dataset is composed of point samples of climate averages for the last 20 years out of a total 60 yr integration time of a true simulation.To emulate a realistic past climate estimation problem, the synthetic observations were located at the sampling points of the MARGO database (MARGO Project Members, 2009), and the sampling characteristics reproduce those of the MARGO.The MARGO database is a synthesis of six different proxies and is considered to represent the combined expertise of at least a sizeable fraction of the LGM paleocommunity.Along with model simulations, it has been recently used for comparison with model ensembles (Hargreaves et al., 2011), or within the adjoint method for dynamical reconstruction of the global ocean state during the LGM with the adjoint method (Kurahashi-Nakamura et al., 2017).The observational uncertainty was taken from the MARGO database as input to the assimilation, but we did not add any error to the synthetic observations.MARGO provides observations (or reconstructions) of near sea surface temperature (SST), evaluated at 10 m depth, for the Last Glacial Maximum (LGM).The proxy types on which the SST estimates are based are a) microfossil-based: planktonic foraminifera, diatom, dinoflagellate cyst and radiolarian abundances, and b) geochemical palaeothermometers: alkenones unsaturation ratios (U K 37 ) and planktonic foraminifera Mg/Ca.Details of the database are given in MARGO Project Members (2009).
In summary, MARGO provides seasonal means for January, February and March (JFM, or winter observations), and July, August and September (JAS, or summer observations), as well as annual means.However, the availability of data for each of the three temporal means (winter, summer, and annual) is different for each proxy type.Specifically, diatoms are just available for winter; dinoflagelates, foraminifera, and Mg/Ca are available for the three temporal means; and U K 37 are only available as annual means.The observation errors are assumed uncorrelated, but each individual record in MARGO contains a specific uncertainty.Mapped into the SST space, the range of standard deviations in MARGO is within 0.79 and 4.87 • C, with relatively homogeneous uncertainty ranges among proxy types.Figure 2 shows the type and location of the proxy data.In addition to the locations and uncertainty, we emulated the temporal mean availability of the observations, with all temporal means calculated over the last 20 yr of the integration time of the true simulation.Thus, the synthetic observations in the experiment, as well as those in MARGO, impose a less rectrictive constraint not only in areas in which observations are more sparse, but also in those locations for which just one season or just annual means are available.Below we see the impact of the uneven observation coverage on the analyses.

Results
Table 4 shows the model parameters for the true simulation, x t , the prior (or background) parameters, x b , and the analysis results, or updated paramater estimates, x a , for the evaluated filters, as well as the value of the cost function J (θ) for the background and each estimation.The observational term in the cost function, J y (θ), is also shown to give an indication of the relative contribution from each term in the cost function to its total value.The component J y (θ) of the cost function is calculated by reintegration of the mean analysis parameters in the ensemble for the ETKF 60 and the ETKF 60 -GA, and by reintegration of the explicit parameter estimates in the pEKS and the pFKS.
In general, all schemes obtained a substantial reduction in the value of the cost function with respect to the background, which had a J (θ) = 373.39.Within the evaluated schemes, the pFKS obtained the lowest values for both J y (θ) (50.24) and J (θ) (51.85) with a substantial reduction with respect to the other schemes.The pEKS was the scheme with higher cost function (66.83), which was expected as, in addition to be linear, it is the scheme with a more reduced exploration of the parameter space and corresponding sensitivity estimation.The cost function for the EKTF 60 +GA is very similar to that of the EKTF 60 with no transformation.So, the transformation given by the anamorphosis does not show a general benefit over a standard ETKF for this kind of problem.An analysis for higher temporal resolution (e.g., annual) could, however, have shown different results.Still, it is possible that a more detailed screening of nonlinearities could have led to just apply the anamorphosis to some selected parameters and improved results.For example, Table 4 shows that the ETKF 60 -GA approaches the true parameter values clearly more closely than the ETKF 60 in the case of the CAM minimum relative humidity for high stable cloud formation, cldfrc_rhminh, and the autoconversion coefficient over land in deep convection; zmconv_c0_lnd.
However, for the autoconversion coefficient over the ocean, zmvonv_c0_ocn, the ETKF gets closer to the truth.Thus, it does not seem possible to derive any general conclusion about the benefit of the Gaussian anamorphosis for multidecadal climate analysis at the view of this experiment.
The lower cost function value of the pFKS with respect to the ETKF 60 (with/without GA) suggests a benefit in the more limited but repetitive sensitivity estimation and incremental approach of the pFKS, which allows for mild nonlinear relationships, with respect to a (linear) single step but more thorough exploration of the mean sensitivities of the ETKF 60 .Again, an increase in the ensemble size of the ETKF could change this results, as it is also possible that an increase in the number of steps in the pFKS could further reduce the cost function value.However, in the experiment, the computing effort of the pFKS was already half of the ETKF 60 , which has to be taken into account regarding the estimation possibilities.
Regarding the estimation of specific parameters, all of the evaluated schemes had some parameters for which the estimation, starting from the background, went in the wrong direction with respect to the true values.For example, while the closest estimate to the true value of the relative humidity threshold for high stable cloud formation (cldfrc_rhminh) was given by the pFKS, with a slight overshooting (0.81 versus 0.80 for the truth).It may have been that this slight overshooting has partially compensated for the effect of other parameters.Thus, the pFKS estimates of the freshwater flux from the GIS went in the wrong directions, as were the estimates for the autoconversion coefficients in the Zhang-McFarlane deep convection scheme.
The ETKF 60 did not show any overshooting, but had some parameter increments going in the wrong direction.For the ocean background vertical diffusitivity, the only scheme for which there was some, albeit minor, improvement in the estimate was the EKTF 60 .Still, the improvement is so light in this case that is could be random.It is possible that the parameter perturbations imposed for the estimation of the Jacobian were too low to have a significant effect on the model sea surface temperature (SST) at the locations, including depth, of the observations given the short integration times.Table 5 shows the corresponding standard deviation for the background and the analyses.The column for the EKTF 60 +GA has been omitted as it was similar to that of the EKTF 60 , and it is not shown to lead to improved parameter estimates.In general, the spread of the posterior parameter ensemble was higher for the EKTF 60 , while the pEKS and the pFKS had a more similar reduction in the spread with respect to the background, with the pFKS slighty higher general reduction.
The OAT (or conditional) sensitivity exploration by the pEKS affects the estimation of the various parameters in different degrees.For comparison, with a prescribed SST, Covey et al. (2013)  More generally, they found that for both OAT and MOAT sampling, most of the parameters received a similar rank sensitivity, despite the MOAT perturbations occur through a baseline traversing the multidimensional parameter space.
This study does not attemp to give an in-depth analysis of the assimilation results for the corresponding CFR.However, we summarise some results of the spatial patterns shown in the climate reconstructions and give examples of sensitivities as estimated by the pFKS and the ETKF 60 .Figure 3 shows, in general, a similar absolute bias reduction for the pFKS and the ETKF 60 in both general magnitude and spatial patterns for both SST and SSS.For SST, the most problematic area is that where most of the observations come from the diatom locations in the MARGO database.A reason for that seems to be that observations for diatoms locations are just the 20 yr means for winter (seasonal mean of January, February and March), which reduces its impact on the climate analysis with respect to other observation types.Still, in general the pFKS has slightly less areas where the absolute bias in the SST could not be reduced.Also the negative effect of the assimilation, regarding absolute bias reduction, on SSS which is shown for the ETKF 60 for the North Atlantic, the Bering strait, and in some areas of the Artic ocean is negligible in the pFKS.While these unobserved areas (from the point of view of the MARGO database) remain largely unconstrained, the pFKS seeems generally more able to correct for the biases in areas with more observations and simultaneously not having a negative effect in areas far from the observations.As an example of estimated sensitivities in our CGCM experiment, Fig. 4 shows the sensitivity of the sea surface temperature (SST) and sea surface salinity (SSS) to cldfrc_rhminl in CAM, as mean sensitivity estimated by the background ensemble for the ETKF 60 and conditional sensitivity in the first loop of the pFKS (i.e. also the pEKS).In the case of SST the general pattern of both sensitivities is quite similar, except for the negative sensitivity shown at the North Atlantic ocean above 50 • of latitude for the ETKF 60 .The second loop of the pFKS (not shown) shows a similar pattern to the first loop.However the sensitivity in the third loop (not shown), approaches more the sensitivity of the ETKF 60 and also shows a similar negative sensitivity area, in extent and magnitude, in the North Atlantic.Something similar happens to the sensitivity estimates for the SSS, where the conditional sensitivity estimates in the first and second loops of the pFKS are reasonably similar to the mean one from the ensemble background for the ETKF 60 , with the major differences being in the Artic ocean, around the Bering strait, and the North Atlantic.The third loop of the pFKS (not shown) also shows the more homogeneous band of sensitivity between the coasts of Canada and Europe shown for the ETKF 60 .As a further example, we find also interesting to show a sensitivity example for the deeper ocean considering the short integration time (60 years) of the experiment.For this integration time the deep ocean is far from reaching equilibrium.Figure 5 depicts the sensitivity of the Atlantic meridional overturning circulation (AMOC) to the background vertical diffusity (κ; POP2 parameter bckgrnd_vdc1) for the ETKF 60 and the three iterations of the pFKS.It can be seen that the general pattern have some similarities in all case, but also noticeable differences.The first iteration of the pFKS (pFKS.f01) is quite close to the mean estimate of the ETKF 60 , except for the high sensitivity area around 30 • N and 1.5 km deep, which is mostly missing in the pFKS.f01.The pattern of the sentitivity in the second iteration (pFKS.f02) is similar to that in the first iteration but with much higher values.The third loop again has reducded sensitivities and keeps missing the deeper high sensitivity area given by the mean sensitivity of the background ensemble for the ETKF 60 .
This warrants further study to analyse with more detail why, despite being sampling different regions of the parameter space, the conditional parameter sampling of the pFKS does not show in any case the deeper high value area estimated by the mean sensitivity.
An important last consideration is that the assimilation will just attempt to minimise (or get the first moments) of the cost function.So the assumed background uncertainties and the background term in the cost function are instrumental in controlling the parameter increments in the assimilation and the resulting CFR.In this synthetic experiment the source of errors is known, and we assume a perfect-model framework expect for the assumed uncertainties in the chosen parameters.However, online parameter estimation with real data does not know about real model errors.Potentially, the parameters can have a compensating effect for non-accounted errors.Although the minimization of the cost function via parameter estimation can highly reduce the value of a cost function and improve the corresponding CFR, it does not necessarily imply that the updated parameters (or their moments) actually correspond to improved model physics.That is, the updated model parameters can potentially lead to improved climate simulations for other prospective climatic conditions, but not neccessarily.Thus, it is important to distinguish between the use of the assimilation methods for CFR including parameter estimation, and the trust one can have in the estimated model parameters for future climate projections under very different climatic conditions.A fair caveat note was recently given by Dommenget and Rezny (2017) regarding the use of flux corrections as an alternative to parameter estimation in CGCMs.
As they indicated, the compensating error risk when using parameter estimates for one specific observation dataset for future projections can be eliminated by using flux corrections instead to estimate the CFR.Still, the flux corrections do not improve the climate projections either.Also parameter estimates may be strictly needed, for example, when model configurations which have not been scientifically validated are newly evaluated, as the parameterisation of the model physics may change, e.g., for different model resolutions.All in all, our experiment with CESM is an example of joint estimation of flux correction and model parameters.The flux correction (to represent some specific model error) must have some specific shape.If properly estimated, it can potentially be used for CFR in unobserved climates (e.g., future projections).The estimation of the flux correction in our example has not succeed for the pFKS.However, the ETKF 60 , despite its higher cost function value, has very adequately estimated the freshwater flux from GIS, in a consistent way with its background uncertainty and the true observations.More study is also neeed to provide deeper insight into the specific factor leading to the different estimates.

Conclusions
This study focuses on the online estimation of model parameters for multi-decadal past climate reanalysis (or climate field reconstructions, CFR) with CGCMs, under the assumption that the main sources of model uncertainty can be encapsulated in a relatively small number of model parameters.By parameters we also refer to jointly estimated flux terms if these are deemed a) uncertain and b) having a significant impact on the model climate.Given this assumption and the restrictions of The first scheme is the iterated Kalman filter formulated in the parameter space, which is here termed the parameter-space iterative Kalman smoother (pIKS).The second scheme is a fractional (by inflation of the observation error covariance) and multiple assimilation of the observations in an iterative fashion, which is here termed as parameter-space fractional Kalman smoother (pFKS).In a first simple experiment, with a one-dimensional energy balance model, for which an adjoint model is available, the pFKS obtained the lowest cost function of the two schemes, with cost function values comparable to that of a 4D-Var scheme (or adjoint method) and with a much reduced computational effort.The pIKS showed instead an unstable behavior resulting in a substantially higher cost function.Thus, in a second experiment with the CESM model as example of CGCM, only the pFKS was evaluated.Given the lack of an adjoint code for CESM, we included an ETKF with 60 members (ETKF 60 ) and an ETKF 60 with Gaussian Anamorphosis (ETKF 60 -GA) for benchmarking.As far as the authors know, this is also the first time that the ETKF-GA is evaluated for online parameter estimation with a comprehensive coupled climate model as CESM.Regarding the cost function as performace criterion, the ETKF 60 -GA was not clearly better than a standard ETKF 60 , although this might change in climate analyses with a higher temporal resolution.The pFKS (with 3 iterations) again obtained lower values of the cost function than the ETKF 60 and with half the computational cost.However, the results are not strongly conclusive about whether the 3-loop pFKS or the ETKF 60 behaved significantly different in the study case.While the ETKF uses a more thorough sampling of the parameter space for the evaluation of the mean sentivities, the simpler pFKS seems more able to account for a (mild) nonlinear relation between the modelled climate and the model parameters.Thus, apart from Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License.where ŷ ∈ R p (with p = n k k=0 p k ) is a column vector of all observations throughout the DAW, R is the corresponding observation error covariance matrix, P b 0 is the error covariance of the background z b 0 , and Ĥ(z 0 ) : R n+q → R p is a generalised observation operator maping from the augmented initial state to all the observations variables and times.The maximum a posteriori estimation in 6 is also known as conditional mode estimation, or maximum of the conditional density.Ĥ(•) differs from the observation operator H k (•), which maps the model state at time t k to observation variables at time t k .The cost function ( Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License.
Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License.column, or estimate of the local (conditional) sensitivity of the observations to that parameter, in G is then the slope of the linear regression of the ensemble model equivalent of the observations against the parameter sample.This requires m θ × q Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License.

4
Experiment 1: 1D energy balance model 4.1 Experimental setup Paul and Losch (2012) conduct an evaluation of the adjoint method (4D-Var) for parameter estimation in a conceptual climate model.While our focus here is on high-resolution CGCMs, as a first test we conducted an extension of their analysis via the comparison of one of their scenarios with the adjoint method with the iterative schemes described here.The model is a onedimensional energy-balance model, where the meridional resolution was set to 10 • .The reader is referred to Paul and Losch (2012) (PL2012 hereafter) for a description of the model and their adjoint scheme.PL2012 evaluated the adjoint approach with several climate conditions and uncertain parameter scenarios, including preindustrial and Last Glacial Maximum (LGM) climate states.Then, with the model constrained by the preindustrial and LGM parameter estimates, they conduct climate projections under several CO 2 forcing scenarios.
Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License.In this specific case the perfect-model assumption is overly optimistic, as in addition to the 1D Earth climate representation, there are strongly simplified physics of the used energy balance model.While PL2012 also assume this perfect-model framework in their strong constraint 4D-Var implementation, they point out to a number of specific structural model errors.Thus, it is clear that the parameter estimation will attempt to compensate for the unaccounted uncertainties in either of the evaluated estimation approaches.
Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License. the pFKF, via inflation of R, never reached these extreme low values for the ocean mixed layer depth in any iteration.The parameter values estimated with pFKS converged all in the same direction that they did with the adjoint method.
Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License. 5 Experiment 2: CESM 5.1 Experimental setup The experiment 2 is a synthetic test where we use a coupled ocean-atmosphere-land GCM; the Community Earth System Model (CESM1.2) with the following components: the Community Atmosphere Model version 4 (CAM4), the Parallel Ocean Program version 2 (POP2), the Community Land Model (CLM4.0), the Community Ice CodE (CICE 4) as sea-ice component, the River Transport Model (RTM) and the CESM coupler CPL7.Land ice is set as boundary conditions.We use a ∼ 4 • horizontal resolution regular finite volume (FV) grid for the atmospheric and land components, a FV grid with a displaced pole centered at Greenland ∼ 3 • (version 7) for the ocean and sea ice components, and a 0.5 • FV grid for the river runoff component.For comparison with recent ensemble experiments with CESM, this is a coarser set up than the CESM Last Millennium Ensemble (Otto-Bliesner et al., 2016).
scheme that predicts the mixing ratios of cloud droplets and ice.Regarding cloud emisivity, clouds in CAM 4.0 are grey bodies with emissivities that depend on cloud phase, condensed water path, and the effective radius of ice particles.
evaluated the sensitivity of the radiation balance at the top Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-48Manuscript under review for journal Geosci.Model Dev. Discussion started: 6 March 2018 c Author(s) 2018.CC BY 4.0 License.of the atmosphere in CAM to a large number of input parameters with both an OAT exploration, similar to the pEKS, and an alternative Morris one-at-a-time (MOAT) sampling.For example, they found with both methods that the highest sensitivity of the upward shortwave flux (solar energy reflected back to space by the atmosphere and surface) was with respect to cldfrc_rhminl, out of 21 evaluated CAM parameters.As cldfrc_rhminl is the threshold of the relative humidity value at which low-level water vapor starts to condense into cloud droplets, their result is consistent with the role of thick stratus clouds in reflecting sunlight.

Figure 3 .
Figure3.CESM experiment.Absolute bias reduction for SST and SSS as a result of a new integration with the parameters estimated with the ETKF60 and the pFKS.The statistics are the absolute bias between the background and the truth minus the absolute bias between the analysis and the truth.Thus, positive values are a net bias reduction.

Figure 4 .
Figure 4. CESM experiment.Sensitivity of SST [ • C] and SSS [PSU] to the minimum relative humidity for low stable cloud formation (CAM.clrfrc_rhmin) estimated from the ETKF60 background ensemble and the first iteration of the pFKS.

Figure 5 .
Figure 5. CESM experiment.Sensitivity of the Atlantic meridional overturning circulation (AMOC) to the background vertical diffusion parameter (κ; POP2.bckgrnd_vdc1) of the KPP scheme in the ocean model estimated from the ETKF60 background ensemble and during the three iterations of the pFKS.

Table 2 .
Parameter estimation results for Ebm1D model.

Table 5 .
CESM estimates of standard deviations.