Open Access

Abstract. A tool for multidimensional variational analysis (divand) is presented. It allows the interpolation and analysis of observations on curvilinear orthogonal grids in an arbitrary high dimensional space by minimizing a cost function. This cost function penalizes the deviation from the observations, the deviation from a first guess and abruptly varying fields based on a given correlation length (potentially varying in space and time). Additional constraints can be added to this cost function such as an advection constraint which forces the analysed field to align with the ocean current. The method decouples naturally disconnected areas based on topography and topology. This is useful in oceanography where disconnected water masses often have different physical properties. Individual elements of the a priori and a posteriori error covariance matrix can also be computed, in particular expected error variances of the analysis. A multidimensional approach (as opposed to stacking two-dimensional analysis) has the benefit of providing a smooth analysis in all dimensions, although the computational cost is increased. Primal (problem solved in the grid space) and dual formulations (problem solved in the observational space) are implemented using either direct solvers (based on Cholesky factorization) or iterative solvers (conjugate gradient method). In most applications the primal formulation with the direct solver is the fastest, especially if an a posteriori error estimate is needed. However, for correlated observation errors the dual formulation with an iterative solver is more efficient. The method is tested by using pseudo-observations from a global model. The distribution of the observations is based on the position of the Argo floats. The benefit of the three-dimensional analysis (longitude, latitude and time) compared to two-dimensional analysis (longitude and latitude) and the role of the advection constraint are highlighted. The tool divand is free software, and is distributed under the terms of the General Public Licence (GPL) ( http://modb.oce.ulg.ac.be/mediawiki/index.php/divand ).


Introduction
Deriving a complete gridded field based on a set of observations is a common problem in oceanography.In situ observations are generally sparse and inhomogeneously distributed.While satellite observations have typically a better spatial and temporal coverage (but measure only surface data) than in situ data, they present also gaps due to, e.g. the presence of clouds (in the case of thermal sea-surface temperature and optical surface properties of the ocean).Since the problem is generally under-determined, if the gridded field is to be derived from the observations alone, a first guess is introduced.The data analysis problem is also closely related to data assimilation where the observations are used in combination with a first guess coming from a model.Several interpolation methods have been developed and presented in the scientific literature.Direct linear interpolation of the observations is rarely an option for ocean observations which are affected by noise and are not necessarily representative (e.g. a measurement at a specific time might not be representative for a monthly average).Also the gradient of the interpolated field is not necessarily continuous.Current interpolation methods take therefore, in one way or the other, the uncertainty of the observations into account.Most interpolation methods of uncertain observations can be classified as methods based on optimal interpolation (including kriging) and variational analysis.
For optimal interpolation methods (Gandin, 1965;Bretherton et al., 1976), the error covariance of the first guess is generally directly specified by analytical functions (Robinson, 1996;Guinehut et al., 2002;Roberts-Jones et al., 2012).When satellite or model data are used, this error covariance can also be specified by its eigenvalues/eigenvectors (Kaplan et al., 1997;Rayner et al., 2003;Beckers et al., 2006) or by an ensemble (Evensen, 2007).Applications to multiple spatial and/or temporal dimensions are common (Høyer and She, 2007;Buongiorno Nardelli et al., 2010) to ensure a continuity of the solution along those dimensions.Analytical functions for the error covariance are based generally on the distance between two given points.However, decoupling water masses separated by land and maintaining at the same time a spatially smooth field over the ocean is difficult in optimal interpolation.
Auxiliary variables can be used as additional dimensions in order to improve the realism of the covariance function.Buongiorno Nardelli (2012) used for instance temperature (based on satellite sea surface temperature) as an additional dimension to generate a sea surface analysis of salinity using optimal interpolation.This innovative application shows that dimensions are not necessarily restricted to space and time and that other related variables can be used to extend the notion of space and distance.
In variational analysis, a cost function is introduced which penalizes non-desirable properties of the analysed field, in particular the deviation from observations and from the background estimate and abrupt changes of the analysed fields.The variational approach is equivalent to the optimal interpolation formulation, but instead of specifying directly the error covariance of the first guess, the inverse of this matrix is parameterized: rather than imposing that two adjacent grid cells are correlated to each other, it is required that gradients (and higher-order derivatives) are small (Brasseur et al., 1996;Brankart and Brasseur, 1996).Decoupling water masses separated by land is natural in variational analysis as it can be included using boundary conditions on the spatial derivatives.Variational analysis in three or four dimensions is common in the context of data assimilation (Rabier et al., 2000;Dobricic and Pinardi, 2008;Moore et al., 2011b) where it is generally a multivariate analysis.In four-dimensional variational analysis, the analysed field has in general three dimensions (e.g.longitude, latitude and depth for initial conditions or longitude, latitude and time for forcing fields) but they involve the numerical model and its adjoint to derive the relationship between the initial condition (or forcing fields) and observations at different time instances.However most of the data analysis applications to grid observations using variational methods are limited to two dimensions: either two horizontal dimensions (e.g.Troupin et al., 2010) or vertical transects (Yari et al., 2012).Three or four dimensional (space and time) fields are then obtained by assembling individual analyses.Inhomogeneous data distribution might then lead to spurious abrupt variations along the additional dimensions, which require ad hoc filtering of the assembled field.The presented tool implements a monovariate analysis.However the information from additional environmental variables can be incorporated as additional constraints.
The variational approach is also attractive for problems where it is easier to formulate physical properties of the underlying field in terms of constraints than in terms of correlation/covariance.For a two-dimensional surface current analysis for example, one can impose that the horizontal divergence is small (Legler and Navon, 1991;Bennett et al., 1997;Yaremchuk and Sentchev, 2009) by adding a corresponding term to the cost function.This kind of constraint would be more difficult to implement in an interpolation method.
On the other hand, in optimal interpolation one can quite easily derive the error variance of the analysed fields, which is more difficult but feasible for variational methods (Troupin et al., 2012).Optimal interpolation in the local approximation can also be quite efficiently applied to distributedmemory parallel computing architectures.
The aim of this manuscript is to implement and test a variational analysis program that can operate in an arbitrary high dimensional space and with a cost function that can be easily extended with additional constraints.The benefit of this method will be assessed in comparison to assembled twodimensional analyses using an advection constraint forcing the gradients of an analysis to be aligned with a given vector field.
Different approaches can be used to model the background error covariance.In the smoothing norm splines approach (Wahba and Wendelberger, 1980;Wahba, 1990;McIntosh, 1990;Brasseur and Haus, 1991;Brasseur et al., 1996;Troupin et al., 2012) a norm is defined using squares of derivatives up to a certain degree.The background error covariance can also be specified by solving the diffusion equations iteratively over a pseudo-time dimension.The solution can be obtained using explicit (Bennett et al., 1996;Weaver et al., 2003;Lorenzo et al., 2007;Muccino et al., 2008;Moore et al., 2011b) or implicit stepping schemes (Bennett et al., 1997;Weaver and Courtier, 2001;Chua and Bennett, 2001;Zaron et al., 2009;Carrier and Ngodock, 2010).Another commonly used technique is based on a recursive filter (Lorenc, 1992;Hayden, 1995;Purser et al., 2003a, b;Martin et al., 2007;Dobricic and Pinardi, 2008).Mirouze and Weaver (2010) have shown that this approach is actually equivalent to the method based on the implicit diffusion operator.For variational data assimilation the approach based on diffusion operators (or recursive filter) is commonly used, as it allows to define the background error covariance (or its inverse) in terms of square root matrices which is often used for preconditioning (e.g.Lorenc, 1997;Haben et al., 2011).As DIVA (Data-Interpolating Variational Analysis), the ndimensional tool divand also uses norm splines to define the background error covariance in combination with a direct solver which does not require preconditioning.In the case of DIVA the skyline method (Dhatt and Touzot, 1984) is used, while divand uses the supernodal sparse Cholesky factorization (Chen et al., 2008;Davis and Hager, 2009).Other iterative methods (which require preconditioning) have also been implemented.
The variational inverse method (Brasseur and Haus, 1991) implemented in the tool DIVA computes the minimum of the cost function in two dimensions using a triangular finiteelement mesh (Brasseur et al., 1996;Brankart andBrasseur, 1996, 1998;Troupin et al., 2012).A web interface has also been developed for this tool (Barth et al., 2010).We present an extension to n-dimensions which is called divand.To simplify the testing and implementation in an arbitrarily high dimensional space, a regular curvilinear mesh is used for the tool divand.
In Sect.2, the formulation of the method in n-dimensional space and the derivation of the analytical kernel functions for an infinitely large domain are introduced.The relationship between the highest derivative needed in the formulation and the dimension of the domain is shown.Section 3 presents the different implemented algorithms.Simple numerical tests are performed in Sect. 4 to show the consistency of the numerical results with the analytical solutions of Sect. 2. Implementation details and capabilities of the tool are given in Sect. 5.The tool is also tested in a realistic configuration to reconstruct global temperature in Sect.6.

Formulation
Variational inverse methods aim to derive a continuous field which is close to the observations and satisfies a series of a priori constraints.In particular, the field should be "smooth".It is therefore important to quantify the "smoothness" of a field.While the interpolated field should be close to observations, it should not necessarily pass through all observations because observations have errors and often do not represent the same information.A cost function is formulated which includes these constraints: where d j are the measurements at the location x j and µ j is their weight, and ϕ b is a background estimate of the field.
The term J c (ϕ) represents potentially additional constraints which will be specified later.
In order to define the norm • , the length scale L i in every domain dimension is introduced.These length scales form the diagonal elements of the matrix L: Based on these length-scales, we define the following scaled differential operators for the gradient and Laplacian: A scalar product f, g of two functions, f and g, is defined using the scaled gradient and Laplacian.
This scalar product is closely related to the spline norm defined by Wahba and Wendelberger (1980) and McIntosh (1990) for two dimensions.
Equation ( 5) corresponds to the covariance inner product x T B −1 y where the vectors x and y represent the fields f and g and the matrix B is the background error covariance (Gaspari and Cohn, 1999).
We note m the highest derivative in this scalar product.The parameter c is a normalization coefficient that will be chosen later.The coefficients α i are generally considered positive so that the cost function has certainly a finite minimum.
The norm used in the background constraint of Eq. ( 1) is defined using this scalar product by As the scalar product ( 5) is symmetric and as the norm is positive for all fields ϕ, the scalar product defines a covariance function (Gaspari and Cohn, 1999).If the field ϕ is discretized on a grid and all elements are grouped into the vector x, the cost function can be written as where we also regrouped all observations into vector y o and the discretized background field in vector x b .H is a discretized local interpolation operator allowing to compare the gridded field with the observations at the data locations.Therefore the observation operator is linear and can be represented efficiently by a sparse matrix.

A. Barth et al.: divand
This cost function is commonly used in optimal interpolation where the matrices B and R are the error covariance of the background estimate and of the observations respectively.The scalar product in Eq. ( 5) defines the matrix B and the diagonal matrix R is composed by the inverse of the data weight 1 µ j .

Kernel
The so-called reproducing kernel K(x, y) associated with Eq. ( 5) is defined by (McIntosh, 1990) and will be helpful in understanding the covariance structure of B.
If the domain is infinitely large (D = IR n ) and the correlation lengths L i are constant in all dimensions, we can analytically derive the function K. First we assume that the correlation lengths L i are all equal to one, and later the more general case with arbitrary (but constant) values of L i will be derived.The derivation follows the PhD thesis of Brasseur (1994), where the kernel is derived for twodimensional problems.Substituting the definition of scalar product from Eq. ( 5) in Eq. ( 8) and by integrating by parts one obtains = f (y).As this last equation must be true for any function f (x), the expression in brackets must be equal to the Dirac function (times c): Since the kernel is translation invariant, we can set y = 0 without loss of generality.By applying the Fourier transform, we obtain where K(k) is the Fourier transform of kernel K(x): The kernel K(x) can thus be found by using the inverse Fourier transform: In particular, the value of the kernel at x = 0 corresponds to the integral: where integration variables were transformed to ndimensional polar coordinates and integration was performed over all angles.The kernel is in fact nothing else than the correlation function one would use to create B yielding the same result in optimal interpolation as with the variational approach (Wahba and Wendelberger, 1980).We naturally choose the value of c such that K(0) = 1: Assuming all α i ≥ 0, the integral at the right-hand side is defined if m > n 2 .This condition links thus the number of dimensions n and the order of the highest derivative needed in the formulation.The Fourier transform of the kernel K is a radial function depending on the norm of the wave number k.The inverse Fourier transform of a radial function is also a radial function which can be derived with the Hankel transform (Appendix A): where J ν (r) is the Bessel function of first kind of order ν.To continue the analytical derivations we must make assumptions about the coefficients α i .We assume that the coefficients α i are chosen as binomial coefficients.
In this case, the norm can also be expressed as the inverse of an implicit diffusion operator (Weaver and Mirouze, 2013) which is commonly used in variational data analysis in one dimension (Bennett et al., 1997;Chua and Bennett, 2001) or two or three dimensions (Weaver and Courtier, 2001;Chua and Bennett, 2001;Zaron et al., 2009;Carrier and Ngodock, 2010).
The Fourier transform of the kernel ( Kn,m (k)) for the highest derivative m and dimension n can be written as Using this expression in Eq. ( 9), the radial part of the kernel K n,m (r) becomes The normalization coefficient is now noted c n,m as it depends on the dimension n and the order of the highest derivative m.By integrating by parts, we can derive a recursion relationship relating the kernels with different values of n and m.
where in step (11), we used the following equation relating Bessel functions of first kind of different order: Since K n,m (x) is one for x = 0, the normalization coefficients are thus related by The recurrence relationship therefore shows that it is sufficient to calculate the kernel (K n,m ) and the normalization coefficients for n = 1 and n = 2.For n = 1, we find the following solution for the integral in Eq. ( 10): for different values of the dimension n and the highest derivative m.
where K ν (r) is the modified Bessel function of second kind of order ν.For n = 2, the solution of Eq. ( 10) is: = 4π(m − 1).Using the recursion relationship from Eqs. ( 12) and ( 13) with the solution for n = 1 (n = 2) one can derive the kernel and c n,m for every odd (even) value of n.After simplifications, it follows that for any n (odd and even) and m, the normalized kernel K n,m and the corresponding normalization factor c n,m can be written as with ν = m−n/2.Non-isotropic covariance functions are often introduced by using coordinate stretching (Derber and Rosati, 1989;Carton and Hackert, 1990).We can finally derive the case for a correlation length different from one, using the change of variables x → L −1 x: where |L| is the determinant of the diagonal matrix L: The kernels and normalization coefficients can be expanded further for particular values of n and m leading to the lines of Tables 1 and 2. Our results agree with the solution derived by Brasseur et al. (1996) for the case of two dimensions n = 2 and m = 2.
The correlation function is the class of the Matérn family of spatial correlations (Matérn, 1986;Guttorp and Gneiting, 2006;Weaver and Mirouze, 2013).These functions have also been previously derived to define a smoothing operator φ(x) based on the Laplacian: where γ n is a normalization constant, L the smoothing length-scale and m = ν + n/2 as before and ν a smoothness parameter.For ν = 1/2 and ν = 5/2, one recovers the well-known second-order and third-order auto-regressive functions (Daley, 1991;Gaspari and Cohn, 1999;Gneiting, 1999).A formulation based on the diffusion operator allows for a straight-forward square root decomposition if m is even (Weaver and Courtier, 2001): This square root decomposition is mainly used in connection with a conjugate gradient solver as a preconditioner (e.g.Haben et al., 2011).
In the divand tool a distinction is made between the actual dimension n and the effective dimension.The effective dimension is the number of dimensions with a non-zero correlation length.Setting a correlation length to zero decouples the different dimensions; this is used to emulate the results that one obtains by "stacking" results of two-dimensional analyses as it has been done previously (e.g.Troupin et al., 2010).The normalization coefficient used in this case is based on the effective dimension.This ensures that one obtains exactly the same results of a stacked 2-D interpolation by analysing data in a 3-D domain with a zero correlation length in the third dimension.

Additional constraints
In addition to the observation constraint and smoothness constraint, an arbitrary number of other constraints can be included in divand.Those constraints are characterized by the symmetric positive-defined matrix Q i , the matrix C i and the vectors z i .
Those additional constraints can be re-absorbed in the definition of the operator H, R and y o : With these definitions the cost function has again the familiar form: A common example for an additional constraint is the socalled advection constraint (Wahba and Wendelberger, 1980;Brasseur and Haus, 1991;Troupin et al., 2012).Anisotropic background error covariance can be achieved by requiring that the gradient of the analysis is aligned to a given vector field.
where v is a vector field with n components.Such a constraint is useful in a geophysical context to force a field to be close to a stationary (or time dependent) solution of the advection equation, in which case the vector field v is related to the ocean current.The diffusion term is not included as it is generally small for geophysical applications and since the background constraint acts similar to a diffusion.

Minimization and algorithms
As the cost function is quadratic, one can obtain its minimum analytically.The derivation of its minimum is well known (Courtier et al., 1998) and is included here for completeness.If x a is the minimum of the cost function J , a small variation δx of x a would not change the cost function in the first order of δx.Noting T a transposed matrix or vector, As δx is arbitrary, the expression multiplying δx T must be zero.The optimal state x a is thus given by where we have introduced the matrix P.
The interpretation of this matrix becomes clear since we can rearrange the cost function as The matrix P represents thus the error covariance of the analysis x a (Rabier and Courtier, 1992;Courtier et al., 1994).

Primal formulation
The primal formulation of the algorithm follows directly from Eqs. ( 19) and ( 20).The matrices P and B are never formed explicitly and the tool works only with the inverse of these matrices noted P inv and B inv respectively.In order to give the algorithm in a compact form close to the mathematical equations, we introduce the backslash operator by which is equivalent to solving the system AX = B for the matrix (or vector) X.With this notation, the primal algorithm reads Different approaches have been implemented here to solve the analysis equation in its primal formulation involving either a direct solver, a factorization or the conjugate gradient method.

Direct solver
The solver based on SuiteSparse (Chen et al., 2008;Davis and Hager, 2009) can be used directly if the matrix R \ H is sparse.This is in particular the case when R is diagonal.P inv will then also be a sparse matrix which can be efficiently stored.This approach is useful when no error field needs to be computed.The direct solver can also be applied for a nondiagonal R but this approach is prohibitive in terms of computational cost and memory.

Factorization
The inverse of the a posteriori error covariance matrix P inv is factorized in the following products: where R P is an upper triangular matrix and Q P is a permutation matrix (chosen to preserve the sparse character of R P ).
Once the matrix P inv is factorized, the product between P and any vector x can be computed efficiently by This approach is useful if the error field is required, since a large number of products between P and a given vector must be computed.Determining all elements of P would be prohibitive, but individual elements (such as the diagonal elements) are computed by where e i is the ith basis vector.The tool divand returns a matrix-like object allowing to compute any element of P or the product of P with a given vector.This latter product is useful if one wants to derive the expected error of an integrated quantity such as a transport along a section or any other weighted sum.

Conjugate gradient method
The conjugate gradient method (Golub and Van Loan, 1996) is commonly used in variational data assimilation (Moore et al., 2011a).This method provides an iterative solution to linear equations: where the vector b and the symmetric and positive-defined matrix A are given here by The conjugate gradient algorithm is applied to solve for x a .For large domain sizes, the matrix A tends to be illconditioned (e.g.Haben et al., 2011) and the use of a preconditioner is necessary.In variational data assimilation it is common to define the covariance matrix implicitly by B = VV T where V is a sequence of operators requiring physical consistency, e.g. using balance relationships and empirical orthogonal functions, and spatial smoothness, e.g. using a projection in spectral space (Dobricic and Pinardi, 2008;Bannister, 2008).A preconditioning using V −1 amounts to transforming the background error covariance matrix into the identity matrix (Lorenc et al., 2000;Cullen, 2003;Haben et al., 2011).The matrix V is not necessarily square because some modes of variability might be excluded (Lorenc et al., 2000).In this case the inverse of V is interpreted as the pseudo-inverse.Theoretical bounds for the condition number in idealized conditions have been derived for this type of preconditioning (Haben et al., 2011).
A similar preconditioner is also implemented in divand using a Cholesky decomposition of the matrix B inv into the square root matrix B −1/2 : The matrix B −1/2 is an upper triangular matrix and the inverse of this matrix times a vector can be efficiently computed efficiently by back substitution.If the observation error covariance matrix R is diagonal and if the observation operator H is a sparse matrix, then the inverse of the analysis error covariance matrix P inv is also sparse.In this case, the matrix P inv can also be decomposed using a modified incomplete Cholesky decomposition (Jones and Plassmann, 1995;Benzi, 2002).
These two preconditioner have been tested for a unit square domain in two dimensions discretized on a regular N × N grid.Observations are evenly distributed on a 10 × 10 grid.The signal-to-noise ratio is set to 1 and the correlation length-scale it set to 0.1.
Figure 3 shows the number of iterations needed to reach a residual lower than 10 −6 for different domain sizes N (ranging from 10 to 100) using either no preconditioner, or a preconditioning based on the square root decomposition of B inv , or the modified incomplete Cholesky decomposition with a drop tolerance of zero (MIC(0)).Even for the largest domain, the computation of the preconditioner matrices themselves takes less than 0.15 s and its contribution to the total CPU time is negligible.
The increased convergence using the preconditioner based on B −1/2 compared to no-preconditioner is well expected (Lorenc, 1997;Gauthier et al., 1999;Haben et al., 2011).The preconditioning using the modified incomplete Cholesky decomposition leads to the smallest number of iterations in this test case.This is attributed to the fact that the MIC(0) preconditioner takes also the error reduction due to the observations into account (the term H T R −1 H in the Hessian matrix of the cost function) while this is not the case by using the square root matrix B −1/2 as preconditioner.It should be noted that these results depend thus also on the data distribution and the signal-to-noise ratio of the observations.In particular, the fewer observations are available or the higher the error variance of the observation is, the better becomes the conditioning based on B −1/2 (Haben et al., 2011).The tool divand has been written such that the user can provide a custom preconditioner based on the background error covariance B, observation error covariance R and observation operator H.

Dual formulation
Using the Sherman-Morrison-Woodbury formula (Golub and Van Loan, 1996), the solution can also be written in the dual formulation, also called Physical-Space Statistical Analysis System (Courtier, 1997;Cohn et al., 1998;Štajner et al., 2001;Dee and da Silva, 2003): In this formulation, all implemented methods in divand are based on the conjugate gradient method, to solve iteratively the following equation for y : where C represents a symmetric and positively defined matrix specified in operator form: Once the vector y is known, the analysis x a is obtained by If R is a non-diagonal matrix, one can use a preconditioner based on the diagonal elements of R noted R .

M = H(P
The matrix M can be efficiently factorized in sparse matrices: where R M is an upper triangular matrix and Q M is a permutation matrix.

Factorization
If the conjugate algorithm requires a large number of iterations, it is useful to factorize the matrix P inv to accelerate the product of P and a vector, which requires solving a linear system.
where, as before, R P is an upper triangular matrix and Q P is a permutation matrix.All products of P times a vector x are computed efficiently by 4 Numerical tests

Numerical kernel
It has been verified in numerical tests that the code reproduces well the analytical kernels.The domain is onedimensional for ν = 1/2, 3/2 and 5/2 and two-dimensional for ν = 1 and 2. Every direction ranges from −10 to 10 and is discretized with 201 grid points.A hypothetical observation located at the centre of the domain is analysed with a signalto-noise ratio of 1 and with a correlation length scale of 1.
Theoretically the value of the analysed field should be 1/2 at the centre.The radial solution multiplied by two is compared to the analytical solution of an infinite domain (Fig. 1).In all tests, the scaling and structure of the numerical kernels correspond well to the analytical solutions.To obtain this correspondence, it is necessary that the grid resolution resolves well the correlation length.Qualitative tests have shown that the numerical kernels match well the analytical functions as  long as the grid spacing is one fourth (or less) of the correlation length.
The kernels differ by the rate at which they decrease to zero.It is important to consider this aspect when comparing the correlation length from analyses using a different number of derivatives.Table 3 shows the value of r for which the analytical kernels are 1/2.They can be used as proportionality coefficients to make the kernels more comparable (Fig. 1).
The kernel for ν = 1/2 has a discontinuous derivative for r = 0 which makes this function unfit for practical use.A simple one-dimensional analysis with m = 1 (i.e.ν = 1/2) illustrates the problem (Fig. 2).As there is no penalty on the second derivative, the analysis has a discontinuous derivative at every observation location (black dots).Using higher order derivatives solves this problem.This example shows also that the analyses with higher order kernels (ν = 3/2 and ν = 5/2) are very similar.In these numerical experiments, the correlation length L is the inverse of the values in Table 3.This problem appears in all configurations where ν = 1/2.In particular also in three-dimensional analyses when the highest derivative in the cost function is a Laplacian.This is surprising because the first derivative is discontinuous despite the fact that the cost function penalizes the second derivative.By default the cost function in divand therefore includes the derivatives up to 1 + n/2 (rounded upwards) to ensure that the analysis has a continuous derivative.

Benchmark
The tool divand is written such that it can run on MATLAB and GNU Octave.In general, interpreted language tends to be slow on explicit loops.Interpreters like MATLAB can reduce this performance penalty by using just-in-time compilation (which is currently not available in Octave).Therefore MAT-LAB tends to be faster on performance benchmarks (e.g.Chaves et al., 2006;Leros et al., 2010) than GNU Octave.Explicit loops are avoided in the tool divand except for the number of dimensions which is a quite short loop (typically 1-4 dimensions).
The run-time performance of divand for MATLAB (version R2012b) and GNU Octave (version 3.6.4)with two implementations of the BLAS library (either GotoBLAS2 version 1.13 or Intel's Math Kernel Library (MKL) version 10.1) were tested.The benchmark was performed on an Intel Xeon L5420 CPU (using a single core) with 16 GB memory.The domain is a square and the used correlation length is inversely proportional to the number of grid points in one dimension.
Observations come from an analytical function and are located at every fifth grid point (in the two dimensions).In all cases the correctness of the analysed field was verified by comparing the interpolated field to the original analytical function.The benchmark was repeated five times and the median value is shown in Table 4 for the primal algorithm with Cholesky factorization and for the dual algorithm with conjugate gradient minimization (Table 5).For the primal algorithm, all tested versions perform equally well with a slight advantage of Octave for domain sizes of 500 × 500 and 600×600.Profiling of the code shows that for the primal algorithm most of the time is spent in the Cholesky factorization using the library CHOLMOD included in SuiteSparse in both MATLAB and Octave which explains the similar results.
In general, the dual algorithm is much slower than the primal algorithm with Cholesky factorization.However, it should be reminded that the latter would be unpractical in some cases (in particular with spatially correlated observation errors).The difference between the different interpreters is more pronounced for the dual case.The benchmarks show

A. Barth et al.: divand
that MATLAB is faster for small domain sizes (30 × 30 and 50 × 50), while GNU Octave (with GotoBLAS2 and MKL) outperforms MATLAB for larger domain sizes.For small domain sizes, the preparation of the matrices to invert represents a significant fraction of the total run time where GNU Octave tends to be slower due to the lack of a just-in-time compiler.For larger domains, the cost is dominated by matrix operations which are faster in Octave for our case.In our benchmark, the Math Kernel Library is slightly faster in Octave than the GotoBLAS2 library.

Implementation
The divand tool is implemented such that it allows analysis in an arbitrary high dimensional space.The grid must be an orthogonal curvilinear grid and some dimensions might be cyclic.Internally the n-dimensional arrays, e.g. for the analysed field, are represented as a "flat" vector.To implement the background constraint, the following basic operations are implemented as sparse matrices: -differentiation along a given dimension; -grid staggering along a given dimension; -trimming the first and last elements of a given dimension.
All other differential operators are derived as a product and sum of these basic operations.For instance, the advection constraint is implemented by performing successively the following operations: differentiation along the ith dimension for i = 1, . . ., n (operator D i ), staggering the result back to the centre of the grid cell (operator S i ), removing land points (operator P) and multiplying with a diagonal matrix formed by the ith velocity component (grid points corresponding to land are removed; operator diag(v i ) ).All of these operations are represented as a sparse matrix and the resulting matrix A is thus also sparse: The additional constraint corresponding to the advection constraint has thus the following form: This example illustrates that once the basic operators have been implemented, more complex operations can be derived on a relatively high level.
Consistent error calculations are also possible with the tool divand to estimate the error variance of the analysed field.This error variance reflects among others the distribution of the observations, the correlation length and the background variance error.However, the accuracy of the error estimate of the analysed field depends crucially on the validity of the background and observation error covariance.
The tool introduces new matrix objects which implement several matrix operations (such as multiplication, multiplication by its inverse, extraction of diagonal elements).These new matrix objects include: -a matrix specified by its inverse and potentially factorized (for B and P); -a matrix specified by an analytical function (for R and C i ); -a matrix of the form C + BB T , which can be inverted using the Sherman-Morrison-Woodbury formula (for R); -a matrix composed by block matrices (for additional constraints).
By adding these new matrix objects one can code the algorithm in a compact way which is close to the original mathematical formulation.For instance, the product of the analysis error covariance matrix P and vector x can just be coded as P * x and the matrix multiplication method of matrix object P implements the multiplication using the factorized form of Eq. ( 26).

Realistic test case
The interpolation was tested using pseudo-observations of temperature coming from a global ocean model.The results of the year 2007 from the ORCA2 model (Mathiot et al., 2011) with a spatial varying resolution (generally close to 2 • resolution) are used.Making a reconstruction with model data allows one to compare the analysed field to the original model data and to assess the quality of the analysis.The position of the pseudo-observations are the real positions of the Argo observations from year 2007.Observations are extracted from the daily model results and the analysis targets monthly means.This mimics the common setup with real observations where measurements are instantaneous while the analyses represent a mean over a given time period.Only surface data are reconstructed for every month separately (2-D analysis) or all 12 months are considered together (3-D analysis).The analysis is compared to a monthly model climatology for the year 2007 and the RMS (root mean squared) difference is calculated.This approach is similar to twin experiments used to test data assimilation schemes (e.g.Nerger et al., 2012).This procedure has also been used before in the context of data analysis, e.g. by Guinehut et al. (2002) and Buongiorno Nardelli (2012).In the following test cases, the solver based on Cholesky factorization is used (Chen et al., 2008;Davis and Hager, 2009).The accuracy of the result is tested by verifying that gradient of the cost function at the analysis is close to the floating point precision.The central question of this test case is to assess the benefit of a 3-D analysis (longitude, latitude and time) compared to a 2-D analysis (longitude and latitude).Afterward different variants of the analysis are also tested, in particular the advection constraint.Parameters in analysis (signalto-noise ratio, spatial and temporal correlation length and strength of the advection constraint) are optimized to ensure the comparison of every approach in its best possible configuration.Analyses are compared to a model climatology obtained by averaging the daily model output.
The following experiments use the direct solver with a sparse Cholesky factorization.

2-D analysis
All observations from the same month are considered as data from the same time instance.The correlation length is chosen here to be identical in both horizontal dimensions.Signal-tonoise ratio and correlation length are optimized by an "exhaustive search" of those parameters.This search is implemented by varying the signal-to-noise ratio between 1 and 30 (60 values equally distributed) and correlation length of 100-3000 km (also 60 values equally distributed).In total, 3600 analyses have thus been carried out for this case.The RMS difference (space and time average) between the analysis and the reference model climatology is minimum for a signal-tonoise ratio of 14 and a correlation length of 1072 km (Fig. 4).The global RMS error of this analysis is 1.1501 • C (Table 6).This experiment serves as the baseline for other experiments and improvements which will be expressed as using the mean square skill score using this experiment as a reference.
Figure 5 (top left panel) shows the RMS error averaged over time for every spatial grid point.This RMS field reflects essentially the data coverage and areas with poor coverage that can have RMS errors of 3 • C and more.Due to the sparsity of Argo data in the coastal area (relative to the near-shore scales of variability), the RMS error is generally highest near the coast.
As a variant of the 2-D analysis, the stationary advection is added to the cost function.
where v = (a s u, a s v).The vector (u, v) represents the monthly-averaged model currents.The coefficient a s determines how strong the advection constraint should be enforced.It is instructive to visualize the impact of a point observation without and with advection constraint.The correction by a point observation is in fact directly related to the background error covariance.Figure 6 shows the impact of an observation located at 72 • W, 36.9 • N (white cross).Without advection constraint, the covariance is mostly isotropic.The slight deviation from isotropy is due to the proximity of the coastline.The location with the largest impact is marked by a white circle.One could expect that the location with the largest impact coincides with the location of the observation.This is actually the case for an observation far away from the coastline.However, the background error variance is not spatially uniform.In the interior of the domain, every grid point is constrained by the four direct neighbours (or more if higher derivatives than the Laplacian are used).This is not the case  at the boundary of the domain and boundary points are thus less constrained.Therefore the background error variance is higher near the coast.With the advection constraint, the covariance is elongated along the path of the Gulf Stream (downstream and upstream).This is a desirable effect since tracers in the ocean tend to be uniform along ocean currents.The variance with advection constraint is relatively spatially uniform near the location of the observation and thus the location of maximum impact coincides with the position of the observation.
The 2-D analysis with advection constraint has thus in total three parameters: signal-to-noise ratio, spatial correlation and strength of the advection constraint.These parameters are optimized by the Nelder-Mead algorithm (Nelder and Mead, 1965) by minimizing the RMS difference between the analysis and the reference climatology.

3-D analysis
In a first test, all observations from the same month are again considered coming from the same time instance.Although this is not necessary for a 3-D analysis, it simplifies the comparison with the previous 2-D case where the information of the actual day is not taken into account.Signal-to-noise ratio, spatial correlation and temporal correlation are optimized using the non-linear Nelder-Mead minimization as before.The RMS is minimum with 0.9822, for a signal-to-noise ratio of 27, spatial correlation length of 1373 km and a temporal correlation length of 4.9 months.Extending the analysis from 2-D to 3-D improves the skill (mean skill score) by 27 %.
For every spatial grid point the time-averaged RMS error is computed (Fig. 5).To facilitate the comparison with the 2-D case, the difference of these RMS values is also shown (Fig. 5, bottom panel).Red shows areas where the 3-D analysis is better and in blue areas the 2-D analysis is more accurate.The RMS error is generally reduced by the 3-D analysis in coastal areas where few observations are present.However also a small degradation is observed in the open ocean.This is attributed to the fact that signal-to-noise ratio and correlation length are optimized globally.It is probable that spacedependent parameters (distinguishing for example between open ocean and near shore conditions) will improve the analysis even more.
In a second test, the actual days of the pseudomeasurements are used in the analysis (noted fractional time in Table 6) which slightly improves the analysis.The small   increase of the signal-to-noise ratio is consistent with the fact that by providing the exact date (instead of the month), the observations are more coherent.The relative small improvement is related to the fact that the optimal temporal correlation is 4.9 months and larger than one month (the time resolution of the original 3-D experiment).
The analysis is also performed using the advection constraint based on model currents.Since the time dimension is included, it is possible to use the non-stationary advection constraint: where v = (a s u, a s v, a s ) and a s determines the strength of the advection constraint.Figure 7 shows the impact of a point observation in the 3-D case.Without the advection constraint, the covariance is essentially uniform with a small modulation due to the proximity of the coastline.With a time-dependent advection constraint a distinction between upstream and downstream is made if two different time steps are considered.The location of the observation is more strongly connected to the upstream area of previous time instances and more strongly related to the downstream area of the following time instances.The time-dependent covariances with the advection constraint can thus relate different time instances taking the advection into account.As in the 2-D case, the advection constraint is introduced with a proportionality coefficient a s allowing to tune the strength of this effect.The calibration of this parameter is related to, among others, the overall significance of advection compared to other processes and the accuracy of the current field.In this test case, the advection parameter is determined, together with the spatial and temporal correlation scale and the signal-to-noise ratio, by minimizing the difference between the analysis and the reference climatology as in Sect.6.1.The optimal values of the analysis parameters are shown in Table 6.To compare the different variants, a skill score relative to the 2-D case (without advection) has been computed: It follows that inclusion of the advection constraint in the 2-D analysis improves the skill by 29 %.It is surprising to see that this improvement is of similar amplitude as the improvement obtained by including the time dimension as the later requires the solution of a 12 times larger system (the number of months).Including the exact date of a measurement instead of its month leads to only a small improvement.The best analysis is obtained when using a 3-D domain in combination with the advection constraint leading to an improvement of 44 %.Including the advection constraint has again the beneficial effect of increasing the optimal signalto-noise ratio as the observations are more coherent along flow lines.

Conclusions
A variational analysis tool has been developed and tested with a realistic data distribution from Argo, but with pseudoobservations extracted from a model.This allows to compare the analysis to model climatology data and to quantitatively compare different analyses.Parameters are optimized by minimizing the difference between the analysis and the model climatology.However in practice, a cross-validation data set is needed for such optimization, which ideally should be homogeneously distributed.An improvement with 3-D (longitude, latitude and time) versus 2-D analysis (horizontal only) was shown.A relatively larger reduction of the RMS error was also observed by including the advection constraint (stationary in the 2-D case and time-dependent in the 3-D case).However, it should be noted that the current fields used here are dynamically coherent with the tracer fields as they come from the same model.In a realistic setup with real observations, an improvement similar to the one reported here will require quite accurate current fields.The source code of the tool divand is available at http: //modb.oce.ulg.ac.be/mediawiki/index.php/divand and distributed under the terms of the General Public License.The toolbox will also be made available through to the Octave extensions repository Octave-Forge.

Fig. 1 .Fig. 1 .
Fig. 1.Solid lines show the analytical kernels for different values of ν and the dots show the numerical kernel (left) and analytical kernels with scaled r h (right).

Fig. 3 .Fig. 3 .
Fig. 3. Number of iteration as a function of domain size (in one dimension).The number of grid points if the domain size squared.

Fig. 4 .Fig. 4 .
Fig. 4. RMS difference between the reference climatology and the analysis for different values of signal-to-noise ratio and correlation length.A non-linear colormap is used en enhance detail near the minimum.

Fig. 5 .Fig. 5 .
Fig. 5. Top left: RMS difference (averaged over time) between 2D analysis and the model reference climatology.Top right: idem for 3D.Bottom: difference of RMS error of the 2D analysis minus the RMS error of the 3D analysis.

Fig. 6 .
Fig.6.2D case: background error covariance (left panels) relative to the location marked by a cross, and surrounding grid points and background variance (right panels).The upper (lower) panels correspond to the case without (with) advection constraint.The circle indicates the grid point with the highest covariance. 68

Fig. 6 .
Fig.6.2-D case: background error covariance (left panels) relative to the location marked by a cross, and surrounding grid points and background variance (right panels).The upper (lower) panels correspond to the case without (with) advection constraint.The circle indicates the grid point with the highest covariance.

Fig. 7 . 69 Fig. 7 .
Fig.7.3D case: background error covariance without (upper row) and with advection constraint (lower row) for a data point located at the cross and at month 6.

Table 1 .
Kernel as a function of non-dimensional radius ρ = |L −1 x| ≥ 0 for different values of the dimension n and the highest derivative m

Table 3 .
Radial distance where the kernel is 1/2 for different values of ν.

Table 4 .
Run time in seconds for different domain sizes for MAT-LAB (R2012b) and Octave version 3.6.4(using GotoBLAS2 or MKL) for the primal algorithm with Cholesky factorization.

Table 5 .
Run time in seconds for different domain sizes for MAT-LAB (R2012b) and Octave version 3.6.4(using GotoBLAS2 or MKL) for the dual algorithm.

Table 6 .
Summary of all experiments with the optimal parameter values.