The ABC-DA system (v1.4): a variational data assimilation system for convective-scale assimilation research with a study of the impact of a balance constraint

. Following the development of the simpliﬁed atmospheric convective-scale “toy” model (the ABC model, named after its three key parameters: the pure gravity wave frequency A , the controller of the acoustic wave speed B , and the constant of proportionality between pressure and density perturbations C ), this paper introduces its associated variational data assimilation system, ABC-DA. The purpose of ABC-DA is to permit quick and efﬁcient research into data assimilation methods suitable for convective-scale systems. The system can also be used as an aid to teach and demonstrate data assimilation principles.

Abstract. Following the development of the simplified atmospheric convective-scale "toy" model (the ABC model, named after its three key parameters: the pure gravity wave frequency A, the controller of the acoustic wave speed B, and the constant of proportionality between pressure and density perturbations C), this paper introduces its associated variational data assimilation system, ABC-DA. The purpose of ABC-DA is to permit quick and efficient research into data assimilation methods suitable for convective-scale systems. The system can also be used as an aid to teach and demonstrate data assimilation principles.
ABC-DA is flexible and configurable, and is efficient enough to be run on a personal computer. The system can run a number of assimilation methods (currently 3DVar and 3DF-GAT have been implemented), with user configurable observation networks. Observation operators for direct observations and wind speeds are part of the current system, and these can, for example, be expanded relatively easily to include operators for Doppler winds. A key feature of any data assimilation system is how it specifies the background error covariance matrix. ABC-DA uses a control variable transform method to allow this to be done efficiently. This version of ABC-DA mirrors many operational configurations by modelling multivariate error covariances with uncorrelated control parameters, each with special uncorrelated spatial patterns.
The software developed performs (amongst other things) model runs, calibration tasks associated with the background error covariance matrix, testing and diagnostic tasks, single data assimilation runs, and multi-cycle assimilation/forecast experiments, and it also has associated visualisation software.
As a demonstration, the system is used to tackle a scientific question concerning the role of geostrophic balance (GB) to model background error covariances between mass and wind fields. This question arises because although GB is a very useful mechanism that is successfully exploited in larger-scale assimilation systems, its use is questionable at convective scales due to the typically larger Rossby numbers where GB is not so relevant. A series of identical twin experiments is done in cycled assimilation configurations. One experiment exploits GB to represent mass-wind covariances in a mirror of an operational set-up (with use of an additional vertical regression (VR) step, as used operationally). This experiment performs badly where error accumulates over time. Two further experiments are done: one that does not use GB and another that does but without the VR step. Turning off GB impairs the performance, and turning off VR improves the performance in general. It is concluded that there is scope to further improve the way that the background error covariance matrices are represented at convective scale. Ideas for further possible developments of ABC-DA are discussed.

Introduction
The grid sizes of limited-area models for operational weather forecasting have become small enough to allow some convective processes to be resolved explicitly (Clark et al., 2016;Yano et al., 2018). Some leading operational models in-R. N Bannister: ABC-DA system v1.4 clude the COSMO (COnsortium for Small-scale MOdelling) model (Baldauf et al., 2011), used at MeteoSwiss (1.1 km grid size) and at the Deutscher Wetterdienst (DWD) (2.8 km grid size); the AROME (Application of Research to Operations at Mesoscale) model (Brousseau et al., 2016), used at Météo-France (1.3 km grid size); the UKV (UK Variable resolution) model (Tang et al., 2013) (1.5 km grid size); and the WRF (US Weather Research and Forecasting) model (Schwartz and Liu, 2014) (3 km grid size). Each of these systems is invaluable in the forecasting of fine-scale weather, including that associated with convective storms, and has its own data assimilation (DA) system to estimate its initial conditions from new observations and a background state.
Apart from the capability to assimilate new highresolution observation types, such as radar reflectivity and Doppler radial wind, the DA systems are still based on those designed for use with synoptic-and planetary-scale phenomena in mind. The convective-scale DA problem needs to account for effects that can often be safely ignored or treated approximately when dealing with large scales. These include certain dynamical properties of background state errors (namely non-hydrostatic and non-geostrophic contributions, vertical motion, multiple phases of water, strong inhomogeneity and flow dependence, and non-Gaussianity), certain properties of observation errors (namely cross-correlations), and other features associated with a small grid size (e.g. feature misalignment). There are also challenges associated with assimilating new observation types (as mentioned above, including the large volumes of data needed), the short DA time window (often 1 h or less), the compatibility of lateral boundary conditions from a coarser parent model, and questions concerning the appropriateness of allowing DA to simultaneously modify the larger-scale flows present in the convective-scale problem.
The properties of background state errors are of particular concern to this paper, although the DA system to be described can be equally applied to study other aspects of convective-scale DA, such as the exploration of strategies for high-resolution observation networks (such as those from Doppler wind instruments), or indeed some of the advanced DA methods mentioned below. In DA, the background state is traditionally assumed to be subject to random error, which is distributed according to a Gaussian distribution described by a multivariate error covariance matrix (the "B-matrix", e.g. Bannister, 2008a). Given that B is too large to store explicitly, in variational DA (Var) it is represented in the form of a "model" (Bannister, 2008b). One important means of representing B in a way that naturally adapts to the flow conditions is to derive a matrix implicitly from an ensemble of forecasts, which are often produced anyway for probabilistic forecasting purposes. This is the basis of the ensemble Kalman filter (e.g. Houtekamer and Zhang, 2016) and En-Var (pure ensemble-variational) formulations (e.g. Liu et al., 2008). Although information from an ensemble in principle follows the dynamical properties of the model to be reflected in the B matrix used at the analysis time, the result is often corrupted by sampling error due to the small ensemble sizes (usually a few tens of members). For this reason, the B matrix used operationally is often still modelled according to physical insight. That insight, though, is based on traditional assumptions of (for instance) geophysical balance, whose applicability is questionable at convective scales.
Studying convective-scale DA in operational systems is burdened severely by the cost and complexity of these systems. The DA system described in this paper has been designed in the same spirit as that of the convective-scale toy model (the ABC model; Petrie et al., 2017), i.e. with an emphasis on low cost and simplicity. This DA system (together with the model code, hereafter called ABC-DA) is a multifeatured Var system suited to the ABC model. ABC-DA is actually a suite of software used not only to perform DA itself (in cycling mode if required) but also to calibrate the B matrix from sets of forecast data, to flexibly generate randomly perturbed data such as synthetic observations from a truth run (which can then be assimilated) to compute a sample of covariances implied from a chosen B matrix model and to perform a set of validation tests. The suite also includes sample plotting codes to help visualise and monitor the outputs, a script to build the executables, sample run scripts, and detailed user documentation. The ABC-DA numerical codes are written in Fortran 90, scripts are written in Linux Bash, and the plotting code is written in Python 2. Certain open-source software libraries are also required to compile the code. This paper presents the scientific documentation for the system, with examples and pointers to how a user can access the software. Finally, a short study of ABC-DA is presented to investigate the impact of balance constraints in the formulation of the convective-scale B matrix. The paper is structured as follows. In Sect. 2 the ABC model is described, in Sect. 3 the ABC-DA system is outlined, in Sect. 4 the ABC-DA system is described in detail, in Sect. 5 a brief study of the role of geostrophic balance is presented, and in Sect. 6 the paper is summarised.

The model equations
The ABC model comprises a set of simplified partial differential equations for a two-dimensional spatial grid (x and z) plus time (t), which are based on the Euler equations. This section summarises the ABC model, and the reader is directed to Petrie et al. (2017) for the details. The model equations are as follows: where u = u v w is the wind vector (comprising zonal, meridional, and vertical wind components, respectively);ρ is the scaled density variable (akin to pressure); b is the buoyancy variable (akin to temperature); f is the Coriolis parameter; g is the acceleration due to gravity; and A, B, and C are tunable parameters (see below). Primed variables indicate perturbations from a reference state defined as b(x, z, t) = g+b 0 (z)+b (x, z, t) andρ(x, z, t) = 1+ ρ (x, z, t). The model supports a range of motions, namely balanced (Rossby-like) modes and unbalanced (gravity and acoustic) modes, which have been studied in detail in Petrie et al. (2017). There are three tunable parameters: A is the pure gravity wave frequency, controlling the gravity wave speeds in the model; B modulates the advective and divergent terms in the equations, controlling the acoustic wave speeds; and C specifies the equation of state that relates pressure and density perturbations, p = Cρ 0ρ , where ρ 0 is a density scaling constant. In the linearised equations, only the product BC (and not B and C individually) affects the characteristics of the flow, so C also controls the acoustic wave speeds. In numerical integrations of the non-linear equations (Eq. 1), the effect of scaling C is found to be virtually indistinguishable from scaling B in terms of the patterns of forecast perturbations.

Properties of the ABC model equations
Equation (1) conserves total mass and energy, although this exact property is lost when the equations are discretised for numerical integration. The equations approximate to geostrophic and hydrostatic balance (GB and HB, respectively) when the Rossby number, Ro = U/f L, is small (where U is the characteristic zonal wind speed and L is the characteristic horizontal length scale of the motion). The GB relations are and the HB relation is These balance relations are well satisfied for motion at the large scales where Ro is small, and they are used in traditional Var schemes to model the covariances between mass, wind, and temperature perturbations in background errors. We will revisit these later in the paper.

Discretisation and integration
As reported in Petrie et al. (2017), the continuous equations (Eq. 1) have been discretised in time and space; the current implementation uses a 360 × 60 (horizontal × vertical) element grid with a grid box size of 1500 m × 250 m. Variables are stored on an Arakawa C grid in the horizontal and Charney-Phillips grid in the vertical (see Fig. 1 of Petrie et al., 2017), and periodic boundary conditions are imposed in the horizontal to avoid the need for a driving model to provide lateral boundary conditions. The integration scheme used is the split-explicit forward-backward scheme of Cullen and Davies (1991) with a main time step of t = 4 s.

Future developments of the model
The current version of the ABC model does not include moist processes. There is much that can be learned about convective-scale DA from a dry model, but assimilating and forecasting moisture fields is a major reason for convectivescale forecasting (Sun et al., 2014;Bannister et al., 2020). It is planned to upgrade the model to permit the advection of one or more water variables and allow condensation and evaporation processes to affect the flow. The assimilation of moisture is a complex task and so a moist ABC-DA system is expected to be very useful in that line of research.
3 Overview of the ABC-DA system Like the ABC model, the DA system is intended to be low cost and easy to run, when compared to a operational-scale system, yet mirror many of the features and options available in operational systems. In this section we review the principles on which ABC-DA is based, which includes a definition of the mathematical notation, but we leave it to Sect. 4 to describe the details.

Variational data assimilation
Var systems construct a scalar functional (called a cost function, J ) that is minimised with respect to the state x by considering observations made over a time window (indicated by an integer time index) t ∈ [0, T ]:

3792
R. N Bannister: ABC-DA system v1.4 In Eq. (4), x represents all variables of the model state at t = 0 (here u, v, w,ρ , and b ) at each location in the domain, x b is a special state at t = 0 called the background state (normally a short forecast from the previous DA), y(t) is the collection of observations at time t, and y m (t) is the model's version of the observations computed from x. Let there be n elements in x and p t elements in y(t). Here we use the convention that quantities like x without a time argument imply the value at t = 0. The model's observations are found in two steps. Firstly, the state is found at time t using the model propagator: , which is the result of integrating Eq. (1) from times 0 to t, and then the model's observations are found using the observation operator at this time: y m = H t (x(t)). This DA method is known as 4DVar (Dimet and Talagrand, 1986). B and R t are covariance matrices (e.g. Kalnay, 2002) pertaining to errors in the background state and in the observations at time t, respectively. Mathematically B (an n × n matrix) and R t (a p t × p t matrix) may be thought as the metrics in which deviations are measured in the cost function (i.e. measures of the precision to which the background and observations are known). The analysis x a is the special state that minimises J .
The B and R t matrices are important as they can have a profound effect on the way that observations combine with the background to yield the analysis. It is a particular challenge to use a B matrix that is relevant to the uncertainties in convective-scale forecasts. As B is usually a much larger matrix than R t (operationally by orders of magnitude), it cannot practically be stored explicitly. Instead the B matrix is modelled, which is usually done via the technique of control variable transforms (CVTs) -see Sect. 3.4. As this modelling process is a major component of any DA system, and may require new thinking for convective-scale systems, much of the design of ABC-DA is concerned with how B is modelled. As a starting point for ABC-DA, the approach that is currently implemented is a conventional one (to mirror typical current operational configurations that were designed around global systems, Sect. 4.2, but still applied for convective-scale systems). This approach, though, can be adapted to accommodate new convective-scale strategies that are discussed at the end of the paper.

The incremental formulation of the problem
If M 0→t and H t are linear functions, then Eq. (4) is a quadratic function of x and may be minimised using efficient algorithms such as a conjugate gradient-based method (Golub and Van Loan, 1996;Lewis et al., 2006). The model M 0→t , though, is a non-linear operator, and many observations require non-linear observation operators (such as measurements of wind speed, top-of-atmosphere radiance, etc.). This leads to a non-quadratic function, which may have multiple minima. Furthermore, there are no general efficient minimising algorithms for such problems. In order to simplify the problem, the cost function is minimised by breaking it down into a sequence of quadratic problems by iteratively linearising M 0→t and H t . This is incremental Var (Courtier et al., 1994).
Suppose that x r (t) is a reference trajectory satisfying x r (t) = M t−1→t (x r (t − 1)), 1 ≤ t ≤ T . A perturbation to this trajectory (δ prefix) at t is approximately related to a perturbation at t − 1 via the linear operation δx(t) ≈ M t−1→t δx(t − 1), where the full states are x(t) ≈ x r (t) + δx(t) and x(t − 1) ≈ x r (t − 1) + δx(t − 1). M t−1→t is the tangent linear (or Jacobian) of M t−1→t and is mathematically representable by an n × n matrix. We assume that provided that δx(t − 1) is sufficiently small. Similarly, we suppose that the observation values computed from x r (t) satisfy y mr (t) = H t (x r (t)). A perturbation to these reference observations is approximately related to a perturbation in x r (t) via the linear operation δy(t) = H t δx(t). H t is the tangent linear (or Jacobian) of H t and is mathematically representable by a p t × n matrix. These approximations may be summarised as the following: where When Eq. (5) and the following definitions, are substituted into Eq. (4), J becomes a functional of the perturbation δx instead of the full state x: This is the incremental form of 4DVar. J [δx] is exactly quadratic, allowing efficient algorithms to be used to minimise it to yield the special state δx a . The iterations required to minimise Eq. (7) form an inner loop. The full cost function (Eq. 4) is minimised by updating the reference state, x r → x r +δx a , and repeating the inner loops. These iterations form the outer loop. In the first outer loop, x r is typically set to x b . This inner and outer loop procedure, though, does not necessarily find the global minimum of Eq. (4), which can lead to complications in highly non-linear systems with a long DA time window (e.g. Fabry and Sun, 2010).
Because M 0→t is a difficult operator to derive, the approximation that M 0→t = I is often made. This leads to an approximate method called 3DFGAT (3DVar First Guess at Appropriate Time; Lee et al., 2004;Lawless, 2010). For applications when it is too expensive to use M 0→t in the DA loops, a further approximation is made that x(t) = x over the window. This is called 3DVar, although many systems that specify 3DVar actually use 3DFGAT.

The observations, their operators, and their error statistics
The observation operator H t (and H t ) is built to suit the range of observations assimilated. The components of H t and H t represent the model's version of the observations. Typical examples include simple bi-linear interpolation of grid values to an observation location, the computation of model wind speed as the root of the sum of squares of the wind components, or the evaluation of top-of-atmosphere radiance by a radiative transfer equation. The ABC-DA system currently implements observations of the first two kinds, but the system is flexible enough to support observations of any required model variable at arbitrary times and positions. The R t matrices are taken to be diagonal, and have specified variances.
The system could be adapted to extend any of these aspects to include more complicated observation operators, such as radiative transfer models, Doppler winds, or for correlated observation errors.

Modelling B with control variable transforms
The B matrix is meant to represent the covariances of errors in x b . Operational-scale DA systems all share the challenge of determining and using B given that this n×n matrix is too large to manipulate (or even store) and is in any case unknowable. Most practical variational methods use control variable transforms (CVTs) to simplify this problem. Consider a vector δχ , which is an alternative representation of δx via the relation δχ is called a control vector, and U is the CVT. The CVT is a powerful way of accounting for cross-correlations between background errors of model variables (including spatial and multivariate components). δx and δχ have different assumed statistical properties: model space errors have covariance δxδx T b = B, and control variables are taken to be uncorrelated and have unit variance δχ δχ T b = I, where the b subscript indicates expectation over hypothetical background error samples. When Eq. (8) is substituted into Eq. (7), J becomes a functional of δχ : Notice that the B matrix has effectively disappeared from this formulation because of the assumed statistical properties of δχ . This is a considerable simplification, although the problem of defining the CVT remains. The δχ cost function is minimised (giving the special vector δχ a ), which then leads to δx a via Eq. (8). This is equivalent to minimising Eq. (7) with a background error covariance matrix equal to which is known as the implied background error covariance matrix. Apart from not needing to know the B matrix explicitly, the minimisation problem (Eq. 9) is found to be numerically better conditioned than Eq. (7), leading to a more efficient and accurate minimisation. It is common to work backwards here: first a CVT is proposed (based on physical principles such as those discussed in Bannister (2008b) and later in this paper), and then its ability to generate reasonable background error covariance structures is studied by looking at the implied covariances. This can be done by studying either UU T or the analysis increments of single-observation DA experiments. Constructing U is one way of doing background error covariance modelling. We do this by defining new parameters and their spatial covariances via a proposed form like U = U p U s (Bannister, 2008b). Here U p is the parameter transform (where U −1 p transforms model variables to alternative parameters that are assumed uncorrelated using sets of balance operators as in Parrish and Derber, 1992;Gauthier et al., 1999), and U s is the spatial transform (which transforms each parameter's field to modes that are assumed to be uncorrelated, such as Fourier modes). U s can itself be decomposed into separate horizontal, vertical, and scaling parts; e.g. U s = U v U h (see Sect. 4.2.3). More complicated sequences of transforms are also possible, e.g. based on wavelets (Deckmyn and Berre, 2005). A property of the CVT approach is that B can be modelled even if it is singular.
3.5 The gradient of J and minimising the cost function Equation (7) is minimised by iteratively adjusting δχ until a convergence criterion is met, indicating that a point close to the minimum of J [δχ ] has been found. The gradient vector, ∇ δχ J , is used with a conjugate gradient algorithm to perform this task. ∇ δχ J is found by differentiating J [δχ ]: where d(t) is the difference between the observations at time t and the model's version of them based on the reference state Eq. (6b). Equation (11) requires the Jacobians H t and M 0→t , the CVT U, and their adjoint counterparts. The evaluation of Eq. (11) can be made more efficient by the following standard algorithm.
1. Set the reference state at t = 0 to the background state: 2. Do the outer loop.
(f) Do the inner loop.
i. Integrate the perturbation trajectory over the time window, 1 ≤ t ≤ T , with the linear forecast model: Compute the gradient as follows: ∇ δχ J = δχ − δχ b + U T λ(0). vii. Use the conjugate gradient algorithm to adjust δχ to reduce the value of J . viii. Compute the new increment in model space using the CVT: δx = Uδχ . ix. Go to step 2fi until the inner-loop convergence criterion is satisfied.
(g) Update the reference state: x r → x r + δx. (h) Go to step 2a until the outer-loop convergence criterion is satisfied. At convergence, set x a = x r .
3. Run a non-linear forecast from x a for the background of the next cycle and longer forecasts if required.
The full procedure (adapted for 3DFGAT, where the linear model is omitted) is shown graphically in Sect. 4.7.

System tests
The system has a special test suite to check aspects of operators that are coded. Operators that have an adjoint counterpart are subject to an adjoint test to demonstrate that the adjoint has been coded correctly. This includes the linearised observation operators and components of U. Many of these operators are subdivided into constituents that are tested separately (e.g. interpolation, halo swapping, and Fourier transforms). For a coded operator, A, with input v in and its coded adjoint A T , the adjoint test computes the left-and right-hand sides of the following formula, which must agree to machine precision to gain confidence that the coded adjoint is correct: where a random vector v in will normally suffice. The CVT needs to be inverted in the gradient algorithm when using more than one outer loop (step 2a in Sect. 3.5) and in calibrating the B matrix (Sect. 4.3). These operators are subject to an inverse test to demonstrate that the inverse has been coded correctly. This is done by reading in a perturbation state and then passing it through AA −1 . The result is output, which can be compared to the original field read-in data.
A test that the gradient of the cost function (as computed for the minimisation) is valid can be confirmed in a gradient test, which is also provided as part of the test suite. The gradient test estimates progressively more accurate finite-difference approximations to the gradient, and it checks that they converge to the analytically computed gradient (Eq. 11). Other tests are possible that have not been included in this version of ABC-DA, e.g. checks that the innovation statistics, namely is the background's version of the observations and the angled brackets indicate average over a large number of DA cycles with the same observation network).

Scientific and technical configuration of ABC-DA v1.4
This section is a description of the current scientific configuration of the ABC-DA system. This section also contains some technical information and can be read in conjunction with the user documentation available on GitHub (https://github.com/rossbannister/ABC-DA_1.4da/ blob/master/docs/Documentation.pdf, last access: 24 August 2020) where more information is available, including names of the executables to be run, the namelist variables that have to be set, and the input and output file names.
References are made to this document in the sections below in the form of GitHubDoc §x. The code is divided into master programs that perform specific tasks. The relevant variables are set in a namelist file (filenames, options, switches, and parameters), and then the relevant executable is run. A list of the available master routines is listed in GitHubDoc §2, and instructions on how to download and build the code are found in GitHubDoc §3.

Construction of a model state and making a forecast
The initial conditions for a model run may be generated using the program Master_PrepareABC_InitState (GitHub-Doc §4.1). The code can take a slice from a specific Met Office Unified Model (UM) file, or it can generate a simple idealised pressure "blob" of specified position and size (or a combination of these). When initial fields are taken from the UM, they need to be adjusted to make them compatible with the ABC model. This involves a number of steps: (i) adjusting the fields towards the E and W edges of the domain to be consistent with the periodic boundary conditions; (ii) adding a constant to the v field to force its integral over each level to zero to allow GB in Eq. (2a) (balance condition, Eq. 2b, though, is not enforced to allow for some imbalance); (iii) computingρ from v with Eq. (2a) and then adding a constant to force its integral over each level to zero; (iv) computing b to satisfy the HB condition (Eq. 3); and (v) setting w so that the 3D winds have zero divergence. A forecast can be made from these initial conditions using the program Master_RunNLModel (GitHubDoc §4.2) by numerically integrating Eq. (1).

The CVTs (B matrix) implemented
ABC-DA has a variety of options implemented to model the B matrix using control variable transforms (CVTs, Sect. 3.4), and this section describes the current implementation. The transforms are most easily understood by describing first the inverse CVTs (since they allow the difference "spaces" to be defined starting in model space and working towards the control space). The CVT operators defined are used in many of the programs mentioned in later sections.

The inverse parameter transform, U −1 p
Recall that U −1 p transforms a perturbation in model variables Eq. (1) to alternative parameters that are assumed to be uncorrelated. It is needed primarily to calibrate the CVT (Sect. 4.3). The input fields in this procedure are the perturbations δu, δv, δw, δρ , and δb ; the output fields are (in the version of the code documented) δψ (streamfunction), δχ vp (velocity potential 1 ), δρ u (unbalanced scaled density), δb u (unbalanced buoyancy), and δw u (unbalanced vertical wind). All input and output fields are a function of longitude and height. This is the algorithm for U −1 p . 1. Compute the streamfunction: The operator ∇ −1 x is defined as ∇ −1 x δv = (∂/∂x) −2 ∂(δv)/∂x, which is based on application of the Helmholtz theorem (see Petrie et al., 2017, Sect. 4.1).

Compute the velocity potential (again based on the
Helmholtz theorem): Note that in Eq. (13) δψ depends only on the meridional wind and in Eq. (14) δχ vp depends only on the zonal wind. This is unlike a system that has latitude dependence, where δψ and δχ vp would each depend on both δu and δv as per the Helmholtz theorem.
3. Compute the GB scaled density: which follows from application of the Helmholtz theorem for this system, (δu, δv) = ∂δχ vp /∂x, ∂δψ/∂x , applied to the GB equation (Eq. 2a). The value α = 1, unless the system is configured to turn off GB in this transform, in which case α = 0.
4. Compute the balanced scaled density after it has been vertically regressed: The vertical regression (VR) operator R ρ has the form is the correlation matrix between a previously computed population of δρ b perturbations with itself, and C δρ δρ b is the correlation matrix between δρ b and δρ . The justification for the use of R ρ is given in Appendix B. The system can be configured to turn off this step (and is not used anyway if α = 0 in step 3).

Compute the unbalanced scaled density:
6. Compute the HB buoyancy: The operator L hb is defined as L hb δρ = C∂ρ /∂z, as in Eq.
(3). The value β = 1, unless the system is configured to turn off HB, in which case β = 0.
9. Compute the unbalanced vertical wind: (δw u = δw if γ = 0). Some of these steps may be omitted according to user options, as specified above. The above steps may be written more compactly as the following "super matrix": It is noted here that this particular form of transform is not necessarily the most appropriate form for convective-scale systems, e.g. GB in step 3 and HB in step 6 may not be relevant. There is, however, expected to be some GB at the larger scales represented and HB at even shorter scales. Furthermore these relationships are still used in some operational systems, so their inclusion in this study is justified. The use of other balance relationships is possible, including statistical balance relationships (e.g. Derber and Bouttier, 1999;Chen et al., 2013;Bannister et al., 2020). An alternative balance relationship that may be applicable at convective scale is mentioned in the summary.

4.2.2
The forward parameter transform, U p U p transforms perturbations of parameters to model space. This transform (and its adjoint) is used at each iteration of the Var algorithm. The input fields in this procedure are the parameter field perturbations δψ, δχ vp , δρ u , δb u , and δw u ; the output fields are δu, δv, δw, δρ , and δb . This is the algorithm for U p .
1. Compute the zonal wind based on the Helmholtz theorem: 2. Compute the meridional wind (also based on the Helmholtz theorem): 3. Compute the balanced scaled densityρ b (Eq. 15).
9. Compute the total vertical wind: Again, some of these steps may be omitted according to user options, as set out in Sect. 4.2.1. The above steps may be written more compactly as the super matrix: Using Eqs. (22) and (27), it may be confirmed that U p U −1 p = I. The adjoint of Eq. (27) is constructed directly from the code.

4.2.3
The inverse spatial transform, U −1 s In the current configuration, the spatial transform comprises separate horizontal (U h ), vertical (U v ), and scaling ( ) transforms. The order of these transforms may vary. The first ordering is called the "classic transform order" (CTO, since this was the transform order in the first Met Office Var system; Wlasak and Cullen, 2014), and the second is called the "reversed transform order" (RTO) is a diagonal matrix of background error standard deviations of the parameters, as a function of longitude and height (although options are implemented to allow the standard deviation to be a function of height only or a constant for each parameter). After the parameters have been divided by , the problem remains one of modelling the covariances between spatial points in space.
There are separate spatial operators for each parameter defined in Sect. 4.2.1, and so strictly we should define the overall spatial transforms as block-diagonal forms, but we instead adopt a casual way of describing the transforms to avoid getting bogged down in notation. Depending on the context, these transforms may represent all parameters at once (as done in Sect. 3.4), single parameters or to individual horizontal levels or vertical columns (as is done below).
U h and U v have the same generic form, as follows: where F is the (exact or assumed) matrix of eigenvectors (columns of F) of the covariance matrix that is being modelled, 1/2 is the diagonal matrix of eigenvalues, and G is any orthonormal square matrix (G T G = I) of the same dimensions as 1/2 . In Eq. (30a), F T projects a state onto the eigenvectors (mutually uncorrelated by definition), −1/2 scales the projections so they have unit variance, and G is an arbitrary rotation. If the complete CVT had this form (but also incorporating ), then the implied covariance would, by Eq. (10), be where F F T is the eigenvalue decomposition of the covariance matrix in question. In this illustration, the CVT is an exact representation of the covariances, but when this procedure is applied in practice, it is only an approximate covariance model, e.g. due to the separation of the horizontal and vertical transform or to the application of approximate eigenvectors. The structure of the actual implied covariances can be investigated with the software suite (Sect. 4.4).
In the ABC-DA system, we use the following for F and G in Eq. (5).
-For the horizontal transform U −1 h , F is called F h , whose columns comprise horizontal plane waves of the form ∼ exp ikx, where i = √ −1 and k is the wavenumber (each column of F h is a different k). In this context, F T h represents a horizontal Fourier transform. This makes the assumption that the eigenvectors of the horizontal covariance matrix are plane waves, and the eigenvalues in h are their variances. This is equivalent to assuming horizontal error covariances that are homogeneous (see Bartello and Mitchell, 1992;Berre, 2000;Bannister, 2008b). G is set to I in the horizontal transform.
-For the vertical transform U −1 v , F is called F v whose columns are the eigenmodes of a vertical covariance matrix (labelled with ν; see below), and v represents their variances. G is set either to F v to give a symmetric vertical transform or to I, depending on user choice.
The spaces that these operators work in depends on the chosen order of the transforms and on whether the vertical transform is symmetric or not. The following summarises these options, is repeated for each parameter, and can be read with Fig. 1, which shows how the transforms change the horizontal and vertical co-ordinates.
-For the CTO, U −1 v operates vertically on a field that is a function of x and z. The vertical eigenvectors or values are those of a pre-computed horizontally averaged vertical covariance matrix, and so these matrices themselves are not dependent on horizontal position in ABC-DA.
-For the symmetric vertical transform option, is also a field that is a function of x and z. The horizontal transform, U −1 h , then operates horizontally on such a field. The horizontal eigenvalues are those of pre-computed horizontal covariance matrices (one for each z in ABC-DA). The output of U −1 h is a field that is a function of k and z (see Fig. 1a). This combination of options allows a different horizontal covariance to be specified for each vertical level.
-For the non-symmetric vertical transform option, v is a field that is a function of x and vertical eigenmode index ν. The horizontal transform, U −1 h , then operates horizontally on such a field. The horizontal eigenvalues are those of pre-computed horizontal covariance matrices (one for each ν in ABC-DA). The output of U −1 h is a field that is a function of k and ν (see Fig. 1b). This combination of options allows a different horizontal covariance to be specified for each vertical mode, effectively allowing horizontal and vertical length scales to be associated.
-For the RTO, U −1 h operates horizontally on a field that is a function of x and z. The horizontal eigenvalues are Representations of the vertical direction include model levels (labelled with z 1 , z 2 , etc.) and vertical modes (ν 1 , ν 2 , etc.). Representations of the horizontal direction include model grid points (x 1 , x 2 , etc.) and Fourier modes (k 1 , k 2 , etc.). Moving from left to right indicates the forward transform and from right to left indicates the inverse transform. The horizontal transform is always done independently for each vertical coordinate (z i to z i or ν i to ν i ), and the vertical transform is always done independently for each horizontal co-ordinate (x i to x i or k i to k i , as guided by the colours). The co-ordinates used in the control space in each option are indicated on the leftmost panels. those of pre-computed horizontal covariance matrices (one for each z in ABC-DA). The output of U −1 h is a field that is a function of k and z. The vertical transform, U −1 v , uses a separate set of vertical eigenvectors/values for each k.
-For the symmetric vertical transform option, the output of U −1 v is also a field that is a function of k and z (see Fig. 1c).
-For the non-symmetric vertical transform option, the output of U −1 v is a field that is a function of k and ν (see Fig. 1d).
We would expect no difference between the implied covariances of the symmetric and non-symmetric vertical transform options in the reversed case; although, both options exist in the code.

The forward spatial transform, U s
The forward spatial transforms follow in a straightforward way from the inverses defined in Sect. 4.2.3, namely for the CTO and for the RTO The adjoint operators follow in a straightforward manner.

Calibrating the CVTs (B matrix)
The CVTs comprise many sub-matrices that need to be determined in a calibration procedure. The operators to be determined are the regression operator R ρ (part of the parameter transform mentioned in Sect. 4.2.1 and 4.2.2) and , h , F v , and v (parts of the spatial transforms mentioned in Sect. 4.2.3 and 4.2.4). The number of pieces of information to be determined in this procedure is explored in Appendix A. These matrices are determined from model training data in five stages, all using the program Mas-ter_Calibration (GitHubDoc §4.4), and they are stored in a covariance file, which is produced by this routine. It is impossible to use a genuine sample of forecast errors to calibrate the B matrix, so instead we use ensembles of forecast perturbations, which are considered proxies of forecast error (Buehner, 2005;Pereira and Berre, 2006). The five calibration stages are described here, and example outputs are shown for a standard set-up (experiment GB+VR+ to be described in Sect. 5.1).

Generate a population of training data from UM fields
This is calibration run stage 1 (Master_Calibration is run with the namelist variable CalibRunStage set to 1). This takes data from one or more UM files (one or more ensembles of forecasts) and extracts multiple longitude and height slices from these files to construct an effective "super ensemble". These are each adjusted to make them compatible with the ABC model (as in Sect. 4.1) followed by a short forecast of specified length (in our examples 1 h). These procedures are intended to give the ensemble members properties of the ABC model rather than the Unified Model from which they came, although the degree to which this has been achieved is not demonstrated. The super ensemble is output from this stage. Also specified at this stage are the model parameters (in the example to be described A = 0.02 s −1 , B = 0.01, and C = 10 000 m 2 s −2 ) and user settings for the transform options mentioned in Sect. 4.2 (here, unless stated otherwise, the control options use GB (α = 1), Sect. 4.2.1 step 3; VR, step 4; HB (β = 1), step 6; no anelastic balance (γ = 0), step 8; the CTO, Sect. 4.2.3; non-symmetric vertical transform, Sect. 4.2.3; and parameter standard deviations that are a function of vertical level only). These are all output in a provisional covariance file (netCDF format). At this stage, the file is blank apart from containing information on these options for future reference. These user options are read from this file in later stages of the calibration when the above mentioned matrices are computed and output to this covariance file. Technical information is given in GitHubDoc §4.4.1, and the ensemble members can be plotted using the Python program specified there.
In this paper, a super ensemble of 260 members is used. Appendix A shows that this is more than adequate to determine the spatial transform matrices and the vertical regression matrix.

Generate a population of forecast perturbations
This is calibration run stage 2 (Master_Calibration is run with the namelist variable CalibRunStage set to 2). The forecasts output from stage 1 are converted to means and perturbations from the means. See GitHubDoc §4.4.2, which includes plotting information.

Compute the vertical regression matrix R ρ
This is calibration run stage 3 (Master_Calibration is run with the namelist variable CalibRunStage set to 3). The perturbations from stage 2 are used to calculate populations of δρ b (Eq. 15). The vertical correlations between δρ b and itself and between δρ and δρ b are then used to compute R ρ in the way specified in point 4 of Sect. 4.2.1 (see also Appendix B). R ρ is then output to the covariance file created in stage 1. See GitHubDoc §4.4.3, which includes plotting information. Fig. 2a and b compare example δρ and δρ b fields. The large-scale pattern of these fields is similar, with δρ b being smoother and of lower magnitude than δρ , indicating 3800 R. N Bannister: ABC-DA system v1.4 that the unbalanced contributions to δρ are at smaller scales (as expected). Panel (c) shows an example R ρ matrix and panel (d) shows its effect on δρ b , showing its ability to modify values and vertical scales.

Perform the inverse parameter transform on the forecast perturbations
This is calibration run stage 4 (Master_Calibration is run with the namelist variable CalibRunStage set to 4). The perturbations from stage 2 are transformed to parameters using the procedure represented by U −1 p (Sect. 4.2.1) and then output. The mean states found from stage 2 are also used in some of the calculations; e.g.ρ 0 is used in step 8 of that procedure. See GitHubDoc §4.4.4, which includes plotting information.

Calibrate the spatial transforms for each parameter
This is calibration run stage 5 (Master_Calibration is run with the namelist variable CalibRunStage set to 5). The perturbations from stage 4 are used to diagnose the matrices , F v , v , and h for each of the five control parameters. The blue lines of Fig. 3 show the background error standard deviations (diagonal elements of ) for the control experiment described above (see the figure caption for a succinct summary). The parameters show some variability with height, although none of the parameters show variations of of orders of magnitude. Note that the heights in the ABC model do not correspond with those in the real atmosphere. 2 The large variability of w (panel e) at model heights around 13 or 14 km for instance does not lie in the ABC model's stratosphere, and in any case without radiative forcing in the ABC model and an ozone layer, we would not expect signatures of the stratosphere to be present.
The blue lines in Fig. 4 show the square root of the eigenvalues of the vertical covariance matrix (diagonal elements of 1/2 v ). 3 We find the vertical covariance matrices of each parameter over each super ensemble member and over each longitude. Each y axis in Fig. 4 is the (integer) vertical model index. The eigensolver sorts these into ascending value of eigenvalue, so the physical meaning of the modes can be unclear. For instance, examining the eigenvectors by eye (not shown), for parameters δψ (panel a), δχ vp (panel b), and δρ u (panel c), the vertical modes of low index have vertical profiles that are generally rapidly oscillating and have more weight in the lower model levels than in the upper ones. As the vertical mode index increases, the oscillations, become less rapid and tend to have weight over the entire depth of the model atmosphere. There is no obvious trend concerning the vertical modes of δb u (panel d) and δw (panel e).
The values in Fig. 4 are of comparable magnitude between parameters because the vertical error covariance matrices are formed from the populations after they have been normalised with −1 -see Eqs. (31) and (32). Calibrating the vertical transform involves (for the CTO) constructing a single global vertical covariance matrix or (for the RTO) constructing one vertical covariance matrix for each wavenumber. The eigenvalues and eigenvectors follow from this procedure. The eigenvectors of the horizontal co-  variance matrix are assumed to be plane waves. In the case of the CTO with a non-symmetric vertical covariance model, for example, the horizontal variances follow from passing each ensemble perturbation (δx) through the partial inverse transform to give δη Here each δη is a field which is a function of k and ν. This ensemble is then used to estimate the variances of δη and 0 h (k, ν). Figure 5 shows the square root of the horizontal variances (diagonal elements of 1/2 h ). In the particular configuration that has a non-symmetric vertical transform, there is a differ-ent horizontal variance spectrum for each vertical mode, so the axes in Fig. 5 are wavenumber, k, and vertical model index ν. The spectra for δψ (panel 5a) and δχ vp (panel 5b) are similar and represent perturbations with long horizontal correlation length scales (most weight is at small wavenumbers). Their similarities are explained by the fact that the fields are related to the horizontal winds in the same way: v = ∂δψ/∂x and u = ∂δχ vp /∂x. The spectra for δρ u (panel 5c) show that perturbations in this parameter have shorter horizontal length scales than in panels (5a) and (5b), and the length scale re- duces with higher vertical modes. This suggests that perturbations with broader vertical scales (generally higher vertical mode index) tend to have shorter horizontal scales. The spectra for the remaining parameters, δb u (panel 5d) and δw (panel 5e), show that perturbations of these parameters are courser still, and there is no obvious pattern with vertical mode index. The same data plotted in Fig. 5c (for δρ u ) are also plotted in Fig. 6a but with transformed co-ordinates to facilitate further interpretation. In Fig. 6a the variance square roots are shown as a function of horizontal length scale l h and vertical length scale l v ; l h is related to horizontal wavenumber index k (an integer), as l h = L h /k (where L h = 540 km is the horizontal domain length and excluding k = 0 since it gives an infinite l h ), and l v is related to the vertical eigenmode index for δρ u in the way outlined in Appendix C. For reference, panel (5b) shows l v for each vertical mode of δρ u , which confirms that higher ν is broadly associated with deeper structures. Furthermore, modes with deeper structures have fewer nodes (not shown). Incidentally, this correspondence between ν and l v is also found for the vertical modes of δψ, which represent the balanced scaled density increments.
Returning to panel (a), the Rossby radius of deformation, L R , has also been plotted (the black curve). This has been estimated using a shallow water interpretation; i.e. L R = √ gl v /f (e.g. Cullen, 2006). The Burger number, Bu, is defined as Bu = L R /l h , and it follows that all of the represented variability in these unbalanced modes has large Bu. Dynamical theory says that for l h L R (i.e. Bu 1), the rotational wind is a potential vorticity (PV)-like variable; for l h L R (Bu 1), the mass variable becomes a PV-like variable (e.g. Cullen, 2003;Katz et al., 2011). PV is often associated with balanced flow, so the above theory suggests that the unbalanced mass variable (δρ u ) should vanish -and hence have vanishing variance -for Bu 1. Figure 6a shows that for 10 1 km l h 10 3 km, the variance of δρ u does increase as Bu increases, showing some consistency with the dynamical theory at synoptic scales. It further suggests that a rotational wind variable, like δψ, is a reasonable variable to capture the balanced part of the flow at synoptic scales.

The implied B matrix
The covariance model is completely described by (i) the parameter transform presented in Sect. 4.2.2, (ii) the results presented in Sect. 4.3.5, and (iii) other statistics like the vertical eigenvectors themselves (not shown). The software suite has a program to compute the implied covariances (Mas-ter_ImpliedCov; see GitHubDoc §4.6), which uses information in the CVT file output by the calibration procedures. The program acts with the combination of matrices UU T on a vector which is zero apart from one element (which we call the source point), which is set to unity. (In the example below, this element corresponds to δρ near the centre of the domain.) Completing the matrix algebra, the implied super matrix (for fields δu, δv, δw, δρ , δb ) is found from Eq. (10); U = U p U s , and Eq. (27): where to save space only the upper triangular part of the covariance matrix is shown. (The rest of the matrix is the adjoint of the corresponding transpose element.) The following definitions have been made for convenience: Recall that the factors α, β, and γ are set to 1 or 0 to turn on or off geostrophic, hydrostatic, or anelastic balance in the covariance model (Sect. 4.2). Equation (34a) represents the implied spatial covariances, where p (parameter) is short for each control parameter listed in Eq. (34a). The diagonal blocks of Eq. (33) are the implied covariances of the model variables. The structure of the terms in Eq. (33) reveals the way that the covariances have been modelled. For example, the implied covariance of δρ is the 4, 4th block, ψ r s + Bρ u s , and comprises a balanced contribution (the first term, which depends upon the covariance structure of δψ) and an unbalanced contribution (the second term, which is independent). Other variables have a similar decomposition.
This equation is too complicated to illustrate the implied geographical covariance structure, so a selection of the implied covariances (Eq. 10) are computed and shown graphically in the top row of Fig. 7 (Fig. 7a, b, c). All panels represent the covariances between a source point (δρ at the position of the cross) with δρ (panel 7a), δv (panel 7b), and δb (panel 7c). For comparison, the bottom row shows the respective raw sample covariances found from the population of 260 forecasts used to calibrate the covariance model (found using the program Master_RawCov; see GitHub-Doc §4.7). We regard the signals contained in the raw covariances as a rough (row-rank) guide to the covariances that should ideally be modelled by the CVT.
The autocovariances in panel (7a) have a region of positive correlations about the source point, with regions of negative correlation towards the east, west, above, and below. This broad structure is apparent in the raw covariance panel (7j) although the implied values are 2 to 4 times larger. The covariances between δρ and δv in panel (7b) show dipole patterns in the horizontal due to GB imposed according to Eq. (2a). This is done in the CVT via the 2, 4th block of Eq. (33), which spreads out the source point with B ψ s R T ρ , evaluates the horizontal gradient, and then multiplies by α f C . Physically, panel (b) is the meridional wind response (an anti-cyclone) to a positive density perturbation at the cross, and these covariances mean that this pattern would appear in the v analysis increments when assimilating a singleρ measurement at the cross. The pattern in panel (b) is evident in the raw covariance in panel (k), but the implied version is slightly too large. Finally, the covariances between δρ and δb in panel (c) show the hydrostatic response of the density perturbation, which is the 5, 4th block of the CVT, α 2 β f 2 C 2 L hb B . There are two contributions to the δρ -δb covariances: one involves the route δρ → δψ → δb (exploiting GB and HB) and the other involves the route δρ → δρ u → δb (exploiting only HB). A closer inspection of Eq. (33) reveals that the 5, 4th block is equal to βL hb (proportional to the vertical derivative operator -see line after Eq. 18) acting on the 4, 4th block, which is the δρ response. Figure 7 shows that panel (7c) is indeed related to the vertical derivative of panel (7a). The structure of panel (7c) comprises complicated bands of positive and negative covariances, and again this pattern would appear in the b analysis increments when assimilating a singleρ measurement at the cross. The structure in panel (7c) is evident in the raw covariances panel (l), suggesting that HB is followed by the forecasts to some extent although the implied covariances are up to about 3 times larger.

Observations and observation operators
ABC-DA currently supports observation operators for direct (point) observations of u, v, w,ρ , b , a conserved tracer, √ u 2 + v 2 , and √ u 2 + v 2 + w 2 . These correspond to observation codes 1-8, respectively. Observations are read in via a text file, which specifies the time of each observation, its longitudinal and height position, the observation code, the observation value itself, and its error standard deviation. The format of the input file is specified in GitHubDoc §4.8. The observation operators employ a bi-linear interpolation, and observation errors are assumed to be uncorrelated.
The user may extend the range of observation operators available by assigning a new observation code and adding a new branch in the observation operator and adjoint routines. Furthermore, observations are assigned a batch number. This is not used in the current code, but it may be used by the user to allow for correlated errors within each batch.

Generating data for twin experiments
The program Master_MakeBgObs (see GitHubDoc §4.8) is available to generate an observation network, to generate synthetic random observations given an observation network (for later assimilation), and to generate a random background state that is consistent with the background errors as specified in a CVT file. Further details are as follows.
-An observation network is a specification of a grid of observation times, locations, and error standard deviations, as well as which synthetic observations to make (but not the observation values themselves). For example, the user may wish to generate synthetic observations of s = √ u 2 + v 2 in an N s x × N s z grid between x s 1 and x s 2 and between z s 1 and z s 2 . This is also true for synthetic observations ofρ in an Nρ x × Nρ z grid between xρ 1 and xρ 2 and between zρ 1 and zρ 2 . Running Mas-ter_MakeBgObs with the option to generate this observation network (specified in an input file) will produce a 3804 R. N Bannister: ABC-DA system v1.4 Figure 7. A selection of covariances (panel columns) found in different ways (rows). All covariances are between the source point (δρ at the cross) and δρ (left column), δv (middle column), and δb (right column) at the field locations. The rows correspond to implied covariances when GB and VR are used in the CVT (GB+VR+, top row), implied covariances when GB is not used (GB−, second row), implied covariances when GB is used and VR is not used (GB+VR−, third row), and when covariances are calculated directly from the population used to calibrate each CVT (bottom row). Red (blue) indicates positive (negative) covariances. text file containing entries for each individual observation (where it should be made, when it should be made, which quantity should be measured, and at what precision -see GitHubDoc §4.8). This file could be produced manually, but this software option exists to make the task less tedious. Once produced, this file can be edited by removing, adding, or altering individual observation points.
-Pseudo observations are generated according to the output from the above procedure and an ABC model initial condition (truth) file. Running Master_MakeBgObs in this mode will run a forecast from the initial conditions, generate "true observations" from the forecast, and will then add observation noise to each observation. The output from this stage is a file of synthetic observations which can then be assimilated (Sect. 4.7). The true observations are also output within this data file, which can be used later for error analysis. The user may wish to generate synthetic observations for an identical twin experiment by using the same model parameters (i.e. A, B, C) as will be used in the DA or may wish to run a non-identical twin experiment with different parameters. As with the procedure for generating the observation network, this file can be edited or instead created manually from scratch. For example, the user may want to add, remove, or change individual observations in order to investigate the effect on the assimilation.
-A valid background state may be generated with the program Master_MakeBgObs. The truth state (conventionally that used to generate the synthetic observations) is perturbed with a random state δx = Uδχ (Eq. 8), where δχ is drawn from N (0, I). The information used to describe the CVT, U, is found from a specified CVT file.

Data assimilation
The program that performs a DA cycle is called Mas-ter_Assimilate; see GitHubDoc §4.9. The inputs to this program are (i) a background state (e.g. as generated in Sect. 4.6), (ii) a set of observations (e.g. as generated in Sect. 4.6), and (iii) a description of the B matrix (in a CVT as found using the calibration procedure in Sect. 4.3). The assimilation methods currently available are 3DVar and 3DF-GAT, although the code is flexible enough to allow other methods (e.g. 4DVar, EnVar; Bannister, 2017) to be developed.
The algorithm for 3DFGAT (the preferred option in the current version) is shown in Fig. 8. The reference state, x r , is set to the background, x b , which is then used to compute the innovations, d(t), computed at their appropriate times within the DA window. The control space perturbation, δχ is set to zero. d(t) and δχ enter the inner-loop minimisation (pink box), where the residuals, r(t), are computed. These are used in the calculation of ∇ δχ J o , where J o is the ob- Figure 8. The outer and inner loops of the variational 3DFGAT procedure. The main inputs into the algorithm are the background state, x b , and the observations, y. The algorithm inside the pink box is the inner loop. The small red arrows represent switches in the data flow. The initial positions (shown) are changed after the first iteration. This algorithm incorporates cycling, where the forecast made (on the right-hand side) is fed back (on the left-hand side) for the next assimilation window. The full 4DVar algorithm to calculate the gradient ∇ δχ J (essentially part of the inner loop) is given in servation term of the cost function. This is added with the background contribution, ∇ δχ J b , giving the total cost function gradient. This is used with a descent algorithm -in our case a conjugate gradient algorithm -which provides the increment, δχ . The switch is flipped (red arrow inside the pink box) so that the inner-loop procedure is repeated with the δχ that is updated with δχ . Once converged, the reference state is incremented by Uδχ . This ends the first outerloop iteration. If further outer-loop iterations are required, the switch is flipped (red arrow outside of the pink box) so that the incremented x r is fed into the innovation calculation. The x r at the final outer loop is the analysis, x a . In addition to the analysis, and the analysis increment, there are many diagnostic outputs from the DA, as detailed in GitHubDoc §4.9. This includes the reference states at each outer loop, the gradient ∇ δx J o , the values of the cost function with iteration, and versions of the observations file with extra information. The last file is output for each outer loop and includes the model observation values computed from x r (the x r corresponding to the last outer loop is x a ), the innovation, d(t), the residual r(t), and the gradient of the observation term in the cost function with respect to the model observation, R −1 t r(t). These diagnostics can be plotted using the Python program specified in GitHubDoc §4.9. A selection of example diagnostics is shown in Sect. 5 in connection with the study on the effect of the GB conditions in the CVT.

Cycling
A (provided) Bash script controls the schedule of tasks to perform DA cycling; see GitHubDoc §5. The background state for the first cycle is made using the procedure in 3806 R. N Bannister: ABC-DA system v1.4 Sect. 4.6, but subsequent backgrounds are found from forecasts originating from each analysis. The script also propagates the true state from one cycle to the next for generation of the synthetic observations and for subsequent monitoring.
Summary statistics of cycling experiments and comparative diagnostics between pairs of cycling experiments can be computed and plotted using Python programs provided. The study in Sect. 5 shows many of these diagnostics, which may be regarded as examples.

An investigation of balance in modelling the B matrix
Most Var systems began their lives serving global models on relatively coarse grids. In mid-and high-latitude regions, and when only relatively large scales are represented, a significant portion of the flow fields are related via GB and HBs (GB and HB). Most Var systems use this property to imply relationships between variables, which impact the background error covariance statistics, and a similar approach is often followed in systems that use much finer grids (as is done in Sect. 4.2 for ABC-DA). Since GB becomes a less important constraint at smaller scales (Berre, 2000;Bannister et al., 2011;Petrie et al., 2017), a research question arises concerning whether using such a constraint in the B model is necessary or indeed harmful in convective-scale DA.
Some regional Var systems, such as those for the AL-ADIN (Aire Limitée Adaption Dynamique et dévelopment InterNational; Berre, 2000) and WRF (Weather Research and Forecasting model; Chen et al., 2013) models couple mass and wind fields via a linear regression operator, which is trained from sample data. The degree of balance represented by these operators will be appropriate to the training sample. Other systems, such as the current configuration of ABC-DA (Sect. 4.2), the Met Office (Ingleby, 2001;Bannister, 2008b), and HIRLAM (HIgh Resolution Limited Area Model; Gustafsson et al., 2001) impose analytic GB operators. Analytic relationships are "cleaner" and more physical than regressions, but their current forms are not so adaptive to the underlying conditions. Geophysical balances are not only sensitive to scale and latitude but also to precipitation, where heavy precipitation leads to a weakening of the balance between mass and wind (e.g. Caron and Fillion, 2010;Chen et al., 2016). Few, if any, studies have examined the effect of switching off the GB constraint completely in the B matrix in convective-scale systems. This is one way to affect the implied covariances (α = 1 → 0 in Eq. 33). Other suggestions have been made to change the nature of the wind coupling in B at convective scale. Sun et al. (2016), for example, compared using ψ,χ-based uncorrelated control parameters with u,v-based ones (with the associated change of the CVT). Their comparison of these two alternative sets of wind control parameters was not so straightforward as their ψ,χ -based set used a linear regression-type balance with mass variables, while their u,v-based set did not. This did not allow for a clean analysis of the effect of the change of parameters versus the effect of turning off the balance constraint.

Description of the DA experiments
In this section we do three DA experiments each with its own configuration as described below.
-Experiment GB+VR+ mirrors the operational configuration of the Met Office DA system. In this configuration δρ is decomposed into a GB part, δρ b , and an unbalanced part, δρ u (α = 1 in Eqs. 27 and 33), and it uses the VR operator, R ρ , to modify the calculation of the balanced scaled density perturbation increments from the streamfunction increment. δb is decomposed into a HB part, δb b , and an unbalanced part, δb u (β = 1).
δw is analysed as though it were completely unbalanced (γ = 0). The ordering of the vertical and horizontal transforms is the CTO, and the non-symmetric vertical transform is used. As a reminder, the implied covariances for GB+VR+ are shown in the top row of Fig. 7.
-Experiment GB− differs from GB+VR+ in that there is no balanced component of δρ (α = 0, β = 1, and γ = 0). Since there is no GB operator, the VR step is irrelevant. The implied covariances for GB− are shown in the second row of Fig. 7. When calibrated with this configuration, the δρ − δρ covariances (panel d) are almost the same as for GB+VR+ (see the 4, 4th block of Eq. (33) to see how α affects the δρ − δρ covariances). The δρ − δv covariances (panel e) are zero as expected as GB was the mechanism used to couple these two variables in the CVT (see the 2, 4th block of Eq. 33). As δρ b = 0 in this experiment, δρ u = δρ , which is consistent with the variances of δρ u (red line in Fig. 3c for GB−) being larger than those of the blue line (GB+VR+). The δρ − δb covariances in Fig. 7f are similar to those for GB+VR+ (panel c), showing that the effect of GB on these statistics is small.
The implied covariances for GB+VR− are shown in the third row of Fig. 7. When calibrated with this configuration, the δρ − δρ covariances (panel g) are slightly smaller than for GB+VR+. The effect on δv (panel h) is noticeably smaller than in GB+VR+. In GB+VR+, R ρ has the effect of increasing the magnitude of the balanced increment (δρ b r is greater than δρ b on average) and thus decreasing the variance of δρ u . This is seen with the blue line of Fig. 3c (the variance of δb u in GB+VR−) being smaller than the green line (GB+VR−), and it explains why the geostrophic response in Fig. 7b is larger than in Fig. 7h. The patterns For each experiment, the initial background is found by perturbing the initial truth in a way that is consistent with the B matrix of each experiment (see last bullet point in Sect. 4.6). All other aspects of the assimilation are identical in all three experiments: 3DFGAT with a 1 h window is used for 30 DA cycles. The observations are identical in all cases, which are generated from the truth run with an error standard deviation of 0.0015. There are 2520 observations ofρ in each cycle, which are spread over a 20 × 18 grid of points broadly covering the domain, and over seven times (t = 0, 600, 1200, 1800, 2400, 3000, 3600 s). Only one outer loop of 3DFGAT is executed per cycle with 100 inner-loop iterations. This comparison is intended to be only a brief demonstration of the ABC-DA system, and we leave a more in-depth study to later papers.

Results
Diagnostics collected over all 30 cycles for each variable in this experiment are shown in Fig. 11. The left panels show how the root-mean-squared (rms) values of the fields vary in time for three versions of the fields: the truth (solid lines), the analysed fields (dotted), and a free forecast starting from the original background (dashed). For all fields except w, the assimilation overestimates the rms field values, although the assimilation does often follow the oscillations in the true trajectory. The overestimation in the assimilation is actually nearly always worse than for the free background for v, is sometimes worse and sometimes better for u and b , but it is nearly always better forρ (recall that only observations ofρ are assimilated). The right panels show the rms errors (RMSEs) of the analysed (dotted) and free (dashed) fields, which we do know in such twin experiments. In the first few cycles the analyses are improved compared to the free run in all variables, but there comes a time when the assimilation becomes worse than the free run for most fields (especially v and b ). The assimilation stays better than the free run for ρ over all 30 cycles, and the background error covariances imposed betweenρ -v andρ -b (panels b and c of Fig. 7) do not appear to be of benefit to v and b , which is surprising. Note that there are no assimilation jumps in the u field, because it is neither an observed variable nor correlated toρ in the B matrix.
At t = 0 the two trajectories of the same colour in each right panel represent background (dashed) and analysis (dotted) errors and so can be compared. For all variables (except u and w), these two points are different at t = 0, with smaller analysis errors than background errors. This indicates that the DA does add improvement early in the cycling. Even though the analyses are closer to the observations than the backgrounds, the errors in the assimilated variables do grow with time and most of the analysis increments (the jumps at the yellow cycle boundaries) do act to increase the RMSE throughout the experiment. There are a number of possible reasons for this degradation, e.g. a poor B matrix formulation for this model (including the use of balance constraints in the CVT), an inappropriate population of forecasts used to calibrate the B matrix, the use of 3DFGAT rather than 4DVar, or the particular choice of observation network for this convective-scale system. Given that the assimilation performs well at t = 0 (where the error in the initial background is consistent with the B matrix formulation by design), it could be that, as the cycling progresses, the nature of actual background errors becomes very different to that assumed by the B matrix. Confirming this hypothesis would require recalibrating the B matrix, which is outside the scope of the present paper. We focus now on the two remaining experiments described in Sect. 5.1 to see if any changes can be made by adjusting parts of B.

Results for experiment GB−
Diagnostics for the GB− experiment are shown in Fig. 12. Comparing the rms values (left) with the GB+VR+ experi-ment, the analysis RMS values become further from the truth for u, v, and b . Note that although the true trajectories are exactly the same as those of GB+VR+, the free trajectories are different as the initial background is consistent with a different B matrix. Again comparing to GB+VR+, the RMSE analysis errors (right panels) are generally worse for GB−, but the assimilation is still able to control theρ field. This suggests that the B matrix for GB− is still not consistent with errors in the 1 h background forecasts over the cycles. Note that there are no assimilation jumps in the u nor v fields, because they are not correlated toρ in the B matrix.

Results for experiment GB+VR−
VR is designed to accompany the GB operator, and the configuration that included these two together in the CVT may have contributed to the relatively poor performance in Sect. 5.2. Here we run the experiment that includes the GB step in the CVT but not the VR. Diagnostics for the GB+VR− experiment are shown in Fig. 13. Again, the true state is the same as in the previous two experiments, but the free background forecast is different due to the different B matrix from which the initial background state is sampled from. The rms values (left panels) of the assimilation are generally closer to the truth than in GB+VR+ (apart from u where they are about the same). The RMSE values (right panels) are also smaller as a fraction of the error in the free Figure 11. (a-c) The rms fields of (a) u, v, w, (b)ρ', and (c) b for the control experiment, GB+VR+. The solid lines are for the truth trajectory, the dotted lines are for the assimilation trajectory, and the dashed lines are for the free trajectories starting from the initial background. The vertical yellow lines mark the times of the cycle boundaries where the assimilation trajectory is the analysis at the start of each cycle and the background (for the next cycle) at the end. run (or at least are smaller for a longer time), although the errors in v do still eventually fail to be controlled by the DA.
The absence of VR does have the effect of weakening the effective strength of the GB between δρ and δv -compare panels (b) and (h) of Fig. 7. This is because, overall, R ρ enhances the balanced scaled density (and so the balanced scaled density is diminished when removing it). It is possible then that it is not the adoption of GB that results in the relatively poor performance of the GB+VR+ assimilation in this system but the use of the VR.

Summary
This article is a documentation of ABC-DA, a Var system for use with the convective-scale ABC model (Petrie et al., 2017) to efficiently test new ideas for convective-scale DA. The DA currently has incremental 3DVar and 3DFGAT implemented, although expansion to 4DVar, ensemble-variational, and hybrid methods is possible. The DA system has an expandable observation operator, which is currently configured to assimilate observations of all model variables, and 2D (horizontal) and 3D wind speeds. The choice of observation network is not fixed, and it is possible to specify arbitrary observation networks in space and time. The B matrix is modelled in a very compact form by a control variable transform (CVT), which is implemented with a range of options. The current multivariate options include the option to use geostrophic balance (GB) to couple δρ and δv, hydrostatic balance (HB) to couple δρ and δb , and anelastic balance to couple δu and δw. All of these balances use diagnostic relationships using analytical (rather than statistical) operators. A vertical regression (VR) step is also included (which works with the geostrophic step) to mirror the operational set-up of the Met Office's DA system. While the model increments are δu, δv, δw, δρ , and δb , the control parameters are δψ, δχ, δρ u , δb u , and δw u , which are assumed to be mutually uncorre-lated. The transforms that model the spatial covariances for each control parameter are formed of vertical and horizontal parts, which model covariances in these respective directions. There are further options that control how each of these spatial transforms work, including the order in which they are used.
ABC-DA comprises component suites to perform the following tasks: run a pure model forecast of any length starting from specified initial conditions, investigate linear properties of the equations of motion, generate an initial ensemble of states (e.g. for calibration of the B matrix), calibrate the parameters that describe components of the CVT to produce a CVT file (This specifies the B matrix for use in the DA program.), test the components of the system (e.g. adjoint tests of CVT operators), generate "raw" covariance maps using a specified population of states, generate implied covariance maps from a specified CVT file, generate sample observations (consistent with a specified R matrix and truth) and a background state (consistent with a specified B matrix and truth), perform a cycle of Var with a specified background state, observations, and CVT file, perform a run of cycled DA/forecast steps where the analysis of one cycle is used to make a background forecast for the next.
The number-crunching code is written in Fortran 90/95, the scripts are written in Bash, and the example plotting routines are written in Python 2. A selection of diagnostics of the CVT is provided in this paper, and a brief study is made to illustrate the system's performance when GB (and VR) is switched on and off in the CVT. GB is used to model the background error coupling between δρ and δv, which is zero when GB is not used, and it has a classical dipole pattern (δv response to a point perturbation in δρ ) when it is used. Exploiting this balance constraint (or not) is investigated as there is a question of whether it has validity in the description of convective-scale background error covariances. VR has the effect of strengthening this geostrophic relationship, because it increases (decreases) the variance of the balanced (unbalanced) contributions to δρ .
Experiments are performed using a set of three 3DFGAT twin experiments by assimilating observations over a 30 h forecast taken as the common truth. The three experiments are (i) using GB and VR (GB+VR+), (ii) not using GB (GB−), and (iii) using GB but not VR (GB+VR−). Each experiment has a different CVT (i.e. a different U) and hence a different implied background error covariance matrix; B ic = UU T . An initial background state is constructed for each experiment by perturbing the truth at t = 0 with Uδχ where δχ is a sample from N (0, I) (equivalent in model space as choosing a sample from N (0, B ic )). The DA/forecast is cycled hourly over 30 cycles with 2520 observations ofρ per cycle. The results are summarised as follows.
-The GB+VR+ experiment does not result in a systematic reduction of errors in the model variables. In fact, for some variables (namely v and b ), it leads to an increase in error of the analysis above the error level of the free background.
-The GB− experiment represents a degradation in performance of the unobserved variables in that the errors are above the error level of the free background for more time than in GB+VR+.
-The GB+VR− experiment represents the configuration that is the best of those tested. The errors in all variables except v are kept below the error level of the free background.
It is surprising that errors in one of the variables in particular (v) cannot be properly corrected in these experiments. It is postulated that this outcome is not due to an inappropriate B matrix (for instance in Figs. 11 and 13) but due to the use of 3DFGAT. Separate tests with 3DVar and with all observations made at the analysis times, rather than spread throughout the analysis window (not shown), suggest that observations ofρ can reduce the error in v. We have learned from these experiments that the nature of the B matrix can have a significant impact on the performance of the DA. Application of GB in such a convective-scale system has a complicated effect: while experiment GB+VR− is better than experiment GB− overall, the application of GB does systematically increase errors in the v field over the 30 cycles, and "too much" GB (as in experiment GB+VR+) enhances this effect. Similar studies may be done by turning on and off hydrostatic and anelastic balances.
These results highlight a difficulty of the convective-scale DA problem, and they raise more questions about how this system could be improved and how it could contribute to advances in DA. A possible way forward might be to keep the current formulation of the CVT but to calibrate it with data from a different source, either as an ensemble, as differences between pairs of forecasts valid at the same time (the popular National Meteorological Center method; Parrish and Derber, 1992;Berre et al., 2006), or from a single long forecast (the so-called "Canadian quick" method; Polavarapu et al., 2005). Other ways forward might be to replace the traditional geophysical balances with a new balance relationship that may be more suitable at convective scales, such as the diagnostic equation for non-hydrostatic pressure outlined in Sect. 4.3 of Pielke (2002). The CVT may also be adapted so that the DA system acts as an ensemble-variational (En-Var) system, where the analysis increments are expressed as linear combinations of ensemble perturbations (e.g. Lorenc, 2003;Buehner, 2005), as a hybrid-EnVar system (a hybrid of the existing B matrix and the EnVar; e.g. Wang et al., 2007), or as part of a hybrid gain formulation (e.g. Penny, 2014). These approaches imply flow-dependent background error covariances (e.g. , but each would require a separate ensemble DA system parallel to the variational one and would also require localisation techniques to mitigate for rank deficiency of the ensemble. These are entirely possible developments given appropriate resources. Studying the B matrix is the main application that the author has in mind, although the system could equally well be used to study impacts of new high-resolution observations.

Appendix A: Number of pieces of information to make up a covariance model
This appendix shows the number of pieces of information that specifies the covariance model used in this paper. Since only analytic balance relationships are used (rather than statistically derived ones, which would themselves require calibration from data), components only of the spatial transform (U s ) and the vertical regression (R ρ ) require calibration. Referring to Eq. (31) or Eq. (32), this includes determination of the standard deviation matrix ( ) and components of the vertical (U v = F v 1/2 v ) and horizontal (U h = F h 1/2 h ) transforms. The vertical regression, R ρ = C δρ δρ b C δρ b δρ b −1 , requires determination of a non-symmetric and symmetric correlation matrix. Table A1 shows this information, concluding that 78 030 pieces of information are needed. In this paper, a super-ensemble of 260 members is used to estimate this background error covariance model. This provides 260×5×360×60 = 28 080 000 pieces of information, vastly more than the number of elements needed for the covariance model. Compare this to n(1 + n)/2 elements to determine a full background error covariance matrix, where n = 5×360×60 is the number of elements in the model state vector. This evaluates to 5 832 054 000, requiring 54 000 ensemble members to estimate these elements at the very least. While this is possible, it is not practical at present. Table A1. The numbers of components of the spatial transforms that need to be estimated from data. The numbers of elements in the third column are for the standard ABC system of 360 longitudes and 60 levels. Note that for 1/2 h , 180 wavenumbers are specified rather than 360 as the negative wavenumbers are assumed to have the same variances as the positive ones.

Matrix
Description Elements to be determined

Appendix B: Vertical regression of balanced scaled density
The direct calculation of the geostrophically balanced scaled density from Eq. (2a) (in practice found via the streamfunction as in Eq. 15) leads to difficulties with regards to the vertical continuity of the resulting δρ b (x, z, t) (Lorenc et al., 2000). If a function (arbitrarily chosen but independent of x, i.e. p(z, t)) is added to δρ b , then δρ b + p is also a solution of Eq. (2a). This will potentially allow damaging jumps to appear with short vertical scale. According to geostrophic adjustment theory, streamfunction (and hence the scaled density computed from it) is not balanced for short vertical scales (e.g. Cullen, 2003) and so the output from Eq. (15) is smoothed in the vertical. This is done by modifying the balanced scaled density with a regression operator R ρ .
From here on in this appendix, we drop the ρ subscript on R ρ for brevity as the subscript will be used to indicate matrix element. We also, for brevity, drop the tilde, prime, and δ notations on the scaled density fields, remembering that the fields are implicitly perturbations. One strategy for R is to introduce similar vertical length scales as those in the forecasts of total scaled density, δρ. We therefore design R to minimise the following penalty: where ρ b (i) and ρ(i) are the ith members of an N -member population of balanced and total scaled density perturbations, respectively. To find the R that minimises J , we solve the equation ∂J /∂R = 0, which is a matrix equation, with element a, b being ∂J /∂R ab = 0.
It is convenient to expand the elements of Eq. (B1): where all subscripts indicate matrix or vector element. Differentiating with respect to R ab gives where δ αβ in the second line is the Kronecker delta function. Now this result is put back into matrix notation. Note that Setting this to zero to solve for the optimal regression matrix gives The outer product matrix -summed and divided by Non the left-hand side is identified as the covariance matrix of the balanced scaled density perturbations with themselves (C ρ b ρ b ), and the corresponding terms on the right-hand side is the covariance matrix between the total scaled density perturbations and the balanced scaled density perturbations (C ρρ b ). The optimal regression matrix is then

Appendix C: Estimation of characteristic length scales
There is a well-defined length scale associated with a plane wave of the form exp iκx (κ is wavenumber, x is position), which is the wavelength l = 2π/κ. For a periodic domain of length L, the allowed wavenumbers are κ = 2kπ/L (k wavenumber index, integer), and so the wavelength is L/k. There is no similar analytical relationship between the vertical mode index, ν, and the vertical length scale, l v , but it is possible to estimate the characteristic length scale as follows.
Suppose that f (z) is an oscillatory function (e.g. a particular vertical eigenvector (column of F v ) of one of the vertical covariance matrices introduced in Sect This translates to a vertical length scale of l v = L v / m . This is the procedure used to make Fig. 6b. We note that the Fourier transform assumes periodicity, which is not a property of the fields in the vertical direction, but since we are after only a rough estimate of l v , we assume that this inconsistency does not have a major impact.
Code and data availability. The model and data assimilation system are written in Fortran 90/95, and the plotting code is written in Python. This software is open source and freely available on a GitHub repository (Bannister, 2019), https://doi.org/10.5281/zenodo.3531926. A sample of initial condition data is also freely available: https://doi.org/10.5281/zenodo.3946359 .
Competing interests. The author declares that there is no conflict of interest.