Recent research in data assimilation has led to the introduction of the parametric Kalman filter (PKF): an implementation of the Kalman filter, whereby the covariance matrices are approximated by a parameterized covariance model. In the PKF, the dynamics of the covariance during the forecast step rely on the prediction of the covariance parameters. Hence, the design of the parameter dynamics is crucial, while it can be tedious to do this by hand. This contribution introduces a Python package, SymPKF, able to compute PKF dynamics for univariate statistics and when the covariance model is parameterized from the variance and the local anisotropy of the correlations. The ability of SymPKF to produce the PKF dynamics is shown on a nonlinear diffusive advection (the Burgers equation) over a 1D domain and the linear advection over a 2D domain. The computation of the PKF dynamics is performed at a symbolic level, but an automatic code generator is also introduced to perform numerical simulations. A final multivariate example illustrates the potential of SymPKF to go beyond the univariate case.

The Kalman filter (KF)

While the equations of the KF are simple linear algebra, the large dimension of linear space encountered in the
realm of data assimilation makes the KF impossible to handle, and this is particularly true for the forecast step.
This limitation has motivated some approximation of covariance matrix to make the KF possible. For instance, in the
ensemble method

One of the major limitations for the PKF is the design of the parameter evolution equations. Although not difficult from a mathematical point of view, this step requires the calculation of many terms that are difficult to calculate by hand and may involve errors in the calculation. To facilitate the derivation of the parametric dynamics and certify the correctness of the resulting system a symbolic derivation of the dynamics would be welcome.

The goal of the package SymPKF 1.0

The paper is organized as follows. The next section provides the background
on data assimilation and introduces the PKF. Section

Dynamics encountered in geosciences are given as a system of partial differential equations (PDEs):

Because of the sparsity and the error of the observations,
the forecast

We now detail how the error covariance matrix evolves during the forecast by considering the formalism of the second-order nonlinear Kalman filter.

A second-order nonlinear Kalman filter (KF2) is a filter that extends the Kalman filter (KF)
to nonlinear situations in which the error covariance matrix evolves tangent-linearly
along the trajectory of the mean state and the dynamics of this mean are governed
by the fluctuation–mean interacting dynamics

Because of the uncertainty in the initial condition, the state

To perform the Reynolds averaging of Eq. (

By setting

Note that the tangent-linear dynamics along the ensemble-averaged dynamics in Eq. (

Now it is possible to detail the dynamics of the error covariance from the
dynamics of the error, which tangent-linearly evolve along the mean state

In the discretized form, the dynamics of the error in Eq. (

Gathering the dynamics of the ensemble mean given by the fluctuation–mean interaction in
Eq. (

Similarly to the KF, the principal limitation of the KF2 is the numerical
cost associated with the covariance dynamics in Eq. (8):
living in a discrete world, the numerical cost of Eq. (8)
dramatically increases with the size of the problem.
As an example, for the dynamics of a simple scalar field discretized
with

We now introduce the parametric approximation of covariance matrices, which aims to reduce the cost of the covariance dynamics in Eq. (8).

The parametric formulation of covariance evolution stands as follows.
If

Hence, starting from the initial condition

In practice, the parametric covariance models considered in the PKF are such that
the number of parameters in

The cost of the PKF can be compared with other low-rank methods such as
the reduced-rank Kalman filter

But the frugality of the covariance model is not the only criterion. For instance,
the first variational data assimilation systems considered a covariance model
based on the spectral diagonal assumption in spectral space

Hence, a covariance model adapted for the PKF should be able
to represent realistic correlations and be such that the dynamics of
the parameters can be computed, e.g., a covariance model defined by parameters in grid points.
To do so, we now focus on the PKF applied to a particular family of covariance models,
whose parameters are defined in grid points by the variance and the anisotropy fields:

This part introduces a particular family of covariance models parameterized by
the fields of variances and of the local anisotropy tensor: the VLATcov models

From now, we will focus on the forecast error statistics, so the upper script

The forecast error being unbiased,

Note that it is useful to introduce the local aspect tensor

What makes the metric tensor attractive, either at a theoretical or at a practical level,
is that it is closely related to the normalized error

Hence, using the notation introduced in Sect.

To put some flesh on the bones, two examples of VLATcov models are now presented.

We first consider the covariance model based on the heterogeneous diffusion operator of

Another example of a heterogeneous covariance model is the
heterogeneous Gaussian covariance model:

At this stage, all the pieces of the puzzle are put together to build the PKF dynamics. We have covariance models parameterized from the variance and the local anisotropy, which are both related to the error field: knowing the dynamics of the error leads to the dynamics of the VLATcov parameters. This is now detailed.

When the dynamics of the error

Following the discussion in Sect.

Note that Eq. (16) can be formulated in terms of
aspect tensors thanks to the definition in Eq. (

Hence, the PKF forecast step for a VLATcov model is given by either the system in
Eq. (16) (in metric) or by its aspect tensor formulation thanks to
Eq. (

In the following section, we present the splitting method that allows the PKF dynamics to be expressed by bringing together the dynamics of each of the physical processes, calculated individually.

When there are several processes in the dynamics in Eq. (

While the theoretical background is provided by the Lie–Trotter formula for Lie derivatives, the well-known idea of time splitting is easily taken from a first-order Taylor expansion of an Euler numerical scheme.

The computation of dynamics,

Appendix

As a consequence of the calculation of the parametric dynamics, calculating the parametric dynamics of
Eq. (

However, although the calculation of the system in Eq. (16) is straightforward,
as it is similar to the calculation of Reynolds equations

Then, once the dynamics of the parameters are established, it remains to design a numerical code to test whether the uncertainty is effectively well represented by the PKF dynamics. Again, the design of a numerical code is not necessarily difficult, but with numerous terms the risk of introducing an error is important.

To facilitate the design of the PKF dynamics and the numerical evaluation, the
package SymPKF
has been introduced to perform the VLATcov parameter dynamics and
to generate a numerical code used for the investigations

In order to introduce the symbolic computation of the PKF for the VLATcov model,
we consider an example: the diffusive nonlinear advection in
the Burgers equation, which reads

The definition of the dynamics relies on the formalism of SymPy as shown in
Fig.

Sample of code and Jupyter notebook outputs for the definition of the Burgers dynamics using SymPKF.

In this example, the dynamics consist of a single equation defined as an instance of
the class

A preprocessing of the dynamics is then performed to determine several important quantities
to handle the dynamics: the prognostic fields
(functions for which a time derivative is present), the diagnostic fields (functions
for which there is no time derivative in the dynamics), the constant functions (functions
that only depend on the spatial coordinates), and the constants (pure scalar terms that
are not a function of any coordinate). This preprocessing is performed when
transforming the dynamics as an instance of the class

The prognostic quantities being known, it is then possible to perform the computation of the PKF dynamics, as discussed now.

Thanks to the preprocessing, we are able to determine the VLATcov parameters
needed to compute the PKF dynamics, which are the variance and the anisotropy tensor
associated with the prognostic fields. For the Burgers equation, the VLATcov parameters
are the variance

As discussed in Sect.

Since the computation of the second-order fluctuation–mean interaction dynamics relies
on the expectation operator, an implementation of this expectation operator has been
introduced in SymPKF: it is defined as the class

Then, the symbolic computation of the second-order fluctuation–mean interaction dynamics
in Eq. (

Sample of code and Jupyter notebook outputs:
systems of partial differential equations given in metric and in aspect forms
produced by SymPKF when applied to the Burgers equation (Eq.

Hence, from SymPKF, for the Burgers equation, the VLATcov PKF dynamics given in the aspect tensor read as

While the Burgers equation only contains two physical processes, i.e., the
nonlinear advection and the diffusion, the resulting PKF dynamics
in Eq. (

In this example, the splitting strategy has not been considered to simplify
the computation of the PKF dynamics. However, it can be done by considering the
PKF dynamics for the advection

Illustration of the splitting strategy that can be used to compute the PKF dynamics and applied here for the Burgers equation: PKF dynamics of the Burgers equation can be obtained from the PKF dynamics of the advection (first cell) and of the diffusion (second cell).

Thanks to the symbolic computation using the expectation operator, as implemented
by the class

An important point is that terms such as

The interesting property of these terms is that they can be reworded as
spatial derivatives of terms in the form

Substitution dictionary computed in
SymPKF to replace terms such as

The term

A naïve closure for the PKF dynamics in Eq. (

For the Burgers equation, a closure for

In particular, it would be interesting to search for a generic way to design closures that leverage the symbolic computation, which could be plugged with the PKF dynamics
computed from SymPKF at a symbolic level. To do so, we propose an
empirical closure
that leverages a data-driven strategy to hybridize machine learning
with physics, as proposed by

The construction of the proposal relies on the symbolic computation shown in
Fig.

Example of a symbolic computation leading to a proposal for the closure of the unknown terms of order

The first step is to consider an analytical approximation for the correlation function.
For the illustration, we consider the local correlation function
to be well approximated by the quasi-Gaussian function

Then, the identification with the Taylor's expansion in Eq. (

Whatever closure has been obtained in an analytical or an empirical way, it remains to compute the closed PKF dynamics to assess their performance. To do so a numerical implementation of the system of partial differential equations has to be introduced. As for the computation of the PKF dynamics, the design of a numerical code can be tedious, with a risk of introducing errors in the implementation due to the numerous terms occurring in the PKF dynamics. To facilitate the research on the PKF, SymPKF comes with a Python numerical code generator, which provides an end-to-end investigation of the PKF dynamics. This code generator is now detailed.

Introduction of a closure and automatic generation of a numerical code in SymPKF.

While compiled language with appropriate optimization should be important for
industrial applications, we chose to implement a pure Python code generator,
which offers a simple research framework for exploring the design of PKF
dynamics.
It would have been possible to use a code generator already based
on SymPy (see, e.g.,

The finite difference takes the form of an operator

For instance, Fig.

The numerical integration is handled through the inheritance
mechanism: the class

UML diagram showing the inheritance mechanism implemented in
SymPKF: the class

Thanks to the end-to-end framework proposed in SymPKF, it is possible to perform
a numerical simulation based on the PKF dynamics in Eq. (

Illustration of a numerical simulation of the PKF dynamics in
Eq. (

In order to show the skill of the PKF applied to the
Burgers equation, when using the closure of P18, an ensemble validation is now performed.
Note that the code generator of SymPKF can be used for arbitrary dynamics, e.g.,
the Burgers equation itself. Hence, a numerical code solving the Burgers equation is
rendered from its symbolic definition. Then an ensemble of

While this example shows an illustration of SymPKF in a 1D domain, the package also applies in 2D and 3D domains, as presented now.

In order to illustrate the ability of SymPKF to apply in a 2D or 3D domain, we
consider the linear advection of a scalar field

The calculation of the parametric dynamics is handled by the class

Sample of code and Jupyter notebook outputs:
system of partial differential equations produced by SymPKF
when applied to the linear advection in Eq. (

Output of the computation by SymPKF of the PKF dynamics for
the simple multivariate periodic chemical reaction, corresponding to the
right-hand side of Eq. (

We do not introduce any numerical simulation of the PKF dynamics in Eq. (

This example illustrates a 2D situation and shows the multidimensional capabilities of SymPKF.
Similarly to the simulation conducted for the Burgers equation, it is possible to
automatically generate a numerical code able to perform numerical simulations of the dynamics
in Eq. (

Before concluding, we would like to present a preliminary application of SymPKF in a multivariate situation.

SymPKF can be used to compute the prediction of the variance and the anisotropy in a multivariate situation.

Note that one of the difficulties with the multivariate situation is that the number of equations increases linearly with the number of fields and the dimension of the domain; e.g., for a 1D (2D) domain and two multivariate physical fields, there are two ensemble-averaged fields, two variance fields, and two (six) metric fields. Of course this is no not a problem when using a computer algebra system as done in SymPKF.

To illustrate the multivariate situation, only a very simple example is introduced.
Inspired from chemical transport models encountered in air quality, we
consider the transport over a 1D domain of two chemical species, whose concentrations
are denoted by

To go further, some research is still needed to explore the
dynamics and the modeling of the multivariate cross-covariances.
A possible
direction is to take advantage of the multivariate covariance model based on
the balance operator as often introduced in variational data assimilation

To conclude, this example shows the potential of SymPKF to tackle the multivariate situation. Moreover, the example also shows that SymPKF is able to perform the PKF computation for a system of partial differential equations. However, all the equations should be prognostic; SymPKF is not able to handle diagnostic equations.

This contribution introduced the package SymPKF that can be used to conduct the research on the parametric Kalman filter prediction step for covariance models parameterized by the variance and the anisotropy (VLATcov models). SymPKF provides an end-to-end framework: from the equations of dynamics to the development of a numerical code.

The package has been first introduced by considering the nonlinear diffusive advection dynamics in the Burgers equation. In particular, this example shows the ability of SymPKF to handle abstract terms, e.g., the unclosed terms formulated with the expectation operator. The expectation operator implemented in SymPKF is a key tool for the computation of the PKF dynamics. Moreover, we showed how to handle a closure and how to automatically render numerical codes.

For univariate situations, SymPKF applies in a 1D domain as well as in 2D and 3D domains. This has been shown by considering the computation of the PKF dynamics for the linear advection equation on a 2D domain.

A preliminary illustration with multivariate dynamics showed the potential of SymPKF to handle the
dynamics of multivariate covariance. But this point has to be further
investigated, and this constitutes the main perspective of development.
Moreover, to perform a multivariate assimilation cycle with the PKF, the
multivariate formulation of the PKF analysis state is needed. A first investigation
of the multivariate PKF assimilation has been proposed by

In its present implementation, SymPKF is limited to computation with prognostic equations. It is not possible to consider dynamics based on diagnostic equations, while these are often encountered in atmospheric fluid dynamics, e.g., the geostrophic balance. This constitutes another topic of research development for the PKF, facilitated by the use of symbolic exploration.

Note that the expectation operator as introduced here can be used to compute Reynolds equations encountered in turbulence. This opens new perspectives for the use of SymPKF for other applications that could be interesting, especially for automatic code generation.

In this section we show that using a splitting strategy is possible for the design of
the parametric dynamics. For this, it is enough to show that given dynamics written
as

Due to the linearity of the derivative operator, the TL dynamics resulting from Eq. (

To conclude, the computation of the parametric dynamics for Eq. (

In this section we proof the property.

Now Property 1 can be proven considering the following recurrent process,
when assuming that the property is true for all patterns of degree strictly lower than
the degree

Without loss of generality we assume

The SymPKF package is free and open-source. It is distributed under the CeCILL-B free software license.
The source code is provided through a GitHub repository at

OP introduced the symbolic computation of the PKF dynamics, and OP and PA imagined an end-to-end framework for the design of the PKF dynamics from the equation of the dynamics to the numerical simulation thanks to an automatic code generation. OP developed the codes.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank Sylwester Arabas and the two
anonymous referees for their fruitful comments, which have
contributed to improving the paper.
The UML class diagram has been generated from UMLlet

This research has been supported by the French national program LEFE/INSU (Étude du filtre de KAlman PAramétrique, KAPA).

This paper was edited by Sylwester Arabas and reviewed by two anonymous referees.