The pygeodyn package is a sequential geomagnetic data assimilation tool written in Python. It gives access to the core surface dynamics, controlled by geomagnetic observations, by means of a stochastic model anchored to geodynamo simulation statistics. The pygeodyn package aims to give access to a user-friendly and flexible data assimilation algorithm. It is designed to be tunable by the community by different means, including the following: the possibility to use embedded data and priors or to supply custom ones; tunable parameters through configuration files; and adapted documentation for several user profiles. In addition, output files are directly supported by the package webgeodyn that provides a set of visualization tools to explore the results of computations.

The magnetic field of the Earth is generated by the motion of liquid metal in the outer core, a process called the “geodynamo”.
To tackle this complex problem, direct numerical simulations (DNSs) have been developed to model the coupling between the primitive equations for heat, momentum and induction in a rotating spherical shell.
With the development of computing power, DNSs capture more and more of the physics thought to be at work in the Earth's core

Many efforts have been devoted to the improvement of observable geodynamo quantities: the magnetic field above the surface of the Earth and its rate of change with respect to time, the so-called secular variation (SV). The launch of low-orbiting satellite missions (Ørsted, CHAMP, Swarm) dedicated to magnetic field measurements indeed presented a huge leap on the quality and coverage of measured data

In this context, the development of geomagnetic data assimilation (DA) algorithms is timely. DA consists of the estimation of a model state trajectory using (i) a numerical model that advects the state in time and (ii) measurements used to correct its trajectory.
DA algorithms can be split into two main families:

variational methods that imply a minimization of a cost function based on a least-squares approach (in a time-dependent problem, one ends up tuning the initial condition by means of adjoint equations); and

statistical methods, which are applications of Bayes' rule to obtain the most probable state model given observations and their uncertainties (it comes down to estimating a best linear unbiased estimate (BLUE) under Gaussian assumptions for the prior uncertainties and model distributions).

In this article, we present a Python package called pygeodyn devoted to geomagnetic DA based on a statistical method, namely an augmented-state Kalman filter

The aim of pygeodyn is to provide the community with a tool that can easily be used or built upon. It is made to ease the updating of data and the integration of new numerical models, for instance to test them against geophysical data. This way, it can be compared with other existing DA algorithms

The paper is organized as follows:
Sect.

In order to support the use of DA in geomagnetism, the package is designed with the following characteristics in mind.

It is written in Python 3, now a widespread language in the scientific community thanks to the NumPy and SciPy suites.

The installation procedure only requires Python with NumPy and a Fortran compiler (other dependencies are installed during the setup).

An online documentation describes how to install, use and build upon the package.

Algorithm parameters can be tuned through configuration files and command line arguments.

Algorithms are designed to be modular in order to allow for the independent use of their composing steps.

Extension of the features is eased by readable open-source code (following PEP8) that is documented inline and online with Sphinx.

The source code is versioned on a Git repository, with tracking of bugs and development versions with clear release notes.

Unitary and functional tests are routinely launched by continuous integration pipelines. Most of the tests use the Hypothesis library (

Logging of algorithm operations is done by default with the

Direct integration of parallelization is possible using the MPI standard (message-passing interface).

Lengthy computations (such as Legendre polynomial evaluations) are performed by Fortran routines wrapped in Python.

Output files are generated in HDF5 binary file format that is highly versatile (close to NumPy syntax) and more time and size efficient.

The output format is directly supported by the visualization package webgeodyn for efficient exploration of the computed data (see Sect.

The package was designed for several user types.

The user will use the supplied DA algorithms with the supplied data. The algorithms can be modified through the configuration files, so this requires almost no programming skill.

The user will use the supplied DA algorithms but wants to run their own data. In this case, the user needs to follow the documentation to implement the reading methods of the data (

The user wants to design their own algorithm using the low-level functions implemented in the package. The how-to is also documented, but it requires some experience in Python programming and object-type structures.

The documentation (available online at

DA algorithms are to be supplied in the form of subpackages for pygeodyn. The intention is to have interchangeable algorithms and be able to easily expand the existing algorithms. In the version described in this article, we provide a subpackage

The model states are stored as a subclass of NumPy array called

The sequential DA algorithm is composed of two kinds of operations.

are performed every

are performed every

From a technical point of view, algorithm steps take

The forecast step consists of time stepping

Drift matrices are estimated with different manners depending on the characteristics of the considered geodynamo priors.
In the case in which the geodynamo series do not allow for the derivation of meaningful temporal statistics

In the case in which geophysically meaningful temporal statistics can be extracted from geodynamo samples, time cross-covariance matrices,

The first step of the forecast is to compute

The analysis step takes as input the ensemble of forecast core states

First, a BLUE of an ensemble of realizations of

Second, a BLUE of an ensemble of realizations of the augmented-state

Computations can be launched by running

Command line arguments of

The pygeodyn configuration file sets the values of quantities used in the algorithm (called parameters). This configuration file is a text file containing three columns: one for the parameter name, one for the type and one for the parameter value. We refer to Table

The first is the number of coefficients to consider for the core state quantities and the Legendre polynomials that are used to evaluate the Gaunt–Elsasser integrals that enter

The second is time spans: starting time

More precisely, 1 January 1980.

).The third is parameters of the AR-1 processes used in the forecasts;

The fourth is parameters for using a principal component analysis (PCA) for the core flow.
By setting

The fifth is the initial conditions of the algorithm.
By setting

The sixth is parameters describing the types of input data (priors and observations) that are presented in more detail in the next section.

Parameters available in a pygeodyn configuration file.

Priors are composed of a series of snapshot core states that are used to estimate the background states and the cross-covariance matrices.
The mandatory priors are those for the magnetic field

The aforementioned snapshots currently come from geodynamo simulations, meaning that the covariance matrices for

Observations are measurements of the magnetic field and of the SV at a set of dates.
These observations are used in the analysis step to perform the BLUE of the core state (see Sect.

In the code, observation data are to be supplied with the observation operator and errors in the form of an

For advanced users, pygeodyn provides the possibility to define custom prior and observation types by supplying new data-reading methods in the dedicated pygeodyn modules. Defining a custom prior type allows for the use of custom geodynamo simulation data to compute covariance matrices that will be used in the forecasts and analyses steps. Similarly, a new observation type can be supplied with custom observation data that will be used to estimate the core state in the analysis step.

In other words, an advanced user can completely control the input data of the algorithm to test new magnetic observations and/or new numerical models and derive new predictions from them.

To reduce computation time, supplied algorithms use MPI to perform forecasts (Sect.

The results are displayed in Fig.

Evolution of the runtime with respect to the number of MPI processes (see text for details). Dots are the observed runtimes (in hours) and the dashed line is a fit of the form

The pygeodyn output files are directly supported by the web-based visualization package webgeodyn, also developed in our group. The source code of webgeodyn is hosted at its own Git repository (

The webgeodyn package implements a web application with several modes of representation to explore, display and diagnose the products of the reanalyses

Quantities at a given time can be displayed at the core surface in the

Example map for the magnetic field and the flow at the core surface, obtained using webgeodyn, for the model calculated by

In the

Time series for the SV spherical harmonic coefficient

Computed data can be easily compared with the geomagnetic observations used for the analysis in the

Time series of the three components of the SV (in spherical coordinates) at the Kourou observatory (French Guyana, location in dark red on the globe on the left) using webgeodyn.
The green line (Barrois_VO_2018e) is from the core surface reanalysis by

On top of the three examples illustrated above, the package webgeodyn also gives the possibility to display and export Lowes–Mauersberger spatial spectra or cross sections at the core surface as a function of time and longitude (latitude) for a given latitude (longitude).

We presented the Python toolkit pygeodyn that allows users to

calculate models of the flow at the core surface from SV Gauss coefficient data,

calculate models of the flow and the magnetic field at the core's surface from measurements of the magnetic field and its SV above the Earth's surface, and

represent and analyze the results via the web interface webgeodyn.

The underlying algorithm relies on AR-1 stochastic processes to advect the model in time. It is anchored to statistics (in space and optionally in time) from free runs of geodynamo models. It furthermore accounts for errors of representativeness due to the finite resolution of the magnetic and velocity fields.

This Python tool has been designed with several purposes in mind, among which are

tests of the Earth-likeness of geodynamo models,

comparisons with alternative geomagnetic DA algorithms,

the production of magnetic models under some constraints from the core dynamics, and

the education of students on issues linked to core dynamics and the geomagnetic inverse problem.

Version 1.1.0 of pygeodyn is archived on Zenodo at

The scientific development of the presented package was done by NG and LH. The technical development (including tests and packaging) was done by LH and FT. All authors contributed to the writing of the paper.

The authors declare that they have no conflict of interest.

We thank Julien Aubert for supplying the geodynamo series used as priors and Christopher Finlay and Magnus Hammer for providing the

This work is partially supported by the French Centre National d'Etudes Spatiales (CNES) in the context of the Swarm mission of the ESA.

This paper was edited by Josef Koller and reviewed by Ciaran Beggan and one anonymous referee.