This paper presents the first known application of multi-model ensembles to the forecasting of the thermosphere. A multi-model ensemble (MME) is a method for combining different, independent models. The main advantage of using an MME is to reduce the effect of model errors and bias, since it is expected that the model errors will, at least partly, cancel. The MME, with its reduced uncertainties, can then be used as the initial conditions in a physics-based thermosphere model for forecasting. This should increase the forecast skill since a reduction in the errors of the initial conditions of a model generally increases model skill. In this paper the Thermosphere–Ionosphere Electrodynamic General Circulation Model (TIE-GCM), the US Naval Research Laboratory Mass Spectrometer and Incoherent Scatter radar Exosphere 2000 (NRLMSISE-00), and Global Ionosphere–Thermosphere Model (GITM) have been used to construct the MME. As well as comparisons between the MMEs and the “standard” runs of the model, the MME densities have been propagated forward in time using the TIE-GCM. It is shown that thermospheric forecasts of up to 6 h, using the MME, have a reduction in the root mean square error of greater than 60 %. The paper also highlights differences in model performance between times of solar minimum and maximum.

NASA predicts that, by 2030, orbital collisions could become frequent enough
to cause a cascade

One way to decrease the errors in satellite orbit forecasts is to reduce
errors in thermospheric density forecasting. It has been previously suggested
that ensemble modelling could improve space weather forecasts

The idea of improving model forecasts by combining two or more independent
models is based on a short note by

An MME relies on the idea that model forecasting can be improved by combining
independent models

It is clear that an MME cannot give a result better than the best individual
model in all circumstances. For a hypothetically perfect model of a system
forming an MME will always add worse information. However in reality such
perfect models do not exist and a successful MME should use independent,
skilful models. It is important to use independent models since models with
similar error characteristics can find such characteristics amplified in the
MME. It is impossible for the MME to be worse than all of the individual
models

Although an MME may reduce the reported thermospheric density errors; it
cannot alone forecast densities and thus cannot be directly used to improve
satellite orbit forecasts. Errors in the forecasts given by thermospheric
models are due to approximations in the modelled physics and uncertainties in
the initial and boundary conditions.

One can construct an MME using a variety of different approaches, but they fall into two main categories, equal and unequal weightings.

There are a number of difficulties in constructing an MME. These include how
the models should be combined and the fact that different models do not all
share common output variables. A further problem is that there may not be
observational data for each parameter, making it difficult to assess model
performance for all parameters. One way to resolve the latter problem is to
not take model performance into account and use an equally weighted average.
Such a simple method for MME generation has been shown to increase model
skill in climate studies. For example,

Alternatively, the MME can use different weights for each model. There are
different approaches for estimating the weights to be applied to individual
models. These include a least-squares minimization of differences between the
model and observations

In the absence of existing MME work in the thermospheric literature, a sample
mean square error (MSE) has been used in this work:

For this study three atmospheric density models have been used: NRLMSISE-00,
Thermosphere–Ionosphere Electrodynamic General Circulation Model (TIE-GCM), and Global Ionosphere–Thermosphere Model (GITM). NRLMSISE-00 is an empirical density model, whereas GITM and
TIE-GCM are physics-based models. The models are driven using standard
geophysical indices: i.e. F10.7, which is the solar flux at a wavelength of
10.7 cm at the Earth's orbit and is used as a proxy for solar output, and
Kp
or Ap, which indicate the severity of the magnetic disturbances in near-Earth
space. Physics models of the ionosphere–thermosphere often suffer from
biases. These can usually be attributed to the uncertainties in the model
parameters, which have a large impact on the final results

The NRLMSISE-00, is a global, empirical model of the
atmosphere. It uses the 81-day average of F10.7, the daily F10.7 solar flux
value of the previous day, and 3-hourly Ap to model the density and
temperature of atmospheric components

The model outputs number densities of helium, atomic oxygen, molecular
oxygen, atomic nitrogen, molecular nitrogen, hydrogen, and argon, as well as
total mass density and the temperature at a given altitude. NRLMSISE-00 has
been shown to offer a noticeable improvement over MSISE-90

Test scenario descriptions. The CHAMP average altitudes and average F10.7 values are taken from across the 5-day test scenarios.

The National Center for Atmospheric Research (NCAR) TIE-GCM is a three-dimensional
model of the coupled thermosphere ionosphere system

The model takes as input the daily F10.7, the 81-day F10.7 average and the
Ap. It uses either the Weimer or Heelis models for the ionospheric electric
fields at high latitudes

GITM is a physics-based
three-dimensional global model that solves the full Navier–Stokes equations
for density, velocity, and temperature for a number of neutral and ion
species

To solve the continuity, energy and momentum equations, GITM uses an
advection solver, whilst the ion momentum equation is solved assuming a
steady state

The performance of each model is compared against the atmospheric density
fields derived from the CHAMP satellite

Three separate test scenarios were used during this study
(Table

Ap, F10.7, and DST index values for the three test scenarios. The spikes in Ap for the 2009 and 2001 test scenario seem to be due to a geomagnetic storms.

To compare NRLMSISE-00, TIE-GCM, and GITM with CHAMP, the output of each model
was spatially mapped to the CHAMP position using tri-linear interpolation. The
model files were output every 30 min and the CHAMP observation closest to
the model time was used. Figure

Modified Taylor diagram

The NRLMSISE-00 empirical model results, as expected, show a reasonable mean
approximation to the observed state, with the least bias of the tested
models. However, the model shows a larger variability in its output than the
CHAMP observations. GITM shows a negative bias with a very small standard
deviation compared to the observations (Fig.

CHAMP, GITM, TIE-GCM, and NRLMSISE-00 reported neutral densities for
the first test scenario (2009; solar minimum). The fast oscillations are due
to CHAMPs orbit (

The results from the second test scenario (2008; solar minimum) are similar to
the first (Fig.

CHAMP, GITM, TIE-GCM, and NRLMSISE-00 reported neutral densities for the second test scenario (2008; solar minimum).

CHAMP, GITM, TIE-GCM, and NRLMSISE-00 reported neutral densities for the third test scenario (2001; solar maximum).

Finally, the third test scenario (2001; solar maximum) has results that are
considerably different to the other two test scenarios. The reported neutral
densities compared to the CHAMP observations can be seen in
Fig.

Neutral density values of the three MMEs for the first test scenario, equally weighted, quiet-time weighted, and all-times weighted.

The results from these test scenarios show that the models suffer from errors and biases, and are unable to exactly match the observed density field from CHAMP. In order to provide better forecasting abilities, MMEs can be used to combine the model output to minimize the impact of model errors and bias.

As described in Sect.

Modified Taylor diagram for the three MMEs: equal, quiet-time
weighted, and all-time weighted as well as GITM, TIE-GCM, and NRLMSISE-00
(MSIS) compared to the CHAMP observations for the first test scenario.
Details of how to read the diagram are described in
Fig.

Model skill and associated weighting (calculated by the inverse of
model skill, Eq.

The model skills and weighting of each model, for each test scenario, are
given in Table

For the first test scenario a further weighting scheme was used whereby
before calculating the MSE the model time series were restricted to times of
low geomagnetic activity.

Figure

Neutral density values of the two MMEs for the second test scenario, equally weighted and all-times weighted.

Modified Taylor diagram for the two MMEs: equal and all-time
weighted as well as GITM, TIE-GCM, and NRLMSISE-00 (MSIS) compared to the
CHAMP observations for the second test scenario. Details of how to read the
diagram are described in Fig.

The time series and modified Taylor diagram for the second test scenario are
shown in Figs.

Neutral density values of the two MMEs for the third test scenario, equally weighted and all-times weighted.

NRLMSISE-00, TIE-GCM, and GITM model outputs. mmr is the mass mixing ratio.

Modified Taylor diagram for the two MMEs: equal and all-time
weighted as well as GITM, TIE-GCM, and NRLMSISE-00 (MSIS) compared to the
CHAMP observations for the third test scenario. Details of how to read the
diagram are described in Fig.

It has been shown, in these test scenarios, that combining model results
leads to increased skill at matching the CHAMP-derived data. In the following
section this reduced uncertainty in atmospheric densities is used to provide
the initial conditions of a forecast run of a model. Such an approach has
been previously shown to increase climate model forecast skill

The objective is to use the MME, with its reduced uncertainties, as the
initial conditions for TIE-GCM. With the better initial conditions, it is
expected that the forecast skill of TIE-GCM will be increased. In order to
use an MME as the initial conditions for a physics-based model (i.e. TIE-GCM)
more than just the combined neutral density is required. The MME of each
density required by TIE-GCM (Table

To combine densities, temperatures, and velocities from multiple models, the data must be interpolated to common latitude, longitude, and altitude grids. Therefore, NRLMSISE-00 and GITM grids were tri-linearly interpolated to the TIE-GCM grid. The grids were then combined to form an MME. Since TIE-GCM uses pressure levels instead of altitude grids the MME values needed to be mapped back onto pressure levels. TIE-GCM provides a mapping between the pressure levels and geometric height for a given time step. This mapping was used in reverse to morph the altitude grids to TIE-GCM readable pressure levels.

For the new TIE-GCM run, the model was restarted using the MME state-vector
as the initial condition. TIE-GCM was then run for 6 h with the model
output recorded every 30 min. After the 6 h period, TIE-GCM was again
restarted using the MME grid for the next 6 h period. For the forecast
run, the model only used the values of Kp and F10.7 corresponding to the
initial conditions; i.e. they were not updated at each time step, but they
were updated every 6 h. This was so that a true forecast could be simulated.
The equally weighted MME uses no prior information so can be treated as a true
forecast. However, it should be noted that when using the weighted MME, a true
forecast is not obtained since the weighted MME is generated using the
information from the CHAMP observations. Figure

Flow chart of the procedure for running TIE-GCM using the MME as its initial conditions for a 6 h forecast.

Procedure for finding the TIE-GCM forecast using an MME as its
initial conditions. The “run TIE-GCM MME forecast” process refers to the
procedure described in Fig.

Using the MME densities to initialize a run of TIE-GCM will alter the outputs
of the model. It was expected that over time the two versions would converge.
However, this does not seem to happen over the 6 h window used here. This
is likely due to the fact that the model biases have a longer timescale than
6 h. Figure

Differences between the standard TIE-GCM model run and TIE-GCM ran using the MME as its initial conditions (at time 0). It can be seen that it takes over 70 h for the models to start to converge again.

Modified Taylor diagram for NRLMSISE-00 (MSIS), TIE-GCM, GITM, and
for TIE-GCM using the MMEs (equal, quiet-time weighted, and all-time weighted)
for its initial conditions every 6 h, compared with the CHAMP
observations. Details of how to read the diagram are described in
Fig.

Top panel shows the neutral density from the CHAMP observations and the original TIE-GCM run. The subsequent panels then show the CHAMP observations with each of the new TIE-GCM outputs using the MMEs as the initial conditions every 6 h.

Figure

Using the MME densities as the starting densities for TIE-GCM provides a
clear improvement compared to the original run of TIE-GCM. The reported
densities show very low bias and have variability close to the observations.
In particular, the post-storm period is modelled very accurately in all but
the quiet-time-weighted MME. The average MME and all-times weighted initial
conditions for TIE-GCM improves upon the original TIE-GCM correlation. Each
of the TIE-GCM MME runs significantly improved the bias and all but the
quiet times improved the standard deviation of the model. The new TIE-GCM run
(using the average MME) offers an improvement in all tested parameters
compared to the neutral density MME calculated after the models were run
(Fig.

None of the contributing models, nor the MMEs, model the peak of the storm
period (

The RMSE of the original TIE-GCM run and running the model with the MME as the initial conditions. The 95 % confidence intervals are also reported.

The RMSE for each TIE-GCM MME run as well as the original TIE-GCM run
compared to the CHAMP observations are shown in
Table

Figures

Modified Taylor diagram for NRLMSISE-00 (MSIS), TIE-GCM, GITM, and
for TIE-GCM using the MMEs (equal and all-time weighted) for its initial
conditions every 6 h, compared with the CHAMP observations. Details of
how to read the diagram are described in Fig.

Top panel shows the neutral density from the CHAMP observations and the original TIE-GCM run. The subsequent panels then show the CHAMP observations with each of the new TIE-GCM outputs using the MMEs as the initial conditions every 6 h.

Modified Taylor diagram for NRLMSISE-00 (MSIS), TIE-GCM, GITM, and
for TIE-GCM using the MMEs (equal and all-time weighted) for its initial
conditions every 6 h, compared with the CHAMP observations. N.B. The
two markers for the MMEs overlap each other. Details of how to read the
diagram are described in Fig.

Top panel shows the neutral density from the CHAMP observations and the original TIE-GCM run. The subsequent panels then show the CHAMP observations with each of the new TIE-GCM outputs using the MMEs as the initial conditions every 6 h.

Finally Figs.

It has been shown that the use of the MME as the initial conditions in
TIE-GCM improve the models forecast skill considerably during solar minimum.
The RMSE is reduced by approximately 60 % (

The work presented in this study shows the possibility of using multi-model
ensembles (MMEs) to enhance the forecast skill of thermospheric models. Three
models were used: an empirical model (NRLMSISE-00) and two physics-based
models (TIE-GCM and GITM). The models' output density has been compared
to derived density fields from CHAMP, where the models vary in
performance compared to the observations depending on the test scenario. To
improve the density estimation, an MME averaging technique has been applied
and tested. Two approaches for the MME were used, a simple average MME where
all models have the same weight, and a weighted MME, where each model is
weighted according to its skill. Three different test scenarios have been
used, two during solar minimum and one during solar maximum. The results show
a significant improvement in both solar minimum cases. The MME was then used
to initialize one of the physics-based models (TIE-GCM) to try and improve
its forecast skill. During solar minimum test scenarios using the MME to
initialize TIE-GCM shows a reduction in RMSE in neutral density of

The results of this study show that the physics models suffer from large
biases, as was discussed in Sect.

Figure

A number of improvements could be implemented in generating the MME. First,
a separate “training” data set should be used to generate the model weights
to make a fairer test. A weighting scheme that varies based on longitude,
latitude, height, and time, could also be implemented, as in

The CHAMP
data were collected from

This research was, in part, conducted as
part of the Integrated Modelling of Perturbations in Atmospheres
for Conjunction Tracking (IMPACT) project at Los Alamos National Laboratory. More information is available at