As an alternative to using the standard multi-model ensemble (MME) approach to combine the output of different models to improve prediction skill, models can also be combined dynamically to form a so-called supermodel. The supermodel approach enables a quicker correction of the model errors. In this study we connect different versions of SPEEDO, a global atmosphere-ocean-land model of intermediate complexity, into a supermodel. We focus on a weighted supermodel, in which the supermodel state is a weighted superposition of different imperfect model states. The estimation, “the training”, of the optimal weights of this combination is a critical aspect in the construction of a supermodel. In our previous works two algorithms were developed: (i) cross pollination in time (CPT)-based technique and (ii) a synchronization-based learning rule (synch rule). Those algorithms have so far been applied under the assumption of complete and noise-free observations. Here we go beyond and consider the more realistic case of noisy data that do not cover the full system's state and are not taken at each model's computational time step. We revise the training methods to cope with this observational scenario, while still being able to estimate accurate weights. In the synch rule an additional term is introduced to maintain physical balances, while in CPT nudging terms are added to let the models stay closer to the observations during training. Furthermore, we propose a novel formulation of the CPT method allowing the weights to be negative. This makes it possible for CPT to deal with cases in which the individual model biases have the same sign, a situation that hampers constructing a skillfully weighted supermodel based on positive weights. With these developments, both CPT and the synch rule have been made suitable to train a supermodel consisting of state of the art weather and climate models.

Climate models are continuously improving over time. This is made evident by the succession of the Coupled Model Intercomparison Project (CMIP), which is currently in its sixth stage

Given a set of imperfect models, one can combine them so that their combination has a greater forecast skill than each individual model independently. A common approach is to use the multi-model ensemble (MME)

Along this line, in the supermodel approach models are combined during the simulation by sharing their own tendencies or states with each other, and not just their outputs as with the MME. This amounts to creating a new virtual model, the supermodel, that can potentially have better physical behavior than the individual models. By combining the models dynamically into a supermodel, model errors can be reduced at an earlier stage, potentially mitigating error propagation and correcting the dynamics. This is particularly helpful since the climate system is not linear, which causes initial errors to spread over different variables and regions. The simulated climate statistics of the supermodel are therefore expected to be superior to that from the combination of biased models. The supermodel not only improves the statistics of simulated climate as in the MME, it can also give an improved model trajectory if the models are adequately synchronized. This could be essential in order to predict a specific sequence of weather or climate events. Given that the individual model trajectories in a MME are “free” to evolve according to each of the model dynamics, their averaging may result in an overall cancellation of the individual variabilities.

The supermodel approach was originally developed using low-dimensional dynamical systems

The SPEEDO experiments in

The paper is structured as follows. Section

This section recalls the general structure of a weighted supermodel as defined in

In

This leads us to redefine a weighted supermodel by combining individual models at every arbitrary

The coupled model SPEEDO consists of an atmospheric component (SPEEDY), that exchanges information with a land (LBM) and an ocean-sea-ice component (CLIO). Detailed descriptions of SPEEDO can be found in

The SPEEDO equations can be formally and compactly written as

A supermodel based on SPEEDO is formed by combining imperfect atmosphere components SPEEDY through a weighted superposition of the states of the imperfect models. All imperfect atmospheres are each coupled to the same ocean and land model. Figure

Schematic representation of the SPEEDO climate supermodel based on two imperfect atmospheric models. The two atmospheric models exchange water, heat and momentum with the perfect ocean and land model. The ocean and land model send their state information to both atmospheric models. The atmospheric models exchange state information in order to combine their states

All the atmospheric components of the individual imperfect models receive the same state information from ocean and land. Nevertheless, each atmosphere calculates its own water, heat and momentum exchange. Conversely, the ocean and land components receive the multi-model weighted average of the atmospheric states. This supermodel construction is inspired by the
interactive ensemble approach originally devised by

We can now write the SPEEDO weighted supermodel equations as

During the training for the supermodel based on SPEEDO, we regard the atmospheric model with standard parameter values as truth

Schematic representation of the SPEEDO supermodel training

We follow a similar experimental setup as in the precursor study by

Parameter values of perfect and imperfect models.

The impact of perturbing parameters on the models' climate (i.e., their long term behavior) is assessed on the basis of 40-year long simulations initiated on 1 January 2001. Table

Global mean average difference between the imperfect models and the perfect model, calculated over the last 30 years of the simulation.

The synch rule was originally conceived for parameter optimization in

We have extended the use of the synch rule to the training of supermodels

The sensitivity of the training results to the nudging strength

The CPT learning approach is based on an idea proposed by

The training phase of CPT starts from an observation. From the same initial state, the imperfect models run for a predefined cross pollination time,

After the training, a CPT trajectory is obtained as a combination of different imperfect models, and we count how often each model has produced the best prediction of a particular component of the state vector during the training. These frequencies are then used to compute weights

The CPT training method has been derived from a linear model assumption. Suppose we have two imperfect models with differential equations:

Weather and climate models are chaotic instead of linear. The key to success is, however, not the dynamical nature of the models, i.e., whether they are linear or nonlinear, but the trade-off between the data sampling time and the regime of evolution of the differences among the individual model trajectories in between subsequent data times. If enough observations are available during training, the difference between the imperfect models between subsequent observation times can be described as quasi-linear, therefore still making it possible for the CPT training to work well. The obtained weights will not be perfect and possibly not as optimal as weights obtained with a cost function minimization approach. On the other hand, the results in

In

In

For long training periods and/or noisy data, an iterative method might not be enough to let the CPT trajectory adequately follow the observations during training. A simple solution is to use a form of nudging towards the observations, similar to what is done in the synch rule. The equations for the CPT trajectory

Before we start training the supermodel, we need to decide when, where and how to let the models exchange their information in order to create a weighted supermodel.

Following

As long as there are enough observations to capture the global behavior of the different models, spatially sparse observations are not expected to be an issue when constructing a weighted supermodel. Given that we focus here on the data sparsity in time, in the experiments we assume that all grid points are observed.

The prognostic variables exchanged between models are temperature, vorticity and flow divergence. The weights for the fluxes from atmosphere to ocean and to land are given by the average of the weights for the three prognostic variables. The SPEEDO time step during training is set to

Following

The codes for both training methods of CPT and the synch rule in the experiments in this paper are integrated into the SPEEDO code. After the individual models have made their individual time steps, their states are exchanged between the models with coupling routines. Once all models have shared their knowledge, they can calculate the new supermodel state and the update of the weight according to the training method. The SPEEDO CPT and synch rule supermodel training code is available in

In

We adapt the synch rule such that the weights are imposed to sum to 1. This is achieved by using the tendency of the individual imperfect model

Too little nudging towards the observations during training may lead to large errors between the imperfect models and the observations. In this case, the updates of the weights might go in a different direction than anticipated. The imperfect models and the observations might be in different phases, resulting in a converse sign of the synchronization error

In the first experiment of

Weights for the supermodel trained by the synch rule with an observation available at every second time step and the same amount of nudging towards the observations as in

When the weights converge towards stable values as in Table

There can also be too much nudging towards observations. In this case, a link with data assimilation can be made, where one has to find a middle ground between noisy observations and the model. Too much nudging towards the observations during training can result again in a converse sign of the synchronization error

In this section we assess to what extent observations can be noisy and sparse in time before the CPT or the synch rule training methods are no longer able to produce weights close to the optimum. To systematically evaluate this, we choose 4 different observation frequencies

Weights for the supermodel trained by CPT and the synch rule. The standard deviation over the year (CPT) or the standard deviation over the last 10 weeks of training (synch rule) is given in parentheses. The weights are only given for model 1, for model 2 the weight equals 1

Figure

Weights for the supermodel trained by CPT and the synch rule. The horizontal lines (continuous for model 1, dashed for model 2) indicate the weights obtained by CPT and synch rule training in

From

Forecast quality as measured by the RMSE of the truth and a model with a perturbed initial condition. The control is the difference between the perfect model and the perfect model with a perturbed initial condition. The pink and orange lines show the supermodels trained by perfect observations (s-CPT/synch perf obs), and the supermodels trained by observations available every 24 h (s-CPT/synch noisy obs),respectively, with the highest noise level used in this paper.

Imperfect models 1 and 2 complement each other in important physical variables such as temperature and wind. Model 1 tends to overestimate their global average values, while model 2 underestimates them. Together they form a convex hull

CPT training does not automatically produce negative weights, since the weights are based on the frequency by which the imperfect models are chosen. Nevertheless, CPT training can give negative weights too, although with boundary restrictions. In the standard CPT training, one chooses whether one of the imperfect models is the closest to the observations, or in addition, whether the supermodel is closest in the iterative method. To obtain negative weights one can also choose a predefined combination of the imperfect models, for example:

In this experiment we choose

Weights for the supermodel trained by CPT allowing for negative weights. The standard deviation over the year is given in parentheses.

The statistics of a 40-year supermodel run with the weights from Table

Global mean average difference between the supermodel with negative weights and the perfect model, calculated over the last 30 years of the simulation.

We have shown the potential of the CPT and synch rule training methods to train a weighted supermodel on the basis of noisy and sparse time observations. The CPT training method is based on “crossing” different model trajectories and thus generating a larger ensemble of possible trajectories. The synch rule adapts the weights to the individual models on the fly during the training, such that the supermodel synchronizes with the observations. In our previous work

To handle noisy and sparse time data, we use nudging in both methods: this choice proved to be pivotal to ensure correct updates of the weights. For the synch rule the nudging strength was increased while for CPT the nudging term was not present in the original formulation and has been introduced here.

For the synch rule it is necessary that the sum of the weights remains equal to 1 in order to maintain physical balances. In the noise-free framework of

The CPT and the synch rule both update the weights based on the difference between the model trajectories and the observations, and on the difference between the imperfect model tendencies. Despite using similar ingredients, CPT and the synch rule give different results for sparse and noisy observations. In particular, the synch rule trajectory seems to diverge slightly earlier from the observations than the CPT. A possible reason could be the different use of the models' tendencies. With CPT, the imperfect models run unconstrained from the data in the period,

Despite the application of the iterative method and nudging, the CPT and synch rule may still struggle to stay very close to the observations. To increase the chance to obtain a proper trajectory, one could work with an augmented ensemble of trajectories. This ensemble could consist of trajectories starting from slightly different initial conditions, or trajectories that emerge from a model nearby the closest model to an observation. One could make a comparison with the particle filter method (see e.g.,

Both training methods seem in principle more suitable for short rather than longer timescales, since for both training rules it is important that the imperfect models stay close enough to the observations. For longer timescales this can be difficult. Despite the action of the nudging, the models can be out of the data phase as long as time evolves. If the observations are lost, the “closest” model in the CPT training is not necessarily the one that contributes most to improving the supermodel dynamics. If during synch rule training the supermodel loses the observations, a new, non-optimal equilibrium for the weights can be found, as we have seen in Sect.

Until now the distance between models and model to data has been the RMSE. If one is training a supermodel with improved skill on longer timescales, it is possible that the appearance of specific climatological features of the models is of more importance than a small RMSE. In that case the distance between observations and models can be defined in a different way. For example, if the imperfect models suffer from an erroneous double intertropical convergence zone (ITCZ), one can increase the weight of the model which is on average closer to a single ITCZ. Additionally, one can define different weights for different periods of time, for example seasonally dependent weights. Despite these possibilities in adapting the training methods, there are some conditions that need to be fulfilled when CPT or the synch rule are used on longer timescales. The methods only work if the models can compensate for each other. For example, when both models have been spun up for a sufficient amount of time and are stable in state space, both CPT and the synch rule cannot give useful weights. In the case of CPT, the model that is on average closest to the observations will be repeatedly chosen. For the synch rule, the average model tendency will be zero over a sufficient amount of time, hence there will be no update of the weights on average over time. Therefore, for both training methods the imperfect models cannot already reside on their own attractor and the tendency towards their attractor needs to be visible.

To make the training methods suitable for state of the art models it needs to be taken into account that state of the art models can differ in grid point resolution and time steps. In this paper, for both CPT and the synch rule during training the imperfect model states are replaced, in the case of the synch rule the imperfect model states are replaced by the new supermodel state, and in the case of CPT the imperfect model states are replaced by the state of the closest model. To apply the training methods in state of the art models, techniques from data assimilation can be used to combine the states in a dynamically consistent manner

The general form of the synch rule as given in

The exact version of the SPEEDO model code with the CPT and synch rule training integrated that is used to produce the results used in this paper is archived on Zenodo (

FS conceived the study, carried out the research and led the writing of the paper. AC provided input for the interpretation of the results and the writing.

The contact author has declared that neither they nor their co-author has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been supported by the H2020 European Research Council (grant no. STERCP (648982)) and Trond Mohn Foundation under project number BFS2018TMT01. The preparation of this paper was partially supported under NSF Grant 2015618.

Alberto Carrassi has been funded by the UK Natural Environment Research Council award NCEO02004.

This paper was edited by Julia Hargreaves and reviewed by two anonymous referees.