Introduction
The provision of reliable seasonal forecasts is an important area in climate
science and understanding the limitations and quantifying uncertainty remains
a key challenge . Operational seasonal
forecasting, although once limited to a handful of research centres, is now a
regular activity across the globe. Much recent focus has been given to the
skill and reliability of seasonal climate predictions. Dynamical
(process-based) forecast systems are arguably the most important tool in
producing predictions of seasonal climate at continental and regional scales.
Such systems are based on numerical models that represent dynamical processes
in the atmosphere, ocean and land surface in addition to the linear and
non-linear interactions between them. However, the development of dynamical
systems is a continuous challenge; climate models are inherently complex and
computationally demanding and often contain considerable errors and biases
that limit model skill in particular regions and seasons.
As an alternative to dynamical forecast systems, empirical approaches aim to
describe a known physical relationship between regional-scale anomalies in a
target variable (the predictand) – say, temperature or precipitation – and
preceding climate phenomena (the predictors). In its simplest form, an
empirical forecast may be based on persistence in which observations of a
given variable at some lead time are taken as the forecast for that variable.
Such forecasts have frequently performed better at short lead times than
those simply prescribed by the long-term climatology, particularly so in the
tropics. More sophisticated statistical methods include analog forecasting
and regression-based techniques, which may in
turn take predictive information from spatial patterns using, for instance,
empirical orthogonal functions (EOFs) (e.g. ),
maximum covariance analysis (MCA) (e.g. ), and linear
inverse modelling (LIM) (e.g. ). Empirical predictions
for the phase and strength of the El Niño–Southern Oscillation (ENSO) have
historically shown comparable skill to those produced by dynamical systems
(e.g. ). Additionally, an
inherent advantage of empirical methods is the ease with which knowledge of
climate variability gained from analysis of up-to-date observations can be
incorporated into a prediction system , which in turn
facilitates the development of new methodologies and statistical techniques
.
Empirical forecasts serve both as a baseline for dynamical models and can be
used to improve the forecasts by limiting the effects of dynamical model
biases. However, differences in the development and output of dynamical and
empirical statistical approaches make systematic comparison troublesome, and
understanding the relative skill of each forecast type is challenging. Recent
attempts have been made in developing empirical benchmark systems for
multiple variables, such as land and sea surface temperature, on decadal
timescales (e.g. ), concluding that the usefulness of
such systems merits further development. While comparison of dynamical and
empirical systems for seasonal forecasts is not novel, a systematic global
comparison for multiple variables, including probabilistic measures, has been
lacking. A key potential benefit of such comparison is the identification of
regions where empirical models are skilful and may be able to provide useful
forecast information to complement the output of dynamical systems.
Supplementing dynamical forecasts with empirical forecasts is of great
importance in situations where dynamical systems are known to have
weaknesses. It has also been shown that combining the output of empirical and
dynamical systems can produce marked improvement over single-system forecasts
(e.g. ).
A fundamental criticism of empirical systems is the question of their
applicability in a future, perturbed climate. In other words, to what extent
will the predictor–predictand relationships underpinning a statistical model
remain stationary under climate change? found that, within
the statistical uncertainties, no changes could be detected in ENSO
teleconnections. recently noted that the temporal
evolution of seasonal climate should be considered as forced not only by the
internal variability of the climate system but also by changes in
concentrations of greenhouse gas and aerosols as a result of anthropogenic
activities. Such external forcings are considered in climate change
simulations and also to an increasing extent in the field of decadal
prediction (e.g. ). Current seasonal forecast systems
now include these forcings , but the resulting
trends are sometimes not realistic.
Here we present and validate a simple empirical system for predicting
seasonal climate across the globe. The prediction system, based on multiple
linear regression, produces probabilistic forecasts for temperature and
precipitation using a number of predictors based on well-understood physical
relationships. The global equivalent CO2 concentration is an indicator of the climate
change signal and is used as the primary predictor in all forecasts.
Additional predictors describing large-scale modes of variability in
the climate system, starting with ENSO, and local-scale information are
subsequently selected on the basis of their potential to provide additional
predictive power. The system presented will have two purposes: (a) to serve
as a benchmark for assessing and comparing the skill of dynamical forecast
systems, and (b) to act as an independent forecast system in combination with
predictions from dynamical systems. Key to achieving these goals will be the
system's implementation in a quasi-operational framework with empirical
forecasts made on a monthly basis and the availability of a set of hindcasts.
The method implemented here constitutes a relatively simple approach to
empirical forecasting. The global and automated nature of the prediction
system calls for the underlying empirical method to be parsimonious in terms
of the predictive sources used to construct it. The statistical model and the
selection of predictors will thus be based on physical principles and
processes to the fullest extent so as to elicit the maximum predictive power
of, first of all, the long-term trend associated with the climate change
signal and, secondly, as few additional predictors as is necessary in order
to minimise the risk of overfitting. The final system will also be
sufficiently flexible to facilitate its future development. Such development
may involve inclusion of additional predictors should more complete and
reliable data sets become available, or the application of the system to
alternative predictands including those relating to the magnitude and
frequency of extreme events.
Producing empirical forecast output in similar format to dynamical systems is
crucial when designing a framework for robust comparison. A weakness of
current dynamical–empirical system comparison is the general lack of a common
set of validation measures. Whereas dynamical systems inherently provide
output in the form of ensemble forecasts, which may be validated in
probabilistic terms, validation of empirical systems does not always extend
beyond deterministic measures, such as bias, root-mean-squared error (RMSE) and correlation
. Here, the uncertainties are explicitly parametrised as an
ensemble of forecasts and we employ a rigorous validation framework designed
to assess both the deterministic and probabilistic aspects of the forecast
system.
The remainder of the paper is structured as follows. Section
describes the prediction system in full, including the
observational data used for empirical model fitting and validation. An
analysis of the potential usefulness of the predictors is given in
Sect. . The skill of the prediction system is then assessed in
Sect. with a discussion and outlook given in Sect. .
Prediction system outline
Key to achieving the goals set out in Sect. is the
development of an automated forecast system that can be applied globally and,
in principle, for any number of predictands. For these reasons, the
regression-based prediction system developed here is relatively simple in
comparison with more sophisticated statistical models, with emphasis given to
a basis of physical processes and the avoidance of overfitting.
Our system incorporates a multiple linear regression approach for estimating
seasonal (3-month) surface air temperature (SAT) and precipitation (PREC)
as a function of global and local atmospheric and oceanic fields. The
approach used assumes the predictand time series x to consist of two
components:
x=xext+int,
where xext is the response to externally forced low
frequency variability associated with anthropogenic activity and
xint represents the internal variability independent of changes
in external forcing . We seek first to utilise the
predictive information in xext which is assumed to be linearly
dependent on the global CO2-equivalent concentration (CO2EQV), based on
historical estimates until 2005 and according to Representative Concentration
Pathway (RCP) 4.5 thereafter, which constitutes the net forcing of greenhouse
gases, aerosols, and other anthropogenic emissions .
Secondly, we seek to identify a set of predictors that best represents
xint. The predictand time series x may be modelled as a
function of a set of predictors:
x=α+βC+∑i=1n(ΦiFi)+ϵ,
where C is CO2EQV at a given lead time and F is a set of n
additional predictors at the same lead time that describes xint.
The regression parameters β and Φ are those required to transform
C and F, respectively, α is the constant regression term and
ϵ is the set of residuals specific to the model fit. In this case,
predictors are taken from the previous 3-month season at a lead time of
1 month (e.g. the forecast for the season March–April–May is estimated
using predictors from November–December–January). An independent regression
model is calibrated at each grid point. Whereas CO2EQV is included as a
predictor by default, all additional predictors are included on the basis of
their predictive potential, which is determined by a predictor selection
procedure prior to model fitting. In the remainder of this section we
(a) identify potential predictors and describe the sources of both predictor and
predictand data (Sect. ) and (b) provide further
details on the predictor selection approach, the model fitting procedure and
the validation framework (Sect. ).
Potential predictors
As additional predictors F, we consider first of all variables that
describe large-scale modes of variability. ENSO is the most important of
these in terms of its contribution to the skill of seasonal predictions,
particularly in the tropics
. Circulation
and precipitation patterns in the tropical Pacific associated with ENSO sea surface temperature (SST)
anomalies are subsequently linked to climate variability in other parts of
the globe . In addition, modes of variability in other
tropical oceans, including the tropical Atlantic and Indian basins, are known
to contribute substantially to variability in SAT and PREC, particularly in
surrounding regions . Many such phenomena are linked in
some way to ENSO, although variability in the Indian Ocean Dipole (IOD) is
known to occur independently . Similarly, the Pacific Decadal
Oscillation (PDO), defined as the leading empirical orthogonal function
of North Pacific monthly SST anomalies, is considered as a representation of
variability on interdecadal timescales that is not otherwise apparent in
interannual ENSO variability . Drought occurrence in the
United States is known to be linked to the phase of both PDO and Atlantic
Multidecadal Oscillation (AMO). Atmospheric anomalies, including
troposphere–stratosphere interactions, are also known to have predictive
potential. The Quasi-Biennial Oscillation (QBO)
has recently been considered in a multiple regression model for predicting
European winter climate . With this in mind, the following
indices are considered as predictors: NINO3.4 (representative of ENSO), PDO,
AMO, IOD and QBO. The system is designed to be flexible enough for the
inclusion of additional predictors in the future.
Description of predictor variables and their sources.
Predictor
Source
CO2EQV
CO2-equivalent concentrations
NINO3.4
Calculated from SST fields from HadISST
PDO
University of Washington (http://research.jisao.washington.edu/pdo/)
QBO
At 30 hPa from the reconstruction of
AMO
Calculated by ; based on HadSST
IOD
Calculated from SST fields from HadISST
LSST
HadSST3
CPREC
GPCC Full Data Reanalysis version 6
External forcing and global modes of variability are not the only source of
skill in seasonal forecasts. Many studies, including those based on dynamical
systems, have found links between local climate and variations in preceding
nearby climate phenomena (e.g. ). The most
simple of these is persistence; that is, the value of the predictand (either
SAT or PREC) for the same location at some lead time. Here, we seek to elicit
predictive information from persistence (PERS) and other variables that vary
from grid point to grid point in addition to the set of large-scale modes of
variability described above. For coastal locations in particular, we seek to
maximise the potential of short-term memory contained within neighbouring sea
surface temperatures to provide greater predictability than PERS at the
specified lead time. We derive a local sea surface temperature (LSST) index
for each predictand grid cell, defined as the mean of the k nearest grid
cells containing SST information. Here, k=5 throughout the analysis
although this value could of course be altered or optimised for
region-specific analysis. Finally, as a proxy for soil moisture, which has
been shown to impact on local temperature (e.g. ), we also
consider accumulated rainfall (CPREC) as a potential predictor.
Further details of the sources of predictor data are given in Table .
Our list of predictors is not exhaustive. Much recent
work has sought to identify predictability arising from the extent of sea ice
and snow covered land, the reflective and insulative attributes of which are
relevant for SAT and PREC in several regions of the extra-tropics (e.g.
). However, these
variables are not considered for the present system due to the absence of
sufficiently long and reliable data sets, although some effects are
effectively captured by persistence. The design of the prediction system
facilitates inclusion of additional predictors should high quality
observational or reanalysis data become available.
Model fitting and validation
Global observational data sets provide the predictand (SAT and PREC) fields
required for model calibration and validation. SAT is taken from the
reconstruction of the Hadley Centre–Climatic Research
Unit Version 4 (HadCRUT4) , which uses kriging to account
for missing data in unsampled regions. PREC is taken from the Global
Precipitation Climatology Centre (GPCC) Full Data Reanalysis version 6
for the period 1901–2010 combined with additional data for the
period 2011–2013 taken from the GPCC monitoring product following bias
correction.
Correlation between SAT and the set of predictors with a 1 month
lead time (1961–2013) for (a) MAM, (b) JJA,
(c) SON, and (d) DJF. Correlation between CO2EQV is shown
in the top left panel; subsequent panels show correlation between
predictand–predictor pairs following removal of the CO2EQV trend. Stippling
is used to indicate significance at the 95 %
level.
Analysing the degree of additional predictive skill offered by each predictor
will form an important precursor to the implementation of the system. A
two-step predictor selection procedure is used to determine the fewest
numbers of predictors necessary to provide greatest predictive skill. The
selection procedure may be considered “offline” in the sense that it is
implemented prior to model fitting. In the first step, global maps of linear
correlation between predictand–predictor pairs form a basis for a physical
understanding of the factors governing variability. Predictors that show good
potential and do not exhibit colinearity with other predictors are included
in the second step: the selection of predictors to be passed to the empirical
forecast model itself.
Correlation between PREC and the set of predictors (1961–2013) for
(a) MAM, (b) JJA, (c) SON, and (d) DJF.
As in Fig. , correlation with CO2EQV is shown in the
top left panel; subsequent panels show correlation between
predictand–predictor pairs following removal of the CO2EQV trend. Stippling
is used to indicate significance at the 95 %
level.
To achieve this, the linear trend associated with CO2EQV is first of all
removed from both the predictand x and the set of predictors F by fitting
the models
x=α1+β1C+ϵx
and
Fi=α2+β2C+ϵFi,
where α1, β1 and α2, β2 are the
respective regression parameters for each model fit and ϵx and
ϵFi are the time series of residuals that equate to the
detrended predictand and predictors, respectively. Correlation is performed
between ϵx and each of the N predictors within the set
ϵFi (where i=1,2,…N). Predictors that exhibit
significant (at the 90 % level) correlation are identified. The two-step
approach is designed to avoid overfitting, which would lower skill scores,
and to ensure that the empirical model is built on physical principles to the
fullest extent. The first step is to an extent qualitative and undertaken
only once for each predictand, i.e. for each predictand there is an agreed
set of potential predictors independent of season or location. However, the
fully quantitative second step is performed independently at each grid point
and for each season. Following the selection of predictors, all significant
predictors are then entered into a multiple linear regression along with
CO2EQV; Eq. () is thus modified:
x=α+βC+∑i=1k(ΦiFiS)+ϵ,
where FS is the subset of k predictors from F that meet
the significance criteria outlined in the selection procedure. An estimate
for the unknown predictand x^ at forecast time t may be determined:
x^t=α+βCt+∑i=1k(ΦiFitS).
A key component of the empirical prediction system is the provision of
probabilistic output. The residuals ϵ from the regression fit in
Eq. () are randomly sampled (with replacement) and
subsequently used to generate a forecast ensemble. The kth member of the
ensemble x^ens at forecast time t is thus given by
x^t,kens=x^t+ϵk,
where ϵk is a randomly sampled member of
ϵ. Sampling of the residuals is performed 51 times, reflecting the
typical ensemble size in an operational dynamic forecast. The ensemble allows
for the calculation of probabilistic skill scores and will provide a basis
for full comparison with the output of dynamical systems. It is anticipated
that future development of the system will consider more complex methods of
ensemble generation.
The model is calibrated and validated in a hindcast framework using a causal
approach: hindcasts are produced for 1961–2013 using data since 1901 prior to
the hindcast start date. The causal approach was chosen instead of a
leave-one-out framework in order to replicate the set of observational data
that would have been available for each hindcast were it produced in real
time. The predictor selection procedure, in addition to being
location-specific, is also implemented independently for each hindcast. In
other words, for a given grid point, a given predictor would only be included
in the regression model for hindcasts with fitting periods during which it
demonstrates predictive potential, allowing for the maximum value to be taken
from predictor information in the fairest way. It is also important to note
that, in setting the earliest hindcast to 1961, we seek to limit the impact
of poor quality available predictand and predictor data in the early 20th
century. Additionally, to ensure robustness, the multiple linear regression
model requires a complete predictand–predictor time series of at least 30
years in the fitting period for a forecast to be produced.
Both the deterministic and probabilistic aspects of the prediction system
must be systematically validated using a number of measures. Global maps of
correlation between hindcast estimates and observations provide a view on the
degree of representation of temporal variability. Verification scores
originally developed in the context of numerical weather prediction,
including the root-mean-squared error skill score (RMSESS) and the continuous
rank probability skill score (CRPSS) (e.g. ), provide a
quantification of the degree of bias and the skill of the probability
distribution produced by the ensemble, respectively. Such verification
measures are also used to determine skill scores that describe forecast skill
against a reference ensemble forecast. The reference forecast is produced by
random sampling of the climatology, i.e. the observations for each year in
the fitting period.
Analysis of potential predictors
Surface air temperature
The surface air temperature (SAT) shows a clear trend almost everywhere,
which is assumed to be proportional to the forcing of greenhouse gases,
described by CO2EQV. Separate spatially varying aerosol forcings have not yet
been implemented. As mentioned in Sect. , this trend is
treated differently from the other predictors in the sense it is always
included in the empirical model; other predictors are considered only in
cases where they appear to add value (following step one of the predictor
selection process). Figure shows seasonal
correlation between SAT and CO2EQV in the top left panels. Subsequent panels
show the correlation derived from predictor–predictand pairs (following
removal of the linear trend associated with CO2EQV). Correlation between SAT
and CO2EQV is in general strongly positive across the majority of the globe
and particularly so when the response of SAT to the internal variability of
the climate system is known to be small compared to the response to the
signal associated with anthropogenic forcing, for example in the Northern
Hemisphere during spring (MAM) and summer (JJA). Additionally, correlation
between SAT and CO2EQV is in general strongly positive throughout tropical
land masses at all times of year.
Correlation between SAT hindcasts and observations (1961–2013) for
(a) MAM, (b) JJA, (c) SON, and (d) DJF.
The top left panel shows correlation between observations and SAT hindcasts
constructed using the CO2 equivalent as the sole predictor; stippling is
used to indicate significance at the 95 % level. Subsequent panels show
differences in correlation following the inclusion of additional
predictors.
Among the indices describing variability in the climate system, NINO3.4 shows
the second strongest relationship with SAT; the importance of ENSO in
governing variability in temperatures across the tropics is highlighted by
correlations stronger than ±0.5 in parts of South America, Africa and
northern Australia in addition to the tropical Pacific and Indian oceans.
ENSO-based relationships in extra-tropical land regions are less apparent,
although positive correlation in the northern half of the North American
continent and negative ones around the Gulf of Mexico show the well-known
influence on winter (DJF) and spring (MAM) SAT
. Very low correlations are
found across Europe.
Correlation between observations and SAT hindcasts generated using
the regression model with all predictors (1961–2013). Stippling is used to
indicate significance at the 95 % level.
The PDO and IOD correlation patterns are very similar to those for NINO3.4.
Much of the signal associated with PDO is likely captured by NINO3.4.
However, inclusion of PDO alongside NINO3.4 in the prediction system may
yield additional skill in the northern Pacific as a result of enhanced
cyclonic circulation around the deepened Aleutian Low associated with a
positive, warm PDO phase (Liu and Alexander, 2007). The AMO correlation
patterns clearly act independently of ENSO and feature correlations
throughout the high northern latitudes and the North Atlantic but curiously
not so much in western Europe . The PDO, IOD and
AMO indices are all included in the prediction system.
Correlation associated with the QBO is poor with the notable exception of
northern and central Russia during the boreal autumn (not shown). In
agreement with we found no significant correlation for
winter in Europe with a 1-month lead time. This is surprising given the
link found in previous work between the QBO and the Arctic Oscillation (AO),
and thus on European surface climate, although the authors suggest that
predictability requires a shorter optimal lead time than that used here
. QBO is thus withdrawn and not included in the
prediction system.
Root-mean-squared error skill score (RMSESS) and the continuous rank
probability skill score (CRPSS) of the SAT hindcasts expressed as a skill
score against a climatology ensemble forecast (1961–2013). For CRPSS,
stippling is used to indicate significance at the 95 % level following a
one-sided t test.
Persistence (PERS) shows strong correlations in some key regions and is
particularly important for high latitude seas in the Northern Hemisphere
during winter, reflecting the latent heat of melting of the sea ice. Over
land, however, there are relatively few regions associated with strong
correlation outside of the tropics. Correlation is greater than 0.4 in parts
of western Europe (MAM), south-east Europe (JJA), central North America (JJA)
and parts of central Asia (JJA). However, aside from these examples, the
memory of land surface temperature outside of the tropics does not appear to
extend to the predictor period.
Local SST (LSST) produces similar correlation to persistence over the oceans
but offers no skill over most continental regions. It is anticipated that
LSST may be beneficial in coastal regions but this is not clear at the present
spatial resolution. Both predictors are made available for selection in the
SAT forecast system. The relationship between antecedent precipitation
(CPREC) and SAT is in general quite poor but correlation is around 0.3 in
northern Europe during spring (MAM), most likely representing the connection
of a mild, wet winter to a mild spring. There is negative correlation
(although not significant) during summer (JJA) in parts of Europe, which
suggests that CPREC is partly able to represent the link between soil
moisture and SAT at this time of year shown in previous work
. The correlation is also strong (negative) in parts of
Australia and south-east Asia, in addition to southern Africa (MAM) and
northern South America (DJF and MAM).
Correlation between PREC hindcasts and observations (1961–2013) for
(a) MAM, (b) JJA, (c) SON, and (d) DJF.
The top left panel shows correlation between observations and SAT hindcasts
constructed using the CO2 equivalent as the sole predictor; stippling is
used to indicate significance at the 95 % level. Subsequent panels show
difference in correlation following the inclusion of additional
predictors.
Precipitation
Correlation between PREC and the predictors is shown in Fig. . As expected, the response of PREC to the trend in
CO2EQV is not as strong as that of global temperature. Increased PREC in
northern high latitudes during the boreal winter has a known association with
global warming . However, the response of precipitation
to global warming is not yet visible above the noise in much of the
mid-latitudes and these regions are associated with low correlation at all
times of the year. Notable exceptions are significant negative correlation in
northern Africa (all times of year) and significant positive correlation in
Greenland, northern Europe and Asia (MAM, SON and DJF).
The strong correlation exhibited between NINO3.4 and PREC in many parts of
the world provides an important basis for predictability. In addition to
ENSO-related changes in tropical precipitation patterns, there are a number
of known links with precipitation in the extra-tropics
, although only a weak one in MAM is found in
Europe . Correlation patterns for the PDO (not shown) are
again similar for NINO3.4. For the IOD, correlations of around 0.3 exist in
eastern Africa during autumn (SON) and winter (DJF), but again these patterns
are very similar to those for NINO3.4. Correlation of IOD and PREC following
removal of the NINO3.4 signal (not shown) indicates an ENSO-independent
relationship, particularly during DJF in eastern Africa, which is supported by
the findings of previous work , and also parts of Europe.
In the absence of known links between the phase of PDO and precipitation
anomalies that are independent of ENSO, PDO is not considered for inclusion
in the prediction system. QBO is also omitted on the basis that there are few
areas of correlation of statistical significance (not shown). AMO on the
other hand produces significant correlation in regions influenced by the
Atlantic where NINO3.4 does not, including the Sahel (JAS, visible in JJA and
SON) and eastern South America (JJA). The AMO–PREC relationship does not appear
to extend to extra-tropical regions; there are no discernible areas of strong
correlation in Europe or eastern North America. This contrasts with the
strong link previously identified between the AMO and JJA precipitation in
Europe during the 1990s . The use of long-term time series,
correlations rather than composites and an absence of temporal filtering here
results in lower correlations.
For PERS, there are a number of regions, particularly in the extra-tropics,
where significant correlation offers potential for predictability. The most
obvious of such correlation is during DJF and MAM in the mid- to
high-latitudes of the Northern Hemisphere; the persistence of dry (wet)
conditions during autumn in much of central Eurasia is an indicator for
similar conditions during winter and into spring. There are relatively few
regions where LSST is significantly correlated with PREC. These include the
western United States (MAM) and south-east Asia where SST has variability
that is independent from ENSO and adds to the skill in dynamical systems
. It remains unclear to what extent LSST may offer
additional value to this empirical prediction system.
Prediction system development and validation
For each hindcast between 1961 and 2013, and for each season and grid point,
predictors are selected on the basis of the significance of the (detrended)
correlation with the predictand for the fitting period. For validation,
causal hindcast estimates are compared with observations to determine the
skill of the deterministic and probabilistic aspects of the prediction
system.
Surface air temperature
Following the assessment of potential predictors (step one of the predictor
selection process), the following were chosen in addition to CO2EQV for
inclusion in the prediction system: NINO3.4, PDO, AMO, IOD, PERS, LSST, and
CPREC. Hindcasts were produced with each predictor added in turn and verified
against observations. Figure shows the correlation
between observations and a hindcast constructed using the CO2 equivalent only
(top left panel on each page) and the incremental correlation attained by
including additional predictors cumulatively (subsequent panels on each
page). The observation–hindcast correlation following the inclusion of all
predictors is given in Fig. . Note that these are
the correlations of a causal system that only uses information from before
the hindcast date, the values are therefore much lower than the full
correlations of Fig. . If the correlations are
spurious – i.e. there was no physical connection but the predictor was
included because the correlation exceeded the 90 % significance criterion
(this happens by chance on 10 % of the grid points without connection) – the
hindcast skill is degraded by the inclusion of this predictor, visible as the
light-blue background in the panels of Fig. . We
tried to minimise this by the first step in the predictor selection process.
Correlation between observations and PREC hindcasts generated using
the regression model with all predictors (1961–2013). Stippling is used to
indicate significance at the 95 % level.
The correlation of observations with hindcasts estimated using CO2EQV
(Fig. ) only is much lower than that with hindcasts estimated
as a function of all potential predictors (Fig. ).
This is due to the fact that over the first half
of the hindcast period the trend is not yet very strong and does not
contribute to the skill. This measure therefore underestimates the skill
expected in forecasts, which are made at a time that the trend plays a much
larger role, although this depends also on the reference period chosen for
the forecasts.
The inclusion of NINO3.4 clearly adds value across the Pacific and in the
parts of the tropics. There are no land-based areas where either PDO or IOD
add value, but AMO does improve correlation substantially in the North
Atlantic and in parts of northern (SON) and eastern (JJA) Europe, although
its inclusion degraded the hindcasts in eastern Europe in DJF. The addition
of PERS improves correlation in only a handful of locations and LSST; while
it is important to correlation over some parts of the ocean and hence for
islands and coastal regions not resolved by our coarse data sets, it adds
little value further from the coast. As suggested in
Fig. , CPREC adds little global value except in parts
of Australia.
The final model shows good skill was found in many regions of the globe
(Fig. ). Key areas of high correlation include
the majority of the tropics where the dominance of ENSO on interannual
variability is greatest. Correlation is strong at all times of year
throughout much of northern South America, central and southern Africa, and
south Asia. Strong correlation is also found in important extratropical
regions, including much of Europe except during SON. Correlation is strong in
much of western and central Europe during the spring and summer (MAM until
ASO). Over North America, the skill depends strongly on the season, varying
from slightly negative skill (due to overfitting) during SON to good skill in
large parts during MAM. Global patterns of RMSE skill scores are broadly
similar; regions of strong correlation are generally associated with small
differences from observations (Fig. ; left
panels).
Root-mean-squared error skill score (RMSESS) and the continuous rank
probability skill score (CRPSS) of the PREC hindcasts expressed as a skill
score against a climatology ensemble forecast (1961–2013). For CRPSS,
stippling is used to indicate significance at the 95 % level following a
one-sided t test.
Global maps of CRPSS exhibit broad patterns of skill similar to those for
correlation (Fig. , right panels). The highest
skill scores (relative to the climatology-based forecast) are found in the
tropics and are evident during all seasons. In Europe, skill is again
greatest during spring and summer, although some parts of eastern Europe and
Scandinavia are associated with negative skill scores. Very little of North
America is associated with high skill; indeed, the prediction system fails to
outperform the climatology-based forecast over the majority of the eastern
and southern United States. This lack of skill is known to extend to
dynamical forecasts, particularly during winter (e.g. ).
Precipitation
The following predictors were included in the PREC prediction system:
NINO3.4, AMO, PERS, and LSST. Figure shows total and
incremental correlation results in the same format as Fig. for SAT. Using CO2EQV as a sole predictor fails to
yield any notable regions of significant correlation, with the exception of
parts of northern Eurasia during winter (DJF). As for SAT, we would expect
the forecast skill to be greater than the hindcast skill given that the a
large portion of hindcasts were made before the trend becomes important. The
addition of NINO3.4 increases hindcast–observation correlation in many parts
of the tropics, particularly during the boreal autumn (SON) and winter (DJF).
In spite of some evidence for a relationship with PREC in parts of Eurasia as
shown in Fig. , AMO fails to add any improvement
to the empirical model's skill except in northeastern Brazil and to some
extent the Sahel. The same is largely true for PERS and LSST, suggesting that
almost all skill is captured by NINO3.4 and, to some extent, the climate
change signal.
For the final model, high correlation (> 0.6) is limited to south-east
Asia and northern parts of South America (between ASO and JFM) (Fig. ).
Another area of high correlation to the north is in
south-east South America during the Austral spring (SON–DJF). However, the
RMSE for the hindcast is rarely an improvement on that derived from the
climatology (Fig. , left panels). In addition,
there are only a few areas where the hindcast produces a positive CRPSS,
which would indicate an improvement on the ensemble forecast derived from the
climatology (Fig. , right panels). This leads
us to conclude that, while the deterministic component of the system is able
to reproduce some components of seasonal precipitation variability,
probabilistically the system does not perform well outside of limited areas in
its present guise.
Discussion and outlook
A global empirical system for seasonal climate prediction has been developed
and validated. Multiple linear regression was chosen as the basis of the
system; a simple predictor selection scheme sought to maximise the predictive
skill of a number of predictors describing global-scale modes of variability
and local-scale information alongside that of the climate change signal.
Probabilistic hindcasts of surface air temperature (SAT) and precipitation
(PREC) have been produced using prediction models based on multiple linear
regression and validated against observations using correlation and skill
scores. The prediction system shows good skill in many regions. For SAT, the
trend and interannual variability are well represented throughout the tropics
and in a number of extra-tropical regions, including parts of Europe,
particularly during spring and summer, southern Africa and eastern Australia.
Skill associated with the probabilistic component of the seasonal predictions
shows similar spatial patterns. For PREC, few areas of notable skill are
found outside of regions with known ENSO teleconnections and,
probabilistically, the system does not perform better than a climatological
ensemble throughout most of the world.
As outlined in Sect. , the system presented here has been
designed to serve both as a benchmark for dynamical prediction systems and as
an independent forecast system to be combined with dynamical output to
produce more robust forecasts. Concerning the second purpose, it is important
to identify seasons and regions where dynamical systems lack skill and
to determine whether our system may potentially add value in such instances. In general,
dynamical system skill is limited to regions that are strongly linked to
ENSO; in extra-tropical regions, where seasonal variability in the
atmospheric state is governed to a greater extent by random internal
variability, skill is inevitably lower than in the tropics
. The good skill in many parts of Europe,
particularly for forecasts of SAT, is an encouraging property of our system
and a detailed comparison with dynamical European forecasts is forthcoming.
The inclusion of locally varying predictors, in combination with predictors
describing large-scale modes of variability, provides a basis to elicit more
skill than can be attained using global indices alone.
An important outcome of this work is the system's implementation in a
quasi-operational framework and the provision of regular forecasts. Monthly
forecasts are generated for each forthcoming 3-month season and made
publicly available through the KNMI (Royal Netherlands Meteorological Institute) Climate Explorer along with uncertainty
parameters and updated hindcast validation. The system's framework permits
the potential to test empirical prediction methods other than linear
regression, such as neural networks that potentially capture non-linear
aspects of the climate system. Additionally, as mentioned in Sect. , the current list of predictors considered for inclusion is
not exhaustive and there is scope to better exploit the predictive
information in other locally varying predictors. Further avenues for system
development include region-specific and case-based analyses and application
to alternative predictands from century-long reanalyses or those describing
extreme events. Focus will also be given to alternative methods of ensemble
generation using, for instance, derived uncertainty in regression parameters
and spatial patterns.