^{1}

^{1}

^{2}

^{2}

<p>A framework of ensemble forecast verification tools is discussed which is founded on the concept of information entropy. It can be based on a common yardstick namely that of "correlation". With these measures calibration is deduced from the balance between ensemble sharpness and resolution. With the same units these features can be put into one diagram for continuous time series from Gaussian processes and exceedance probabilities, the latter usually tested with the reliability term from the Brier score. The sharpness and resolution terms allow to use the same vocabulary of over- and underdispersion which is established for frequency histograms. The concept is based on the fact that mutual information (MI) of two Gaussian processes is directly related to Pearson's anomaly correlation. Further MI can be written as the Kullback-Leibler divergence of the conditional probability of observations given the model forecasts and the unconditioned observations. Thus the MI is a measure of resolution. The mean of the <i>UTILITY</i> defined by (Kleeman, 2002) is the corresponding measure of sharpness. For Gaussian processes the mean <i>UTILITY</i> is very close to the ratio of ensemble mean variance to mean ensemble variance (<i>ANOVA</i>) which is the analysis of variance factor when time is taken as treatment. The ensemble spread score (<i>ESS</i>) (Palmer et al., 2006) is shown to be a measure of calibration if model and observed data are scaled with their respective means and standard deviations. For exceedance probabilities the resolution term of the divergence score (Weijs et al., 2010) is already defined as a MI term and it is here complemented with a mean <i>UTILITY</i> formed similarly to the resolution term but with forecasts only. The entropy terms are then rescaled to the "correlation" yardstick. The concept is applied to temperature data from the German project on decadal climate prediction, Mittelfristige Klimaprognose (MiKlip). It is shown that both over – and underdispersion can be found for the 2m temperature forecasts. Increasing ensemble sharpness of surface ocean temperature with lead year in the southern ocean hints at model-data inconsistencies at some locations in the ocean. Finally empirical orthogonal functions (EOF) of northern hemisphere annual mean surface temperature for ERA-40/ERA-Interim and MiKlip retrospective hindcasts are determined. For both data sets the respective first EOF represents the low frequency temperature development. The time coefficients of the EOF are used to compare resolution and sharpness of continuous data and exceedance probabilities in one diagram.</p>