Coupled general circulation models are of paramount importance to quantitatively assessing the magnitude of future climate change. Usual methods for validating climate models include the evaluation of mean values and covariances, but less attention is directed to the evaluation of extremal behaviour. This is a problem because many severe consequences of climate change are due to climate extremes. We present a method for model validation in terms of extreme values based on classical extreme value theory. We further discuss a clustering algorithm to detect spatial dependencies and tendencies for concurrent extremes. To illustrate these methods, we analyse precipitation extremes of the Alfred Wegener Institute Earth System Model (AWI-ESM) global climate model and from other models that take part in the Coupled Model Intercomparison Project CMIP6 and compare them to the reanalysis data set CRU TS4.04. The clustering algorithm presented here can be used to determine regions of the climate system that are then subjected to a further in-depth analysis, and there may also be applications in palaeoclimatology.

Coupled general circulation models are frequently utilised to quantitatively assess the magnitude of future climate change. Validating these models by simulating different climate states is essential for understanding the sensitivity of the climate system to both natural and anthropogenic forcing. Usual methods for validating climate models include the evaluation of mean values and covariances and the comparison of empirical cumulative distribution functions. These analyses can also be conducted over seasonal and annual averages (climatologies) or along latitudinal and longitudinal transects

In particular, the concurrent occurrence of climate extremes at different locations may have especially large impacts on agriculture

A particular challenge for the analysis of extreme events is the fact that extreme events are typically rare and that it is therefore difficult to build informative statistics based solely on the extreme events themselves. Two common approaches are used to overcome this issue: peaks over threshold and block maxima. In the peaks-over-threshold approach, a fixed threshold is selected. The distribution of the data exceeding this threshold can then be approximated by a generalised Pareto distribution if some additional assumptions are fulfilled

In this work, we will evaluate the performance of the fully coupled Alfred Wegener Institute Earth System Model (AWI-ESM1.1LR)

In this article, our main focus is on the AWI-ESM, and we present our methods using data from this model. We also present a measure for the model accuracy in regard to extremal precipitation and apply it to a set of different CMIP6 models. In the main text, results will be discussed for the AWI-ESM and for the model identified as having the best model accuracy. In the Supplement to this paper, the results for the other CMIP6 models investigated are presented.

Model validation in terms of precipitation extremes is already an active research topic.

It is also not a new approach to apply clustering algorithms to climate data. For example, it has been used to define climate zones in the United States

The article is structured as follows. After introducing the data sets in Sect.

The observational data are reanalysed monthly precipitation data in

The Coupled Model Intercomparison Project (CMIP), coordinated by the Working Group on Coupled Modelling (WGCM) of the World Climate Research Programme (WCRP) has the goal of supporting and facilitating the analysis of climate model data by providing a set of common standards regarding the formatting and availability of model output. Additionally, in order to enhance model comparability, all models participating in CMIP are required to run a set of standardised experimental setups (Diagnostic, Evaluation and Characterization of Klima experiments; DECK experiments) and a simulation of the historical climate from 1850 to 2014 (the historical simulations we also use in our analysis). CMIP is divided into different phases, reflecting the advancements of climate modelling; the current phase (CMIP6) started in 2016. More information on CMIP can be found in

In our analysis, we restrict the timeframe of the model data to the years 1930 to 2014, as in the observational data. We investigate monthly precipitation (the sum of convective precipitation and large-scale precipitation) in millimetres per month. We use bilinear interpolation to scale the reanalysis data to the grid of the atmospheric component of the climate model and take into account only those interpolated grid points that correspond to locations with given observed data, excluding the oceans and the regions with incomplete data mentioned above.

In this subsection, the time series of each spatial location (henceforth referred to as grid point) is investigated separately, and all operations and analyses described are therefore conducted for each grid point.
Since the focus of this work is not on evaluating the effects of longtime trends, we apply a seasonal–trend decomposition using Loess

The theoretical foundation for the application of the GEV distribution is as follows. For a random variable

To estimate the distribution parameters, we use the method of probability-weighted moments developed by

We also use the parametric bootstrap method with

To compare the performance of different CMIP6 models, we introduce average weighted quantile difference (AWQD) as a measure for the accuracy of the extremal precipitation. For this measure, the absolute differences between model and observational

To compare the spatial distributions of climate extremes, we introduce a hierarchical clustering algorithm (using average linking) to determine regions with similar extremal behaviour. This approach is similar to the idea proposed in

Assume that two real-valued random variables

Note that the extremal coefficient is invariant under rank transformations and that it does not depend on the values of the GEV parameters of the marginal distributions. In fact, in

To choose a suitable number of clusters, we consider an approach by

The empirical mean

Quantile–quantile (Q–Q) plots comparing the empirical mean values

We start with calculating the empirical mean and standard deviation of the annually maximised data for each grid point, as can be seen in Fig.

As pointed out by

When fitting the GEV distributions to the data and applying KS tests to check the goodness of fit, the hypothesis that the data follow a GEV distribution with the estimated parameters is not rejected for almost all grid points in both the observational and climate model data, except for in parts of the Sahara and some isolated points.

The

The estimated GEV parameters location

The three GEV parameters estimated are location, scale and shape, with location and scale very roughly corresponding to mean and variance, and the shape parameter yielding information about the degree of heavy-tailedness. The estimated parameter values are shown in Fig.

Difference between AWI-ESM model and observational GEV parameter estimates: location parameter

Difference between the

We apply the hierarchical clustering algorithms using the two dissimilarity measures

The number of clusters for the AWI-ESM climate model and observational data determined with the L Method (

The results of the L method seem to depend rather strongly on the data set investigated and the value of

To exemplify the differences and similarities in the clusterings, we have a closer look at Europe in the

Clustering of AWI-ESM model data

The average weighted quantile difference (AWQD) of the 27 CMIP6 models considered plotted against the model resolution (number of model grid points in units of

GEV parameters estimated using the EC-Earth3-Veg-LR climate model

Difference in the

The number of clusters for the EC-Earth3-Veg-LR climate model and the observational data determined with the L method (

Clustering of the EC-Earth3-Veg-LR model data

For the AWI-ESM, we calculated an AWQD of

We presented approaches and methods to validate climate model outputs by comparing their extremal behaviour to the extremal behaviour of observational data. To illustrate these methods, we compared precipitation extremes between the AWI-ESM and the CRU TS4.04 data set of reanalysed observations. After an analysis of empirical statistical parameters, we fitted the data to GEV distributions and analysed the differences in estimated parameters. Following this, we continued with an analysis of spatial concurrence of extremes based on a hierarchical clustering approach and a dissimilarity measure derived from bivariate copula theory. While the empirical statistics are similar for many parts of the world, we can also identify larger regions of overestimation and underestimation of empirical means and standard deviations by the climate model. These misestimations often go hand in hand with a similar misestimation of the standard deviation (heteroscedasticity), but for the standard deviation a stronger tendency for underestimation can be observed. Misestimations of mean and standard deviations translate into a misestimation of extreme values, and this can be confirmed by the comparison of the fitted GEV distribution parameters and the

The cluster analysis based on spatial dependencies and the occurrence of concurrent extremes shows that there is generally a good agreement between identified clusters. The number of clusters is also similar in general, with a slight tendency for a higher cluster number in the model data. Since it is mostly large-scale weather events and teleconnections contributing to concurrent climate extremes, this may indicate that the basic physical behaviour underlying them is in general well captured by the AWI-ESM. Further analyses can be conducted to investigate the reasons for different clusterings over selected regions in detail.

In addition to the AWI-ESM, several other CMIP6 models are also analysed. A comparison of the model accuracy, measured using an averaged quantile difference, shows a tendency for higher-dimensional models to capture extremal behaviour better.

In this work, a clustering algorithm based on bivariate extremal coefficients is used to perform a spatial analysis of extreme values. Extremal coefficients are also used to model multivariate spatial distributions of extremal precipitation using max-stable processes. This method was first developed by

The clustering approach presented here focuses on the comparison of extremal events at different locations, thereby supplementing the analyses of climate extremes that are often focused on extremes at a specific location

The CRU TS4.04 reanalysis data are available at

The supplement related to this article is available online at:

The initial concept was created by TD and GL. JC led the writing of the paper and implemented the statistical data diagnostics. TD contributed to statistical methodology. GL contributed to the climatological analysis. All authors read and approved the manuscript.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors are grateful to Manfred Mudelsee for constructive discussions and helpful suggestions. The authors also would like to thank

Justus Contzen is funded through the Helmholtz School for Marine Data Science (MarDATA) (grant no. HIDSS-0005). Gerrit Lohmann receives funding through “Ocean and Cryosphere under climate change” in the Program “Changing Earth – Sustaining our Future” of the Helmholtz Society and PalMod through the Bundesministerium für Bildung und Forschung (grant no. 01LP1917A).The article processing charges for this open-access publication were covered by the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research (AWI).

This paper was edited by Julia Hargreaves and reviewed by Qingxiang Li and Anna Kiriliouk.