Towards an objective assessment of climate multi-model ensembles – a case study: the Senegalo-Mauritanian upwelling region

Climate simulations require very complex numerical models. Unfortunately, they typically present biases due to parameterizations, choices of numerical schemes, and the complexity of many physical processes. Beyond improving the models themselves, a way to improve the performance of the modeled climate is to consider multi-model combinations. In the present study, we propose a method to select the models that yield a multi-model ensemble combination that efficiently reproduces target features of the observations. We used a neural classifier (self-organizing maps), associated with a multi-correspondence analysis to identify the models that best represent some target climate property. We can thereby determine an efficient multi-model ensemble. We illustrated the methodology with results focusing on the mean sea surface temperature seasonal cycle in the SenegaloMauritanian region. We compared 47 CMIP5 model configurations to available observations. The method allows us to identify a subset of CMIP5 models able to form an efficient multi-model ensemble. The future decrease in the SenegaloMauritanian upwelling proposed in recent studies is then revisited using this multi-model selection.


Introduction
In this study, we present a methodology aimed at selecting a coherent sub-ensemble of the models involved in the Climate Model Intercomparison Project Phase 5 (CMIP5) that best represents specific observed characteristics. While the future evolution of the global climate is subject to great changes and great uncertainty (Collins et al., 2014), the most com-mon way to predict the evolution of the climate is to run climate models that include fully coupled atmosphere-oceancryosphere-biosphere modules. Due to their low resolution and the fact that they use different parameterizations of the physics, use numerical schemes and sometimes include or neglect different processes, these models have some marked biases in specific regions. They also have different responses to an imposed increase in atmospheric greenhouse gases, which partly explain their mean climate biases. This variety of models allows us to assess the uncertainty of present climate representation when compared to observations and, by studying their dispersion, to roughly estimate the uncertainty of the response to future climate change.
For several generations of climate models, it has been shown that for a large variety of variables the multi-model average generally agrees better with observations of presentday climate than any single model (Lambert and Boer, 2001;Phillips and Gleckler, 2006;Reichler and Kim, 2008;Santer et al., 2009;Tebaldi and Knutti, 2007). Several studies also suggest that the most reliable climate projection is given by a multi-model averaging (Knutti et al., 2010), rather than, for example, averaging different projections performed with a single model run with different initial conditions. This result relies on the assumption that if choices of parameterizations or specific numerical schemes are made independently for each model, then the errors might at least partly compensate, resulting in a multi-model average that is more skillful than its constitutive terms (Tebaldi and Knutti, 2007). The significant gain in accuracy can be explained by the fact that the errors specific to each model compensate each other in the averaging procedure used to build the multi-model mean.
J. Mignot et al.: Towards an objective assessment of climate multi-model ensembles However, the number of general circulation models (GCMs) available for climate change projections is increasing rapidly. For example, the CMIP5 archive (Taylor et al., 2012), which was used for the fifth IPCC Assessment Report (Stocker et al., 2013), contains outputs from 61 different GCMs, and 70 contributions are expected for CMIP6. It thus becomes possible -and probably necessary -to select and/or weight the models constituting such an average. Recent work has suggested that weighting the multi-model averaging procedure could help to reduce the spread and thus uncertainty of future projections. Such an approach has been applied extensively to the issue of climate sensitivity (Fasullo and Trenberth, 2012;Gordon et al., 2013;Huber and Knutti, 2012;Tan et al., 2016). Valuable improvement of model selection has also been found in studies of the carbon cycle (Cox et al., 2013;Wenzel et al., 2014), the hydrological cycle (Deangelis et al., 2015;O'Gorman et al., 2012), the Antarctic atmospheric circulation (Son et al., 2010;Wenzel et al., 2016), extratropical atmospheric rivers (Gao et al., 2016), atmospheric and ocean heat transports (Loeb et al., 2015), European temperature variability (Stegehuis et al., 2013) and temperature extremes (Borodina et al., 2017).
The present paper works towards the elaboration of an objective method to select models according to their performance for a specific phenomenon. Here, we use the Senegalo-Mauritanian upwelling area as a case study to construct an efficient climate multi-model combination together with its related confidence interval in order to anticipate the effect of climate warming by the end of the century in this region. The Senegalo-Mauritanian upwelling has been the focus of increasing attention over recent years. The very productive waters associated with the upwelling have a strong economic impact on fisheries in Senegal and Mauritania and a crucial societal importance for local populations. It is therefore important to predict the evolution of the dynamics and the physics of the upwelling in the forthcoming decades, due to the effect of climate warming and its consequences for biological productivity, which may impact the fisheries. The Senegalo-Mauritanian upwelling lies at the southern end of the Canarian upwelling system, which has a relatively weak seasonality and is maximum in summer. By contrast, the Senegalo-Mauritanian upwelling presents a well-marked seasonal variability. Its intensity is stronger in boreal winter, and it disappears in summer with the northward progression of the intertropical convergence zone (ITCZ). Due to the enrichment of the sea surface layers with nutrients upwelled from deep layers, it drives an important phytoplankton bloom that is observed on ocean color satellite images (Demarcq and Faure, 2000;Farikou et al., 2015). The maximum intensity of this bloom occurs in March-April (Farikou et al., 2015;Faye et al., 2015;Ndoye et al., 2014). Its important seasonal cycle is also associated with mesoscale patterns whose variability has been recently studied by several oceanographic campaigns (Capet et al., 2017;Faye et al., 2015;Ndoye et al., 2014) and theoretical work (Sirven et al., 2019). Sylla et al. (2019) have recently shown that the intensity of the sea surface temperature (SST) seasonal cycle along the coasts of Senegal and Mauritania was a good marker of the upwelling in this specific region in climate models. They have used this index together with other more dynamical indices to predict that the upwelling will decrease by about 10 % of its present-day amplitude by the end of the 21st century. Nevertheless, their study also highlighted a large uncertainty due to model biases in this region. The method we have developed selects a subset of the CMIP5 ensemble based on the capability of the climate models to reproduce the SST seasonal cycle observed during the historical period in key sub-regions. These sub-regions are identified by a neural classifier. The method leads us to rank the different models and to determine an efficient multi-model combination for the analysis of the Senegalo-Mauritanian upwelling and projections of its behavior in global warming conditions.
The paper is structured as follows: Sect. 2 presents the different climate models and the climatological observations used in the study, together with the region of interest. The classification method is described in Sect. 3 and applied to the extended region. Section 4 presents a qualitative analysis able to group the different climate models into clusters presenting similar performances. Section 5 investigates the results of the method applied over a smaller area, more focused over the upwelling region. Section 6 uses the two multi-model clusters defined in Sects. 4 and 5, respectively, to tentatively predict the representation of the Senegalo-Mauritanian upwelling changes under global warming. Conclusions are given in Sect. 7.
2 Climate models and region of interest

Data
This study is based on the CMIP5 (Coupled Model Intercomparison Project Phase 5) database. We use the output of 47 simulations listed in Table 1. The models are evaluated over the historical period defined as  by comparing their output to observations. The mean seasonal cycle of SST anomalies over this period is constructed for each model grid point as the difference between the monthly mean temperature and the mean annual temperature. When several members of historical simulations are available for a specific model configuration, they are averaged together. However, this has practically no impact on the estimated mean seasonal cycle (not shown). The mean climatological cycle of the CMIP5 models under study is evaluated against the Extended Reconstructed Sea Surface Temperature data set (ERSST-v3b, Smith et al., 2008), averaged over the same time period. This data set was produced by NOAA at 2 • spatial resolution. It is derived from the International Comprehensive Ocean-Atmosphere Dataset with missing data filled in by statistical methods. This data set is used as the target to be reproduced and is denoted "observation field" hereafter. In order to deal with data at the same resolution, all model outputs as well as the observation fields were regridded on a 1 • resolution regular grid prior to analysis. A previous study (Sylla et al., 2019) has compared the performance of this data set as compared to the gridded SST data set from the Met Office Hadley Centre HadISST (Rayner, 2003). The main results regarding the future of the upwelling were shown to be independent of the validation data set, primarily because the models' biases and the inter-model differences were much larger than the differences between the validation data sets. The methodological and oceanographic results presented in this study are thus expected to depend only very weakly on the target data set.
In Sect. 6, the model selections are used to characterize the response of the upwelling to climate change. This response is characterized in terms of SST anomalies as well as wind intensity. For wind intensity, the simulated wind stress is compared to the TropFlux reanalysis. This data set combines the ERA-Interim reanalysis for turbulent and longwave fluxes and ISCCP (International Satellite Cloud Climatology Project) surface radiation data for shortwave fluxes. This wind stress product is described and evaluated in Praveen Kumar et al. (2011).

The Senegalo-Mauritanian upwelling region
In this study, we evaluate the ability of the different climate models to represent the Senegalo-Mauritanian upwelling. Following Sylla et al. (2019), we consider the intensity of the seasonal cycle of the SST anomaly to be a marker of the upwelling variability and localization. This variable is shown in Fig. 1 for the eastern tropical Atlantic. This figure confirms that the Senegalo-Mauritanian coast stands out with a very strong seasonal SST cycle as compared to similar latitudes in the open ocean. This results from the cold SST generated by the strong winds occurring in winter. The Senegalo-Mauritanian upwelling is confined in a small region on the order of 100 km off the western coast of Africa. It is part of a complex and fine-scale regional circulation system recently revisited by Kounta et al. (2018). Since the grid mesh of most of the climate models is on the order of 1 • (∼ 100 km), this regional circulation is poorly resolved, which favors a relatively large-scale analysis of the upwelling representation in climate models. The Senegalo-Mauritanian upwelling is also embedded in a large-scale oceanic circulation pattern, encompassing the North Equatorial Counter Current flowing eastward in the southern part of the region and the return branch of the subtropical gyre in the northwestern part. Therefore, we firstly study the representation of the SST seasonal cycle intensity in the different climate models over a relatively large region that includes part of the Canary Current in the north and the Guinea Dome in the south. The socalled "extended region" is defined by a rectangular box extending from 9 to 45 • W and from 5 to 30 • N (Fig. 1). In a second step, we will proceed to the same analysis and classification of the models within a much more focused (hereafter zoomed) region, namely [16-28 • W and 10-23 • N] (Fig. 1). All the results below will be first shown for the extended region. Comparison with the focused region will be done in Sect. 4.

Comparing observations and models: a methodological approach
The methodology we have developed is based on the ability of the climate models to adequately reproduce the climatology of the seasonal cycle of the SST anomalies as observed during the last 3 decades in key sub-regions of the studied domain. These key sub-regions are determined from the similarity of their physical and statistical characteristics through an unsupervised classification, which clusters pixels with similar observed seasonal SST climatology. We chose to deal with a neural classifier, the so-called self-organizing map (SOM hereafter) developed by Kohonen (2013), followed by a hierarchical ascendant clustering (HAC, Jain and Dubes, 1998). This method leads to a dynamically interpretable classification. The SOM determines a vector quantization of the data set, which compresses the initial data set into a rela- tively small number of reference vectors. Doing so allows us to take the nonlinearities of the data set into account and to filter the noise, which can make the classification spurious. This reduced number of dataset vectors enables an HAC to determine the highly nonlinear borders between the different SOM clusters. This procedure has been used with success in several studies (Farikou et al., 2015;Jouini et al., 2016;Niang et al., 2003Niang et al., , 2006Sawadogo et al., 2009). Note that the use of an HAC directly on the initial data set would not be efficient in the present study because the number of degrees of freedom (here the grid points of the initial domain) is too large for this method to work efficiently. In the present section, we describe the methodology we developed to score the different climate models with respect to the observations. In Sect. 4, we will tentatively group the different climate models into blocks with the same behavior by using a multiple correspondence analysis (MCA).

The unsupervised classification method
The first step of the methodology was to decompose the selected region into different classes (the key sub-regions mentioned above) by using a neural network classifier, the so-called SOM (Kohonen, 2013). This algorithm constitutes a powerful nonlinear unsupervised classification method. It has been commonly used to solve environmental problems (Hewitson and Crane, 2002;Jouini et al., 2013Jouini et al., , 2016Liu et al., 2006;Reusch et al., 2007;Richardson et al., 2003). The SOM aims at clustering vectors (here the 12 SST seasonal anomalies) of a multidimensional database (D) (here the grid points of the studied domain) into classes represented by a fixed network of neurons (the SOM). The SOM is defined as an undirected graph, usually a two-dimensional rectan-gular grid. This graphical structure is used to define a discrete distance (denoted by δ) between the neurons of the map and thereby identify the shortest path between two neurons. Moreover, the SOM enables the partition of D in which each cluster is associated with a neuron of the map and is represented by a prototype that is a synthetic multidimensional vector (the referent vector w). Each vector z of D is assigned to the neuron whose referent w is the closest, in the sense of the Euclidean norm (EN), and is called the projection of the vector z onto the map. A fundamental property of a SOM is the topological ordering provided at the end of the clustering phase: two neurons that are close on the map represent data that are close in the data space. In other words, the neurons are gathered in such a way that if two vectors of D are projected onto two "relatively" close neurons (with respect to δ) on the map, they are similar and share the same properties. The estimation of the referent vectors w of a SOM and the topological order is achieved through a minimization process using a learning dataset base, here from the observations. The cost function to be minimized is of the form where c ∈ SOM indices the neurons of the SOM, χ is the allocation function that assigns each element z i of D to its referent vector w χ(z i ) and δ(cχ (z i )) is the discrete distance on the SOM between a neuron c and the neuron allocated to observation z i . K T a kernel function parameterized by T (where T stands for "temperature" in the scientific literature dedicated to SOM) that weights the discrete distance on the map and decreases during the minimization process. At the end of the learning process, the classification can be visualized onto the SOM and interpreted in terms of geophysics.

Classification of the observations
In the present problem we chose to classify the annual cycles of the SST seasonal anomalies observed in the Senegalo-Mauritanian upwelling. The study was made on the "extended region" constituted of 25 × 36 = 900 pixels, but this enlarged region covers a part of the African continent, and 157 pixels are in fact over land. That means that we have truly 743 ocean pixels to deal with. We consider a time period of 30 years [1975 to 2005] extracted from the ERSST-V3b database. For a given grid point and a given year and month, the monthly anomaly is the SST of the pixel for which we have subtracted the mean of the considered year. The climatological mean of the anomaly is then computed for each grid point by averaging each climatological month over the 30 years. Thus, the learning data set D is a set of 743 12-component vectors z, each component being the mean monthly anomaly computed as above. We denote as "SST seasonal cycle" the vector z in the following. We used a SOM to summarize the different SST seasonal cycles present in the "extended region". We found that 120 prototypes (or neurons) can accurately represent the 743 vectors of D. This reduction (or vector quantization) is made by using a rectangular SOM of 30 × 4 neurons.
We then reduced the number of neurons in order to facilitate their interpretation in terms of geophysical processes. For this, we applied a HAC using the Ward dissimilarity (Jain and Dubes, 1988). We grouped the 120 neurons of the SOM into a hierarchy that can contain between 1 and 120 clusters. Then the different classifications proposed by the HAC were applied to the geographical region: each SST seasonal cycle of each grid point of the region is assigned to a neuron and consequently to a cluster (assignment process), thereby defining the so-called region clusters. The problem is then to choose a number of clusters that adequately synthesizes the geophysical phenomena over the region. This was done by looking at the different possible classifications and choosing one representing the major characteristics of the upwelling region. In Fig. 2a, we observe that when we partition the SOM into seven clusters, the associated seven region clusters are constituted of contiguous pixels in the geographic map and that two clusters (6, 7) are within the upwelling region and present a well-marked seasonal cycle. For each region cluster, we estimated the monthly mean of the SST seasonal cycle and the associated spread captured by the neurons constituting this region cluster.
The typical SST climatological cycles for each region cluster are presented in Fig. 2b together with their related error bars. We note that the region clusters are well identified, their typical climatological annual cycles of SST being well separated. Furthermore, the seven region clusters are spatially coherent and have a definite geophysical significance.
For the extended region under study, 7 therefore appears to be an adequate cluster number, since this number balances a clear partition of the clusters on the HAC decision tree with a clear physical significance to each region cluster. The Senegalo-Mauritanian coastal upwelling is associated with clusters 7 and 6. Cluster 2 corresponds to deep tropical waters associated with the equatorial countercurrent. Cluster 1 corresponds to surface waters of the Gulf of Guinea. Cluster 3 corresponds to the offshore tropical Atlantic, and cluster 5 has extratropical characteristics. Cluster 4 is a transition between 3 and 5. As expected, the equatorial regions (clusters 1 and 2) have a very weak seasonal cycle, which increases towards the extratropics (clusters 3 to 5). The upwelling regions (clusters 6 and 7) are characterized by an exceptionally strong seasonal variability.

Classification of the climate models on the extended upwelling region
The aim is now to find the model(s) that best fit the "observation field". A heuristic manner is to compare the pattern of the different region clusters of the CMIP5 models with respect to those of the "observation field" through a sight-evaluating process. This kind of approach has been proposed in Sylla et al. (2019), and we indeed immediately see that some models better fit the "observation field" than others. Nonetheless, this method remains very subjective.
In the following, we present a more objective approach. We use the previous classification to objectively estimate how each CMIP5 model fits the "observation field" and its seven region clusters. For this, we projected the SST annual cycle of each CMIP5 model grid point of the extended region onto the SOM learned with the observations (Sect. 3.2) using the assignment procedure described in this section. Each grid point thus corresponds to a cluster of the SOM and is represented on the geographical map by its corresponding color. Doing so, we can represent each CMIP5 model by the geographical pattern of the seven clusters partitioning the SST seasonal cycle of its grid points. The geographical maps representing the 47 models and their associated clusters are plotted in Fig. 3. This graphical visualization is easier to compare than the original characteristics (amplitude and phase) of the annual cycle at each grid point of a model since each grid point can only take one discrete value among seven. This representation immediately allows identification of the model biases and the models that best reproduce the cluster regions identified in the observations. A huge amount of information could in principle be extracted from these maps, both from individual modeling groups, to understand the representation of this region by the models and the origins of possible biases, and from experts of the area, to understand the difficulties of the climate models in representing the SST seasonal cycle in this region.
For a more quantitative assessment, we counted the number of grid points of a region cluster for a given CMIP5 model matching the same region cluster of the "observation field". We then computed the ratio between that matching number and the number of pixels of the region cluster of the considered model. That number is noted in the color bar for each region cluster in Fig. 3. We denote R m,i the ratio for the region cluster i and the model m, where i = 1, . . ., 7 is the number of the region cluster and m = 1, . . ., 47 is the number of the model (see Table 1). We note that R m,i ≤ 1. Doing so, each model m is represented by a seven-dimensional vector R m , each component being the ratio of a region cluster. We estimated the total skill of a model by averaging the seven ratios. Note that this procedure gives the same weight to each region cluster whatever its number of grid points and its proximity to the upwelling region. In the following, the skill is presented as a percentage: the higher the skill, the better the fit. In Fig. 3, the 47 CMIP5 models are ranked by their total skill, which is indicated above each panel beside the model name. The model skills are very diverse, ranging from 79 % to 28 %. This figure also shows that the models presenting the best total skill are also those representing thoroughly the upwelling region. Some models represent the large-scale structure in the eastern tropical Atlantic (Region-clusters 3, 4, and 5) very well, but not the upwelling (33-GISS-E2-R and 34-   (Table 1), and its skill given as a mean percentage (see text). The models are ordered according to their skill in decreasing order. The seven region clusters (or SOM clusters) are defined by applying an HAC to the SOM output learned with the observation field. They are represented by different colors. The numbers in the color bar on the right of each panel represent the skill for each region cluster. The observation field is shown in the bottom right panel and the numbers in front of the color bar reference the region cluster.
GISS-E2-R-CC, for example). Others represent pretty well the upwelling region clusters (Region-clusters 6 and 7), but not the large-scale structures of the SST seasonality (13-CSIRO-Mk-3-6-0 and 6-CMCC-CESM, for example). None of these models is ranked among the best models, with a score greater than 60 %. As indicated above, this representation gives a very synthetic view of the structure of the sea-sonality of the SST cycle in each of the models, potentially a very useful guide for climate modelers to identify rapidly major biases.

Qualitative analysis of the climate models
In order to further progress in the selection of the models, the 47 climate models and the observation field were then analyzed by using an MCA. MCA is a multivariate statistical technique that is conceptually similar to principal component analysis (PCA) but applies to categorical rather than continuous data. Similarly to PCA, it provides a way of displaying a set of data in a two-dimensional graphical form.
In the following, we apply a MCA to the (47, 7) matrix R = [R m,i ] whose elements represent the skills of the clusters of the models shown in front of the color bars in Fig. 3: the rows m represent the 47 different models, the columns i the seven region clusters. The MCA, like the PCA, projects the initial matrix onto a new basis in such a way that the new axes are the matrix eigenvectors (PC), the inertia of each axis being the corresponding eigenvalues. According to the theory, the MCA matrix analysis of R gives i − 1 = 6 independent PCs. Each model is thus now associated with a sixdimensional vector on which it has a specific weight. The MCA uses for this analysis the χ 2 distance. In Fig. 4, we present the projection of the models and the "region clusters" in the plane formed by the two first axes (x = PC1 and y = PC2) of the MCA. These two axes represent 70 % of the total inertia. Each model is represented by a small circle and each region cluster by a purple square. We also projected the observation field (green diamond) onto that plane. To have a more precise view of the topology, it would be necessary to consider the projection onto the five other PCs, which represent 30 % of the inertia.
In the (PC1, PC2) plane, the shorter the distance between two models, the more similar the distribution of their regioncluster skills. Proximity between a model and a region cluster leads us to affirm that this region cluster is well represented by that model. Clearly, some models adequately represent the southern part of the extended region (Region-clusters 1, 2 or 3), where the SST seasonal cycle is weak, and are very distant from the upwelling regions (Region-cluster 6 and Regioncluster 7) whose large SST cycle is poorly reproduced. In this group of models, one recognizes the model 16-IPSL-CM5A-MR, at the extreme bottom of Fig. 4, close to Region-clusters 4 and 5, consistently with Fig. 3. At the other end of this group of models, model 23-HadCM3 for example is located very close to Region-cluster 1. Figure 3 indeed shows that most of its grid points over the region of interest have a seasonal cycle resembling the one found in the offshore tropical ocean. Another group of models is located in the center of this plan, thus at an optimal distance of each of the observed region clusters, and not far from the overall position of the observations (diamond). We recognize in this group of models those that have a high skill in Fig. 3. The positioning of the observations (green diamond in Fig. 4) with respect to the models indeed allows selection of those that best represent the observation field. The representation given in Fig. 4 allows understanding of the drawback of the different models with respect to the seven modes of SST cycles.
As indicated in the introduction, the main objective of the methodology is to select an ensemble of models that represents at best the upwelling behavior with respect to the observations and to use this ensemble to predict the impact of climate change on the Senegalo-Mauritanian upwelling with some confidence. The problem is now to determine a subset of models which has a better skill than Model-All, in other words minimize the distance to the observations. As the number of models is small enough, we chose to cluster them by an HAC according to their projections onto the six axes provided by the MCA and select the optimal jump in the hierarchical tree (Jain and Dubes, 1988). We recall that the HAC (hierarchical ascending clustering) is a bottom-up algorithm for dataset clustering. The key operation in hierarchical bottom-up clustering is to repeatedly combine the two nearest (according to a certain distance) clusters into a larger cluster. The HAC starts from individuals and combines them according to their similarity (with respect to the chosen distance) to obtain new clusters. The process is repeated up to get one cluster only (the full data set). This algorithm is visualized through a tree-like diagram, the so-called connection tree or dendrogram: the branches of the connection tree represent the connections between the clusters (Fig. 5). From Fig. 5, we obtain four homogeneous groups which are well separated (group 1, 2, 3, and 4). They are plotted with different colors in Fig. 4. We denote as Model-group 1, Modelgroup 2, Model-group 3, and Model-group 4 these multimodel ensembles hereafter. Note that Fig. 4 shows the projection of the individual models onto the first two axes of the MCA. The fact that only two axes are shown here can introduce some bias into the visualization, and this figure must be considered with some caution.
Through MCA+HAC, we thus grouped the models into clusters, using the χ 2 distance, according to their proximity to the observations and their internal similarity. For each group, we computed a multi-model average whose outputs are the mean of the outputs of its different members, and we analyzed it according to the same procedure (projection of the SST seasonal cycle and assignment to a region cluster) used for each individual model. In addition, we introduced the full multi-model average (Model-All in the following), which is the multi-model ensemble, which averages the 47 CMIP5 model outputs. Model-All was also projected in the MCA plane, and it is represented by a red star in Fig. 4. Comparison of the four model groups with Model-All and the observations are presented in Fig. 6. This figure visually highlights the dominance of Model-group 4 for the reconstruction of the SST seasonal cycles of the different region clusters for the extended region. This is particularly clear for Region-clusters 6 and 7, which are those located in the upwelling region (Fig. 2). Model-group 3 seems to group models characterized by an equatorward shift of the main structures, since Region-cluster 1 of tropical waters is not repro- Figure 4. Projection of the CMIP5 models (colored circles) and the observation field (green diamond) defined by their cluster skill vectors onto the first two axes of the MCA. The seven region clusters of the observation field are represented by purple squares. The colors of the circles denote the four groups of models obtained after an HAC was performed on the seven MCA components of the models. The projection of the full multi-model mean (47 models) is represented by a red star. We note that some bias can be introduced into this projection since the projection onto the other axes can be of importance. duced and Region-clusters 4 and 5 of extratropical waters are overestimated. Figure 4 indeed shows that this model group is very close to Region-clusters 4 and 5, which correspond to the extratropical and transition geographical regions. Modelgroup 2 misrepresents the region of the Canary upwelling. Model-group 1 overestimates the SST seasonal cycle in all the tropical open Atlantic. These two last model groups overestimate Region-cluster 1, again consistently with their position in Fig. 4. A detailed physical interpretation of the model groups is nevertheless beyond the scope of this paper. Clearly Model-All represents the SST seasonal cycle of the offshore ocean, but it proposes a very poor representation of the upwelling region.
Two models (models 7 and 25) have a better skill than Model-group 4 and Model-All. These two models are very close to the observations on the first two axes of the MCA (Fig. 4). It is easily seen that Model-group 4 and the projection of Model-All onto this plane are farther than that of model 7 and model 25 from the observation projection. This explains the lower performance of these two multi-models as compared to models 7 and 25. In the present case, the method permits one to determine the best models (model 7 and model 25) and to outline the best multi-model (Modelgroup 4) whose skill is better than any model with a probability of 95 % (number of models whose skill is smaller than the skill of Model-group 4 with respect to the total number of models). Projection of the models onto the other planes of the MCA should confirm this interpretation. One could then question the use of Model-group 4 rather than model 7 or model 25 individually. Furthermore, we argue that multimodel averages are in general more robust for climate studies than the use of a single model that can have good performance for a very specific set of constraints but not for neighboring ones. The following section will partly justify this point.

Analysis of the climate models over a zoomed upwelling region
The classification presented above relies largely on the ability of the models to represent the offshore seasonal cycle of the SST. In the following, we propose to test the classification over a much more reduced area in order to focus the analy- Figure 5. HAC dendrogram. The horizontal line displays the 47 CMIP5 models, each model being associated with its seven-component skill vector. As the dendrogram represents a hierarchy of clusters, the numbers on the y axis give the distance between two clusters. We note an optimal "jump" on this graph: the level 1.5 on the vertical axis (materialized by a horizontal black line) is associated with four well-separated clusters corresponding to four model groups that are very different. sis on the upwelling area. This "zoomed upwelling region" is shown in Fig. 1. As for the extended region, we partitioned the observations of the zoomed upwelling region with a SOM (ZSOM in the following) followed by a HAC. We then applied a new MCA to regroup the climate models. We did a similar analysis to that performed in Sect. 4. We obtained four new well-separated region clusters denoted ZRegion clusters. Figure 7 shows the four ZRegion clusters obtained from ERSSTv3b observations together with their associ-ated mean SST seasonal cycle. Again, the ZRegion clusters are spatially coherent. The upwelling area is now decomposed into three ZRegion clusters (ZRegion-clusters 2, 3, and 4). This new decomposition thus refines the study performed for the extended region: ZRegion-cluster 1 represents the offshore ocean; its grid points typically have a SST seasonal cycle amplitude of 4 • C, very similar to Regioncluster 4 in the classification performed over the extended region (Fig. 2). ZRegion-cluster 4 identifies the core of the Senegalo-Mauritanian region, with grid points characterized by the greatest amplitude of the SST seasonal cycle of the domain, typically 6.5 • C. It is interesting to note that an additional upwelling ZRegion cluster (ZRegion-cluster 3) appears south of ZRegion-cluster 4. Indeed, several studies have shown that the Cape Verde peninsula, located around 15 • N, separates the upwelling region into two distinct areas with a different behavior north and south of this peninsula (Sirven et al., 2019;Sylla et al., 2019). The location of the separation between ZRegion-clusters 3 and 4 is determined with some uncertainty due to the coarse resolution (1 • ) of the ocean models. ZRegion-cluster 3 is marked by a time shift of the seasonal cycle: the warmest season seems to occur approximately 1 month earlier than in the other regions, as clearly seen in Fig. 7a (yellow curve in June). Due to a classification using a much larger region, such a characteristic does not appear in the extended area study. The physical interpretation of the SST seasonal cycle of this ZRegion cluster is beyond the scope of the present study, but one can suspect a role of the ITCZ seasonal migration covering these grid points earlier than further north. Finally, ZRegion-cluster 2 is a transition between the large-scale ocean and the upwelling region.
As for the extended region, we applied a MCA to the (47 × 4) matrix R = [R m,i ] whose elements represent the skills of the four clusters (i) of the 47 models. This MCA was followed by a HAC leading the definition of five ZModel groups. The members of each group are given in the Appendix. Figure 8 shows the ZRegion cluster obtained in the zoomed area by projecting these five ZModel groups and Model-All onto the ZSOM and their associated performances. ZModel-group 1 is the worst performing one: only 25 % of the grid cells fall into the same class as for the observations. The structure of this model group shows that it is characterized by a homogeneous amplitude of the seasonal cycle over the whole domain, suggesting a largely reduced upwelling: only one grid point at the coast has an enhanced SST seasonal cycle as compared to the large-scale tropical ocean. ZModel-group 2 is the best performing one: 66 % of the grid points are assigned to the correct class and the general picture indeed represents a four-class picture fairly consistent with the observed structure (Fig. 7). Important biases yet remain. In particular, ZRegion-clusters 2 and 4 characterizing the upwelling extend too far offshore. The three other ZModel groups are intermediate. A relatively reduced upwelling area, with an underestimated SST seasonal cycle, characterizes ZModel-groups 3 and 4. ZModel-group 5 corresponds to a shift of the upwelling region towards the north. Model-All also shows a strongly reduced seasonal cycle, with a large number of pixels in the intermediate ZRegioncluster 3 and very few in ZRegion-cluster 4. ZRegion cluster 3, representing the southern part of the Senegalo-Mauritanian upwelling, does not appear in the pattern of Model-All.
It is notable that all the models forming ZModel-group 2 are included in Model-group 4. For a more precise assessment, we can also project the entire Model-group 4, identified as the best multi-model ensemble over the extended region, onto the ZSOM (Fig. 9b). We notice that the performance of Model-group 4 remains high on this projection, indicating some robustness of this multi-model ensemble. Moreover, this ensemble now outperforms the single best model identified over the extended region (Fig. 9a). This result gives further confidence in the use of multi-model averages, illustrating that one single model can be very skillful over a specific region or for a specific analysis, but multi-model averages are more robust across various analyses and/or regions.
6 Impact of climate change on the Senegalo-Mauritanian upwelling 6.1 Representation of the upwelling in the CMIP5 climate model clusters In this section, we compare the representation of the Senegalo-Mauritanian upwelling system given by the two best model groups identified above (model group 4 and ZModel group 2). For this evaluation, we use two of the five indices used by Sylla et al. (2019) to evaluate the full database, namely the intensity of the SST seasonal cycle and the offshore Ekman transport at the coast. The former is specific to the seasonal variability of the Senegalo-Mauritanian upwelling system, and it has been used for the classification. The latter is more general and, although it has recently been shown to partly represent the volume of the upwelled waters (Jacox et al., 2018), it is extensively used in the scientific literature to characterize upwelling regions (Cropper et al., 2014;Rykaczewski et al., 2015;Wang et al., 2015). Note also that, following Sylla et al. (2019), evaluation is performed on the period . This period slightly differs from the classification period, but the SST seasonal cycle is not significantly different (not shown). Figure 10 compares the amplitude of the SST seasonal cycle as represented in the observations, Model-All, Modelgroup 4 and ZModel-group 2 identified above. Consistently with Figs. 6 and 8, Model-All dramatically underestimates the upwelling signature in terms of the SST seasonal cycle as compared to the observations. Model-group 4 and ZModelgroup 2 yield improved results: the area of an enhanced SST seasonal cycle is larger in both latitude and longitude, with  stronger SST amplitude values. This confirms the efficiency of the selection operated above. Nevertheless, ZModel-group 2 yields a realistic SST amplitude pattern along the coast, but it extends too far offshore. Furthermore, in ZModel-group 2, the subtropical area (in green in Fig. 10) extends too far towards the south, in particular in the western part of the basin. The tropical area, characterized by limited amplitude of the seasonal cycle of SST (deep blue in Fig. 10), is shifted to the south as compared to the observations. In other words, the large-scale thermal -and thus probably dynamical -structure of the region is poorly represented in ZModel-group 2. Finally, Model-group 4 is the least biased one.
The intensity of the wind stress parallel to the coast, inducing offshore Ekman transport and consequently an Ekman pumping at the coast, is generally considered to be the main driver of the upwelling. We therefore also tested the representation of this driver in the different model groups. The idea is to evaluate the impact of the model selection performed above on the representation of an independent variable by the model groups. Figure 11 shows the latitude-time evolution of the meridional oceanic wind stress, considering that the coast in the studied region is oriented approximately meridionally, so that the offshore Ekman transport is mainly zonal. Note that in Fig. 11, southward winds have positive values, so that they correspond to a westward Ekman transport fa-  vorable to upwelling. Panel a shows that the observed meridional wind stress is, all year long, favorable to the upwelling north of 20 • N. At these latitudes, the meridional wind stress is stronger in summer. Conversely, between 12 and 20 • N, in the latitude band of the Senegalo-Mauritanian upwelling, the wind blows southward with a very weak intensity in summer, and it even changes direction in the southern part of this latitude band. It is favorable to the upwelling in winterspring, which explains why the Senegalo-Mauritanian upwelling occurs during this season with a maximum of intensity in March-April (Capet et al., 2017;Farikou et al., 2015).
The main bias of Model-All (Fig. 11b) is due to the fact that the wind stress never reverses between 12 and 20 • N. It weakens in the southern part of the Senegalo-Mauritanian latitude band, i.e., south of the Cape Verde peninsula (15 • N), but does not become negative. North of the Cape Verde peninsula, it also blows from the north in summer, so that the Senegalo-Mauritanian upwelling lacks seasonality. This bias is corrected in Model-group 4 and ZModel-group 2 ( Fig. 11c and d) that are, in this aspect, more realistic than Model-All. Model-group 4 shows a slight extension of the time and latitude range where the oceanic wind stress reverses sign. This constitutes an improvement. The southward wind is nevertheless too strong in winter on the [12-20 • N] latitude band as well as further south from December to March. These two remaining biases are further reduced in ZModel-group 2. This latter model yields the most realistic seasonal cycle of meridional oceanic wind stress on the latitude band under study. This is consistent with a very localized model selection, as the wind index is itself localized along the coast.
To conclude, Model-group 4 and ZModel-group 2 perform in general better than Model-All in reproducing the major, characteristic features of the Senegalo-Mauritanian upwelling. This result confirms the relevance of the multimodel selection we have presented above. Applying the methodology over a relatively large region allows better constraint of the spatial extent and pattern of the SST signature of the upwelling than the reduced area. The latter however yields a better representation of the wind seasonality along the coast.

Response of the Senegalo-Mauritanian upwelling to global warming
In this section, we examine the response of the upwelling system given by the different multi-model groups we selected to global warming. For this, we compared the two indices analyzed above in present-day and future conditions. The present-day conditions are taken as above as the climatological average of historical simulations over the period . The future period is taken as the climatological average of the RCP8.5 scenario over the period . Figure 12 shows the difference of the SST seasonal cycle amplitude between these two periods. The general behavior is that the SST cycle amplitude will reduce in the upwelling region. Sylla et al. (2019) showed that this is primarily due to a warming of the winter temperature, thus suggesting that the upwelling signature at the surface will reduce. On the other hand, this figure shows that the upwelling signature will increase along the Canary Current, which flows along the coast of Morocco, as well as in the subtropical part of our domain. This behavior is observed in the three multi-model ensembles. However, the two selected model groups suggest a weaker decrease in the SST seasonal cycle in the upwelling region than the one given by Model-All. ZModel-group 2 shows an even weaker decrease mainly confined in the southern part of the upwelling region. This result echoes findings of Sylla et al. (2019) based on another indicator of the upwelling imprint on the SST: they showed that the difference between the SST at the coast and offshore is expected to decrease more in the southern part of the Senegalo-Mauritanian upwelling system (SMUS) than in the north. We hypothesize that the study conducted on the reduced area permits separation of the Senegalo-Mauritanian upwelling system into two clusters, a northern one (ZRegion 4) and a southern one (ZRegion-3) (Fig. 8), which enables us to distinguish this specific response.
The meridional wind stress also generally weakens under climate change in the [12-20 • N] latitude band (Fig. 13), suggesting a general reduction of the upwelling intensity. From December to March, this is particularly true in the southernmost region of the Senegalo-Mauritanian band, consistent with the results of Sylla et al. (2019). The wind pattern inferred from the two model groups (Fig. 13b and c) present a higher seasonal variability than those of Model-All (Fig. 13a). The winter reduction of the southward wind stress is slightly more confined to the southern region in ZModel-group 2, especially at the end of the upwelling season (March-April), when the upwelling intensity is the strongest. This may be consistent with the reduced seasonal cycle in the southernmost part of the upwelling identified above.

Discussion and conclusion
This paper proposed a novel methodology for selecting efficient climate models in a specific area with respect to observations and according to well-defined statistical criteria. The present study has specifically focused on the ability of climate models to reproduce the ocean SST annual cycle observed in specific sub-regions of the studied domain during the period 1975-2005 as reported in the ERSST_v3b data set. These sub-regions were defined by a neural classifier (SOM) as clusters with similar seasonal SST cycle anomalies with respect to some statistical characteristics and were therefore named region clusters. They correspond to ocean areas with well-marked oceanographic specificities.
We then checked the ability of the different climate models to reproduce the region clusters defined on the observation data set with a SOM. The better a climate model fits the clusters computed with the SST observation, the higher the skill of the model. To evaluate this, we defined geographical regions in the different CMIP5 climate models by projecting the SST annual cycle anomalies of each model grid point onto the SOM. Each grid point is associated with a cluster on the SOM and consequently with a region cluster on the geographical map. We built a similarity criterion by counting the number of grid points in a region cluster of a given model matching the same region cluster defined by processing the observation field. We then computed the ratio between that matching number and the number of pixels of the region cluster of the model under study. We estimated the total skill of a model by averaging the seven ratios associated with the seven region clusters. Note that this procedure presents the advantage of giving the same weight to each region cluster whatever its number of grid points and its proximity to the upwelling region. This procedure respects the clustering done by the SOM since the different clusters have an equal weight in the skill computation. In its present definition, the total skill is a number between 0 and 1: the higher the skill, the better the fit. Other measures of the total skill of a model  Evolution of the amplitude of the SST seasonal cycle at the end of the 21st century. The figure shows the difference between the seasonal cycle amplitude averaged over the period (2080-2100) following the RCP8.5 scenario and the amplitude averaged over the period  in the historical simulations. A positive value (red) means that the seasonal cycle is more marked over the period 2080-2100. group could nevertheless be defined depending on the objective of the study. One may compare the skill of individual models over a specific region cluster of interest or analyze the pattern of skill in one specific model and its sensitivity to possible various parameterization schemes. The extraction of information embedded in the vector skill whose seven components are the skills associated with the seven sub-regions and the resulting efficient multi-model combination imply the use of advanced statistical tools such as the MCA. Moreover, the vector skill also contains information behavior of models in terms of large offshore ocean circulation as well as in the upwelling region. It could thus be used to diagnose the deficiencies of some climate models with respect to the modeling of physical processes. Another contribution of the MCA is the visualization of the 47 models and the observations on the plane constituted by the first two MCA axes, which represents 70 % of the information embedded in the data. The similarities of the climate models with respect to the observations and the region clusters can be clearly visualized. The "mean" skill associated with each climate model and proposed in this study is easy to use but is far less infor- mative than the vector skill whose seven components are the skills associated with the seven sub-regions.
Such a multi-model ensemble selection allows sampling of a set of models in order to obtain a more realistic climatology over the region of interest. The response of the upwelling to climate change given by the different multi-model ensembles is quite robust in the sense that they give similar qualitative answers. However, a too-selective ensemble of models may lead to noisy patterns. A compromise thus has to be found: a large number of models leads to smoothed biases and unrealistic patterns but also damps the characteristics of the selection. On the other hand, selecting the most realistic models may yield spurious biases in the ensemble mean.
As discussed in the introduction, different criteria have been used for extracting some efficient models from the CMIP5 models used for climatic studies. The most common parameter is the average annual surface mean temperature of the grid points of the region under study. However, Knutti et al. (2006) used the seasonal cycle in surface temperature, represented by the seasonal amplitude calculated as summer June-August (JJA) minus winter December-February (DJF) temperatures. This criterion is more informative than the annual mean temperature since the amplitude of the seasonal variability is an important criterion characterizing the validity of a climate model. In the present work, we used a more informative criterion which is formed from the monthly temperature cycle anomaly represented by a 12-component vector, each component representing the average monthly temperature of the year we consider. This new criterion allows account to be taken of the amplitude and phase of seasonal variability, while the Knutti et al. (2006) criterion only takes into account the amplitude of the seasonal variability. Note however that it implies a good geophysical knowledge of the region under interest, in order to determine the rele-vant region clusters after the SOM. It is also very specific to the Senegal-Mauritania upwelling region. Furthermore, Sylla et al. (2019) extensively discussed the possible differences among several indices, aiming at characterizing the upwelling and the need to use some of them to have a complete understanding of this coastal phenomenon. This conclusion is probably general to any physical process of the climate system. In the present study, the model selection is based on only one signature of the SMUS. Several possibilities can be envisaged to improve the resolution of this problem, such as merging several indices like SST, temperature at several depths, wind vector or ocean currents. This approach could also allow a selection of models based on the representation of several distinct regional behaviors. In spite of several subjective choices, including the studied domain and the statistical metrics, we argue that this method is a step towards an objective selection of models, based on a quantitative assessment rather than a qualitative analysis of maps of performance.
The methodology is general and can be adapted to any climate or oceanographic phenomenon. Different applications of the multi-model selection strategy proposed in the present study can also be envisaged. Firstly, from a purely modeling point of view, the projection of the models onto the SOM (or ZSOM) and the results of the HAC yield a very enlightening description of a given model behavior in terms of region clusters of the area under study. Such a procedure could advantageously be used by individual modeling groups to identify, analyze and therefore hopefully reduce their model biases in a targeted region. Secondly, from a physical point of view, an identified model group can be used to analyze the targeted region (here the SMUS) in terms of processes, with the advantages of a subset of models which have been selected from quantitative criteria. Such an application has been briefly il-lustrated by showing how the selected model group represents an important additional characteristic of the SMUS not used for the selection, namely Ekman pumping. A promising reduction of biases of the full multi-model mean ensemble has been identified, opening possibilities for process studies based on this sub-ensemble of the CMIP5 database. A third application of the selection lies in the prediction of the future climate. Here, we have shown that selected multi-model ensembles may provide a more precise description of the future behavior of the SMUS. It may nevertheless be important to note that these conclusions are based on the assumption that the CMIP5 models, which have been selected according to their present-day characteristics, are the most reliable in terms of future projections, which can be questioned and refined (Lutz et al., 2016;Reifen and Toumi, 2009).
As discussed in the introduction, the concept of "model democracy", suggesting that all models should be equally considered in multi-model ensembles, is now strongly questioned . The present study proposes a promising way to improve the quality of multi-model ensembles in terms of model selection. Deep advances in the field of multi-model analysis and selection can be expected from the emerging topic of climate informatics (Monteleoni et al., 2016), as has been shown through the present study. Machine learning can indeed provide efficient tools to make the best out of the extraordinary but imperfect tools that are the climate models and multi-model intercomparison efforts.