Added value of the EURO-CORDEX high-resolution downscaling over the Iberian Peninsula revisited. Part I: Precipitation

Added value of the EURO-CORDEX high-resolution downscaling over the Iberian Peninsula revisited. Part I: Precipitation João António Martins Careto, Pedro Miguel Matos Soares, Rita Margarida Cardoso, Sixto Herrera and Jose Manuel Gutiérrez Instituto Dom Luiz, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Portugal 5 Meteorology Group. Dept. of Applied Mathematics and Computer Science. Universidad de Cantabria. Santander, Spain. Meteorology Group. Instituto de Física de Cantabria, CSIC-University of Cantabria, Santander, Spain.

permitting simulations from the WRCP-CORDEX Flagship Pilot Studies focused over the Alps (Coppola et al., 2020;Ban et al., 2021;. The evaluation and added value of higher resolution simulations constitute an important step to gauge their quality and usefulness. Soares and Cardoso (2018) proposed a new metric to quantify the added value of higher resolutions with respect to their forcing or lower resolution counterpart simulations. This metric is based on the ability of models in representing the observed probability 75 density functions (PDF). It relies on a distribution added value (DAV) which can be applied to either the full PDF or to PDF sections, thus enabling an easy evaluation of extremes or any section of the PDF.
In the past, the hindcast simulations from the EURO-CORDEX were extensively evaluated, revealing gains for the main meteorological variables (Kotlarski et al., 2014;Casanueva et al., 2016a;2016b;Prein et al., 2016;Soares and Cardoso, 2018;Herrera et al., 2020;Cardoso andSoares, 2021, Careto et al., 2021). Kotlarski et al. (2014) assessed temperature and precipitation 80 at monthly and seasonal timescales for the hindcast simulations, reporting slight improvements from EURO-CORDEX relative to the ENSEMBLES (Van der Linden and Mitchell, 2009). Overall, the models showed to be able to capture the space-time variability of the European climate. However, when considering averages over large subdomains and at the seasonal timescale, the higher resolution simulations did not reveal noticeable improvements. Prein et al. (2016) also assessed precipitation for both resolutions of the hindcast EURO-CORDEX (50Km and 12Km) and found improvements, mostly in regions characterized by complex terrain 85 and also in summertime precipitation, due to the better resolved convective features. More recently, Herrera et al. (2020) performed an assessment for precipitation and temperature for an ensemble of 8 Hindcast EURO-CORDEX RCMs over the Iberian Peninsula.
The authors report a good spatial agreement between models and observations, namely for temperature. On the other hand, this agreement decreases when extremes are considered. Nevertheless, the authors also report a larger uncertainty related to observations for precipitation relative to temperature.

90
The first to quantify the added value of the EURO-CORDEX hindcast runs were Soares and Cardoso, (2018), evaluating 5 RCMs for precipitation at both resolutions (50Km and 12 Km) considering their probability density functions with station-based dataset as observational benchmark. This study reported relevant added value of the RCMs against the driving ERA-Interim reanalysis (Dee et al., 2011). Nonetheless, when comparing both resolutions, the improvements are not as significant, with the exception of extreme precipitation. More recently, other studies such as Cardoso and Soares, (2021), Soares et al. (2021) and Careto et al,

95
(2021), used the DAVs technique to assess the added value for other variables, simulations and domains.
The precipitation historical period EURO-CODEX simulations were also assessed for specific regions (Torma et al., 2015;Soares et al., 2017;Ciarlo et al., 2020). For instance, Torma et al. (2015) evaluated precipitation over an alpine area, whereas Soares et al. (2017) assessed the same variable but for Portugal. Both studies describe the ability of the higher resolution runs in simulating the mean spatial and temporal patterns of precipitation, as well as their distributions. More recently, Ciarlo et al. (2020) assessed 100 the added value of all available EURO-CORDEX and CORDEX-CORE (Gutowski et al., 2016) simulations for precipitation, also considering a probability density function metric. The authors found added value, particularly at the tail of the distributions, however, they also report a significant uncertainty linked to the observational datasets in the results.
In this study, the DAV metric is used to assess the added value of precipitation for all available 12 Km resolution simulations from the EURO-CORDEX Hindcast (1989-2008) and Historical (1971-2005 set. The added value is then computed comparing the The next section introduces the data and a description of the methods considered. The results and discussion are presented in the 110 following section. Finally, the main conclusions are drawn in the last section. Both datasets are comparable, yet statistically different. Since a large number of stations were considered, particularly for precipitation, IGD should reproduce more realistically the climate of the Iberian Peninsula.

EURO-CORDEX
The aim of CORDEX is to develop a coordinated ensemble of high-resolution Regional Climate projections to provide detailed climate data for all land regions of the world, at user-relevant scales, and support climate change impact and adaptation research Gutowski et al., 2016). All model data is available at the Earth System Grid Federation portal (Williams et al., 2011). The EURO-CORDEX (Jacob et al., 2014;) is a branch from the international CORDEX initiative and consists of a 125 multi-model ensemble of simulations at 50Km, 25Km or 12Km resolutions for a European domain. These simulations consist in Hindcast for the 1989-2008 period, forced by the ERA-Interim Reanalysis (Dee et al., 2011), and Historical/Scenario simulations driven by the Intergovernmental Panel on Climate Change Coupled Model Intercomparison Project -Phase 5 (IPCC-CMIP5) GCMs covering the 1971-2100 period. All simulations are available at the Earth System Grid Federation portal (Williams et al., 2011;https://esgf.llnl.gov/).

130
The information regarding the simulations used are summarized in Table S1 for Hindcast and Table S2 for the Historical. For all models, the added value is computed for the common Iberian Peninsula domain shown in Fig. 1, where prior to all computations, all RCM model data was first conservatively interpolated (Schulzweida et al., 2009) into this observational domain, while the observations were interpolated into each low-resolution grid. Thus, the evaluation of the EURO-CORDEX regional models is performed at the 0.1 o regular grid, while at the same time the GCMs or ERA-Interim (0.75 o ) are evaluated at their native resolutions 135 (see table S2 for each GCM resolution).

Distribution Added Value
The Distribution Added Value (DAV) is a metric put forward by Soares and Cardoso (2018) which allows assessing in a direct way the gains or losses of using higher against lower resolution models relying on their probability density functions (PDF), by having an observational dataset as reference. DAV uses the PDF skill score proposed by Perkins et al. (2007) to measure the 140 similarity between two different PDFs. In order to compute this metric, first, a PDF must be built from the data. In this work, two slightly different methods are considered for building the PDFs to assess the daily precipitation from the EURO-CORDEX models.
In the first method is the precipitation values are accumulated within each bin, thus returning a precipitation intensity distribution.
While the second one, considers the number of events that fall into each bin, thus returning a precipitation frequency distribution.  Then, a normalization is carried out by dividing each bin by the sum of all bins (Gutowski et al., 2007;Boberg et al., 2009;2010).
With this normalization, one can more accurately compare the results between seasons or regions (Soares and Cardoso, 2018), but also, changes in PDF are identified more straightforwardly (Gutowski et al., 2007). Each bin has a width of 1 −1 to avoid excessively fine and potential noisy steps in both methodologies, thus satisfying the criteria proposed by Wilks, (1995). All DAVs

150
are computed by only considering the wet days, i.e., days with precipitation equal or above to 1 mm, as models tend to overestimate the days with very small precipitation amounts (Boberg et al., 2009;2010;Soares and Cardoso, 2018). For either methodology, the score is given by the sum of the minimum value obtained at each bin between the models PDF and the observational PDF: Where n is the number of bins for the PDFs, m denotes the high or the low-resolution simulation and obs is the observational PDF.

155
For precipitation, the limits are bounded between 1 and 300 mm, roughly corresponding to the maximum precipitation rate in IGD.
Subsequently, the DAV metric is then computed as follows: = 100 * ℎ − (2) with the subscript ℎ depicting the high resolution and the low resolution. The DAVs return the fraction or percentage of gains or losses of value by downscaling the low-resolution runs. With the normalization of the PDFs, the contribution from each bin to 160 the overall score of a particular model is more relevant for the lower bins, decreasing when approaching the tails of the distribution.
If for a specific bin there is no model or observation data, then the contribution of that bin would be 0. By definition, the maximum value for S is 1, where if a specific model overestimates the observable PDF in one section, then it will inevitably underestimate in another section. Both these scenarios lower the score of individual models. DAVs is a versatile metric with the advantage of being able to be computed for PDF sections, which is useful for the extremes added value characterization. In this study, the added 165 value assessment is performed by considering not only the whole PDF but also for an extreme precipitation PDF section, where only values above the observational 95 th percentile are accounted for. Since the resolution difference between observations and the high-resolution models is approximately 0.01 o , this threshold is computed from the observations at the original resolution, while for the low-resolution driving models, the percentile is obtained from the interpolated observations.
For the DAVs assessment, firstly, a regional approach is considered by pooling together all data from the Iberia Peninsula, thus 170 computing the added value for the entire domain. Secondly, a spatial approach is performed, where all data within each grid cell from the low-resolution simulation is pooled together, returning a DAV's spatial view, instead. Nevertheless, it should be noted that the Iberian overall value does not represent a mean from the spatial DAVs. Although the results should be similar, one must consider that different behaviour are expected, and care must be taken when comparing the results.

Hindcast (1989-2008)
The next subsection presents the results for the EURO-CORDEX Hindcast (1989-2008 simulations, by applying the DAVs metric to precipitation and precipitation extremes. All results have the IGD as a reference. The precipitation PDFs are shown in the supplementary material. Figure S1 root to the results displayed in Fig. 2. Two different approaches are performed, one following a precipitation intensity PDF (left panels in Fig S1) and a precipitation frequency (right panels in Figures). Overall, the high-180 resolution RCM simulations capture better the observable PDFs, contrary to their lower-resolution counterparts This behaviour suggests an expected and overall added value of the high-resolution runs relative to. the coarser resolution over the Iberian Peninsula domain, for both the annual and seasonal timescales. Although differences are visible for both methodologies, the precipitation intensity PDF reveals a larger spread between low and high resolution, particularly at the lower bins. Thus, one can anticipate a generalized stronger added value (positive DAV), due to a closer representation of the RCM PDFs to observations, 185 relative to the low-resolution. Moreover, low-resolution runs tend to overestimate considerably the lower rainfall bins, and in consequence of the normalization, the higher bins, roughly above the 15 mm/day are underestimated. The same occurs for the high-resolution runs, however at a lower degree, hence reproducing more reliably the observable PDF.
In  the whole PDF, with significant gains at the annual scale for 11 models, which have a DAV equal or above to 10%. From these CCLM, ETHZ, CNRM63, and SMHI models stand out surpassing 18%. CNRM53, ICTP, and IPSL RCMs show lower gains, ranging from ~5 to 10%. In winter, spring and autumn the models roughly reproduce similar DAV values seen at the annual scale, however this is not the case for summer. Of all seasons, summer has the lowest performances, particularly CNRM53, DHMZ, and IPSL showing a detrimental effect ranging from -0.2 % to -6%. In fact, the summer is the season where models display

220
Previous studies such as Soares and Cardoso, (2018) and Ciarlo et al. (2020) reported more relevant gains when looking into extremes, with a few exceptions. Instead, results from Fig. 2c reveal lower DAVs compared to the whole PDF case (Fig. 2a), yet still significant. Still in Soares and Cardoso, (2018) Fig. S1. However, the low-resolution models can't reproduce such high precipitation rates, which in the end results into added value, despite the overestimation.

235
For the precipitation frequency extremes (Fig. 2d), the DAV values are almost always slightly lower in comparison to precipitation intensity (Fig. 2c), but larger than Fig. 2b. The similarity across both methodologies at the yearly and seasonal scale is clear, where summer and autumn reveal more significant gains, while for spring and winter most RCMs display gains close to 10%. The exceptions are DMI, GERICS, MPI, and MOHC, namely for winter and autumn with more limited gains. Figure S2 from the supplemental material displays a slightly different approach than in Fig. 2. In this case, all model data was  265 clear impact on the results shown in Fig. 2a. Moreover, DHMZ which was revealed to have the minimum value in the summer regional overview, displays some points in Fig. 3a with having a slightly detrimental effect, ~-10%.
In contrast with Fig. 3a and following the results from relative precipitation frequency in Fig. 2b, Fig. 3b shows an overall smaller added value, following a very similar inter-model difference. While for precipitation intensity, values easily surpassed 50%, namely near the coast, here the gains are more limited, going up to 30% in some coastal sites. Moreover, for the same models which had

Historical (1971-2005)
The next section displays the same metric, but applied to the Historical simulation, covering the 1971-2005 period. For this case, the same RCM could be forced by different GCMs, however, the results do not necessarily have to agree. In fact, following the values from Fig. 4, the different performances are more closely related to the different GCMs themselves than across the same 290 high-resolution models. This enforces a weak or even no relationship between a single RCM forced by different GCMs. Moreover, any comparison with the previous Hindcast (Figs. 2 and 3) is hindered, not only owing to these differences, but also due to different The normalized precipitation with respect to the whole PDF is shown in Fig. 4a

310
amongst all GCM-RCM pairs. Following the previous results, the MOHC driven RCMs also show weak gains and even some detrimental effects in 4 RCMs at the annual scale. From these same RCMs, at least one season also displays losses up to -6.9%.
Nevertheless, MOHC-ETH and MOHC-KNMI are still able to reveal some added value.
The different behaviour across the downscaling of each GCM group may be related not only to their resolution but also to the performance and quality of the GCMs itself, mainly within the lateral boundary forcing zone. For instance, Brands et al. (2013)

320
NCC or IPSL_LR clearly display added value. The resolution of the GCMs can also play a major role in the added value of precipitation. Although NOAA GCM reveals a good performance (McSweeney et al., 2015), at the same displays one of the lowest resolutions, which may be a possible reasoning behind the gains found in Fig. 4a for the NOAA-GERICS pair.
The next panel displays the precipitation frequency relatively to the whole PDF (Fig. 4b). As with the Hindcast simulation (Fig.   2), the frequency reveals limited gains, but with similar inter-model differences, correlating well with the precipitation intensity 325 approach. Thus, the overall results for precipitation frequency are weaker and closer to 0%. In other words, the negative values are not as strong, where for instance the losses for the MPI2 driven RCMs are slightly less significant in comparison to Fig. 4a. At the other end of the spectrum, models forced by the NCC GCM still reveal a more significant added value, albeit weaker when compared to precipitation intensity. The exception is for winter where 6 RCMs display weaker and slightly negative DAVs.  stronger added values, thus showing higher variability. Mirroring Fig. 4a, the NCC forced RCMs reveals a very significant added value at the annual scale, derived from the strong signal found seasonally, particularly for winter. At the same time, 4 RCMs from 345 this group are highlighted for having gains superior to 30% for almost all seasons. The other GCM-RCM pairs do not reveal such expressive added value. Nevertheless, 2 RCMs driven by ICHEC1, 6 by ICHEC2, and all models forced by the IPSL GCMs, display at least one season with percentages above 20%. From these, 3 GCM-RCM pairs reveal gains superior to 30% for a single season. On the contrary, the DMI RCM forced by both ICHEC reveals weaker gains, namely for spring and summer. Moreover, the IPSL-MR RCMs display weak percentages for spring and even losses for winter, which overshadows the gains found in summer 350 and autumn. The CNRM driven RCMs reveals 5 models with similar percentages found in Fig. 4a. However, CNRM-DMI and CNRM-GERICS display losses at the annual scale derived from a stronger detrimental effect for summer and winter respectively. with either significant added value or losses, which would be masked otherwise. Figure 5 displays the results for precipitation intensity, and overall, important gains are found for all models, even for those which underperformed at the Iberia Peninsula scale in Fig. 4a. Nevertheless, it is possible to verify a more significant added value in coastal areas relative to inland points. This situation also occurred for the Hindcast simulations and is owed mainly to a better representation of the land-sea circulations.
Moreover, RCMs forced by IPSL-LR, NCC, and NOAA, reveal most grid points with significant gains, corroborating the results 375 from Fig. 4a. On the contrary, GCMs that displayed gains not as relevant, such as CNRM, ICHEC1, ICHEC2, or IPSL-MR, all display points with limited gains and sometimes small losses for sites in the interior, thus lowering the joint performance.
Nevertheless, these pairs reveal substantial added value, in particular on the Mediterranean coast. Similar to the previous cases, models forced by all three MPI GCM versions reveal a similar behaviour, with significant gains in coastal areas and lower values in the interior. However, these results contrast with the DAVs found in Fig. 4a, namely for models forced by MPI2, i.e., when assessing the precipitation at a more local scale, the gains become even more evident. Lastly, models forced by

Discussion and conclusions
In this study, the performance of RCMs from the Hindcast (1989Hindcast ( -2008 and Historical (1971Historical ( -2005 simulations is assessed relatively to their PDFs, by using a distribution added value metric proposed by Soares and Cardoso, (2018). This assessment has

420
was first interpolated to the 0.1 o resolution from the observations, while the low-resolution is assessed at their native resolution.
Two slightly different approaches were considered here, one following a precipitation intensity PDF and the other a precipitation frequency. Between both, the results reveal very similar inter-model differences, however, a stronger signal is found for precipitation intensity. Nevertheless, all RCMs reveal a significant added value, particularly in the representation of extremes,

425
where the global models have more difficulty in describing the higher precipitation rates. This result is expected and shows the importance of considering regional models with higher resolution. However, in some isolated cases, the RCMs instead display neutral or even a slight deterioration effect. On the other hand, the spatialization of the DAVs, in particular for extreme precipitation https://doi.org/10.5194/gmd-2021-207 Preprint. Discussion started: 6 August 2021 c Author(s) 2021. CC BY 4.0 License. revealed significant added values within the entire Iberia Peninsula. These gains are more relevant for coastal sites possibly owed to the better representation of the land-sea boundary.

430
Previous works warned about the uncertainty owed to interpolation procedures (Ciarlo et al., 2020). In a way interpolating the GCMs to higher resolutions could generate unrealistic values, whereas upscaling the high resolution degrades the spatial information, affecting primarily the tail end of the distributions (Torma et al., 2015;Prein et al., 2016). To gauge these differences a second methodology was investigated, where all data were interpolated to the 0.1 o resolution from the observations. In this case, the overall DAVs revealed more significant added value. These results hint towards a stronger effect in the upscaling of the high-435 resolution PDF, approaching ERA-Interim and the CMIP5 GCMs, against the effect of the generation of spurious values when interpolating lower-resolution datasets. Nonetheless, since unrealistic values may be created, the uncertainty associated is higher.
While the DAVs metric allows for quantification of the gains or losses by the downscaling of the global models, no relationship is found when the same RCM is forced by multiple GCMs. Yet, a strong connection is observed for high-resolution models driven by the same GCM. A reasoning for these differences could be primarily attributed to the individual resolution of each GCM, but 440 also to the performance of the GCM along the regions of lateral forcing for the EURO-CORDEX. In this sense, lower-resolution models will show higher DAVs values, although other effects could also play a relevant role, such as model configuration or the parameterizations used. However, if a specific GCM reveals a good performance, then the regional models will have difficulty in obtaining added value. Nevertheless, the gains obtained from the use of higher resolution RCMs are paramount, not only owed to finer details in the representation of variables by itself, but also due to the increased description of orography, and land-ocean-

445
atmosphere feedbacks, which all have important impacts on precipitation.

Data availability
All model and observational datasets are publicly available. The regional and global model data is available through the Earth