Can high-resolution GCMs reach the level of information provided by 12-50 km CORDEX RCMs in terms of daily precipitation distribution?

In this study, we perform an evaluation of PRIMAVERA high-resolution (25-50 km) Global Climate Models (GCMs) relative to CORDEX Regional Climate Models (RCMs) over Europe (12-50 km resolutions). It is the first time such assessment is performed for regional climate information using ensembles of GCMs and RCMs at similar horizontal resolutions. We perform this exercise for the distribution of daily precipitation contributions to rainfall bins over Europe 25 under current climate conditions. Both ensembles are evaluated against high quality national gridded observations in terms of resolution and station density. We show that PRIMAVERA GCMs simulate very similar distribution to CORDEX RCMs that CMIP5 cannot because of their coarse resolutions. PRIMAVERA and CORDEX ensembles generally show similar strengths and weaknesses. They are of good quality in summer and autumn in most European regions, but tend to overestimate precipitation in winter and spring. PRIMAVERA show improvements in the latter bias by reducing mid-rain 30 rate biases in Central and Eastern Europe. Moreover, CORDEX simulate less light rainfall than PRIMAVERA in most regions and seasons, which improves this common GCM bias. Finally, PRIMAVERA simulate less heavy precipitation than CORDEX in most regions and seasons, especially in summer. PRIMAVERA appear to be closer to observations. However, when we apply an averaged precipitation undercatch error of 20%, CORDEX become closer to these synthetic datasets. https://doi.org/10.5194/gmd-2019-370 Preprint. Discussion started: 4 March 2020 c © Author(s) 2020. CC BY 4.0 License.

(GCMs) relative to CORDEX Regional Climate Models (RCMs) over Europe (12-50 km resolutions). It is the first time such assessment is performed for regional climate information using ensembles of GCMs and RCMs at similar horizontal resolutions. We perform this exercise for the distribution of daily precipitation contributions to rainfall bins over Europe 25 under current climate conditions. Both ensembles are evaluated against high quality national gridded observations in terms of resolution and station density. We show that PRIMAVERA GCMs simulate very similar distribution to CORDEX RCMs that CMIP5 cannot because of their coarse resolutions. PRIMAVERA and CORDEX ensembles generally show similar strengths and weaknesses. They are of good quality in summer and autumn in most European regions, but tend to overestimate precipitation in winter and spring. PRIMAVERA show improvements in the latter bias by reducing mid-rain 30 rate biases in Central and Eastern Europe. Moreover, CORDEX simulate less light rainfall than PRIMAVERA in most regions and seasons, which improves this common GCM bias. Finally, PRIMAVERA simulate less heavy precipitation than CORDEX in most regions and seasons, especially in summer. PRIMAVERA appear to be closer to observations. However, when we apply an averaged precipitation undercatch error of 20%, CORDEX become closer to these synthetic datasets.

Introduction
Climate models are essential tools to provide information on the evolution of climate quantities, their variability and interactions with various components of the Earth System. There have been two main streams of development in the climate modelling community: Global Climate Models (GCMs) and Regional Climate Models (RCMs). GCMs are complex models that account for interactions at the global scale between various components of the Earth System (e.g. atmosphere, ocean, sea 45 ice, vegetation). They are designed to balance model resolution, physics complexity and computational requirements, and are therefore commonly run at coarse spatial resolution. RCMs are complex models that dynamically downscale GCM results to obtain fine climate information at the regional scale. The main advantages of the dynamical downscaling approach are that: 1) RCMs are computationally cheaper and use a higher horizontal resolution than state-of-the-art GCMs over the region of interest. As a result, RCMs provide a more detailed representation of complex topography and land-sea contrast (e.g. Torma 50 et al., 2015). 2) Physical processes are based on parameterization schemes that are developed at the resolutions of the RCM (12-50 km). At such resolution, these may be more appropriate than GCM schemes that are developed at much coarser resolutions (100-300 km) (e.g. Giorgi and Mearns, 1999;Prein et al., 2016;Sørland et al., 2018). 3) RCMs' parameterization schemes are specifically tuned to simulate the regional climate as realistically as possible compared to observations (e.g. Bellprat et al., 2016), while it is not possible to apply regional-specific tuning in GCMs (Mauritsen et al., 2012;Hourdin et 55 al., 2017). 4) Each RCM can downscale various GCMs to simulate many different large-scale climate conditions at the domain boundaries. This ability to be used in large ensembles is an important step to evaluate the RCM ensemble spread and better constrain the model uncertainties. To provide reliable information on climate mean, variability and change to end users at the regional to local scales, RCMs are therefore considered to be useful tools to supplement the so-called global Earth System Models (such as those used for the Coupled Model Intercomparison Projects, CMIP5 and CMIP6), which are 60 more complex and use lower resolutions.
Since the end of the 1980s, dynamical downscaling has been used to provide regional climate projections (Dickinson et al., 1989;Giorgi, 2019), and has become a well-accepted and extensively used approach to produce climate change information at the local scale (refer to various national climate assessment reports, e.g. Kjellström et al., 2016;Fealy et al.,  Projections (https://www.metoffice.gov.uk/research/approach/collaboration/ukcp/index)). The Coordinated Regional Climate Downscaling Experiment (CORDEX) is an international coordinated effort to produce multi-model regional climate change scenarios (Giorgi et al., 2009;Gutowski et al., 2016). CORDEX started in 2009 with the main goals to develop a framework 70 that provides consistent high-resolution climate information at the regional scale that would be directly usable compared to those provided by GCMs. By systematically evaluating regional climate downscaling techniques, it also aims to provide a solid scientific basis for impact assessments. Last but not least, it aims to promote interaction and communication between the Global Climate Modelling community, the Regional Climate Modelling community and end users to better support adaptation activities (Giorgi et al., 2009).

75
The CORDEX initiative (Giorgi et al., 2009) has primarily focused their effort into downscaling CMIP5 GCMs (150-200 km resolution) using RCMs at 50 km (CORDEX-44) resolution. As computational resources have become more available, resolutions in RCMs have been further increased to 12 km (CORDEX-11) over Europe Kotlarski et al., 2014;Vautard et al., in prep) and 25 km (CORDEX-22) over other domains of the globe. This effort follows the CORE protocol (https://www.cordex.org/experiment-guidelines/cordex-core) and aims to provide a core set of comprehensive and 80 homogeneous regional climate projections across many domains that can support IPCC AR6 assessments, to investigate the impact of model resolution, and to better constrain the model ensemble spread (Gutowski et al., 2016). The horizontal resolutions of 12 km over Europe and 25 km over all domains were chosen as a compromise between what is computationally possible for different modeling groups and the expected added value compared to GCMs. These community efforts within CORDEX have proved to be very useful to provide reliable climate information in terms of temperature, 85 precipitation, winds mean and extremes (e.g. Kotlarski et al., 2014;Prein et al., 2016;Glisan et al., 2019), as well as their projected climate change signals over different parts of the globe (e.g. Gao et al., 2008;Jacob et al., 2014;Rajczak and Schär, 2017). Overall, RCMs have been shown to improve the representation of mean climate compared to their driving GCMs, particularly over orography according to their higher resolutions (Torma et al., 2015;Giorgi et al., 2016;Sørland et al., 2018). When the RCM resolution is refined from 50 km to 12 km, there is an improvement in terms of spatial and 90 temporal distributions, particularly in mean and extreme precipitation in mountainous regions (Torma et al., 2015;Prein et al., 2016) due to its improved representation of orography. Summer seasons also tend to be better simulated in CORDEX-11 because the larger scales of convection are captured by the better resolved-scale dynamics (Prein et al., 2016). In addition, CORDEX-11 improves over CORDEX-44 in simulating amplitudes and historical trends of far extreme fall "mediterranean events", which have increased in intensity of about 20% in the past 60 years or so (Luu et al., 2018). Overall, however, 95 climate mean and variability do not change significantly by going from 50 to 12 km (e.g. Kotlarski et al., 2014;Casanueva et al., 2016;Jury et al., 2019).
Whether it is at 50 or 12 km resolution, the quality of RCM simulations has been shown to be linked to the internal skill of the RCM itself, which can be assessed by evaluating reanalysis-driven simulations (e.g. Kotlarski et al., 2014), but also to the quality of their driving GCMs (Giorgi and Mearns, 1999;Rummukainen, 2010;Diaconescu and Laprise, 2013;Hall,  example, summer convection contributes largely to regional water budgets, particularly precipitation extremes, where RCMs' ability to simulate these events have been demonstrated (Prein et al., 2016). Radiation or surface wind speed biases of RCM simulations downscaling GCMs were clearly driven by RCM biases and GCMs appear not to contribute (Vautard et al., 2019). The argument that RCMs are performing poorly because of unrealistic large-scale circulation in the GCMs has unfortunately not facilitated the communication between the two communities (Schiermeier, 2010;Kerr, 2011Kerr, , 2013, which 110 have continued to evolve on separate paths. This dependency may be relaxed by the recently developed 2-step nesting convection-resolving model simulations (e.g. 2-4 km resolution) and many studies have already shown their advantages (Prein et al., 2013;Ban et al., 2015;Prein et al., 2015;Giorgi et al., 2016;Berthou et al., 2018;Schär et al., 2019). Although these convection-resolving simulations can be run at decadal scale, they are still too expensive to provide multi-model ensembles of centennial climate change projections and end users therefore have to rely on CMIP and CORDEX projections 115 for adaptation activities.
In parallel to the development of the RCMs, GCMs have mainly been developed in terms of complexity, by adding more components of the Earth System into the models. Over the past decade, resolutions in GCMs have also increased from about 300 km for CMIP3, used in the 4th Assessment Report (AR4; Randall et al., 2007) of the Intergovernmental Panel on 120 Climate Change (IPCC) to about 150 km for CMIP5, used in AR5 (Flato et al., 2013). CMIP6 (Eyring et al., 2016), which are analysed in the coming AR6, have recently been completed at about 100 km resolution. These large simulation ensembles have been extensively evaluated (e.g. Kumar et al., 2014) and their projections taken into consideration for the various IPCC Assessment Reports. A new high-resolution model intercomparison project, HighResMIP (Haarsma et al., 2016), has recently emerged due to the constant progress in computing power. HighResMIP calls for atmosphere and 125 coupled GCMs at resolutions of 50-25 km, in addition to more standard CMIP-type resolutions, in order to understand the role of horizontal resolution in global climate simulation for model mean bias, variability and extremes. HighResMIP simulations have just finished and analyses on the role of resolution in these ensembles are currently underway. The benefits of resolution in GCMs have been investigated in the past in a non-coordinated way with single or small groups of models (e.g. Jung et al., 2012;Kinter III et al., 2013;Mizielinski et al., 2014), and have drawn similar conclusions regarding the 130 emergence of weather-type systems that feedback on the global climate system. For example, increasing resolution in GCMs plays a role in the simulation of the global hydrological cycle , which tends to be more intense but partitioned more realistically over land and ocean compared to observations due to stronger transport of atmospheric moisture (Demory et al., 2014) and a better representation of orography . Coupling the atmosphere with ocean eddy-permitting models tends to improve the climate mean state and variability (e.g. Minobe et al., 2008; 135 Shaffrey et al., 2009;Roberts et al., 2016). Synoptic-scale dynamics are better resolved in GCMs with increasing resolution, which improves the representation of mid-latitude eddy-driven jet variability, extra-tropical cyclones and associated extreme precipitation (Catto et al., 2010;Haarsma et al., 2013;Schiemann et al., 2018;Baker et al., 2019), as well as blocking events (Matsueda and Palmer, 2011;Berckmans et al., 2013). Intensity of tropical cyclones in GCMs also increases with resolution, and their interannual variability is better captured (e.g. Zhao et al., 2009;Roberts et al., 2015), but the resolution in GCMs is 140 still not high enough to capture the most intense tropical cyclones. All these weather-type processes can affect regional climate variability (e.g. Haarsma et al., 2013), so better simulating them can potentially lead to more realistic climate information and trustworthy climate change projections at the regional scale (e.g. Matsueda and Palmer, 2011). This question would be particularly important in regions where the water budget is partly driven by synoptic systems, such as tropical cyclones over East Asia (e.g. Guo et al., 2017) and Central America (e.g. Franco-Diaz et al., 2019), frontal systems and 145 eddy-driven jet interactions with topography over Europe (e.g. Woollings et al., 2010;Catto et al., 2012;Baker et al., 2019).
There are currently two sets of RCM and GCM ensembles at similar resolutions: the 12-50 km CORDEX RCMs and the 25-50 km HighResMIP GCMs. So far, assessing the effect of increasing resolution in RCMs and GCMs have been performed compared to their lower resolution counterparts. With HighResMIP and CORDEX, it is the first time that GCM and RCM 150 climate ensembles can be assessed against each other. Two questions emerge: 1) Can high-resolution GCMs reach the level of regional climate information that is provided by state-of-the-art RCMs, developed at higher resolution than state-of-the-art GCMs and specifically calibrated to simulate the climate of the region of interest? 2) Can we better constrain the spread of information by considering various model sources? Considering these two ensembles together would give some insights for planning future climate ensembles to improve climate projections and risk assessments at the regional scale.

155
In this study, we make use of the various RCM and GCM coordinated efforts (CMIP, CORDEX, HighResMIP) to investigate the spread of information given by various products, whether they are from low-resolution GCMs (CMIP5), highresolution GCMs (HighResMIP), low-resolution RCMs (EUR-44) and high-resolution RCMs (EUR-11). We would like to determine for instance whether HighResMIP GCMs, due solely to their increasing resolution, provide information at the 160 regional scale that is comparable to CORDEX. In other words, is the potential improvement of large-scale drivers of European climate with high resolution GCMs as beneficial as the local tuning of regional models? This would enable us to inform end users on the kind of information they can expect by considering different products. We focus our efforts on the daily precipitation distribution over European regions under current climate conditions. Section 2 presents the data used as well as the method employed to evaluate the daily precipitation distribution. Section 3 presents the results and includes 165 various sensitivity tests related to the impact of resolution and the effect of regridding model data on coarser grids. Section 4 presents several sensitivity tests regarding the method itself. Section 5 discusses the results and concludes with an opening towards the need for RCM and GCM communities to strengthen collaboration and communication.

170
We use the ocean-atmosphere coupled GCMs developed and run within the EU-Horizon 2020 PRIMAVERA project (https://www.primavera-h2020.eu), which is a European contribution to HighResMIP. PRIMAVERA uses the HighResMIP protocol (Haarsma et al., 2016), which is different from CMIP (e.g. different aerosols; refer to Haarsma et al., 2016, for details). As PRIMAVERA simulations are still running, we use the ones which were available at the time of the study. So far, PRIMAVERA simulations consist of 6 GCMs (Table 1). Most high-resolution simulations include one member only, but 175 in case there are more (such as the IFS-HR that provides 6 members), we consider one per model in order to apply equal weights to each model.

CORDEX RCMs
Over Europe, we use the CMIP5-driven EUR-44 and EUR-11 CORDEX simulations (please refer to the EURO-CORDEX simulation list here: https://euro-cordex.net/imperia/md/content/csc/cordex/20180130-eurocordex-simulations.pdf) run at 180 0.44° (about 50 km) and 0.11° (about 12 km) resolution. Daily precipitation model data have been extracted from the Earth System Grid Federation (ESGF) servers (as summarised in Table 2). We focus our analysis on the EUR-44 simulations because their resolution roughly corresponds to the resolutions used by PRIMAVERA GCMs, which allows a clean comparison between the two ensembles. However, we evaluate the roles of resolution, regridding, and ensemble size in daily precipitation distribution with equivalent pairs from EUR-11. 185

CMIP5 GCMs
To investigate the added value of CORDEX RCM simulations to CMIP5 GCMs, we constrain our study to the subset of CMIP5 GCMs used to force CORDEX simulations (Table 2, second column), available on the ESGF servers. However, we examine the robustness of our findings by also analysing the entire ensemble of CMIP5 simulations. Taking the full set changes the ensemble spread but the main conclusions of the study regarding CMIP5 remain the same (not shown).

190
We perform our analysis either on the full CORDEX and PRIMAVERA ensembles or on reduced ensembles. Reduced ensembles correspond to PRIMAVERA GCMs and CORDEX RCMs that downscale CMIP5 GCMs that are based on the same GCM family, for example the PRIMAVERA MPI-ESM1-2-XR GCM and the EUR-44 RCA4, CCLM4, CCLM5 and REMO2009 that downscaled MPI-ESM-LR (blue colored in Table 2). Also, within CORDEX, a reduced ensemble (dark 195 shaded in Table 2) is defined to compare results from EUR-44 and EUR-11.

Observations
Over Europe, we make use of the best available observational datasets (Fig. S1). These are mostly national datasets, such as such as the Mediterranean region and the Eastern Europe, we also make use of E-OBS, although the quality is most likely lower (Prein and Gobiet, 2017). All the observation datasets used are listed in Table 3.

205
The advantage of using such national datasets are that they are available at high resolutions (5-20 km) and they contain a very dense stations network, which minimizes the effect of precipitation undersampling (Prein and Gobiet, 2017). These data are therefore considered to be the best available over Europe. Nevertheless, there are drawbacks, particularly related to the lack of precipitation undercatch correction. This issue can be particularly important for falling snow over mountains but also in other places when associated with strong winds (rain does not fall vertically in the gauges, which creates an error 210 depending on wind speed and drop size). The lack of correction can include errors of 3-20% on average and up to 40-80% in high latitudes and mountainous regions (Prein and Gobiet, 2017). To overcome this problem, we use a method similar to Kotlarski et al. (2014) and Rajczak and Schär (2017), and assume a mean estimate of the undercatch error of 20% over all regions. All observations are therefore scaled by a factor of 1.2 over all grid points and over the entire time series, which gives us a rough estimate of observational uncertainties. We refer to it as a synthetic observational dataset.

215
Moreover, Prein and Gobiet (2017) advised to consider as many observational datasets as possible for regional analyses.
Most datasets, however, are available either at much lower resolution than 50 km and therefore cannot be used for evaluating the ensembles at such resolution, or they are not available at daily timescales. We have done a test using GPCP v2 available daily at 1 degree resolution. The distribution shows almost no intense precipitation over most regions and in most seasons (not shown). 220

Period
To match the observation time periods with the PRIMAVERA and CORDEX ensembles, we focus our analyses on the present-day 1971-2005 over Europe.

Domains
We divide the European domain into subregions according to the areas covered by national observational datasets (Fig. S1).

225
Over the sub-regions covered by E-OBS, we consider the PRUDENCE regions (Christensen and Christensen, 2007).

230
We look at the daily precipitation distribution in each sub-region (Fig. S1). We use a similar method as Berthou et al. (2019) based on the ASoP1 diagnostics tool developed by Klingaman et al. (2017). We calculate the daily precipitation distribution in terms of the actual contribution from 100 different intensity bins to mean precipitation. In order to account for the high frequency of low intensity precipitation events and the low frequency of high intensity events, we use an exponential bin distribution, as described by Berthou et al., 2019 (see their Fig. S5). To calculate the contribution to mean precipitation, each 235 bin frequency is multiplied by its average rate. This way, mean precipitation is split in different contributions of different rates. We consider a logarithmic scale on the x-axis, so the area under the curve is directly proportional to the mean. Fig. 1 shows the resulting distribution for PRIMAVERA, EUR-44 and observations over the British Isles region (refer to Fig. S1 for the domain) in summer (JJA). Note that this type of histogram contains both information about mean precipitation (the area under the curve) and precipitation distribution. In this example, we see that EUR-44 tend to simulate more mean and 240 intense summer precipitation over the UK, while PRIMAVERA has a lower mean, which is closer to observations for this area. However, PRIMAVERA tend to simulate too much drizzle precipitation (a common bias among GCMs; Dai, 2006;Stephens et al., 2010), while EUR-44 does not simulate such behaviour and is therefore closer to observations. These results are summarised in the pie plot (right panel of Fig. 1) for all seasons (DJF, MAM, JJA, SON).

245
The intercomparison of the model ensembles is performed as follows: 1) All datasets are regridded on the EUR-44 rotated pole grid, using a first-order conservative remapping. Then the precipitation data is pooled from each region and season. This step is repeated for every model and observational dataset.
2) The ensemble mean is calculated for each bin and a bootstrap resampling is used 1000 times on each model 250 ensemble (CORDEX and PRIMAVERA) to establish a confidence interval around the ensemble mean (the 10% confidence interval is plotted with shaded colours around the ensemble mean in Fig. 1). For the observations, the bootstrap resampling is done on single years, therefore reflecting inter-annual variability.
3) A p-value is calculated on the difference between the two ensembles for each bin (plotted in grey crosses in Fig. 1).
We apply a 10% threshold on each bin to validate that the two ensembles are significantly different (p-value < 0.1). 255 4) We group the bins as 3 intensity precipitation intervals (low: 1-10 mm/day; mid: 10-60 mm/day; high: >60 mm/day). We evaluate for each interval the percentage of bins over which they differ.
5) If the ensembles differ by more than 90% over that interval, the part of the pie corresponding to the season, region and precipitation interval is coloured (Fig. 1, right panel). 6) If the ensembles differ by more than 90%, we determine which one is less significantly different from the 260 observations using the same metric between the observational spread (inter-annual variability) and each ensemble spread. If an ensemble has at least 10% less difference with the observations than the other, then its first letter is added to that part of the pie (P and C stand for PRIMAVERA and CORDEX, respectively). If the two ensembles are both close to observations (both differ by less than 30% with the observations), then we add an "=" sign to the pie section.

265
These steps are performed for every season, region and intensity interval, and plotted as shown in Fig. 1. The pie plot is therefore a way to synthesize information for the comparison between CORDEX and PRIMAVERA (section 3.3).
We focus our analyses on DJF (December-February) and JJA (June-August), which show the largest differences. The

280
The bootstrap resampling gives a spread of ensemble means, and not of models themselves. Because PRIMAVERA has only 6 models, some of the means of the bootstrap resampling will be close to individual model behaviour, so the ensemble spread of the bootstrap can be relatively larger than with the 26 CORDEX models, where the bootstrap resampling is much less likely to pick individual models. What we are assessing with the bootstrap resampling is whether the ensemble means are different if we resample the models within them.

285
The resampling in the observations is done on inter-annual variability but the bootstrapping also represents a spread of averages, which is directly comparable with the spread of ensemble means, rather than comparing inter-annual variability with inter-member spread directly through standard deviations. We show that the main conclusions are not sensitive to the use of bootstrapping or interquartile range in section 3.4.

290
In order to evaluate the robustness of our results, we have performed several sensitivity analyses to evaluate: • the role of regridding by comparing EUR-11 results on their native grid or regridded on the EUR-44 grid https://doi.org/10.5194/gmd-2019-370 Preprint. Discussion started: 4 March 2020 c Author(s) 2020. CC BY 4.0 License.
• the role of model ensemble size by comparing all models versus a reduced ensemble • the sensitivity of the results to the bootstrapping methodology by considering an inter-quartile (25%-75%) ensemble spread 295 • the sensitivity of the results to the significance threshold by considering p-value=0.01, 0.05 and 0.1 on different percentages of each interval.
• the sensitivity of the results to the definition of the bins (size and distribution) These analyses are discussed in Section 4 of the manuscript.  Table 2).

305
All data are plotted on the models native grid but use a common mask to define the regions regridded on each model grid.
The British Isles region is not included because the resolutions of most CMIP5 GCMs are too low for significant results.
Moreover, here we only focus on the differences between CORDEX and CMIP5, so observations are not included.
In Winter (Fig. 2), there is a clear shift in the precipitation distribution going from CMIP5 to CORDEX (EUR-44 and EUR-310 11) over all regions (results from other regions can also be seen in Fig. S2). EUR-44 and EUR-11 simulate an overall decrease in low intensity precipitation and an increase in high intensity precipitation. Moreover, EUR-11 tend to show a decrease in mid-rate precipitation compared to EUR-44. The shift towards more intense precipitation can be seen in all regions but is particularly clear over coastal and orographic regions (MD, SC, AL, IP), which is presumably attributed to the increase in resolution (Prein et al., 2016).

315
In Summer (Fig. 3), these findings are still valid between CMIP5 and CORDEX. CMIP5 simulate very little high intensity precipitation, while their mid-rate precipitation is much larger than CORDEX. This finding may be attributed to the finer grid box (meaning the rain rates are those of a smaller area), the better representation of orography and coastlines that may enhance the triggering of summer convective precipitation, the use of convective schemes which are more appropriate at the resolution of the RCMs, or the tuning of parameterization schemes. The differences between EUR-44 and EUR-11 are 320 reduced, which suppose that such resolution jump does not influence summer precipitation largely when convection parameterization is used. This is also seen in other regions (Fig. S3)

325
Analyses have also been performed on all CMIP5 GCMs available on ESGF (not shown). We have found that the ensemble mean (area under the curve) is slightly lower when considering all CMIP5 models, but the distribution does not shift, so our above conclusions do not change.

330
When comparing EUR-44 and PRIMAVERA ( Fig. 4 and 5), we find that the two ensembles are relatively similar, as opposed to how CORDEX compares with CMIP5 ( Fig. 2 and 3). The effect of resolution is therefore the most important aspect to capture a realistic distribution of daily precipitation contribution to each rain rate. Overall there is no systematic difference between CORDEX and PRIMAVERA, but the two ensembles show different distributions, depending on region and season.

335
As for CMIP5, PRIMAVERA still overestimate low intensity precipitation in all seasons and regions, although to a lesser extent. In summer, PRIMAVERA have significantly less heavy rain rates than EUR-44 in all regions, which is more in agreement with observations, but both are within the observational range when considering a 20% rainfall undercatch (Fig.   5). PRIMAVERA also tend to simulate mid rain rates closer to observations in winter (Fig. 4) and transitional seasons, but this is mostly because of a sub-selection of GCMs within PRIMAVERA, except in the centre and east of the domain when 340 conclusions are robust to the reduced ensemble. Heavy precipitation tends to be lower than EUR-44. When compared to observations, PRIMAVERA tend to be more realistic in the mid and heavy rain rates. However, when considering a 20% undercatch error, this is not systematically true, particularly for the most intense rain rates in JJA (Fig. 5).
To summarise our results, we have gathered the precipitation distribution comparisons onto a common figure, as explained 345 in section 2.7. Figure 6a shows the results of the comparison of the two ensembles for each region, season and bin rate interval (low/mid/heavy rain rates). Fig. 6b shows the same figure but for the ensembles reduced to the GCM families shared by the ensembles (4 GCMs, 17 RCMs, see Table 2).
For all regions, EUR-44 and PRIMAVERA ensemble means significantly differ (the part of the pie is coloured) from each 350 other for the most intense rainfall rates in summer (JJA). EUR-44 indeed generally show a heavier precipitation tail in all regions, which is often significantly larger than PRIMAVERA (e.g. IP, CA and AL regions; Fig. 5). PRIMAVERA shows less contribution from these strong precipitation events, in better agreement with the observations in most regions except the Alps. This conclusion is the most robust one and remains true when the strictest criteria of difference is applied (Fig. 7  observations is assumed, EUR-44 are closer to the observed estimate (dashed line in Fig. 5 and Fig. S5). In other seasons, EUR-44 also have significantly larger contributions from intense precipitation compared to PRIMAVERA in many regions.
They are in general further away from observations but closer to the synthetic observations accounting for precipitation undercatch.

360
When the same GCMs are used (Fig. 6b), the differences in the medium bins are only found in the centre or east of the domain (FR, CE, CA), which is potentially where EUR-44 are less influenced by the boundary conditions provided by the GCMs. In these regions, PRIMAVERA tend to simulate less contribution from these medium bins. This is in better agreement with the observations, even if the 20% undercatch error is taken into account ( Fig. 4 and Fig. S4).

365
Both ensembles are the furthest away from the observations for the medium rain rates in winter ( Fig. 4 and S4) and in spring (not shown), mostly overestimating precipitation. They are in best agreement with the observations in summer and autumn for these bins ( Fig. 5 and S5), except in the CA region where they both underestimate summer rainfall. differences also for medium rainfall in BI, CE and CA in at least two seasons. The Alps is a region which is quite sensitive to the threshold definitions, particularly in summer regarding the comparison between the ensembles and the observations. This is because observations lie in between PRIMAVERA and CORDEX for most rain rates (Fig. 5). However, the pie charts do 380 not consider the possible observational precipitation undercatch error, which would benefit EUR-44 in this region. The observations quality is therefore of particular importance in orographic regions such as the Alps. The distribution for the lowest rain rates in CORDEX is generally close to the observed distribution, while PRIMAVERA tend to have too large contribution from low intensity, as described earlier. This result, however, depends on the threshold value. For relaxed thresholds (bottom left panels), the two ensembles differ more compared to strict thresholds (top and right panels).

Sensitivity of results to bin definitions
Fig . S6 shows the same pie charts but using different underlying bin definitions: exponential distribution with 100 bins, 200 bins, or a regular 1 mm/day bin definition . The results are weakly dependent on the bin definition, except for the Alps and the Mediterranean regions, for which more precipitation intervals are different when using the 390 regular bin definition. Regular bins also favour EUR-44 in the Alps in summer (as discussed in the previous paragraph).

Sensitivity of results to the bootstrap method
We chose to use a confidence interval based on bootstrapped ensembles. We therefore analyse the variability of the ensemble mean when the ensemble is randomly changed rather than the intermember spread itself. We argue that this method allows a fairer comparison with the observation mean bootstrapped on interannual variability. Through this approach, we can evaluate 395 if the ensemble means are significantly different, rather than if the ensembles themselves are different. To evaluate the sensitivity of our results to that choice, and assess the robustness of our conclusions from the pie charts, we also show the median and interquartile range of the distributions for PRIMAVERA, EUR-44, and the observations, each individual year being considered as "one member" (Fig. 8). We find that our first main result, the two ensembles differ for heavy precipitation in summer, are still valid: the median of PRIMAVERA is outside the interquartile range of EUR-44 above 60 400 mm/day in all regions. Our second conclusion, PRIMAVERA and EUR-44 (when driven by the same GCM family as PRIMAVERA) differ most in the centre of the domain, is also robust (e.g. Fig. S9 for winter).

Sensitivity of results to the choice of EUR-44 or EUR-11
We showed earlier (section 3.1) that EUR-11 and EUR-44 show similar distributions over most areas, except where there is complex topography (orography or coastal regions) and for intense precipitation. These results were shown on the models' 405 native grids. This analysis has the benefits of showing the actual ability of the models on their own grid, but it also takes into account technical aspects related to different land-sea contrasts that may include noise into the results. To further investigate the role of resolution in CORDEX simulations, we analyse the daily precipitation distribution for EUR-11, EUR-44 and observations on a common EUR-44 grid. These analyses also serve to assess the robustness of our results between PRIMAVERA and EUR-44 using a larger EUR-11 ensemble. The results are shown in Fig. 9 and 10 (as well as Fig. S7 and 410 S8 for other regions). When shown on a common grid, we find that EUR-11 and EUR-44 show similar results, particularly in the low-to mid-rain rates. However, EUR-11 simulate more intense precipitation than EUR-44 over orographic and coastal regions, particularly in winter. There are differences whether EUR-11 are regridded on the EUR-44 grid or not (not shown), which are expected, but these results show that the main findings between PRIMAVERA and CORDEX still hold, even when considering EUR-11.

Discussion and conclusion
In this study, we have considered high-resolution (25-50 km) PRIMAVERA GCMs, following the HighResMIP protocol, and CORDEX RCMs (available at 12-50 km resolutions) present-day simulations to make an evaluation of their simulated daily precipitation distribution over Europe. This study is the first attempt to evaluate GCM and RCM ensembles provided at 420 similar horizontal resolutions at the regional scale.
Our results show that CMIP5-driven EUR-44 and PRIMAVERA atmosphere-ocean coupled simulation ensembles give equivalent regional climate information in terms of daily precipitation distribution and its contribution to precipitation intervals. The differences in their precipitation distribution are generally small (Fig. 4-5) and much smaller than differences 425 between CORDEX and CMIP5, where the value of CORDEX is indisputable (Fig. 2-3). CMIP5 model ensemble show rather different distributions, particularly shifted to smaller precipitation intensities, as expected from their coarse grids.
PRIMAVERA and CORDEX ensembles are of good quality in summer and autumn (except in the CA region), but tend to overestimate precipitation in winter and spring. However, there are some precipitation intervals, seasons and regions for which the two ensembles significantly differ. A large difference between the two ensembles is found for heavy precipitation 430 (in all regions in summer, and in some regions in other seasons). PRIMAVERA have less heavy rainfall than EUR-44, and tend to agree better with raw observations, while EUR-44 are closer to synthetic observational datasets when a 20% undercatch error is considered. Moreover, EUR-11 partially correct this overestimation of heavy precipitation seen in EUR-44. European summer precipitation is mostly driven by local convective precipitation, which is not explicitly simulated in state-of-the-art RCMs and GCMs. At such resolutions (at best 12 km), convection is parameterized. In RCMs, such 435 parameters are commonly set by expert tuning or objective calibration to simulate a mean climate as close as possible to observations over the region of interest in hindcast simulations (using reanalysis boundary forcing; e.g. Bellprat et al., 2016).
It is not possible to perform such tuning in GCMs. GCMs are commonly tuned to balance top-of-the-atmosphere radiation globally or to better represent specific processes, but cannot be tuned over a specific region (Hourdin et al., 2017). A hypothesis to explain this excess in rainfall in the CORDEX ensemble is that most RCMs do not use the semi-implicit semi-

440
Lagrangian numerics commonly used in GCMs that allow for longer time steps. Using shorter time steps tends to increase both mean and extreme precipitation (C. Zeman, personal communication). PRIMAVERA GCMs tend to have more light precipitation than EUR-44, and too much compared to the observations, although this result is not as robust as the former one. It is possible that expert tuning of the convective scheme and land-surface scheme in RCMs has a positive effect towards reducing this "drizzling" problem.

445
The advantage of EUR-11 over EUR-44 is mostly found in winter, when precipitation strongly depend on the interaction of large-scale circulation with orography. Otherwise the differences are rather small when aggregated over a region (Fig. 9-10).
Another conclusion is that when considering only shared GCM families between the two ensembles, differences in the bulk 450 of the distribution (medium rain rates) is mostly found in the central and eastern parts of the European domain, in autumn, winter and spring (Fig. 6b). PRIMAVERA tend to reduce precipitation overestimation in these regions and seasons compared to EUR-44. This could be linked with better simulation of blocking frequency in PRIMAVERA GCMs This study is a first effort to evaluate the quality of regional climate information provided by GCM and RCM ensembles of similar horizontal resolutions. We have only investigated daily precipitation distribution, and such an exercise needs to be continued with other fields (temperature, winds) mean, variability and extremes. Nevertheless, the results are very promising, in particular as the two ensembles have similar performance. PRIMAVERA and CORDEX, being EUR-11 or 475 EUR-44, should therefore be considered equally credible, depending on the user's needs, such as those aggregated over a domain. For studies at the local scale or over orography, however, a higher resolution model dataset, such as EUR-11, would inevitably give more detailed spatial information (e.g. Kotlarski et al., 2014;Prein et al., 2015).
The performance of PRIMAVERA was not logically expected because these GCMs were developed at a coarser resolution, and only their resolution was increased. The tuning was performed on their low-resolution counterparts, so little additional 480 tuning was performed at these high resolutions (see Roberts et al., in revision, for changes in models when increasing resolution), as opposed to RCMs which are developed at a higher resolution and potentially tuned at each resolution. The fact that PRIMAVERA results exhibit moderate improvements over CMIP5-driven CORDEX simulations for precipitation over Europe is also an important result of this study, which is consistent with the results of Iles et al. (2019) who used a very different method to compare GCMs and RCMs at different resolutions. It indicates that the potential 485 improvement of large-scale dynamics in GCMs due to higher resolution does not have a strong influence on precipitation improvement, which is largely driven by downscaling.
The added value of RCMs to CMIP5 GCMs is also an important result, and it emphasizes the importance of a well designed, well evaluated model chain when using dynamical downscaling as a method to obtain higher resolution climate data. We

490
show here that considering climate information from various sources is crucial.
We have also taken into account the issue associated with observational uncertainty. To try to reduce as much as possible uncertainties linked to observations, we have used national gridded observational datasets. Although of very high quality, these are still not fit for a thorough evaluation of climate models at the regional scale, particularly over orography, and we had to roughly correct the observations by adding an averaged 20% to account for precipitation undercatch. This is not ideal 495 but believed to be fairer when evaluating higher resolution models.
In this study, we have only focused on present-day simulations. Assessing future climate projections between the two ensembles may be more difficult because the results would depend on other parameters independent of the models themselves, such as the lack of a common protocol (e.g greenhouse gases and aerosols forcings) between RCMs and GCMs.

500
Assessing the impact of aerosol forcings on the climate projections is currently being investigated (Boé et al., in revision; Gutierrez et al., in revision).
We have limited our study to Europe, which has the advantage of having a large RCM ensemble.  -1950 hist-1950 hist-1950 hist-1950 hist-1950 hist-1950 Ensemble member r1i1p1f1 r1i1p2f1 r1i1p1f2 r1i1p1f1 r1i1p1f1 r1i1p1f1 Table 1: Information about the PRIMAVERA high-resolution GCMs used in this study, including their spatial resolution (for full 900 details, refer to https://www.primavera-h2020.eu/modelling/our-models/). The ones listed in bold are of the same family than the CMIP5 GCMs downscaled by CORDEX.
https://doi.org/10.5194/gmd-2019-370 Preprint. Discussion started: 4 March 2020 c Author(s) 2020. CC BY 4.0 License.  1971-2005 1971-2005 1971-2005 1971-2003 1971-2005 1971-2005 Table 3: Information about the observational datasets used in this study (refer to Fig S1 for the coverage). The time period concerns that considered in this study, not the available period of each observational datasets.    Figure 6: Map using the method described in Fig. 1: for each season (clockwise from the top: summer, autumn, winter, spring, see right panel of Fig. 1), region, and precipitation intensity interval (low rain rates=inner part, mid rain rates=middle part, high rain rates=outer part), a colour indicates that the CORDEX and PRIMAVERA ensembles are significantly different, a "P" or "C"