Model intercomparison of COSMO 5.0 and IFS 45r1 at kilometer-scale grid spacing

TS3The increase in computing power and recent model developments allow for the use of global kilometerscale weather and climate models for routine forecasts. At these scales, deep convective processes can be partially resolved explicitly by the model dynamics. Next to horizontal 5 resolution, other aspects such as the applied numerical methods, the use of the hydrostatic approximation, and time step size are factors that might influence a model’s ability to resolve deep convective processes. In order to improve our understanding of the role of these 10 factors, a model intercomparison between the nonhydrostatic COSMO model and the hydrostatic Integrated Forecast System (IFS) from ECMWF has been conducted. Both models have been run with different spatial and temporal resolutions in order to simulate 2 summer days over Europe with strong 15 convection. The results are analyzed with a focus on vertical wind speed and precipitation. Results show that even at around 3 km horizontal grid spacing the effect of the hydrostatic approximation seems to be negligible. However, time step proves to be an im20 portant factor for deep convective processes, with a reduced time step generally allowing for higher updraft velocities and thus more energy in vertical velocity spectra, in particular for shorter wavelengths. A shorter time step is also causing an earlier onset and peak of the diurnal cycle. Further25 more, the amount of horizontal diffusion plays a crucial role for deep convection with more diffusion generally leading to larger convective cells and higher precipitation intensities. The study also shows that for both models the parameterization of deep convection leads to lower updraft and pre30 cipitation intensities and biases in the diurnal cycle with a precipitation peak which is too early.


Introduction
The earth's atmosphere is home to processes ranging from scales as large as the planet itself, such as the trade winds, down to scales of angstroms (10 −10 m), such as Rayleigh scattering of sunlight by an air molecule. Explicitly resolving all these processes in an atmospheric model is virtually impossible, even in the distant future. But the ever greater availability of computing power allows us to at least come closer by reducing spatial resolutions in numerical weather prediction and climate 20 models step-by-step (Schulthess et al., 2019;Neumann et al., 2019). One of the processes that can nowadays be resolved is deep convection: the rise of buoyant plumes, strong enough to break through the temperature inversions and reaching as high as to the tropopause. Given that there is enough moisture in the air, the plumes can form towering cumulonimbus clouds and cause heavy thunderstorms. On a larger scale, deep convection is an important process for the redistribution of heat, moisture, and momentum with subsequent large impact on the general circulation in the atmosphere (Houze and Betts, 1981;Held and Soden, 2006).
Atmospheric models with a grid spacing of around 4 km and smaller have been considered to at least partially resolve deep convection (Weisman et al., 1997;Romero et al., 2001;Done et al., 2004) while models with coarser resolutions have to rely on parameterization of deep convection. One of the drawbacks of parameterized deep convection is a known phase error in the diurnal cycle of precipitation, by being too closely coupled to the phase of solar radiation and thus peaking too early 30 (Yang and Slingo, 2001;Betts and Jakob, 2002;Guichard et al., 2004;Dai and Trenberth, 2004), even though more recent developments by Bechtold et al. (2014) have shown some improvements in this regard. Coarse models with parameterized deep convection also tend to overestimate precipitation frequency but underestimate precipitation intensity (Dai and Trenberth, 2004;Sun et al., 2006;Stephens et al., 2010). The explicit treatment of deep convection usually leads to a better representation of the diurnal cycle (Hohenegger et al., 2008;Dirmeyer et al., 2012;Ban et al., 2014;Pearson et al., 2014;Leutwyler et al., 35 2017), a better spatial representation of rainfall (Kendon et al., 2012;Prein et al., 2013), and more realistic hourly intensities of extreme precipitation events (Prein et al., 2013;Ban et al., 2014;Fosser et al., 2014;Kendon et al., 2019). More details about the benefits of kilometer-scale climate models can be found in several review articles (Prein et al., 2015;Schär et al., 2020).
Nonetheless, deep convection is not yet fully resolved with grid spacings of 1-4 km. To fully resolve deep convection, one would require a grid spacing of around 250 m or less (Bryan et al., 2003;Lebo and Morrison, 2015;Jeevanjee, 2017). But 40 even though the structural convergence of updrafts and clouds is not yet reached at kilometer-scale, many domain-averaged and integrated properties related to a large ensemble of convective cells (i.e. mean diurnal cycle, spatial distribution of precipitation, clouds, diabatic heating, convective transport of mass, heat and water vapour) have been shown to converge already at a grid spacing of around 4 km (Panosetti et al., 2018(Panosetti et al., , 2019. This so-called bulk convergence makes the explicit treatment of deep convection at kilometer-scale an attractive practice, as it can bring the aforementioned improvements without paying the huge 45 computational costs associated with fully resolving deep convection. Recent work by Vergara-Temprado et al. (2020) has shown that the explicit treatment of deep convection may already be beneficial even at relatively coarse grid spacings of up to 25 km for selected metrics such as hourly precipitation statistics and the representation of the diurnal cycle over nonorographic regions.
Another subject that is often associated with weather and climate models at kilometer-scale is the use of the hydrostatic 50 approximation in the governing equations. The hydrostatic approximation assumes the vertical accelerations to be small compared to the buoyancy force. This is normally the case when the horizontal length scale of the flow is much larger than the vertical length scale. With the hydrostatic approximation, vertical velocity can be derived from the continuity equation and thus becomes a diagnostic variable. The resulting system of equations is simpler and usually computationally less expensive to solve, what makes it an attractive option for models as long as the hydrostatic approximation is still suitable. For example, the 55 nonhydrostatic version of the Integrated Forecast System (IFS) model from the European Centre for Medium-Range Weather Forecasts (ECWMF) is about 80% more expensive than the corresponding hydrostatic version at a grid spacing of around 9 km (Wedi et al., 2009).
There is not really a consensus in the scientific community about the horizontal resolution at which the hydrostatic approximation is no longer suitable. For example, Ross and Orlanski (1978) performed a two-dimensional simulation of an idealized 60 cold front and found no big differences between the hydrostatic and the nonhydrostatic setup for a grid spacing of 20 km, while Orlanski (1981) found significant differences for a similar case with 8 km grid spacing. According to calculations by Daley (1988), models with a grid spacing of 25 km or smaller should already use the nonhydrostatic set of equations. But then again, Dudhia (1993) found only little differences between the hydrostatic and the nonhydrostatic solution for a cold front with grid spacing of 6.5 km. Kato and Saito (1995) performed idealized moist convection simulations with grid spacings of 65 20 km, 10 km, and 5 km and found that the hydrostatic model without parameterized deep convection overdevelops updrafts and overestimates convective precipitation amount and area. These results were later confirmed for a real-world case from Kato (1996). Kato (1997) recommends the use of moist convective parameterization (e.g. Manabe et al., 1965) when using a hydrostatic model with around 10 km grid spacing and the use of a nonhydrostatic model for a grid spacing of 5 km. Recent global real-world simulations with the hydrostatic IFS at a grid spacing of 1.45 km by Dueben et al. (2020) and Wedi et al. (2021) 70 produced realistic results and did not show deficiencies that could be directly attributed to the invalidity of the hydrostatic assumption at this resolution.
Several studies also primarily looked at the vertical velocities of hydrostatic and nonhydrostatic models at different resolutions. A maybe counter-intuitive behavior of the hydrostatic regime is the development of too high vertical wind velocities at resolutions where the hydrostatic assumption is no longer valid. This is due the fact that the vertical wind velocity is directly 75 diagnosed from the horizontal velocities and there is no nonhydrostatic process limiting the vertical mass flux. Simulations of a squall line with horizontal grid spacings reaching from 20 km to 1 km by Weisman et al. (1997) showed the hydrostatic model overestimating the maximum vertical velocity at grid spacings of 4 km and lower. Jeevanjee (2017) found an overestimation of vertical velocities of the hydrostatic model at grid spacings smaller than 2 km in radiative-convective-equilibrium simulations over sea with grid spacings ranging from 16 km to 0.0625 km. Dueben et al. (2020) performed global simulations with IFS 80 using the hydrostatic and nonhydrostatic equations at 1.45 km grid spacing where the updraft velocities were quite similar when using a timestep of 30 s. Not only the system of equations, but also the applied numerical methods are important when it comes to understanding the model behavior. The two models used for this study are very different in this regard: While the hydrostatic IFS model is a spectral model with a semi-Lagrangian semi-implicit scheme, the nonhydrostatic COSMO model is a Eulerian model 85 with a split explicit scheme in the horizontal and a implicit scheme in the vertical dimension. These differences in design have direct implications on the conditions for numerical stability and the associated timestep of the models. Thanks to the semi-Lagrangian treatment of advection, the timestep in IFS is not limited by the Courant-Friedrichs-Lewy (CFL) condition (Courant et al., 1928) but by the Lipschitz condition. The Lipschitz condition requires the timestep to be smaller than the reciprocal of the absolute maximum value of the wind shear at each direction (Pudykiewicz et al., 1985;Staniforth and Côté, 90 1991). It ensures that the trajectories do not intersect each other (Smolarkiewicz and Pudykiewicz, 1992) and is less restrictive regarding timestep than the CFL condition, allowing an atmospheric model to remain stable and deliver accurate results even with CFL numbers higher than 4 (Staniforth and Côté, 1991). This implies that IFS can be run with a rather long timestep and still remain stable, even though the CFL condition might be violated at some locations with high wind speed. In contrast, COSMO uses an Eulerian explicit approach for horizontal advection and thus the timestep has to be small enough to not violate 95 the horizontal CFL condition at any location in order to guarantee stability. Evidence that the semi-Lagrangian semi-implicit scheme does not only provide numerical stability for such high CFL numbers, but also produces reasonable results in realworld scenarios is provided by the day-to-day forecasts of ECWMF, using a competitive timestep of 450 s at 9 km grid with IFS.
Compared to the many studies addressing spatial resolution in atmospheric models, the sensitivity to temporal resolution has 100 received relatively little attention. Several studies identified timestep as a very important factor when it comes to precipitation patterns (Williamson and Olson, 2003), precipitation intensity (Mishra and Sahany, 2011), or tropospheric circulation . However, these studies were carried out with relatively coarse resolution (∆x > 100 km) with parameterized deep convection and are therefore hardly comparable to the resolutions used in this study. Fuhrer et al. (2018) recommends a timestep smaller than 40 -60 s at around 1 to 2 km grid spacing. The global simulations with IFS from Dueben et al. (2020) 105 with a 1.45 km grid spacing showed improvements in the representation of deep convective processes for both, the hydrostatic and nonhydrostatic version when reducing the timestep from 120 s to 30 s. It is probably difficult to give a generally valid recommendation for timestep size, as many different processes are affected by it. Next to the dynamics, the parameterization of subgrid-scale processes, its call frequency, and the type of coupling to the model dynamics (see for example Ubbiali et al., 2021) also have to be considered. For instance, Barrett et al. (2019) performed idealized simulations using COSMO with 1 km 110 grid spacing and found a 53% reduction in precipitation with a two-moment microphysics scheme when the timestep was increased from 1 s to 15 s. These changes were attributed to the timestep dependence of the amount of supersaturation with respect to liquid in strong updrafts and the corresponding sensitivity of the cloud microphysics parameterization to this value in combination with the sequential-update splitting coupling. In the current study we use bulk microphysics schemes, which have a much smaller sensitivity with respect to the timestep.

115
Deep convection is a dynamical process that is often happening very locally, involving only a few grid points in kilometerscale models. The dynamics and concentration of moist variables at such scales are largely affected by diffusion. Diffusion may serve many purposes, such as eliminating numerical noise, increasing model stability, absorbing vertically propagating gravity waves at the model top, or also emulating cumulative effects of unresolved subgrid-scale processes (see Jablonowski and Williamson, 2011, for an overview of diffusion). Next to implicit diffusion, which is inherently caused by the numerical 120 methods, most models apply some form of explicit diffusion. A significant amount of diffusion is also caused by subgridscale parameterizations and orography filtering . All these aspects can lead to very different model behavior in terms of dissipation, which then might again influence deep convection. Ricard et al. (2013) conducted a case study over southwest France with the nonhydrostatic limited-area model AROME (Météo-France), which uses the same dynamics as the nonhydrostatic version of IFS, at 2.5 km grid spacing in order to deter-125 mine the influence of horizontal diffusion on convective cells. They have compared the results to the Eulerian research model Meso-NH (Lafore et al., 1998) and found that AROME develops larger convective cells than Meso-NH with a tendency of the cells to structure into circular patterns with too strong outflow at the surface induced by cooling from precipitation evapora-tion, especially with additional explicit diffusion. In order to prevent too high precipitation intensities and unrealistic divergent winds at the edges of the cold outflow, Malardel and Ricard (2015) introduced a correction of the interpolation weights of the 130 semi-Lagrangian scheme for IFS, AROME, and HARMONIE (HIRLAM consortium), which all use the same dynamics. These new interpolation weights improve the conservation property of the scheme, effectively increase diffusion in the convergent parts of the flow and reduce diffusion in the divergent parts of the flow. They performed idealized and real-case experiments of convective systems with the new interpolation weights, which lead to a significant reduction of extreme precipitation with the new interpolation weights. While the experiments shown in the paper were performed with the nonhydrostatic version of the 135 dynamics of IFS, the operational hydrostatic version showed the same improvements.
In order to improve our understanding about the role of some of the aforementioned factors in the representation of deep convection, we here present a model intercomparison between COSMO and IFS, addressing the following key questions: 1. What are the main differences between COSMO and IFS in the representation of deep convective precipitation? How do the precipitation patterns, precipitation intensities, and the diurnal cycle of precipitation look like at different resolutions,  It has to be mentioned, that comparing two so fundamentally different and complex models in simulating a real-world case makes it intriguing but also very difficult to confidently attribute any disparities in the examined results to a specific process or its associated handling in the model. Nevertheless, some assumptions can be made based on findings in previous studies 150 and our knowledge of the different model properties. We would also like to emphasize that this study is not intended to be a performance comparison of the two models. The models have not been specifically tuned for the respective resolution setups and, with only two days of hourly data, the sample size is too small to draw any conclusions regarding the general forecast quality. Differences in the results are expected, as the specific weather situation under consideration was difficult to predict and the general setup of the models is different. In particular, IFS is initialized from its own analysis and then run globally, while 155 COSMO is driven by the IFS operational analysis. So the aim of the study is mainly to show differences and similarities of these two very distinct modelling approaches with different configurations in their treatment of deep convective processes, and through this extend our knowledge on this subject.  (Baldauf et al., 2011) has been originally developed for numerical weather prediction, but has been extended to also run in climate mode (Rockel et al., 2008). COSMO is a regional model and operates on a grid with rotated latitude-longitude coordinates. It uses a split explicit third-order Runge-Kutta discretization (Wicker and Skamarock, 2002) in combination with a fifth-order upwind scheme for horizontal advection, and an implicit 165 Crank-Nicholson scheme for vertical advection. Parameterizations used in this version include subgrid-scale orography (SSO) by Lott and Miller (1997), a radiation scheme based on the δ-two-stream approach (Ritter and Geleyn, 1992), a single-moment cloud microphysics scheme (Reinhardt and Seifert, 2006), a turbulent kinetic energy based parameterization for the planetary boundary layer (Raschendorfer, 2001), an adapted version of the convection scheme by Tiedtke (1989), and a multi-layer soil model with a representation of groundwater . Explicit horizontal diffusion is applied by using 170 a monotonic 4th-order linear scheme acting on model levels for wind, temperature, pressure, specific humidity, and cloud water content (Doms and Baldauf, 2018). An orographic limiter helps avoiding excessive vertical mixing around mountains.
For the standard experiments in this paper, the explicit diffusion from the monotonic 4th-order linear scheme is set to zero.
However, we apply Smagorinsky diffusion (Smagorinsky, 1963) to the horizontal wind components for all experiments in order to enhance the numerical stability of the scheme in the presence of horizontal shear instabilities. For this project we 175 use a refactored version of COSMO 5.0, which is able to run on hybrid GPU-CPU architectures (Fuhrer et al., 2014). The model extension was a joint effort between MeteoSwiss, the ETH-based Center for Climate Systems Modeling (C2SM), and the Swiss National Supercomputing Center (CSCS). It has been used for climate studies with 2.2 km grid spacing over Europe (Ban et al., 2014;Leutwyler et al., 2017) and is also capable of running on a near-global domain at this resolution (Fuhrer et al., 2018).

IFS
The Integrated Forecasting System (IFS) is the model used by the European Centre for Medium-Range Weather Forecasts (ECWMF) for its daily data assimilation and subsequent global forecasts. It is a hydrostatic model but can also be run using a nonhydrostatic extension which originally has been developed for the ARPEGE/Aladin models (Bubnová et al., 1995;Bénard et al., 2010). IFS is a spectral transform model where temperature, wind, and surface pressure are represented in spectral 185 space with spherical harmonics basis functions, transformed at every timestep to a corresponding grid point space on a cubicoctahedral reduced Gaussian grid (Wedi, 2014;. Notably, all water substance variables only exist in gridpoint space. Semi-Lagrangian advection, physical parameterizations, and nonlinear terms are calculated in grid point space. The horizontal gradients and the Laplacian operator for horizontal wave propagation are then efficiently calculated in spectral space. The transformation between grid point space and spectral space is done by a Fast Fourier Transformation (FFT) in 190 longitude and a (Fast) Legendre Transformation (FLT) in latitude. The spectral transforms do not scale linearly with the number of grid points and also require global communications, which means that at very high resolution the spectral transforms become a computational bottleneck of the model (Wedi et al., 2013;Schär et al., 2020). One of the reasons why the global IFS is still mostly run in hydrostatic mode is that the nonhydrostatic version uses a predictor-corrector approach that leads to more spectral transforms per timestep and is therefore substantially slower. Besides the differences in the timestepping scheme, 195 the nonhydrostatic spectral version of IFS also uses a different method for the vertical discretization then the hydrostatic IFS (finite differences instead of finite element). Recently, a nonhydrostatic core for IFS based on a finite-volume discretization (IFS-FVM) has been developed (Kuhnlein et al., 2019). IFS-FVM does not require spectral transforms and achieves a higher strong scaling computational efficiency compared to the spectral model at higher resolutions. Nevertheless, the hydrostatic spectral model version of IFS is still very competitive in terms of time-to-solution even at kilometer-scale (e.g. Kuhnlein et al.,200 2019; Schulthess et al., 2019;Dueben et al., 2020;Wedi et al., 2021). While it would certainly be interesting to include all three different IFS model versions into this intercomparison, the differences between them are substantial and we consider an intercomparison among them beyond the scope of this work. A detailed comparison of the spectral hydrostatic IFS with the spectral nonhydrostatic IFS can be found in Dueben et al. (2020), who performed global simulations at 1.45 km grid spacing with these model versions in different configurations. In the present study, we have decided to focus only on the 205 operational spectral hydrostatic model version of IFS. Therefore, when we use the term IFS in this paper, we generally refer to the operational spectral model. IFS uses an adapted version of the convection scheme by Tiedtke (1989) with improvements regarding tropical variability (Bechtold et al., 2008) and diurnal cycle (Bechtold et al., 2014). Other parameterizations include a Monte-Carlo Independent Column Approximation (McICA) for radiation (Barker et al., 2008;Hogan et al., 2017) and the land surface hydrology scheme HTESSEL (Balsamo et al., 2011). IFS applies explicit diffusion to the prognostic variables in 210 spectral space (temperature, wind, surface pressure) with an operator that mimics spectral viscosity after Gelb and Gleeson (2001). Furthermore, some diffusion comes implicitly from the interpolation required by the semi-Lagrangian scheme in grid point space (notably for the water variables and tracers), as well as the spectral truncation due to the transformation from grid point space to spectral space, acting like a 4∆x spectral filter in case of a cubic grid. The simulations for this project were performed with the global atmospheric model of the IFS based on Science Version 45r1. IFS documentation of the different 215 model cycles can be found on the ECMWF website (https://www.ecmwf.int/en/publications/ifs-documentation, last access: 1 February 2021).

Model intercomparison
The simulations cover two days from 29 May 2018 00:00 UTC to 30 May 2018 with heavy thunderstorms over Europe. Both 220 models use one day lead time (28 May) and are initialized with ECMWF operational analysis data at a horizontal grid spacing of ∼ 9 km. Since COSMO uses a different soil model , the soil in COSMO was initialized with an average value from May/June after a 5-year spinup with COSMO 12 km by Vergara-Temprado et al. (2021). IFS is run globally,   Table 1.

Horizontal diffusion experiment
For this experiment, COSMO has been run for the same case as above, but with a varying amount of explicit horizontal 235 diffusion. This will give us some idea about the influence of horizontal diffusion on the model results and might explain some characteristic differences between IFS and COSMO. In COSMO, 4th-order diffusion is applied by introducing an additional operator at the right hand side of the prognostic equation where ψ is the prognostic variable and S represents all physical and dynamical source terms for ψ. The prognostic variables on 240 which horizontal diffusion is applied are wind, temperature, pressure, specific humidity, and cloud water content. The default diffusion coefficient is α 4 is dependent on the horizontal and temporal resolution such that α 4 = (∆x/2) 4 /∆t. This coefficient can be multiplied with a factor, which we will hereafter call diff, in order to apply more or less smoothing to the mentioned variables. A value of diff = 1 means that the diffusion coefficient remains unchanged and corresponds to the default value α 4 . Any value of diff smaller than one decreases the explicit diffusion coefficient and any value larger than one increases 245 explicit 4th-order diffusion strength. In our default setup for the intercomparison, COSMO has been run with no explicit 4thorder linear horizontal diffusion, which means diff was set to zero. For this experiment, the 2.2 km setup with a timestep of ∆t = 20 s has been used, but with numbers for diff ranging from 0 to 4 with an increment of 0.5.

Observations
Three datasets are used for the evaluation of the model results: IMERG, RADKLIM, and IDAWEB. Comparing model results

250
with observational data is a difficult undertaking. Next to the differences in spatial sampling (i.e. point measurement vs. grid cell averages), observations also suffer from several deficiencies (see below) and therefore different observational datasets often provide substantially different results, which is also the case in this study. Thus, observations should only be taken as a point of reference and not the absolute truth.

255
The Integrated Multi-satellitE Retrievals for GPM (IMERG) dataset (Huffman et al., 2019b) provides worldwide, half-hourly precipitation data on a 0.1 • × 0.1 • grid by using a set of algorithms to combine satellite data and rain gauge observations into one product (Huffman et al., 2019a). IMERG incorporates satellite data from as many satellites as possible, i.e. not only the ones under the direction of the Global Precipitation Measurement (GPM) mission, in a flexible framework. The satellite data consists of passive microwave (PMW) sensors from various low-Earth-orbit platforms and infrared (IR) estimates from 260 geosynchronous-Earth-orbit satellites, as well as active radar data from the GPM satellites. The rain gauge data stems from the Global Precipitation Climate Centre (GPCC) which is operated by the German Weather Service (DWD, Deutscher Wetter Dienst). The specific product that has been used by IMERG for the time period of this case study is the GPCC Monitoring Product V6 (Schneider et al., 2018). This product is based on monthly SYNOP and CLIMAT data from 7000 -9000 rain gauges worldwide. IMERG adjusts the accumulated monthly precipitation totals from GPCC with a gauge correction algorithm by 265 Legates and Willmott (1990) and then calibrates the gridded multi-satellite estimate with these values. For this study, the Final version of IMERG has been used and the half-hourly measurements were added up to hourly values in order to be consistent with the model output frequency.

RADKLIM
RADKLIM (Radarklimatologie) is a radar-derived and gauge-adjusted precipitation product from the German Weather Service

270
(DWD, Deutscher WetterDienst) that works on a 1100 × 900 grid over Germany with 1 km grid spacing (Winterrath et al., 2017). It uses measurements from 17 C-band weather radars (for the evaluated period only 16 radars have been in use) and approximately 2000 rain gauge stations. The method is based on the disaggregation of daily precipitation estimates from raingauges into hourly values using radar-based estimates (Paulat et al., 2008;Wüest et al., 2010). The specific product that has been used for this study is the RW product, which uses the weighted mean of two different gauge calibration methods, from 275 the reanalysis version 2017.002. RADKLIM delivers hourly accumulated precipitation values for the hour from (hh-1):50 to hh:50. This represents a slight shift compared to the model and averaged IMERG outputs, which are available for the (hh-1):00 to hh:00 intervals. This 10-minute time shift is neglected in this study. RADKLIM works very similar to RADOLAN (Bartels et al., 2004), but unlike RADOLAN it is not a real-time product. RADKLIM includes more rain gauge stations for the calibration (∼ 2000 compared to ∼ 1300) and also possesses a more sophisticated radar artefact correction process. Therefore,

280
RADKLIM should deliver more accurate values than RADOLAN.
Radar-based estimates of rainfall allow for a high resolution in space and time, but they are also associated with some uncertainties. Sources of errors include cluttering from other objects, attenuation, variability of the relation between reflectivity and rainfall rate (Z-R relation), beam blockage, range degradation, vertical variability of the precipitation system (i.e. bright band), and vertical air motions that increase or decrease raindrop fall speed (Villarini and Krajewski, 2010). RADKLIM uses a 285 sophisticated radar artefact correction process which, together with the rain gauge calibration, should minimize the uncertainty due to such artefacts. Nevertheless, some artefacts might still exist which has to be kept in mind when using the data.
An intercomparison between RADOLAN, which is very similar to RADKLIM, and IMERG can be found in Ramsauer et al. (2018), where they have found total precipitation in IMERG to be generally higher than in RADOLAN but lower in mountainous regions. This underestimation of precipitation by IMERG at higher altitudes, opposing to the general overestimation in 290 flatter terrain, was also found in studies by Wang et al. (2019) who evaluated IMERG with a dense rain gauge station network in Lhasa. They also showed that the performance of IMERG overall decreases with increasing elevation.

IDAWEB
IDAWEB is a web-portal operated by MeteoSwiss which provides hourly precipitation measurements from roughly 1000 rain gauges from different institutions in Switzerland. Unlike IMERG or RADKLIM, it is not a gridded dataset. But due to its provides decent coverage at high altitude in the Alpine region. IDAWEB also incorporates rain gauge measurements from a few stations outside of Switzerland, but as most of them are quite isolated, these stations were ignored for this study.

300
Like satellite-based or radar-based products, also rain gauge observations involve uncertainties. They suffer from various errors such as evaporation, splashing, and most importantly wind effects which usually result in a low bias. The mean undercatch for Switzerland in summer is estimated to be 7% with exposed stations having roughly twice the bias as well protected sites (Sevruk, 1985;Richter, 1995). A good overview of errors and error correction can be found in Sevruk (2005). While IMERG uses a gauge correction algorithm by Legates and Willmott (1990), no comparable gauge correction algorithm is applied for 305 RADKLIM or the IDAWEB station data. However, the undercatch for heavy summer precipitation is assumed to be rather small and incorporating such possible observation errors into this analysis would be beyond the scope of this study.

Model intercomparison
3.1.1 Precipitation pattern 310 We start by showing an example of the spatial precipitation distribution. Figure 1 shows accumulated hourly precipitation between 17:00 and 18:00 UTC on 29 May 2018 from different model runs and the multi-satellite product IMERG. While the location of precipitation is generally similar to the observations, there are distinct differences visible between COSMO and IFS. Most obvious is the larger amount of light precipitation in IFS compared to COSMO. Additionally, the cell structure in COSMO is more fine-grained than in IFS. These two model characteristics hold true throughout all resolutions and can 315 also be seen in Fig. 2, which shows the precipitation patterns at the same time for the RADKLIM domain. Both runs with parameterized deep convection clearly produce less heavy precipitation than the ones with explicit deep convection. Looking at Fig. 1, one could argue that IFS 9 km (450 s) without deep convection parameterization is the closest to IMERG with regard to cluster size and location. However, this is a momentarily snapshot and while the IFS 9 km operational configuration is welltuned, there is evidence of shortcomings and ongoing work to improve cluster size and location in the IFS model with deep 320 convection parameterization. For the RADKLIM domain in Fig. 2, COSMO (PAR) produces much less precipitation, which is partially due to a shift in timing (precipitation falls too early in the course of the day), but also due to a generally strong underestimation of precipitation over Germany with this setup, especially for the first day (see Fig. 1 and also Table 2).
When looking at the observations only, RADKLIM and IMERG agree well on the location of the precipitation. There are, however, visible differences in intensity and spatial extent. While some of these differences might come from the different 325 measurement and processing techniques, differences will also be caused by the much higher spatial resolution of RADKLIM.

Precipitation intensity
The cumulative frequencies of hourly precipitation within the European domain for all 48 hours are depicted in Fig. 3 (a) It could be that the original timestep in IFS is too large to properly represent deep convective processes associated with such high vertical wind velocities (see also chapter 3.1.5). For instance, with a 120 s timestep (IFS 2.9 km) and assuming a midtropospheric vertical wind speed of 20 m s −1 , an airparcel would traverse the troposphere in merely 3-4 time steps. The large timestep thus implies inaccuracies, as the airparcel's trajectory and its forcing by diabatic heating cannot be fully accounted for.

340
So while the semi-Lagrangian scheme prevents the model from developing instabilities, the large timestep will likely affect the convective mesoscale dynamics, truncate extreme updrafts and thus allow less heavy precipitation events. As COSMO does not show such a timestep sensitivity, one could argue that the timestep in the COSMO simulations is already small enough and thus has no significant effect on convective mesoscale dynamics. However, also COSMO develops higher vertical velocities with a smaller timestep (see Sect. 3.1.5), even though a bit less pronounced than IFS. So it is hard to imagine that this truncation of 345 vertical updrafts is the only reason for the strong timestep dependence of precipitation in IFS.
(b) By halving the timestep, any subgrid-scale parameterization scheme will be called twice as often, which may affect precipitation. Notably, paramerizations such as cloud microphysics, shallow convection, or vertical mixing could experience timestep sensitivity which could affect convective processes. Barrett et al. (2019) shows a strong timestep sensitivity of total precipitation in an idealized setup with COSMO in combination with a two-moment microphysics scheme (see above). For our 350 simulations, COSMO and IFS both use a single-moment scheme which show little timestep sensitivity of total precipitation but some sensitivity regarding precipitation location and magnitude in Barrett et al. (2019). While we do not see an impact in COSMO, timestep sensitivity of subgrid-scale parameterization could affect IFS, where the absolute differences in timestep size are generally larger.
(c) One possibility that has been investigated was the sensitivity of the interpolation error in the semi-Lagrangian scheme to 355 timestep. In semi-Lagrangian schemes, the accumulation of errors is also a function of the timestep and the error of the spatial interpolation procedure (Bonaventura, 2004). Halving the timestep leads to twice as many interpolations, which can potentially increase the total interpolation error, lead to more damping and thus increase the amount of implicit diffusion. In Sect. 3.2, it is shown that more diffusion leads to more heavy precipitation events, which could thus explain the timestep sensitivity in IFS. However, we do not see increased damping in IFS with a smaller timestep. This becomes obvious when looking at the  these configurations seems to produce unreasonable values. In general, the distribution of the runs with higher resolutions are closer to the observations.

Diurnal cycle over land
Several studies have already shown that parameterized deep convection leads to a premature diurnal cycle in COSMO (Hohenegger et al., 2008;Ban et al., 2014;Leutwyler et al., 2017;Vergara-Temprado et al., 2020). Figure 5 shows that this also applies in this study. Compared to COSMO, the phase of IFS with parameterized deep convection is shifted a bit towards the later hours, which could be a result of the improvements from Bechtold et al. (2014) to the parameterization scheme by 390 Tiedtke (1989). But still, both IFS and COSMO show a significant phase shift with paramterized deep convection. Convective precipitation lasts longer in IFS than in COSMO for all configurations. Compared to the observations from IMERG, convective precipitation in IFS lasts too long for both days, whereas in COSMO it seems to be about right for the first day but too short for the second day. A very interesting aspect is the dependence of the diurnal cycle on spatial and temporal resolution. All runs, except the ones that already use a rather small timestep (COSMO 4.4 and 2.2), show an earlier development and also decay of  550 m. In their studies, the peak of the diurnal cycle then shifts back, again more towards the evening, with 1.1 km and 550 m.

400
It is possible that, by increasing the resolution even further, a similar effect could be seen in this case. The shift of the diurnal cycle with higher temporal resolution for the explicit COSMO 12 km is also consistent with the results from Panosetti et al.

Total precipitation
Total precipitation during the two days has been analyzed for four domains: The whole European domain, the land part of the European domain, the RADKLIM domain, and the IDAWEB stations. The results are summarized in Table 2.
For the whole European domain, all COSMO runs show clearly less precipitation than IMERG. And while all COSMO 415 simulations with explicit deep convection produce about the same amount of precipitation, the one with parameterized deep convection is clearly an outlier with even less precipitation. The IFS runs with explicit deep convection show about the same amount of precipitation as IMERG. Also here, the run with paramaterized deep convection shows significantly less precipitation than the explicit ones, similar to the results of the global simulations with IFS by Dueben et al. (2020). This is most probably due to the missing medium-to-heavy precipitation in the parameterized runs as shown in Sect. 3.1.2.

420
If one looks only at the precipitation over land, COSMO is much closer to IMERG, while the values from IFS are clearly larger. The effect of parameterized deep convection for both models is the same as for the whole European domain but even more distinct as the larger part of deep convection is happening over land. Moreover, IFS shows a clear sensitivity to timestep with the amount of precipitation increasing with decreasing timestep. One of the properties of hydrostatic systems is supposed to be the overestimation of convective precipitation amount and 425 area compared to nonhydrostatic systems (Kato and Saito, 1995;Kato, 1997). When looking at total precipitation over land and comparing it with values from IMERG, it looks like IFS is overestimating convective precipitation. Additionally, the overestimation seems to get worse with increasing resolution which is consistent with findings by Kato (1997). However, it is not clear whether this effect can be purely attributed to the hydrostatic core, as there are other factors, notably the subgrid-scale parameterizations, to consider.

430
Total precipitation in the RADKLIM domain and at the IDAWEB stations has to be interpreted cautiously as both domains are rather small and the simulations cover only 48 hours. But the numbers support the findings from the European domain in the sense that IFS seems to overestimate precipitation while COSMO generally underestimates it. Also, the precipitation-reducing effect of parameterized deep convection is visible for both domains. 850 hPa level is almost symmetric in COSMO and for the coarser IFS runs, the updrafts at the 500 hPa become much stronger than the downdrafts. This property is consistent with a cross section of a multicell thunderstorm produced by COSMO in Fig.   7. At 850 hPa, the updrafts are not really that strong due to the proximity to the planetary boundary layer. At 500 hPa, we are 440 well above the level of free convection and the updraft velocities become very high in this area. In general, downdrafts are more frequent than updrafts in both models and on both levels, but they never develop the strength of the deep convective updrafts.
Especially in IFS, the values for the downdrafts are quite low.
The profound impact of deep convection parameterization on the vertical motions in the atmosphere can be seen on the two panels on the left-hand side of Fig. 6. Both the updraft and downdraft velocities are much smaller with parameterized 445 deep convection. This effect is most pronounced for the IFS updrafts at 500 hPa where the parameterization leads to very low updrafts which is also consistent with the lack of heavy precipitation in this configuration (see Fig. 3). Both models show some sensitivity to horizontal resolution and the updraft velocities at 500 hPa are also comparable between the respective horizontal resolutions. The timestep sensitivity seems to be more pronounced in IFS at both levels, which we interpret as resulting from the larger vertical motion in combination with a large time step. Nevertheless, the updraft velocities 450 for the 4.5 km and 2.9 km runs of the hydrostatic IFS are similar to those of the nonhydrostatic COSMO runs with 4.4 km and 2.2 km grid spacing, respectively. Hence, the presumption that the vertical velocities could become unrealistically high due to the violation of the hydrostatic assumption at these resolutions cannot be confirmed. It is not clear to what extent the large timestep of IFS influences these results, but results from Dueben et al. (2020)  3.1.6 Energy spectra While kinetic energy spectra are generally not used as a measure of a model's skill, they can be useful in order to determine whether a model is able to reproduce the observed dynamics of the atmosphere (Skamarock, 2004). Observational analysis from Nastrom and Gage (1985) showed a transition of the kinetic energy spectra from a k −3 dependence at the large scale, 465 characteristic of two-dimensional turbulence, to a k −5/3 dependence at the mesoscale, with k being the wavenumber. These results have been confirmed by other studies such as Lindborg (1999); Cho et al. (1999a, b). The two upper panels of Fig. 8 Dueben et al. (2020). COSMO seems to conserve a bit more kinetic energy at smaller wavelengths while IFS shows stronger dissipation at these scales.  examined kinetic energy spectra produced by IFS and 475 identified subgrid-scale paramterizations (notably surface drag and momentum vertical mixing) as the major contributors to dissipation in IFS. They have also found that differences in orographic filtering will affect the energy transfer. It is likely, that the differences between COSMO and IFS in dissipation rate for smaller wavelengths are caused by a combination of different factors.
So while timestep seems to have little influence on the horizontal kinetic energy spectra, it certainly has an influence on the 480 vertical wind spectra, as the lower right panel of Fig. 8 shows. For most runs a reduction in timestep leads to significantly more energy throughout all wavelengths. This is most pronounced for the pairs of simulations with larger timesteps, but even for the runs with smaller timesteps it leads to an increase in power, especially for the smaller wavelengths. Similar to the horizontal kinetic energy spectra, COSMO conserves more energy in the smaller wavelengths than IFS. The effect of parameterizing deep convection seems to be even more drastic for the vertical than for the horizontal winds: The amplitude is clearly reduced 485 throughout all wavelengths for both models.
The amplitude and shape of the power spectral densities of w from the explicit runs seem to mostly agree with values from other observational and numerical studies (Bacmeister et al., 1996;Gao and Meriwether, 1998;Callies et al., 2016;Schumann, 2019;Panosetti et al., 2019). The spectra of the configurations used in this study all follow a slope of roughly k −1/5 . This seems to be a realistic value (Liu, 2019), even though there is quite a bit of variability in the aforementioned studies, indicating 490 that probably there is some dependence on the specific weather situation, altitude, and regional climate considered.   This observed increase of convective cell size and heavy precipitation with additional horizontal diffusion is very similar to 510 the results in Ricard et al. (2013) with AROME. Malardel and Ricard (2015) increased diffusion in the convergent parts of the flow and reduced it in the divergent parts in order to improve the conservation property of the scheme used in IFS, AROME, and HARMONIE, which lead to a decrease of heavy precipitation. Unlike in Malardel and Ricard (2015), the additional diffusion is applied everywhere in our experiment. It is not clear how the results would change with COSMO, if for example diffusion would be applied only to the convergent or divergent part of the flow, but answering that is beyond the scope of this study and 515 would require further investigations.

The effect of horizontal diffusion
One of the most important conclusions from the COSMO diffusion experiments is the evidence that horizontal diffusion in the governing equations does not act to simply smooth the precipitation field (which would weaken and broaden all cells, but not significantly change their number). Rather it appears that diffusion more fundamentally affects the dynamics: With higher diffusion, the available CAPE is consumed by substantially fewer but broader updrafts (Fig. 10). In terms of peak vertical 520 velocity, however, the cells weaken, and one wonders why the peak hourly precipitation rates increase so strongly (Fig. 9). We think that the increase of peak precipitation is owed to an accumulation effect. As the cells are much broader, the precipitation footprint at the surface will take longer while propagating over an affected gridpoint. Evidence for such accumulation effects can be seen in the elongated hourly precipitation signatures in Fig. 10. Figure 11 shows the power spectral densities of the COSMO diffusion experiments, and it is obvious that more diffusion 525 leads to more dampening near the short-wave cut off. In fact, the spectra from COSMO with substantial explicit diffusion look quite similar to the ones obtained from IFS with 2.9 km grid spacing in Fig. 8

Conclusions
IFS produces more light precipitation than COSMO in all configurations and generally produces more precipitation. For both models, parameterized deep convection leads to more light precipitation but less medium-to-heavy precipitation. With explicit 530 deep convection, the cumulative frequencies in COSMO are quite constant with regard to horizontal resolution and timestep. This is not the case for IFS, which shows an increasing amount of heavy precipitation with increasing resolution. However, the deciding factor for the precipitation frequencies in IFS seems to be the timestep. IFS runs with a smaller timestep all lead to significantly more heavy precipitation than the respective runs with larger timestep. It is not entirely clear how much this behavior is an effect of timestep on resolved dynamics or the subgrid-scale parameterizations and their coupling. It is possible 535 that a combination of these factors contribute to this timestep sensitivity of precipitation intensities. The comparison of model results with the three observational datasets IMERG, RADKLIM, and IDAWEB showed that both model's runs with explicit deep convection seem to be in the range of realistic values when it comes to precipitation intensities.
In contrast, both runs with parameterized deep convection failed to reproduce the medium-to-heavy precipitation that could be observed during these two days and thus also produced significantly less precipitation.

540
Resolution and timestep size also have an effect on the diurnal cycle of precipitation over land. A higher spatial and temporal resolutions seems to lead to an earlier onset and peak of precipitation. While we see a convergence of the diurnal cycle already at 4.4 km grid spacing in COSMO, IFS only shows signs of convergence at the highest resolution with 2.9 km grid spacing, most probably still due to the relatively large timestep sizes of 120 s and 60 s. Furthermore, this study also reinforces the evidence that parameterized deep convection leads to a much earlier onset and peak in the diurnal cycle. However, besides the 545 two coarsest runs (COSMO 12 km and IFS 9 km) with explicit deep convection, all runs seem to have a too early phase in the diurnal cycle when compared with observations from the multi-satellite product IMERG.
The redistribution of heat and moisture due to parameterized deep convection has a distinct effect on the vertical velocities, leading to lower values for the downdrafts and especially the updrafts. From the runs with explicit deep convection, the respective updraft values at the 500 hPa level were quite similar between the nonhydrostatic COSMO and the hydrostatic IFS.

550
This indicates that the hydrostatic approximation at a grid spacing of around 2-3 km still works well and does not lead to too high updraft values. However, the downdraft values in IFS are significantly lower than in COSMO throughout almost all simulations. This could be a characteristic of hydrostatic models (see for example Dueben et al., 2020) or also be caused by enhanced diffusion in IFS compared to COSMO. However, this would require further investigation.
The influence of timestep on wind velocities does not seem to be very crucial for the horizontal winds and both models show 555 almost no change in the spectra of horizontal kinetic energy with different timesteps. The vertical winds however, are clearly influenced by the timestep. This is visible in changes in spectra and also frequency distributions where a large timestep seems to suppress high vertical velocities. The importance of resolving all these high velocities, compared to the significant additional computational costs involved with a smaller timestep, is up for debate and probably also depends on application and purpose of the simulation.

560
Increasing horizontal diffusion in COSMO leads to more medium and heavy precipitation, making the precipitation frequency profile in this range look similar to the ones from IFS. Furthermore, more horizontal diffusion also leads to a reduction of downdraft velocities at the 500 hPa level and thus also making the vertical velocity profiles of COSMO and IFS more akin.
The added diffusion generally leads to fewer convective cells while increasing the horizontal extent of these cells. This could be a reason why the hydrostatic approximation still seems to work quite well even at a grid spacing of around 2-3 km, as the 565 relatively large horizontal width of the cells might prevent them from entering the nonhydrostatic regime where the vertical extent of the buoyant cells becomes larger than the horizontal extent. But while this sensitivity to dissipation certainly would need a more detailed investigation, it seems to explain some of the characteristic differences between COSMO and IFS.
Given the significant structural differences between the two models, it is very difficult to confidently attribute differences in the shown results to specific model properties. While this study is able to give some indications, it also stimulates further 570 research. For example, is the sensitivity in heavy precipitation of IFS with regard to timestep size mainly a dynamical effect (with the normal timestep being too large to properly resolve the updrafts) or rather an effect from the increased calling frequency of the subgrid-scale parameterization schemes? It would be intriguing to only vary the timestep of the dynamics while leaving the timestep for the physical parameterizations constant (or the other way around) in order to be able to answer this question. A hypothesis in this study is that we see an increase in heavy precipitation with more horizontal diffusion mainly 575 due to an accumulation effect of the larger convective cells and not necessarily due to heavier instantaneous precipitation. This could be verified by a similar test configuration as for our diffusion experiment, but with a focus on instantaneous precipitation rates with a high output frequency. Regarding the validity of the hydrostatic approximation, the current study supports a view that it is still suitable for a grid spacing of around 2-3 km. But is this mainly due to the rather diffusive behavior of IFS with its relatively large convective cells? A study with more focus on convective cell size rather than grid spacing could probably 580 answer this (for example similar to Miyamoto et al., 2013;Jeevanjee, 2017). What is the role of differences in timestep, numerical methods, or subgrid-scale parameterizations between the hydrostatic IFS and the nonhydrostatic COSMO when it comes to updraft velocities? Investigating this question in more detail would require a study with two models that have nearly identical numerics and use the same physical parameterizations. For IFS, a prerequisite for such a comparison would be to try bringing the nonhydrostatic spectral IFS closer to the operational hydrostatic IFS with regard to the timestepping scheme, the 585 vertical discretization and the physics-dynamics coupling.
Code and data availability. Model codes developed at ECMWF are the intellectual property of ECMWF and its member states, and therefore the IFS code is not publicly available. Access to a reduced version of the IFS code may be obtained from ECMWF under an OpenIFS licence (see http://www.ecmwf.int/en/research/projects/openifs for further information, last access: 28 January 2021). The particular version of the COSMO model used in this study is based on the official version 5.0 with many additions to enable GPU capability and available under 590 license (see http://www.cosmo-model.org/content/consortium/licencing.htm for more information, last access: 28 January 2021). Most of these developments have been reintegrated into the mainline COSMO version in the meantime. COSMO may be used for operational and for research applications by the members of the COSMO consortium. Moreover, within a liense agreement, the COSMO model may be used for operational and research applications by other national (hydro-)meteorological services, universities, and research institutes. ECMWF operational analysis data, which has been used for initial (IFS and COSMO) and lateral boundary conditions (COSMO), is available at https: 595 //www.ecmwf.int/en/forecasts/dataset/operational-archive (last access: 28 January 2021). The model output data from IFS and COSMO used for the figures in this work, as well as the intial conditions for the soil in COSMO are available under https://doi.org/10.5281/zenodo.4479130.
Author contributions. CZ, NPW, and CS designed the experiments. CZ performed the COSMO model simulations and NPW the corresponding IFS simulations. CZ performed the analysis of model output and observations with supervision from CS and NB, and technical support from PDD and NPW. NPW, PDD, and CS were strongly involved in the discussion of the results. CZ wrote the paper with input from all other co-authors. the GPU-accelerated version of COSMO. We would like to thank Elmar Weigl and Marcus Paulat (DWD) for their assistance concerning the RADKLIM dataset, and Pirmin Kaufmann (MeteoSwiss) for his help regarding the IDAWEB observations. Peter D. Dueben gratefully acknowledges funding from the Royal Society for his University Research Fellowship and the ESIWACE2 project. ESIWACE2 has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 823988. Bubnová, R., Hello, G., Bénard, P., and Geleyn, J.-F.: Integration of the Fully Elastic Equations Cast in the Hydrostatic Pressure Terrain-Following Coordinate in the Framework of the ARPEGE/Aladin NWP System, Monthly Weather Review,123,