Evaluation of the COSMO model (v5.1) in polarimetric radar space – Impact of uncertainties in model microphysics, retrievals, and forward operator

Sensitivity experiments with a numerical weather prediction (NWP) model and polarimetric radar forward operator (FO) are conducted for a long-duration stratiform event over northwestern Germany, to evaluate uncertainties in the partitioning of the ice water content and assumptions of hydrometeor scattering properties in the NWP model and FO, respectively. Polarimetric observations from X-band radar and retrievals of hydrometeor classifications are used for comparison with the multiple experiments in radar and model space. Modifying two parameters (Dice and Tgr) responsible for the production of snow and 5 graupel, respectively, was found to improve the synthetic polarimetric moments and simulated hydrometeor population, while keeping the difference in surface precipitation statistically insignificant at model resolvable grid scales. However, the model still exhibited a low bias in simulated polarimetric moments at lower levels above the melting layer (-3 to -13◦C) where snow was found to dominate. This necessitates further research into the missing microphysical processes in these lower levels (e.g., fragmentation due to ice-ice collisions), and use of more reliable snow scattering models to draw valid conclusions. 10


Introduction
Polarimetric radar networks provide an unprecedented database to evaluate and improve cloud microphysical parameterizations in numerical weather prediction (NWP) models. With the increasing availability and use of such modern remote sensing observations (also satellites, radiometers etc.) for NWP model validation, evaluation, and data assimilation, there is an increasing demand for cloud microphysics parameterization schemes to realistically approximate cloud microphysical processes and 15 hydrometeor properties such as size distributions, partitioning into different classes/types, bulk densities, fall speeds etc. This is key for consistent forward simulations of cloud-related quantities across different measurement platforms and different parts of the electromagnetic spectrum, because these radiative transfer calculations depend critically on the particle properties and their spatial distributions. Errors herein can lead to inconsistencies in simulated measurements for the same cloud, which may cause adverse effects, for example, in data assimilation. Even with a single device like a polarimetric radar, there can be such forward operator (FO), and the radar retrievals used for this study. A first model evaluation with current default configurations of microphysics and FO is presented in Sect. 3. Sensitivity studies with model and FO are discussed in Sect. 4. Finally, overall discussion and conclusions are presented in Sect. 5 and Sect. 6, respectively.

70
Spatially and temporally high-resolution polarimetric weather radar measurements provide the undisputed core information for an in-depth evaluation of NWP models. In addition to improvements in quantitative precipitation estimation, polarimetric radars provide insights into precipitation microphysics and the 3D distribution of hydrometeors. This study exploits measurements of the polarimetric X-band Doppler radar (BoXPol) in the city of Bonn, Germany. It is installed at 50.73052 • N, 7.0716638 • E on a 30-m tall building next to the Institute for Geosciences, Department of Meteorology, at the University of Bonn at 99.9 m 75 MSL (see also Diederich et al., 2015). Trömel et al. (2014) and Ryzhkov et al. (2016) introduced so-called quasi-vertical profiles (QVPs) to present polarimetric radar observations in a time versus height format. To generate QVPs, the azimuthal median is calculated from standard conical scans measured at higher elevation angles and the range coordinate is transformed into height. Here we use the 18 • elevation scan with 100 m range resolution. One key advantage is the inherent noise reduction from the averaging process which some-80 times enables the detection and quantification of small but meaningful vertical gradients in the polarimetric radar variables.
Furthermore, QVPs are well suited to study the temporal evolution of processes and to directly compare with other mostly vertically pointing sensors and, in particular, with model simulations. QVPs represent the average conditions within the cone spanned by the radar scan with decreasing resolution with height. If the precipitation is not uniform within the cone, errors from the averaging process are larger, but the advantage of a significant noise reduction clearly dominates for microphysical 85 studies of widespread stratiform rain as presented in this paper.
The variables-horizontal reflectivity Z H , differential reflectivity Z DR , and cross-correlation coefficient ρ hv are masked where ρ hv < 0.7 to exclude clutter and bins without significant weather signal from the attendant QVP calculations. Calibration offsets for Z H and Z DR are estimated following Diederich et al. (2015) and Pejcic et al. (2021b). After differential phase Φ DP is masked where ρ hv < 0.95 and Z H < 0 dBZ, the measurements are smoothed with a median filter using a 1.1 -km moving 90 window (i.e. including 11 range bins). The melting layer detection algorithm of Wolfensberger et al. (2016) is applied to each ray of smoothed Φ DP and the identified bins are removed. A linear interpolation of Φ DP across the melting layer is performed that excludes the component of backscattering differential phase δ and enables the estimation of an average specific differential phase K DP in this region. Least-squares fitting on a moving 3.1 -km window is used to estimate K DP based on the smoothed Φ DP without melting layer contamination. Finally, the QVP of K DP is calculated using the same azimuthal median approach. 95 We study an event of widespread stratiform rain passing the Bonn area and monitored by BoXPol on 16 Nov 2014 between 00:00 and 10:00 UTC. Fig. 1 shows the QVPs during this time, illustrating only moderate reflectivities Z H around 20 to 25 dBZ near the surface and attendant Z DR values typical of rain. The melting layer is observed at around 1.5 km and significant Z H values are observed up to 6 km. Enhanced Z DR values up to 0.4 dB indicate pristine crystals near the cloud top, while decreasing Z DR together with increasing Z H toward lower levels indicates ongoing aggregation or riming processes. Enhanced K DP values 100 above the melting layer are especially observed in the first half of the observation period, pointing towards enhanced number concentrations of ice particles and thus ice water content (IWC), which is in line with higher Z H below the melting layer and thus higher rain rates in the first half of the observation period. ρ hv is mostly close to 1 except for in the melting layer where values decrease to 0.92 to 0.95, in line with statistics of polarimetric variables performed in stratiform precipitation observations at X band (Trömel et al., 2019). Pristine crystals together with decreasing signal-to-noise ratio also results in a 105 ρ hv reduction near the cloud top.

The COSMO Model (v5.1)
This study evaluates the partitioning of total ice water content in the Consortium of Small-scale Modelling (COSMO) model (Steppeler et al., 2003;Baldauf et al., 2011) with a 2-moment bulk microphysics scheme (Seifert and Beheng, 2006) (henceforth SB2M). The extended version of the SB2M is used which includes a separate hail class described in Blahak (2008) ;Noppel 110 et al. (2010);van Weverberg et al. (2014). An additional dynamic saturation adjustment was turned on to reduce the time step sensitivity of model microphysics and precipitation (Barrett et al., 2019). More details about the dynamical core and other physical schemes are available from Baldauf et al. (2011).
SB2M predicts the mass densities ρ q and number densities ρ n of cloud droplets, rain, cloud ice, snow, graupel and hail, which are the zeroth and first moments of the particle size distributions (PSD) that is assumed to follow a modified Gamma with x being the particle mass and parameters µ and ν determining the shape of the distribution. The specific hydrometeor mass q and specific number n can be derived by q = ρ q /ρ and n = ρ n /ρ with ρ being the total density (air, vapor and hydrometeors). The size-mass and velocity-mass relations of different hydrometeors are parameterized by power laws with (maximum) particle diameter D, terminal fall velocity v T and parameters a g , b g , a v and b v .
The shape parameters µ and ν of the MGD remain constant for each hydrometeor class and N 0 and λ can be diagnosed from the two prognostic moments. However, if rain is below cloud base in the sedimentation-evaporation regime, it µ depends on 125 the mean diameter (Seifert, 2008;van Weverberg et al., 2014) to better capture the strong effects of these two processes on the shape of the rain size distribution.
To mitigate unphysical effects on the mean spectral particle mass x = q/n coming from the separate advection and sedimentation of q and n, it is very important to impose some minimum and maximum allowable mass limits for x (x min and x max ) at relevant places during the model time stepping. This is done by clipping n so that x stays within [x min , x max ]. For reference, 130 all fixed parameters which were used in this study are summarized in Table 1.
The cloud droplet nucleation parameterization is based on the lookup table of Segal and Khain (2006), which parameterizes the number of activated cloud droplets just above cloud base depending on the updraft speed w, ambient cloud nuclei (CN) concentration, mean and standard deviation of an assumed log-normal dry CN size distribution, and aerosol solubility, based on 1D rising-parcel simulations with a very detailed bin microphysical scheme. For this study, continental aerosol with CN 135 concentration N CN = 1700 × 10 6 m −3 , log-normal standard deviation ln(σ s ) = 0.2, mean radius of aerosol size distribution R 2 = 0.03 µm and solubility = 0.7 is used. w is chosen to be the prognostic grid scale updraft because the table values already include subgrid effects of a disturbed turbulent flow. Similarly, the ice nucleation parameterization is based on Kärcher and Lohmann (2002) and Kärcher et al. (2006). The large-scale concentration of aerosols of this parameterization for heterogeneous ice nucleation are chosen as N dust = 162 × 10 3 m −3 , N soot = 15 × 10 7 m −3 , and N organics = 177 × 10 7 m −3 .

140
For ice-phase processes, interactions between different hydrometeors involving collisions (e.g., riming, aggregation, icemultiplication) play an important role in the partitioning of the IWC. These interactions are parameterized using collision integrals and collision/sticking efficiencies (Seifert and Beheng, 2006), which are activated according to certain particle mean size and temperature thresholds. In this study, we focus on two of these parameters, motivated by the case study below.
The first parameter is the critical mean diameter of cloud ice particles D ice for conversion to snow through aggregation of 145 cloud ice. If the mean cloud ice size is larger than D ice , self collection leads to the production of snow, otherwise ice remains as ice. a gi and b gi are the size-mass parameters for cloud ice and q i and n i are its specific mass and number, respectively. Perturbations in D ice affect the ice to snow partitioning and also the size of the resulting snow particles.

150
The second parameter is the temperature threshold T gr below which the production of graupel by riming of cloud ice and snow with supercooled rain is allowed. If T < T gr , a certain part of the rimed cloud ice and snow particles are converted to the graupel class, otherwise no cross-class transfer happens. This pathway for graupel production may also be slightly controlled by D ice , because larger snow exhibits more riming.
As with most bulk cloud microphysical schemes without a prognostic melted fraction, SB2M most likely systematically 155 underestimates the distances which melting particles may fall until melting completely (i.e., the melting layer thickness). This is because SB2M instantaneously transfers the amount of meltwater formed during one model timestep from cloud ice, snow, graupel, and hail to the rain class. As a consequence, the melting hydrometeors shrink too fast, fall too slowly, and completely melt too quickly in the SB2M. We acknowledge this limitation in the SB2M scheme.

160
Evaluating output of NWP models with observations requires their data in a consistent and comparable parameter space, typically either in model space (e.g., hydrometeor mass and number densities) or observation space (e.g., radar reflectivity and further polarimetric radar variables). Conversion of the numerical model output into radar observables is done by means of a polarimetric radar forward operator. Particularly for model evaluation, it is important that the model and forward operator (FO) are consistent regarding parameters that affect the forward modeled observables. Many of these, including the phase 165 partitioning of hydrometeors during melting, the shape and orientation of particles, and the heterogeneous microstructure of frozen hydrometeors, are insufficiently constrained by the direct model output, and assumptions need to be made. When implicit assumptions exist in the numerical model, it is advantageous to ensure that the FO makes use of equivalent assumptions.
In this study, we apply the Bonn Polarimetric Radar forward Operator (B-PRO v2.0) (Xie et al., , 2021. B-PRO is a research-oriented polarimetric radar FO, which has been built onto an early, non-polarimetric version of EMVORADO (Zeng 170 et al., 2016), the German Weather service's operational radar FO. Below we explicate the B-PRO implementation and settings directly related to polarimetry. Further aspects are detailed in Appendix A.
B-PRO calculates and outputs polarimetric radar parameters on the spatial grid given by the numerical model field input.
That is, no beam integration and antenna pattern are taken into account, and hence the output Z H and Z DR are to be considered as unattenuated (or perfectly attenuation-corrected) variables. In addition, no simulated measurement errors are included, 175 making the output variables take on their intrinsic values. A software interface between the COSMO model and B-PRO ensures consistent microphysics, specifically the same hydrometeor class-dependent particle size distributions and size-mass relations used in the SB2M in Eq. (1) and 2 and Table 1.
The shape and orientation of the hydrometeors, which are the primary properties that characterize anisotropic scattering, are not at all constrained by the COSMO model. Instead, different parameterizations are applied by the FO. Apart from 180 the cloud liquid class, hydrometeors are modeled as homogeneous oblate spheroids. Their shape is described by the aspect ratio (AR), i.e. the ratio of the spheroids' semi-minor and semi-major axes. The spheroids are assumed to have no preferential orientation, with their maximum cross section parallel to the horizon on average, and with canting angles α out of the horizontal following a Gaussian distribution with a specified width σ canting , i.e. µ α = 0 • and σ α = σ canting (see Ryzhkov et al., 2011).
The applied hydrometeor class-dependent parameterizations for the frozen hydrometeors are given in Table 2. For rain, the 185 AR parameterization of Brandes et al. (2002) and σ canting = 10 • is used following Ryzhkov et al. (2011). The shapes and applying the T-matrix method for particles with a fixed orientation (Mishchenko, 2000) for canting angle α = 0 • , with the properties of particles with an orientation distribution then derived using the angular moments method of Ryzhkov et al. (2011, 190 2013a).
Compared to Xie et al. (2016), several modifications to B-PRO have been implemented within this study. Melting particle shape and orientation calculations have been adapted to consistently follow the approach of Ryzhkov et al. (2011). For cloud ice, the fairly spherical and unoriented crystals (AR = 0.9 -0.7, σ canting = 40°) resulting in insignificant polarimetric signatures have been replaced by more oriented (σ canting = 12°) and more non-spherical particles following a shape parameterization by 195 Andrić et al. (2013). Also, the range of sizes considered for cloud ice, representing single crystal particles in COSMO, has been largely extended (from the original upper integration limit, D upper int , of 200 µm to 4 mm) consistent with COSMO (mean) size limits of cloud ice and providing a better coverage of particle sizes contributing to the cloud ice bulk properties. The considered size range of snow, representing aggregated ice particles in COSMO, was also found to not sufficiently cover the sizes contributing significantly to bulk scattering (both regarding single particle scattering properties as well as the PSD-200 predicted number density). Therefore, the size ranges of snow, graupel, and hail were extended. The calculation of effective density and volume-equivalent diameter of the spheroids, which are inputs to the particle effective refractive index and T-matrix calculations, respectively, from D, x, and AR have been revised and corrected for both frozen and melting hydrometeors.
A summary of the shape and orientation parameterizations used as a baseline in this study (B-PRO def ) is given in Table 2.
The table further details implementations or choices for further FO parameters like the melting scheme, effective medium 205 approximation (EMA; see also Appendix A), and particle size ranges.

Hydrometeor Classification Algorithm (HCA)
In this study we use the HCA of Pejcic et al. (2021a), hereafter referred to as HCA-Pejcic. In this two-step method, agglomerative hierarchical clustering (Ward, 1963;Grazioli et al., 2015;Ribaud et al., 2019) is first applied to the polarimetric radar observations and ∆ z (i.e., the difference between observation height and 0°C level). With the sigmoid transformation used 210 in Besic et al. (2016), ∆ z separates liquid and solid regions with a smooth transition around the freezing level height. The resulting clusters are identified based on state of the art HCAs (Dolan and Rutledge, 2009;Dolan et al., 2013;Zrnic et al., 2001;Straka et al., 2000;Evaristo et al., 2013) and merged into categories comparable to the model hydrometeor classes for rain, snow, cloud ice, graupel, hail, and wet snow. In the second step, a modified method of Besic et al. (2018) is used to derive hydrometeor percentage (HP). Here, we use not only the centroids (mean values of the polarimetric moments) of the clusters 215 as done originally, but we also calculate the covariance of the five dimensional observations. Instead of the exponential distribution used by Besic et al. (2018), we use a multivariate normal distribution for the determination of the HP. This allows us to use the calculated centroids and covariances to determine the shape of the individual hydrometeor probability functions in five dimensions without parameterization. Furthermore, the membership function-based HCA of Zrnic et al. (2001)  hydrometeor categories of vertically aligned crystals, horizontally aligned crystals, wet snow, dry snow, graupel/hail, rain/hail, hail, large drops, heavy rain, moderate rain and light rain. HCA-Dolan uses categories of big drops/melting hail, hail, high density graupel, low density graupel, vertically aligned ice, wet snow, aggregates, ice crystals, rain, and drizzle. In these two methods, theoretically calculated membership functions are determined for each hydrometeor type and for each polarimetric variable and a temperature variable. As explained in Zrnic et al. (2001) and Dolan and Rutledge (2009), the dominant classes 225 are then determined by the highest score calculated over the membership functions.

Problem description
The study is set up over the Bonn radar domain (Shrestha, 2021). For consistent comparison to the QVPs from the radar observations, the outputs of the model and FO are also postprocessed to obtain synthetic QVPs using conical scans with 18 • elevation angle along the vertically stretched model grid. This is achieved by generating a one grid cell-wide circular mask for each model level, whose diameter increases with height as a function of elevation angle, and using this mask (each containing a minimum of eight grid cells) to estimate the median value of the model 240 or FO data at that level.
Fig. 3a-c shows the QVPs of modeled ice hydrometeors from the CTRL run of COSMO. The hydrometeor population here is dominated by snow, which is primarily produced by self-collection of cloud ice that grows rapidly via aggregation. As the hydrometeors fall downwards, the mixing ratio of cloud ice (q i ) decrease gradually, while the snow mixing ratio (q s ) increases rapidly until the melting layer. The melting layer here is defined as the region around the 0°C isotherm located around 1.5 km.

245
At this height, the cloud ice and snow aggregates in the presence of cloud water and rain also produce considerable amounts of graupel via riming. The graupel mixing ratio (q g ) peaks at this height and then gradually decreases as it melts producing rain, while falling downwards to the surface. The QVPs of modeled rain for the CTRL run are shown in Fig. 4a . The rain mixing ratio (q r ) increases gradually below the melting layer towards the surface as the meltwater fraction from graupel is transferred directly to rain.

Evaluation of synthetic radar observations
Synthetic radar observations of the CTRL run are derived by running the FO with the B-PRO def setup using the model outputs.
To minimize the FO computational cost, the domain was cropped to only cover the QVP extent, which is simply governed by the maximum diameter of the conical mask near the top of the precipitating system. FO calculations were performed for each 5 min step during 00:00 -10:00 UTC, and QVPs were produced from the FO radar variable fields using a conical mask (see The cloud ice-dominated upper levels show reflectivities up to 10 dBZ, similar to the observations (see Fig. 1a). The increase of Z H with decreasing height through the snow-dominated layers and towards the melting layer is somewhat stronger compared to the observations, with Z H reaching up to 30 dBZ just above the melting layer compared to ≤25 dBZ in the observations.
The maximum Z H in the melting layer agrees fairly well with the observations, but the melting layer appears wider in the 260 model-simulated radar data compared to the observations. This pattern continues below the melting layer, where the synthetic Z H is significantly higher and high Z H appear over much longer times.
Cloud ice, located at heights >5 km, shows a clear polarimetric signature with Z DR from 0.2 up to 2 dB, K DP up to 0.2 • /km, and ρ hv slightly decreased below 1. This is qualitatively in agreement with the observations; quantitative comparisons are not meaningful at these heights due to significant uncertainties in the observations. The snow-dominated layers are characterized  The matter of synthetic ρ hv in most regions being very close to 1 while observations show reduced values can likely be explained by shortcomings in the FO assumptions on hydrometeor shape and orientation. ρ hv describes the correlation between the horizontally and vertically polarized returned radar signals and is sensitive to particle shape, composition, and orientation; hence, it provides information about the diversity of the scattering particles within the observed radar volumes (Kumjian, 2018). That is, overestimating ρ hv indicates too little variability of assumed hydrometeor shapes and orientations in the FO.

275
Assuming hydrometeors of all categories and sizes to be homogeneous spheroids, with identical shapes at least for all particles of one category and size, as is the state of the art for the majority of polarimetric radar FOs, is clearly a simplification of the shape and microstructure variability of real-world hydrometeors. Simplifications are typical and necessary in modeling, however, the persistent overestimation of ρ hv suggests that the current assumptions oversimplify and produce too little variety in the structure of particles, and therefore fail to sufficiently reproduce observed ρ hv levels. Similar issues have been identified, 280 e.g., by Ryzhkov et al. (2013b). This current lack of forward modeling ability regarding ρ hv limits the applicability of synthetic ρ hv in data assimilation and in observation-equivalent based model evaluation.
The lack of polarimetric signatures at the snow-dominated heights may be due to issues with the partitioning of cloud ice and snow in these layers, but also due to the forward operator struggling to correctly model the scattering properties of snow aggregates. This is analyzed in depth in Sect. 4.1 and Sect. 4.2, respectively.

285
Large regions of high Z H together with extremely high Z DR and comparably low ρ hv (breaking the general pattern of synthetic values overestimating the observed one) below the melting layer suggest issues with the underlying COSMO modeled hydrometeor fields. Comparison to Fig. 4a indicates that the "curtains" of high Z H and Z DR do not coincide with high rain mixing ratios, and hence are likely not due to rain. Instead, as suggested by the frozen hydrometeor mixing ratios (Fig. 3a-c) and their mean sizes (not shown), these are caused by graupel. This has also been confirmed by no-graupel runs of the FO (not 290 shown) and has been observed during testing to occur independently of the choice of melting scheme settings in B-PRO. That is, the FO results strongly suggest an overestimation of the graupel occurrence below the ML.  (Fig. 6), because K DP values became too uncertain at altitudes above (Fig. 1).

295
When comparing the dominant hydrometeor types, it can be seen that mainly snow is identified at temperatures above zero degrees. HCA-Zrnic and HCA-Pejcic show predominantly cloud ice at the upper edges of the precipitation and only between 06:00 and 07:00 UTC; from 09:00 UTC onward, cloud ice is also classified down to the height of the melting layer. HCA-Dolan classifies no cloud ice except for very small isolated areas between 05:00 UTC and 07:00 UTC at approx. 2.5 km. In contrast to snow, only very small amounts of graupel are classified. Thereby, HCA-Pejcic shows between 01:00 and 04:00 UTC 300 and 07:00 and 09:00 UTC, graupel occurred directly above the melting layer, which can be only partially confirmed with the sagging of the melting layer (very visible in Fig. 1 at ρ hv and Z DR ) as an indicator of aggregation and riming. At the same time steps, smaller portions of graupel are classified by HCA-Zrnic. HCA-Dolan classifies rain above the 0°C isotherm in these areas and HCA-Zrnic in the middle of the melting layer. Considering the scores calculated via the membership function in HCA-Zrnic and HCA-Dolan (not shown here), there are only minor differences between the scores for graupel, rain and wet 305 snow in the same areas. Fig. 7 shows the hydrometeor percentages derived from the HCA-Pejcic. Below the melting layer, all HCA classify rain and the mixing ratio for rain is constant at 100% and changes with height to wet snow. The mixing ratio of graupel reaches up to 3 km altitude at 01:00 UTC, but with only small proportions below 40%. The proportions of cloud ice reach down to 3 km altitude between 01:00 and 03:00 UTC. In general, the different HCA indicate the dominance of cloud ice above 4 km, snow aggregates from 1.5 km to 4 km (with sporadic appearances of cloud ice), wet snow in the melting layer, 310 and rain drops below the melting layer. Comparison between the above radar retrievals and the CTRL simulations show two main differences in the partitioning of the IWC above the melting layer: 1) Excessive graupel production above the melting layer which extends from 1 to 2 km, and 2) a low concentration of cloud ice above 4 km and an absence of sporadic increases in cloud ice concentration above the melting layer. While the above deficiencies in the modeled hydrometeors in the CTRL run can be also observed in the synthetic radar variables compared to radar observations, with a thick bright band in the melting 315 layer, stronger reflectivity above the melting layer, and a lack of polarimetric signatures in the snow-dominated region, the contribution of possible errors in the assumptions used in the FO can also not be neglected. So, additional sensitivity studies with the model and FO are conducted to better understand the uncertainties in the model and the FO, which are discussed in detail in the following sections.  Table 3 summarizes the list of sensitivity experiments conducted to account for uncertainties in the partitioning of the IWC.

Sensitivity studies
The experiments include using different combinations of D ice and T gr . In the SB2M scheme, D ice controls the aggregation of cloud ice, thereby affecting the production of snow and depletion of cloud ice. Aggregation of cloud ice is the primary source 325 of snow production above 6 km, which then further grows in size by aggregation of snow or between snow and cloud ice below. Similarly, T gr controls the riming of cloud ice and snow with supercooled rain drops, and hence affects the production of graupel and depletion of snow above the melting layer.

Results
First, the model precipitation from the sensitivity runs was compared to the rain gauges available over Bonn, Germany. In 330 general, the 10-hour accumulated model precipitation for all runs is similar compared to the gauge measurements (see Fig. 8).
However, it is important to note that any perturbations in the model parameters can influence the spatial pattern of the precipitation. This can strongly influence the grid-scale comparison of precipitation between model and point observations. So, the domain average precipitation is also shown in Fig. 8a. A t-test was also conducted to check whether the difference in domain average precipitation between the CTRL and sensitivity runs were statistically significant (see Table 4). At native grid reso-335 lution, the difference is statistically significant (p < 0.05), while at actual model resolvable scales (e.g, 10 ∆x used here), the difference is statistically insignificant (p > 0.05). However, both similarities and differences in the microphysical processes for rain production in the sensitivity runs can be observed. For example, q r for EXP1 is qualitatively similar to the CTRL run, but EXP2 and EXP3 differ (Fig. 4). For EXP2 and EXP3, a steady sharp increase in q r near the melting layer can be observed, while the q r gradient changes with time and appears more disorganized in CTRL and EXP1 runs. Fig. 3d-f shows the QVPs of modeled frozen hydrometeors for model sensitivity experiment EXP3. The increase in D ice for cloud ice self-collection, as used in EXP1 and EXP3, substantially alters the hydrometeor population above 4 km. Cloud ice now dominates above this height, while snow aggregates dominate below until the melting layer. The change in the partitioning of cloud ice and snow aggregates in the mid-levels (>4 km), however, has no significant effect on the graupel mixing ratio below. The decrease in T gr for graupel production, as applied in EXP2 and EXP3, only prohibits most of the graupel formation 345 near the melting layer. Subsequently, the snow hydrometeors stretch further downward relative to the CTRL run and below the melting layer, eventually melting and producing rain. The absence of graupel production also has no effect on the hydrometeor partitioning in the upper layers. However, the change in the source of ice hydrometeors to form rain drops via melting directly modulates the q r profiles below the melting layer (see Fig. 4). This could be attributed to the differences in the sedimentation velocity and time-scales of melting for graupel and snow.

350
The change in the partitioning of frozen hydrometeors also leads to changes in the partitioning of cloud water and rain near the melting layer. Fig. 9 shows half-hourly averaged QVPs of hydrometeor mixing ratios at 0400 UTC for the CTRL and the sensitivity experiments. For EXP2 and EXP3, there is a general decrease in cloud water mixing ratio (q c ) with reference to the CTRL run. In the absence of graupel production, the sharp increase in q r near the melting layer due to melting of snow can also be clearly observed for EXP2 and EXP3. In the time-averaged QVPs, the change in the partitioning of the cloud ice and snow 355 aggregates in the mid-levels is also clearly visible between the CTRL and the sensitivity experiments. For the QVPs in Fig. 9, the mean size of graupel around the vicinity of the melting layer is around 1.5 -2 mm for CTRL and EXP1. The mean size of rain is qualitatively similar in all runs (<1 mm), except that it increases rapidly near the melting layer for EXP2 and EXP3.
Also, in these experiments snow aggregates stretch further downward below the melting layer as discussed above, with further increase in mean size (from 2 to 3.5 mm).

Effects in FO output
QVPs of synthetic radar observations for model experiment run EXP3 using the B-PRO def setup are shown in Fig. 10. Since the effects of D ice and T gr changes are mostly independent and occur at different heights, their individual effects on the FO modeled observations can be, and are in the following, discussed for the FO output from the combined model experiment EXP3 only.

365
The change of the dominant hydrometeor from snow to cloud ice in the mid-levels resulting from the increase in D ice between the CTRL and EXP1/3 runs leads to very little changes in Z H . However, the change in partitioning of cloud ice and snow above 4 km (increased amounts of cloud ice and reduction of snow) cause significantly intensified polarimetric signals with large regions of Z DR > 2 dB and K DP > 0.4 • /km. The reduction (or removal) of graupel resulting from the changes in T gr largely remediates the "curtains" of high Z H , extreme 370 Z DR , and depressed ρ hv below the melting layer. It leads to a more well-defined, distinct melting layer signature and only leaves streaks of enhanced polarimetric signals, now exclusively resulting from rain, both much better in line with the observations. However, it also eliminates the melting layer signal in K DP . The latter might be a result of too little graupel now existing around the 0 • C level. It can, though, also be due to the already discussed lack of polarimetric signals from snow.

Setup
Weakly or unconstrained assumptions in the FO introduce uncertainties in the forward modeled observation parameters, also known as forward model error and are considered as one type of representation error in data assimilation (Janjić et al., 2018).
Forward operator errors translate into errors and biases of the simulated radar variables, challenging the use of FOs in both model evaluation and data assimilation. In order to study the forward operator uncertainty, specifically with respect to the polarimetric radar parameters, we set up a variety of B-PRO runs with perturbed assumptions in the polarimetry-relevant microphysics, i.e. the shape and orientation parameterizations.
The shape and orientation of rain drops are considered to be comparably well known, hence rain is not considered in this polarimetry sensitivity study. Also, the studied case does not contain any significant amounts of hail (neither in CTRL nor the different experiment model run setups), therefore analysis of hail sensitivity is also skipped. For the remaining ice hydrometeor 385 classes both shape and orientation have been varied, described by the parameterizations of the oblate spheroid aspect ratios and the canting angle distribution width, respectively. For all three classes of ice, the shapes and orientation reported in the literature, derived from a range of methods, including in-situ imaging of particles (e.g. Garrett et al., 2012), and over different atmospheric situations, vary strongly. From the literature, we have compiled a set of AR and σ canting values that cover these reported ranges. For AR, we have selected three parameterizations providing a high, medium, and low AR (i.e. high(er) to 390 low(er) sphericity) case for each hydrometeor class. While AR is often parameterized as a function of particle size, σ canting is mostly estimated as a constant value (one exception is Wolfensberger and Berne (2018), who described the σ canting of snow and graupel slightly decreasing with size). Here, we use a set of two constant σ canting per hydrometeor class, a high and a low value, corresponding to weaker or stronger degrees of orientation. Polarimetric FO calculations have been performed for each combination of AR and σ canting . A summary of all setups is given in Table 5.

395
In contrast to the synthetic QVP for the CTRL and EXP3 cases shown above, where B-PRO was run over the QVP extent, the sensitivity B-PRO calculations have only been performed for a single model column at the grid point location of the BoXPol radar, due to the high computational costs of the polarimetric FO. Since the precipitation system for this case study is horizontally homogeneous, the resulting single-column profiles are in general in good agreement with the full-domain QVPs (compare Fig. 11), and conclusions drawn from comparing single-column FO results with QVPs can be considered robust.

Results
Results of the B-PRO sensitivity calculations are shown in Fig. 11  Reflectivities Z H are found to be insensitive to the changes in all of the hydrometeor classes shape and orientation assumptions. As expected from scattering theory, polarimetric signals increase with increasing non-sphericity (i.e. lower AR) and higher degree of orientation (i.e. lower σ canting ). Shape and orientation effects on the radar variables used here are difficult to disentangle since both decreasing AR (increasing non-sphericity) and decreasing σ canting (i.e. higher degree of orientation) 410 generally lead to an increase in Z DR and K DP . This renders independent evaluations or retrievals of shape and orientation challenging. Besides, the combined effects of AR and σ canting are not necessarily linear, but might instead amplify each other. This is, for example, observed in the case of snow (Fig. 11, middle row and Fig. 12, bottom row), where the change in Z DR 13 https://doi.org/10.5194/gmd-2021-188 Preprint. Discussion started: 6 September 2021 c Author(s) 2021. CC BY 4.0 License. and K DP between the AR mid + σ high case and the AR low + σ low case (orange solid to green dashed) are clearly higher than the changes from AR mid + σ high to AR low + σ high (orange solid to green solid) and AR mid + σ high to the AR mid + σ low (orange solid 415 to orange dashed) combined. The Z DR and K DP for the snow AR low + σ low case are significantly higher than for all other snow sensitivity setups, which cluster closely together.
The range of effects from variations of AR and σ canting within observed limits are very different between the different hydrometeors. Cloud ice exhibits nearly no polarimetric signals for the AR high cases, but provides almost excessive values compared to observations with Z DR > 1 dB and K DP > 0.1 • /km for the AR low + σ low case, which corresponds to the B-PRO def 420 setting. The effect of AR and σ canting variations on snow polarimetric signals is rather small. Solely, the AR low + σ low provides somewhat enhanced signals (Z DR ≈ 0.2 dB, K DP ≈ 0.05 • /km), but still remaining clearly below the observed polarimetric signals (Z DR ≈ 0.3 dB, K DP ≈ 0.1 • /km) at heights where snow is expected to be the dominating hydrometeor class.
Since shape and orientation do not noticeably affect Z H , general differences between synthetic and real observations in Z H remain. While the bright band and below-ML Z H are matched fairly well, the FO clearly overestimates reflectivities in the 425 dendritic growth and aggregation layers and underestimates them in the layers where cloud ice, i.e., pristine crystals dominate.
In the upper levels, both the observed Z DR and K DP in the cloud ice-dominated layers fall within the range covered by the different cloud ice shape and orientation parameterizations, with the best fit to the AR mid + σ low and AR low + σ high case (as far as can be judged extrapolating by eye). However, all cloud ice paramaterizations largely overestimate ρ hv with noticeable ρ hv reductions only for AR low + σ high and AR low + σ low (the latter excluding EXP3, though). The snow-dominated lower levels (-3 430 to -13 • C) are characterized by a strong underestimation of Z DR and an even stronger underestimation of K DP . Only the snow's AR low +σ low setup raises the polarimetric signals somewhat, but it is still insufficient to get close to the observed value. The ρ hv in lower levels also remain very close to 1 for all snow AR and σ canting settings. At the heights just above the ML, where graupel exists (i.e. in CTRL only), assuming a lower AR and especially a lower σ canting can raise Z DR and K DP towards or beyond the observed values. Specifically the graupel AR mid +σ high and AR low +σ high setups provide Z DR that is in good agreement with the 435 observations, but still underestimate K DP strongly, while AR low + σ low provides K DP close to the observations but also largely overestimates Z DR .
The Z H bright band signature agrees well between the real and synthetic observations. For the CTRL case (Fig. 11) in particular, the absolute values of Z H match well while the width of the layer is wider in the synthetic data. On the other hand, the bright band is equally narrow in the observations and synthetic data for the EXP3 case (Fig. 12), but the synthetic observations 440 overestimate the bright band peak by about 5 dBZ. In the polarimetric variables, generally the ML-related peak occurs slightly below where the observations show it. For EXP3, i.e. where practically no graupel occurs, the Z DR ML signature matches well with the observation apart from the placement at slightly lower height. Also, the below-ML Z DR values are in satisfactory agreement with the observations independent of the snow AR and σ canting applied. Synthetic K DP in general underestimates the observed value, but the snow AR low + σ high setup provides at least a well-pronounced ML signature. An ML signature is also 445 seen in ρ hv , but is clearly too weak independent of the applied AR and σ canting parameterizations. For the CTRL case, the ML signatures are dominated by the polarimetric signals of graupel. The flank of increasing Z DR with decreasing height is matched well for setups using graupel σ high . However, the Z DR increase continues to lower heights than in the observations, leading to significantly stronger Z DR ML peaks (2 dB compared to observed 1 dB). Setups using graupel σ low even peak at 3 dB. The synthetic K DP ML values are overestimated for σ low setups, but fit comparatively well for the σ high setups. The differences in 450 ρ hv are small between the AR and σ canting settings. Despite a more pronounced ρ hv decrease compared to the EXP3 case, ρ hv remains too high compared to the observations. Below about 1 km, no more differences occur in the radar variables of the different shape and orientation settings. This is because the FO's melting scheme predicts all ice hydrometeors to have melted there and having taken on the shape and orientation properties of rain drops. These melted hydrometeors, however, still follow the size distributions of their original 455 hydrometeor categories. That is, in the CTRL case, where the model still predicts significant amounts of fairly large-sized graupel, the FO models scattering properties of large liquid-graupel drops (with rain-like AR and σ canting ) that would not exist in reality, but would break up and form smaller drops. These large virtual drops dominate the polarimetric signals below the ML, causing values of Z DR that are far too high and values of ρ hv that are clearly too low.

460
At model-resolvable grid scales, the difference in domain average precipitation in the sensitivity runs with reference to control runs were found to be statistically insignificant. However, the partitioning of the IWC and the source of rain drops was found to vary, with EXP3 matching qualitatively well with the retrieved hydrometeor classification from the BoXPol. The low amount of cloud ice simulated above 4 km in the CTRL run is consistent with the earlier findings from Fridlind et al. (2017). This could be compensated for the SB2M scheme by increasing D ice (EXP2,3), but this change was found to have negligible effects on the 465 microphysical processes below (aggregation and riming). An increase in the temperature threshold for riming of ice and snow in the presence of supercooled raindrops independently reduced the graupel production (EXP2,3) near the melting layer. This also had no effect on the dominating aggregation process above. Hence, EXP3 is qualitatively similar to a linear combination of EXP1 and EXP2.
In the radar space, EXP3 also produced polarimetric moments much closer to the observations; i.e. the melting layer signature 470 is better represented and the curtains of high Z H and Z DR values below the melting layer are prevented. However, EXP3 also produces high Z DR and K DP above 4 km, indicating that mid to high values of cloud ice AR would better approximate the observed polarimetric moments at this level. Regarding the excessive polarimetric signals below the ML for the CTRL run, it is possible that the FO melting scheme could be partly responsible as it virtually produces huge, physically impossible, rainlike drops of melted graupel. However, preliminary tests with varied melting scheme setups (not shown) have not provided 475 significantly better results. Rather, delayed melting by increasing the T max parameter leads to a widening or smearing out of the melting layer signatures. Other sources of uncertainty are the shape and orientation assumptions of melting graupel. Instead of a linear transition towards the description for rain drops, independent parameterizations could be developed for melting graupel, e.g., assuming more spherical and less oriented graupel particles albeit without a physical basis. The problem lies in the prediction of significant graupel amounts of large mean sizes by the model that survive as graupel far below the expected 480 and observed melting layer.
Both the CTRL and sensitivity runs, however, produced a low bias in the polarimetric signal at lower levels (2.5 to 4.5 km, i.e. -3 to -13 • C), where snow aggregates dominate. The observed K DP and Z DR at this level are around 0.1 -0.4 • /km and 0.3 -0.5 dB, while the synthetic K DP and Z DR range from 0 -0.05 • /km and 0 -0.1 dB, respectively. None of the alternative shape and orientation setups for snow could provide sufficiently strong polarimetric signals to reproduce the observed signals at these 485 heights. Snow particles could be assumed to take on even more non-spherical shapes and higher degrees of orientation to create stronger signals and tune them closer to the observed values. However, this would require going clearly outside of the reasonable and in-situ observed range of AR and σ canting . Also, it is not possible to find consistent tunings that are valid for different wavelengths and observation techniques. Concerning the EMA for the ice-air mixture material of dry snowflakes, Bohren and Huffman (1983) showed that for mixtures of weakly (like ice) and non-dielectric (like air) substances, many different EMA for-490 mulas agree to first order, i.e., variations of the EMA are not expected to have a significant effect here. Applying methods like the Discrete Dipole Approximation (DDA) that solve for the scattering properties of irregularly shaped and heterogeneously structured particles, it has been demonstrated that internally homogeneous particles like (soft) spheroids, plates, or columns are not suitable proxies for fluffy, low-density hydrometeors like dendritic crystals or aggregates (Schrom and Kumjian, 2018).
That is, to improve snow polarimetric signals predicted by forward operators, more realistic, less homogeneous particle mod-495 els, and hence other scattering methods than the T-Matrix, need to be applied. Another cause of the low bias in the synthetic polarimetric signals at lower levels could lie in missing cloud ice that also appears sporadically as the dominant hydrometeor in the radar retrievals. The observed high K DP values with modest Z DR have also been reported for dendritic growth layer (between -10 to -20 • C) for tall clouds with colder tops (-40 C) and identified to be dominated by isometric ice crystals with AR around 0.6 (Griffin et al., 2018). However, here we observe high K DP at a relatively warmer temperature regime (-3 to -500 13 • C), also probably associated with isometric ice crystals (as evidenced by high ρ hv ). Such a peak in ice crystal concentration near -5 • C was also reported earlier by Stewart et al. (1984) for stratiform clouds. Hence, the low bias in the model simulated synthetic polarimetric signals could be possibly due to missing additional secondary ice production (SIP) parameterizations in SB2M (e.g., Takahashi et al. 1995;Fridlind et al. 2007;Korolev and Leisner 2020;. Of the identified possible SIP mechanisms, ice hydrometeor collision and breakup (leading to increase in cloud ice and reduction 505 in snow aggregates) appears to be one of the probable missing mechanism for SIP in the absence of supercooled water in the lower levels.
Finally, as discussed in the Appendix, an additional source of outstanding uncertainty not examined in these sensitivity tests is the chosen mixing formula and resultant dielectric constant. Varying the topology of the constituent phases, particularly for three-component (i.e., air-ice-water) particles within the melting layer, has been shown to have a dramatic impact on 510 the simulated reflectivity (Fabry and Szyrmer, 1999;Battaglia et al., 2003) and Z DR and K DP (Carlin, 2018). These effects can act in concert with the aforementioned variability due to AR and σ canting , and while some topologies are more physically plausible than others, it is not always clear which mixing formula best approximates the scattering properties of melting ice hydrometeors. Future work should further explore these sensitivities in an effort to constrain the degrees of freedom of the simulated polarimetric variables.

515
The model was generally found to underestimate the polarimetric signals in the lower levels (-3 to -13 • C), where snow aggregates dominated the simulated hydrometeor population. Sensitivity studies with different combinations of aspect ratios and widths of each hydrometeor's canting angle distribution in the FO could also not explain this model bias, indicating 1) shortcomings in the FO and requirement of more reliable snow scattering models to draw valid conclusions from that, and/or 2) 520 missing additional secondary ice production parameterization in the model.
In the absence of in-situ measurements using aircraft, this study shows the potential of polarimetric radar observations and radar retrievals based on high-resolution X-band radars for evaluating and improving the simulated cloud microphysical process by NWP models. For the model used in this study, improvements in the synthetic polarimetric signatures were found to be sensitive to uncertainty in the prescribed D ice and T gr , while still constraining the accumulated precipitation at model 525 resolvable scales. The latter is an important achievement for operational weather forecasting models to improve the simulated cloud microphysical processes in the model, without further degrading the forecast surface precipitation.
Future studies should make additional use of multi-frequency spectral radar polarimetric observations, which would further allow us to investigate the evolution of the ice particle size distributions , along with numerical modeling to better understand the biases in modeled IWC partitioning.

530
Code and data availability. The COSMO model is distributed to research institutions free of charge under an institutional license issued by the Consortium COSMO and administered by DWD. For more information see http://www.cosmo-model.org/content/consortium/licencing.html.
The COSMO license also includes access to lateral boundary data data provided by DWD. COSMO-DE analysis data used for the initial and lateral boundary conditions data for the COSMO model experiments in this study can be downloaded from the DWD database (https://www.dwd.de/DE/leistungen/pamore/pamore.html). The radar forward operator B-PRO is based on source code derived from the 535 COSMO model, hence redistribution is limited by the COSMO license. B-PRO v2.0 used in this study is available to registered collaborators from https://git2.meteo.uni-bonn.de/git/pfo.
Modifications to the COSMO and B-PRO source codes for the sensitivity studies along with the scripts used to setup, run, and process output of COSMO and B-PRO as well as processed COSMO and B-PRO data, rain-gauge data, polarimetric radar data from BoXPol, retrievals of hydrometeor classification and scripts to produce the figures in this work are available from https://doi.org/10.5281/zenodo.5218717 540 .

Appendix A: Forward Operator Details
While focusing on polarimetry, radar FOs also require additional assumptions regarding non-polarimetric factors like the melting parameterization and the EMA applied. These are already known to impact modeled reflectivities, but can also affect the polarimetric parameters.

545
In B-PRO, frozen hydrometeors (i.e., cloud ice, snow, graupel, hail) are modeled as ice-air mixtures, where the air fraction is derived from the model-provided size-mass relation and the hydrometeor shape assumptions applied in the FO. B-PRO includes a melting model: above a given, hydrometeor class-specific temperature T meltbegin , ice hydrometeors start to melt and turn into water-ice-air mixtures. For the modeling of the effective refractive index of the ice-air and water-ice-air mixture particles, a selection of effective medium approximations (EMA) are available, namely 2-and 3-component Maxwell Garnett, 550 Bruggemann, and Debye mixing rules which are popular in the meteorological radar community (see Blahak, 2016). Note that different EMAs have been theoretically derived under widely different assumptions and for different scattering phenomena (Bohren and Huffman, 1983) and lead to a widespread range of possible solutions for radar signals for the same hydrometeors (see Blahak, 2016, for effects on reflectivity). Because it is often unclear which EMA "best" suits which situation and which radar parameter(s), it should perhaps be treated in the sense of an ensemble based on many different EMA choices.

555
As explained in Sect. 2.2, the COSMO model and its SB2M scheme instantaneously transfer meltwater from the ice hydrometeor classes into rain. Taken literally, that implies that ice class hydrometeors are always (completely) frozen, and no mixed-phase hydrometeors exist. When only assuming pure phase hydrometeors, however, radar forward operators have issues producing realistic melting layer signatures. Some FOs try to compensate for the instantaneous meltwater shedding by artificially redistributing the rain present in a grid box back to the melting hydrometeors, assuming that all of the rain in the grid 560 box comes from melting and no shedding occurs until melting is complete (Wolfensberger and Berne, 2018;Jung et al., 2008).
While being an attractive and modern concept, this might systematically overestimate the mean sizes of melting particles and lead to artificial discontinuities of the polarimetric parameters at the lower edge of the melting layer (Wolfensberger and Berne, 2018). B-PRO instead assumes a mass fraction of the ice hydrometeors to be melted, i.e. turns a part of what the model predicts as unmelted ice into (unshedded) liquid water. In contrast to the redistribution approach above, this ensures continuity of the 565 radar signals over the melting layer edges but might lead to an underestimation of the mean sizes, and hence of the radar signals, of the melting hydrometeors. For practical purposes, a possible lack of "bright-band" may be artificially compensated by choosing an EMA from the available options which produces stronger melting signatures. The B-PRO melting model, inherited from EMVORADO (see Blahak, 2016), predicts a temperature-and particle size-dependent meltwater mass fraction for the ice hydrometeor classes once the ambient temperature exceeds a specified class threshold T meltbegin . The melt fraction decreases 570 with increasing D for constant T and grows exponentially with T at constant D. All particles of a given hydrometeor class are considered completely melted once a given temperature T max is reached. By default, the melting scheme is dynamic, i.e., T max is within specifiable limits determined from the model temperature and hydrometeor fields in each model column.
Hydrometeor scattering properties are calculated for single particles over a range of sizes per hydrometeor class, applying the size-dependent shape and melting fraction values. Bulk scattering properties per hydrometeor class are obtained by integrating 575 the monodisperse scattering properties over the particle size distribution provided by COSMO applying the Simpson quadrature rule.
The temperature thresholds governing the degree of melting of the particles as well as the EMA for each hydrometeor type can be controlled by the user through a (FORTRAN) namelist file. For this study, we defined a FO setup as a baseline, B-PRO def , using reflectivity observations from BoXPol as well as synthetic reflectivities from EMVORADO (Zeng et al., 2016)  as reference. The B-PRO default melting scheme parameters (see Blahak, 2016) are more suitable for convective situations.
For example, the default T meltbegin for graupel and hail are -10 • C to reflect the effects of wet growth in convective updrafts, but would not be suitable for stratiform clouds. Hence, for the study baseline setup in the present study the melting model parameters have been adapted for stratiform situations. T meltbegin for all hydrometeors is set to 0 • C. Also, the dynamic melting fraction scheme has been switched off, and T max is replaced by a fixed, hydrometeor-class specific value, except for snow.

585
Snow applies the dynamic scheme, but in the case studied here it practically behaves as if T snow max was set to 3 • C. In addition, the EMA of dry and wet snow, which are modeled as a homogeneous single-layer spheroid in B-PRO but as a 2-layered sphere in EMVORADO, has been chosen to roughly reproduce the melting snow reflectivities as predicted by EMVORADO. All settings are documented in Table 2. We gratefully acknowledge the computing time (project HBN33) granted by the John von Neumann Institute for Computing (NIC) and pro-600 vided on the supercomputer JUWELS at Jülich Supercomputing Centre (JSC). The post-processing of model output data and input/output for FO was done using the NCAR Command language (Version 6.4.0). We would like to thank the city of Bonn for providing the rain gauge data.