A process-based evaluation of the Intermediate Complexity Atmospheric Research Model (ICAR) 1.0.1

. The veriﬁcation of models in general is a non-trivial task and can, due to epistemological and practical reasons, never be considered as complete. As a consequence, a model may yield correct results for the wrong reasons, i.e. by a different chain of processes than found in observations. While in the atmospheric sciences guidelines and strategies exist to maximize the chances that models are correct for the right reasons, these are mostly applicable to full-physics models, such as numerical weather prediction models. The Intermediate Complexity Atmospheric Research (ICAR) model is an atmospheric model em- 5 ploying linear mountain wave theory to represent the wind ﬁeld. In this wind ﬁeld atmospheric quantities, such as temperature and moisture are advected and a microphysics scheme is applied to represent the formation of clouds and precipitation. This study conducts an in-depth process-based evaluation of ICAR, employing idealized simulations to increase the understanding of the model and develop recommendations to maximize the probability that its results are correct for the right reasons. To contrast the obtained results from the linear-theory-based ICAR model to a full-physics model, idealized simulations with the 10 Weather Research and Forecasting (WRF) model are conducted. The impact of the developed recommendations is then demonstrated with a case study for the South Island of New Zealand. The results of this investigation suggest three modiﬁcations to improve different aspects of ICAR simulations. The representation of the wind ﬁeld within the domain improves when the dry and the moist Brunt-Väisälä frequencies are calculated in accordance to linear mountain wave theory from the unperturbed base state rather than from the time-dependent perturbed atmosphere. Imposing boundary conditions at the upper boundary 15 different to the standard zero gradient boundary condition is shown to reduce errors in the potential temperature and water vapor ﬁelds. Furthermore, the results show that there is a lowest possible model top elevation that should not be undercut to avoid inﬂuences of the model top on cloud and precipitation processes within the domain. The method to determine the lowest model top elevation is applied to both the idealized simulations as well as the real terrain case study. Notable differences between the ICAR and WRF simulations are observed across all investigated quantities such as the wind ﬁeld, water vapor 20 and hydrometeor distributions, and the distribution of precipitation. The case study indicates a large shift in the precipitation maximum for the ICAR simulation employing the developed recommendations in contrast to an unmodiﬁed version of ICAR. The cause for the shift is found in inﬂuences of the model top on cloud formation and precipitation processes in the ICAR simulations. Furthermore, the results show that when model skill is evaluated from statistical metrics based on comparisons to surface observations only, such analysis may not reﬂect the skill of the model in capturing atmospheric processes such as 25 gravity waves and cloud formation. Additionally, for ICAR four centermost west-east south-north averaged and the average found as representative of the 5 In the central west-east cross the south-north The effect of the Brunt-Väisälä frequency calculation method is investigated with a comparison of the u 0 and w 0 ﬁelds obtained from ICAR-N and ICAR-O simulations to the ﬁelds given by the analytical expressions in equations (4) and (5). Non-linear effects on the wind ﬁeld are investigated by a comparison of ICAR to WRF. Differences between the models’ and the analytical 10 solution are quantiﬁed with the bias B and the mean absolute error MAE (MAE, Wilks, 2011b, chap. 8). Since WRF uses a occur: A snow shower with the majority of snow falling upwind of the ridge in ICAR-N and a snow and graupel shower in WRF with the portion precipitating leeward of the ridge. While these results are obtained for one particular they indicate that the linearisation of the ﬁeld has the potential to signiﬁcantly alter the distribution of precipitation in a study domain. This could have drastic consequences for the results of studies relying on ICAR to provide precipitation ﬁelds 25 for, i.e. applications in hydrology or glaciology.

eled processes and the associated computational demand. However, even with a firm basis in natural laws, such models may generate results that match measured data but arrive at them through a causal chain differing from that inferred from observations ("right, but for the wrong reason"; e.g. Zhang et al., 2013). Additionally, the reason for a matching result may even be found in unphysical artifacts introduced by the numerical methods of these models (e.g. Goswami and O'Connor, 2010). In acknowledgment of the fundamental limitation of verification, best practices and strategies have been outlined to maximize the 15 probability that the results obtained from a model are correct for the right reasons (e.g. Schlünzen, 1997;Warner, 2011). Most of these criteria, however, apply to full physics-based models such as regional climate models or numerical weather prediction models that are expected to model atmospheric processes comprehensively.
The Intermediate Complexity Atmospheric Research model (ICAR; Gutmann et al., 2016) employed in this study is intended to 20 be a simplified representation of atmospheric dynamics and physics over mountainous terrain. With a basis in linear mountain wave theory, it is a computationally efficient alternative to full physics regional climate models such as the Weather Researching and Forecasting (WRF; Skamarock et al., 2019) model. Compared to simpler linear-theory-based models of orographic precipitation (e.g. Smith and Barstad, 2004), ICAR allows for a spatially and temporally variable background flow, a detailed vertical structure of the atmosphere and employs a complex microphysics scheme. However, for instance, precipitation induced 25 by convection or enhanced by non-linearities in the wind field is not considered by ICAR but may be accounted for with other methods (e.g. Jarosch et al., 2012;Horak et al., 2019). For such cases Schlünzen (1997) advises that a model has to be assessed with respect to its limit of application. Therefore, a direct comparison to a full physics-based model is generally not sufficient for an evaluation of ICAR. Note that ICAR is not intended to provide a full representation of atmospheric physics. Furthermore, whether the results obtained from ICAR simulations are correct for the right reasons cannot be inferred from comparisons to 30 measurements alone (Schlünzen, 1997).
However, in the literature the evaluation efforts for ICAR so far focused mainly on comparisons to measurements or WRF output. Gutmann et al. (2016) compared monthly precipitation fields for Colorado, USA, obtained from ICAR to WRF output and an observation-based gridded data set. While Gutmann et al. (2016) additionally performed idealized hill experiments, these focused on the qualitative comparison of the vertical wind field and the distribution of precipitation between ICAR and WRF. Bernhardt et al. (2018) applied ICAR to study changes in precipitation patterns in the European Alps in dependence of the chosen microphysics scheme. Horak et al. (2019) evaluated ICAR for the South Island of New Zealand based on multi-year 5 precipitation time series from weather station data and diagnosed the model performance with respect to season, atmospheric background state, synoptic weather patterns and the location of the model top. By comparing to measurements, Horak et al.
(2019) observed a strong dependence of the performance of ICAR on the location of the model top, finding an optimal setting of 4.0 km above topography that minimized the mean squared errors calculated at all weather stations. However, the analysis of cross sections revealed numerical artifacts in the topmost vertical levels, suggesting these to be responsible for the high model 10 skill, thus rendering the model right for the wrong reason.
This study aims to improve the understanding of the ICAR model and develop recommendations that maximize the probability that the results of ICAR simulations, such as the distribution of precipitation, are correct for the right reasons. For a given initial state, a correct representation of the fields of wind, temperature and moisture as well as of the microphysical processes 15 are a necessity to obtain the correct distribution of precipitation for the right reasons. Therefore, simulations of an idealized mountain ridge are employed to investigate and verify the respective fields and processes in ICAR. This study first analyses quantitatively and qualitatively how closely the ICAR wind and potential temperature fields match the analytical solution for the ideal ridge and contrasts them to a WRF simulation to infer the aspects not captured by linear theory (Sect. 4.1). In a second step the influence of the height of the model top and the upper boundary conditions on the microphysical cloud formation 20 processes are quantified with a sensitivity study . Thirdly, the differences in the hydrometeor and precipitation distribution due to non-linearities and other processes not represented by linear theory are investigated in a comparison of ICAR to WRF (Sect. 4.5). Finally, the impact of recommendations derived from the preceding steps on a real case are demonstrated (Sect. 4.6). The case study is conducted for the South Island of New Zealand and contrasted to the results of Horak et al. (2019). All findings are discussed in Sect. 5 and the conclusions, including the recommendations, are summarized in Sect. 6. 25

Overview
ICAR is an atmospheric model based on linear mountain wave theory (Gutmann et al., 2016). The input datasets required by ICAR are a digital elevation model supplying the high-resolution topography and forcing data, i.e., a set of 3-D atmospheric variables as supplied by atmospheric reanalysis such as ERA5 or coupled atmosphere-ocean general circulation models. The 30 forcing data set represents the background state of the atmosphere and must comprise the horizontal wind components, pressure, temperature and water vapor mixing ratio. ICAR stores all dependent variables on a 3-D staggered Arakawa C-grid (Arakawa and Lamb, 1977, pp.180-181) and employs a terrain-following coordinate system with constant grid cell height.
In contrast to dynamical downscaling models, ICAR avoids solving the Navier-Stokes equations of motion explicitly. Instead, ICAR calculates the perturbations to the horizontal background winds analytically for a given time step by employing linearized Boussinesq-approximated governing equations that are solved in frequency space with the Fourier transformation (Barstad and Grønås, 2006). Besides the horizontal background winds, these equations depend on the topography and the 5 Brunt-Väisälä frequency N for which, depending on whether a grid cell is saturated or not, either the moist, N m , or dry Brunt-Väisälä frequency N d is used. The vertical wind speed perturbation is eventually calculated from the density-weighted horizontal winds. The atmospheric quantities (e.g. temperature and moisture), supplied at the domain boundaries by the forcing data set, are advected with the calculated wind field. 10 In linear mountain wave theory, the wind field is entirely determined by the topography and the background state of the atmosphere (Sawyer, 1962;Smith, 1979) and, for a horizontally and vertically homogeneous background state, given by a set of analytical equations (e.g. Barstad and Grønås, 2006). This formal simplicity is achieved by a number of simplifications such as, for instance, neglecting the interaction of waves with waves, waves with turbulence or non-linear effects such as gravity wave breaking, time-varying wave amplitudes or low-level blocking and flow splitting. Discussions of the limitations of linear 15 theory resulting from this reduction of complexity can be found in the literature (e.g. Dörnbrack and Nappo, 1997;Nappo, 2012).
ICAR is based on the equations derived in Barstad and Grønås (2006). Therefore, ICAR currently neglects the reflection of waves at the interface of atmospheric layers with different Brunt-Väisälä frequencies and neglects the vertical increase of 20 the amplitude of the wind field perturbations with drecreasing density.
ICAR allows for the selection of different microphysics (MP) schemes. In this study an updated version of the Thompson MP scheme is employed (Thompson et al., 2008). It predicts mixing ratios for water vapor q v , cloud water q c , cloud ice q i , rain q r , snow q s and graupel q g , from here on referred to as microphysics species, as well as the number concentrations for cloud 25 ice and rain. The Thompson MP scheme is a double moment scheme in cloud ice and rain and a single moment scheme for the remaining quantities.
The forcing data set in ICAR represents the atmospheric background state, ideally without the effect of the topography, yielding a sequence of steady-state wind fields for each forcing time step, between which ICAR interpolates linearly. Statically unstable 30 atmospheric conditions (i.e., N 2 < 0) in the forcing data are avoided by enforcing a minimum Brunt-Väisälä frequency of N min = 3.2 × 10 −4 s −1 throughout the domain. A full description of ICAR is given by Gutmann et al. (2016).

Modifications to ICAR
The investigations described in this study were conducted with a modified version of ICAR 1.0.1. All modifications are publicly available as download .

Calculation of the Brunt-Väisäla frequency
The 3-D fields of potential temperature and microphysics species are initialized in ICAR by linearly interpolating the corre-5 sponding fields of the forcing dataset to the high-resolution ICAR domain. From this initial state of the fields at t f0 ICAR calculates the (moist or dry) Brunt-Väisälä frequency N for all model times t m smaller or equal to the first forcing time t f1 .
During each model time step the potential temperature and microphysics species fields in the ICAR domain are modified by advection and microphysical processes. For model times t m between forcing time t fn and t fn+1 , N is based on the perturbed state of the potential temperature and q v + q c + q i at t fn . However, in linear mountain wave theory N is a property of the 10 unperturbed background state (e.g. Durran, 2015), an assumption that is not satisfied by the calculation method employed by the standard version of ICAR. This study therefore employs a modified version of ICAR that, in accordance with linear mountain wave theory, calculates N from the state of the atmosphere given by the forcing data set if the corresponding option is activated. In the following, this modification of ICAR is referred to as ICAR-N, while the unmodified version is referred to as the original version (ICAR-O). If properties applying to both versions are discussed, the term ICAR is chosen. 15

Treatment of the upper boundary in the advection numerics
ICAR imposes a zero gradient boundary condition (ZG BC) at the upper boundary on all quantities subject to numerical advection. This section details how, particularly for the microphysics species, a ZG BC has the potential to cause problems by e.g., triggering influx of additional water vapor into the domain. Due to its conceptual simplicity, the issue is illustrated for the upwind advection scheme, which is the standard advection scheme employed by ICAR. 20 In the following the mass levels are indexed from 1 to N z and the half levels bounding the k-th mass level, i.e. the levels where the vertical wind components is defined as k − 1/2 and k + 1/2.
The advection equation for a quantity ψ employed by ICAR (Gutmann et al., 2016) is: To arrive at the discrete equations of the upwind advection, the flux divergences ∂(uψ)/∂x, ∂(vψ)/∂y and ∂(wψ)/∂z on the right hand side of equation 1 are discretized as, e.g., in Patankar (1980). The vertical flux gradient φ z across mass level k at time step t due to downdrafts (w t k+1/2 < 0 and w t k−1/2 < 0) is then approximated by 30 with ∆z as the vertical grid spacing. The resulting value of ψ at mass level k at time step t + 1 is calculated with an explicit first-order Euler forward scheme as where ∆t denotes the length of the time step. At the upper boundary, where k = N z with N z being the number of vertical levels, by default ICAR applies a zero gradient boundary condition to ψ by setting ψ Nz+1 = ψ Nz . In case of downdrafts (see 5 above) and vertical convergence in the wind field across the topmost vertical mass level (w Nz+1/2 < w Nz−1/2 ), this results in a negative vertical flux-gradient and an associated increase in ψ (see equation 3). If w Nz+1/2 < w Nz−1/2 persists for more than one time step, the concentration of the quantity in the topmost vertical level will continue to increase until it is redistributed within the domain via advection or conversion into other microphysics species. As observed by Horak et al. (2019), this influx of additional water therefore may cause numerical artifacts such as the formation of spurious clouds.

10
In contrast to ICAR, full physics models such as the Integrated Forecasting System (IFS) of the European Center for Medium-Range Weather Forecasts (ECMWF, 2018), the COSMO model (Doms and Baldauf, 2018)  The ideal case consists of an infinite ridge extending along the south-north direction in the domain and westerly flow. The models. The code of the Thompson MP implementation in ICAR and WRF was reviewed and tested to ensure that both implementations produce the same results for the same input. All input files and model configurations are available for download (Horak, 2020).  high and 20 km wide ridge. An overview of the parameter space covered by the simulations is given in Table 1. A particular 5 combination of topography and sounding is referred to as scenario.   (Miles and Huppert, 1969). Note that while a saturated atmosphere has been shown to increase the values of and c (Jiang, 2003), wave breaking does not occur due to < c .

Topographies and initial soundings
Nonetheless other non-linear effects, such as wave amplification, cannot be completely neglected. The combination of this sounding and topography is therefore suitable as an indicator of how well the ICAR solution approximates scenarios in which non-linearities occur, a situation ICAR is very likely to encounter in real-world applications. To this end an ICAR-N simulation 15 is compared to a WRF simulation employing the same topography and sounding.

Analytical solution
ICAR calculates the perturbations to the horizontal background wind with analytical equations based on linear theory while the vertical wind speed is calculated to balance the density-weighted horizontal winds (see Eq. 9; Gutmann et al., 2016).  tions to the potential temperature and microphysics species fields, on the other hand, result from advection and microphysical processes calculated with numerical methods. In ICAR-O this introduces a time dependency for N and, in turn, for the wind field perturbations that depend on N as input variable. Furthermore, ICAR assembles the wind field with an algorithm that allows for a spatially variable background state (Gutmann et al., 2016). It is therefore necessary to ascertain how well the exact analytical perturbations are reproduced by ICAR. This cannot be inferred from a direct comparison to WRF since the wind field of the latter is influenced by non-linear processes not modeled by ICAR. For the topography given in Sect. 3.2 linear-theory-based analytical expressions for the resulting perturbations to a horizontally and vertically uniform background state have been derived as (e.g. Smith, 1979): with u as the perturbation to the horizontal background wind U , w the perturbation to the vertical wind speed, θ the perturbation to the background potential temperature Θ, g = 9.81 m s −2 as the gravitational acceleration, l the Scorer parameter 10 defined as l = N/U and A(z) as the elevation dependent amplitude of the perturbations. A(z) is given by where ρ is the height-dependent air density of the background state. However, since the underlying equations employed by ICAR neglect the effect of wave amplification due to decreasing density with height, the term ρ(0)/ρ(z) in equation (7) is set to unity in the following.  BCs to water vapor, potential temperature and the hydrometeors (cloud water, ice, rain, snow and graupel) respectively, herein after referred to as set of boundary conditions. To indicate which BCs were applied to what group in a specific model run, the runs are labeled with a three digit code, see Table 3. The first digit indicates the BC imposed on θ, the second digit the BC imposed on q v and the third digit the BC imposed on the hydrometeors q hyd , which encompass all remaining MP species (q c , q i , q r , q s and q g ). The number ID associated with each BC is listed in Table 2. In this notation, for instance, 014 denotes a simu-25 lation imposing a zero gradient BC to θ, a constant gradient BC to q v , and a constant flux gradient BC to the hydrometeors q hyd .
The ten combinations of BCs tested in the sensitivity study are listed in Table 3. While a much larger set of combinations of BCs exists, physically not meaningful BC combinations, such as a zero value BC imposed on potential temperature, were ruled out beforehand. Additionally, to reduce the parameter space further, a preliminary study was conducted to exclude sets 30 of BCs that yielded results with distinctly higher errors than the standard zero gradient BC. ID abbreviation boundary condition ψ Nz+1 Table 3. Combinations of BCs tested in the sensitivity study with idealized simulations. Each column represents a combination of three BCs used in a specific simulation. Each digit of the three digit code refers to the ID number of a specific BC listed in Table 2 that was applied to one of the three quantities listed in the rows below. For all combinations of BCs, simulations for all of the topographic settings and background conditions listed in Table 1

Evaluation
All evaluations conducted in this study focus on cross-sections along the west-east axis of the domain, oriented parallel to the background flow. Since ICAR does currently not support periodic boundary conditions, the ICAR domain is extended along the south-north axis to minimize influences from the boundaries (see Sect. 3.1). Additionally, for ICAR the four centermost west-east cross sections from the south-north axis in the domain are averaged and the average is found as representative of the 5 domain center in preliminary tests (not shown). In WRF the central west-east cross section from the south-north axis is used.
The effect of the Brunt-Väisälä frequency calculation method is investigated with a comparison of the u and w fields obtained from ICAR-N and ICAR-O simulations to the fields given by the analytical expressions in equations (4) and (5). Non-linear effects on the wind field are investigated by a comparison of ICAR to WRF. Differences between the models' and the analytical 10 solution are quantified with the bias B and the mean absolute error MAE (MAE, Wilks, 2011b, chap. 8). Since WRF uses a different model grid than ICAR, WRF fields are linearly interpolated to the ICAR grid for this comparison.
For the evaluation in this study the mixing ratios of the microphysics species are assigned to three groups. Water vapor q v , suspended hydrometeors q sus = q c + q i and precipitating hydrometeors q prc = q r + q s + q g . The total mass of water vapor Q v , suspended hydrometeors Q sus and precipitating hydrometeors Q prc is calculated as where N x and N z are the horizontal and vertical number of grid cells respectively, V the grid cell volume, q ij (t) the mixing ratio of the respective hydrometeor group and ρ ij (t) the density of dry air within the grid cell. Note that in contrast to WRF the grid cell volume in ICAR is constant and all vertical levels have the same height ∆z.

10
The sensitivity of the physical processes simulated by ICAR-N to the elevation of the upper boundary and the imposed boundary conditions (BCs) is inferred from the total mass of the MP species in the cross-section and the spatial distribution of potential temperature, the MP species and the 12-h accumulated precipitation P 12h . Except for P 12h all quantities are averaged over the 12 hour period after a spinup of 18 h when an approximately steady state is reached. P 12h is the precipitation accumulated over the same period. 15 Differences in the spatial distribution of time-averaged quantitiesψ, P 12h and time-averaged total mass of the MP species Q with respect to the reference simulation are quantified with the sum of squared errors (SSE). The SSE is calculated between ICAR simulations with different values of z top and the reference simulation employing the default zero gradient BCs at the upper boundary where z top is z max = 20.4 km. This model top is high enough so that cloud processes within the troposphere 20 are not affected by the model top. The SSE is calculated over all vertical levels defined in both simulations as Hereψ ij (z top , BCs) is the time averaged value of a quantity ψ in an ICAR simulation at grid point (i, j) with the model top at z top and the set of upper BCs, andψ ij (z max ) is the value of a quantity at the same location in the reference simulation with z top = z max . For 12-hour accumulated precipitation a one-dimensional version of equation (9) with the sum-25 mation only along the x-axis is employed while for total mass no summation is necessary and only the squared difference To quantify the improvement of one simulation (with a set of boundary conditions BCs and model top z top ) over another by choosing a different set of boundary conditions, BCs , at the upper boundary or another model top elevation z top , the reduction of error (RE) measure is employed (Wilks, 2011a, chap. 8). It is given by This way, RE can be interpreted as a percentage improvement due to the alternative choice of z top or BCs over the original settings z top and BCs, with RE = 0 corresponding to no improvement and RE = 1 corresponding to a complete removal of The quantity z min (ψ, BCs) is introduced which defines the model top elevation for a given set of boundary conditions BCs 15 and parameter ψ for which RE exceeds 95 % for the first time and remains above that threshold for z top ≥ z min . In preliminary studies the 95 % threshold value was found as a suitable indicator for reaching a saturation in error reduction (not shown). The lowest possible model top elevation Z min is then calculated as the maximum of z min (ψ, BCs) for all quantities ψ and a particular combination of boundary conditions BCs. However, θ is excluded since this study focuses mainly on hydrometeors. Nonetheless any relevant error in θ influences the MP fields and the distribution of precipitation, thereby directly affecting Z min . In this 20 context Z min can then be interpreted as the lowest possible model top elevation such that the cloud and precipitation processes in the domain are sufficiently independent from influences of the model top.

Case study
To investigate the effects of the suggested modifications to ICAR on the distribution of precipitation for a real world application, a case study is conducted for the Southern Alps on the South Island of New Zealand located in the southwestern Pacific 25 Ocean. Furthermore, the procedure to identify the lowest possible model top elevation Z min , as described in Sect. 3.5, is applied to this real case scenario and the result compared to the optimal model top elevation of 4 km found by Horak et al. (2019) for this region. In their study the model top elevation was chosen as the elevation that led to the lowest mean squared errors between simulated and measured 24-h accumulated precipitation for eleven sites in the Southern Alps. This section additionally investigates whether this seemingly optimal result, as suggested by the lowest mean squared errors, was achieved for the The wind and potential temperature fields simulated by ICAR-O (Fig. 1c, g) exhibit clear differences to the analytical solution, especially above an elevation of about 6 km. The deterioration increases with elevation and is clearly visible from approximately z = 8 km upward, particularly for w ( Fig. 1g) but still well pronounced for u and the isentropes (Fig. 1c). This  (Fig. 2a), more dependent on z top for q v (Fig. 2b) and smallest for the remaining quantities (Fig. 2c-e) with similar results for all tested topographies and the respective time averaged total masses Q v , Q sus and Q prc (not shown). Most tested 5 BC combinations reduce the error in at least one of the investigated quantities, but generally not for all, with the exception of the combinations 141 and 142. However, in case of q sus , q prc and P 12h no improvements for any BC combination are observed once z top > 4.4 km (Fig. 2c-e). Potential temperature fields are improved the most when a CG BC is imposed on θ (Fig. 2a).
The water vapor field shows improvements for all BCs except for a CF BC, with the largest REs found for a CG BC imposed on q v . For the hydrometeors and P 12h the improvement at the lowest model top setting of 4.4 km is only found if a CFG BC is 10 applied to water vapor and either a CG, ZV or CFG to q hyd , otherwise the RE is approximately zero.
The choice of an alternative BC over the standard ZG BC has the largest potential for a reduction of error when the grid  Figure 3a where the vertical flux gradient of the potential temperature divided by the local potential temperature, given byφ z (θ) = φ z (θ)/θ, is shown. Consequently θ exhibits the largest reductions of error across all values of z top with only a small dependence on z top (see Fig. 2a). For water vapor, as shown in Fig. 2b, RE as a function of z top exhibits two peaks, the first at z top = 4.4 km, and a second peak at z top = 11.4 km with a minimum in between. Here the exponential decay of q v with 20 height results in comparatively small values for φ z (q v ) above an elevation of 4 km (not shown). However,φ z (qv) still exhibits minima and maxima at higher elevations due to the periodicity of the vertical velocity field (see Fig. 3b). At the locations of these minima und maxima ofφ z (q v ) the relative error introduced by a boundary condition can therefore be large as well. In case of q v , as shown in Fig 3b, the model top of a simulation with z top = 11.4 km would coincide with a downdraft region of strong vertical convergence and negativeφ z (q v ) close to the domain center, implying strong water vapor flux convergence. 25 The same situation occurs for z top = 4.4 km albeit in a region with a lower value ofφ z (q v ) and weaker vertical convergence.
Therefore, the local change in q v due to a mass influx caused by the boundary condition is comparatively small, resulting in a lower relative error. Note that for simulations with 4.4 km < z top < 11.4 km the vertical convergence in downdraft regions at the model top is weaker andφ z (q v ) is lower. Therefore, as shown in Fig. 2b, the RE achieved for q v exhibits two peaks where the RE is high for the lowest model top setting at 4.4 km, exhibits a maximum at z top = 11.4 km and is low otherwise.   Table 3 for the key to the BC combination code) for (a) potential temperature θ, (b) water vapor q v , (c) suspended hydrometeors q sus , (d) precipitating hydrometeors q prc and (e) the 12-h precipitation sum P12h. Note that overbars denote the temporal average of the respective quantity over 12 hours following 18 hours of model spinup. REs were calculated between an ICAR-N simulation with an alternative set of boundary conditions imposed at the upper boundary and an ICAR-N simulation employing the standard zero gradient boundary condition (BC code 000), both run with the same model top elevation ztop (indicated by line color). All simulations are conducted for the default scenario. For the investigated scenarios, altering the boundary condition applied to θ has only a negligible effect on the microphysics species fields and P 12h . This is observed, for instance, for simulations 011 and 111 where the BC applied to θ was changed from a ZG to CG while the BCs imposed on the MP species remained the same: Both BC settings lead to very similar RE values for the MP species (Fig. 2b-d) and P 12h (Fig. 2e) despite the RE drop observed for θ (Fig. 2a). This is due to the location of the errors that are introduced with the standard ZG BC on θ. As shown in Fig. 4, for simulations with higher model tops these are      Figure 5. The panels show the minimum model top elevation zmin(ψ, BCs) necessary to reduce the error by 95 % for (a) water vapor q v , suspended hydrometeors q sus , precipitating hydrometeors q prc , potential temperature θ and the 12-hour precipitation sum P12h and (b) the total mass of water vapor Q v , suspended hydrometeors Q sus and precipitating hydrometeors Q prc , respectively in dependence of the set of upper boundary conditions. The ICAR-N simulations are run for the default scenario.

Sensitivity to the model top elevation
As shown in Fig. 6a-h, for most investigated quantities the reduction of error (RE) increases monotonously with the model top elevation z top for all tested topographies. Once the threshold of 95 % is exceeded, further increases in z top correspond to distinctly lower increases in RE. However, non-monotonic exceptions exist as, for instance, the total mass of water vapor Q v 5 shown in Fig. 6e. Here Q v exhibits a local maximum at z top = 5.4 km, before dropping to lower values that eventually converge towards RE = 1. This is a direct consequence of the influence of the model top on the cloud processes within the domain, which for the investigated scenarios is particularly pronounced for suspended hydrometeors q sus . For ICAR-N simulations conducted for the default scenario (BC code 111) with increasing values of z top , Fig. 7a shows the cloud boundary of suspended hydrometeors. Here it is defined as the contour line where q sus = 10 mg kg −1 . While the upwind cloud adjacent to the ridge occupies a 10 large region in the simulations with the lowest model tops, it initially shrinks with increasing z top until a minimum extension is reached at z top = 7.4 km. After this minimum the cloud increases in size with higher z top . The extension of a smaller secondary cloud upwind of the ridge decreases in size similarly before it vanishes completely for z top ≥ 8.4 km. Conversely, downwind of the ridge at an elevation of approximately 6 km to 9 km a larger cloud forms only for z top ≥ 6.4. Altogether, the total mass of suspended hydrometeors, shown in Fig. 7b, initially decreases with increasing z top until a local minimum at 6.4 km is reached. 15 In the simulation with this model top elevation, less water vapor is converted into suspended hydrometeors q sus , leading to a local maximum of Q v at z top = 6.4 km (Fig. 7b). This particular behavior is found independently of the imposed boundary conditions and results in the same cloud boundaries as shown in Fig. 7a. If a different Witch of Agnesi ridge configuration is employed, the same shrinking of the q sus cloud occurs with increasing z top , however, in these simulations the cloud boundaries differ from those in Fig. 7a (not shown). 20 The results show that the total masses of the microphysics species alone are not sufficient to determine whether the processes within the domain are influenced by the model top. In other words, the distribution of these quantities needs to be taken into account as well. Conversely, even though the error in the distribution of q sus is reduced by at least 95 % once a model top elevation of 7.4 km is employed, the same occurs for the total mass Q sus only at z top = 10.4 km (cf Fig. 6b, f). Therefore, both 25 measures, the distribution of a quantity and its total mass, are necessary to reliably determine whether the cloud formation processes within the domain is independent from influences of the model top. Overall the results show that for the default scenario a lowest possible model top elevation of Z min = 10 km is required for ICAR-N to represent cloud processes undisturbed from the influence of the upper boundary of the domain. Furthermore, the value of Z min is found to depend strongly on the particular scenario simulated, with values ranging from 8 km-14 km.

The lowest possible model top elevation
This section investigates how the lowest possible model top elevation Z min depends on ridge height h m and width a, as well as the background state employed in the ICAR-N simulations. Note that Z min is defined as the maximum of z min (ψ, BCs) and thereby represents the model top elevation required for a 95 % reduction of error in all quantities (except θ) for a given set of boundary conditions (BC code 111 in the following). For a background state with U = 20 m s −1 and N = 0.01 m s −1 the 5 results indicate a weak dependence of Z min on the ridge height, with higher Z min for higher ridges (Fig. 8a). The dependency of Z min on the width of the ridge, on the other hand, exhibits no distinct pattern (Fig. 8b).
For a Witch of Agnesi ridge with h m = 1 km and a = 20 km, Z min exhibits a clear dependence on the background state as shown in Fig. 8c. In the following, the background state is characterized by the vertical wavelength of the resulting mountain 10 wave in dry conditions, given by λ z = 2πU/N d . Note that the characteristics of the results remained unchanged (not shown) even if instead of N d the mean moist Brunt-Väisälä frequency N m in the lowest kilometer of the atmosphere (e.g., Jiang, 2003) is employed to calculate λ z . In Fig. 8c λ z is varied either by keeping N d = 0.01 s −1 constant and varying U or by fixing U = 20 m s −1 and varying N d . Figure 8c shows that Z min decreases with increasing vertical wavelength. A potential reason for this behavior is that lower λ z correspond to a higher number of periods of up-and downdrafts within the troposphere. This

Comparison to WRF
This section compares the spatial distribution of water vapor q v , suspended hydrometeors q sus , precipitating hydrometeors q prc and 12-h sum of precipitation P 12h calculated by ICAR-N to the corresponding fields in WRF. ICAR-N imposes CG BCs (111) and employs a model top elevation of z top = 10.4 km. This is the lowest possible model top elevation Z min required for a 95 % reduction of error in all quantities for the chosen set of BCs determined for the default scenario. The distributions of q v , q sus 5 and q prc are investigated after 30 hours of simulation time, while P 12h is investigated between 19 and 30 hours of simulation time. The comparison aims to highlight the differences that may be expected between an ICAR-N and WRF simulation due to the tradeoff between physical fidelity and model performance. The scenario is chosen such that the wind field is expected to exhibit non-linearities.

Water vapor and hydrometeors
10 With respect to water vapor ICAR-N is drier upwind of the topographical ridge and wetter downwind in comparison to WRF (see Fig. 9a-c). The regions with this dry and wet bias extend up to an elevation of approximately 6 km in which, farther upwind of the ridge, WRF exhibits slightly stronger updrafts than ICAR-N ( Fig. 10c and d). Similarly, above the ridge the downdrafts calculated by WRF are of a higher magnitude than those predicted by ICAR-N, see Fig. 10c and d. Therefore, upwind of the ridge WRF transports more moist air from close to the surface to higher elevations. Above the ridge, on the other hand, WRF 15 advects drier air from higher elevations to lower levels. Hence, the two large regions in ICAR-N exhibiting a dry and wet bias in q v respectively are likely caused by the differences in the wind field. However, a wet bias close to the mountain slope on the windward side is presumably caused by microphysical conversion processes (Fig. 10c). Here the stronger orographic lifting in WRF leads to a higher microphysical conversion rate of q v to hydrometeors, thereby resulting in the observed wet bias of ICAR-N in terms of q v . Above the downwind slope of the ridge and up to approximately 100 km downwind, the downdrafts in WRF are still stronger than in ICAR-N. This potentially causes an increased conversion of hydrometeors to q v by evaporation, resulting in the dry bias of ICAR-N in this region. 5 Clear differences between the ICAR-N and WRF simulations are observed for suspended hydrometeors. While the approximate shape of the windward cap cloud (Fig. 9d and e) shows similarities, the mixing ratios calculated by ICAR-N are approximately one tenth of those in WRF (see Fig. 9f). Furthermore, the main constituent of the cap cloud in ICAR-N is ice q i , while it is liquid water q c in WRF (not shown).

10
The majority of precipitating hydrometeors in ICAR-N are observed windward of the topographical ridge, extending over most of the upwind slope (Fig. 9g). In WRF, on the other hand, the distribution of q prc is centered above the ridge and extends farther downwind than upwind (Fig. 9h). In both models the majority of q prc consists of snow q s (not shown). However, WRF additionally predicts non-negligible amounts of graupel q g up to 20 km upwind of the ridge (not shown). Altogether, for precipitating hydrometeors (Fig. 9i) ICAR-N is wetter on the windward slope but drier above the ridge and the downwind slope. This 15 is caused by a combination of two factors: (i) The higher vertical wind speeds above the windward slope of the topographical ridge predicted by WRF, lead to lower effective falls speeds of the hydrometeors (see Fig.10b). (ii) Higher horizontal wind speeds additionally contribute to a larger horizontal drift of q prc and precipitation spill-over in WRF (see Fig.10c and, for a basic estimation of the drift distances, Sect. 4.5.2).  Figure 11a illustrates that P 12h on the windward slope is substantially higher in ICAR-N than in WRF. Conversely, ICAR-N is drier along the leeward slope. Both observations correspond well to the distribution and shape of the precipitating hydrometeors close to the surface (see Fig. 9g and h) and the differences of q prc between ICAR-N and WRF (see Fig. 9i). The precipitation maximum predicted by ICAR-N is approximately 25 mm and lies 6 km upwind of the ridge peak in comparison to the 32 mm 5 maximum in WRF, which lies 4 km upwind of the ridge. The median of P 12h , however, is located upwind of the ridge peak in ICAR-N and downwind in WRF, separated by a distance of 20 km (see Fig. 11b). Integration along the cross section shows that 63 % of ICAR-N precipitation falls out upwind of the domain center while for WRF, on the other hand, it is only 43 %.

Precipitation
The distribution of precipitation in ICAR-N is asymmetric with a gradual increase until the maximum is reached and a steeper 10 decrease after that. While in WRF P 12h is asymmetric as well, the distribution exhibits a very steep increasing slope ending in a distinct peak that is followed by a decreasing slope comparable to the decrease of P 12h in ICAR-N. In WRF snow and graupel contribute to P 12h , while the precipitation in ICAR-N is solely composed of snow. The graupel shower predicted by WRF is localized within a 30 km region centered approximately 10 km upwind of the ridge and causes the distinct peak observed in the distribution of precipitation in WRF (Fig. 11a). 5 The maximum of accumulated snow in WRF is 48 mm and the median of the distribution is shifted downstream by 22 km in relation to the median of the precipitation distribution in ICAR-N, which is solely snow. The difference is mainly due to the different wind fields of ICAR-N and WRF. In the following a fall speed for snow in stagnant air of −1 m s −1 is assumed for the ICAR-N and WRF simulations alike. Starting 1 km above the orography, the effective fall speeds in ICAR-N and WRF are −0.75 m s −1 and −0.25 m s −1 respectively, based on an average w above the upwind slope of the ridge of 0.25 m s −1 10 in ICAR-N and 0.75 m s −1 in WRF (see Fig. 10c-d). In combination with an approximate average horizontal wind speed of 17.5 m s −1 in ICAR-N and 21 m s −1 in WRF ( Fig. 10a-b) this results in a difference in the resulting horizontal drift of 19 km, which fits the observed difference in the medians of the accumulated snow precipitation distribution well. Hence, the discrepancy in the precipitation distribution appears to be mainly caused by an underestimation of the perturbation velocities in ICAR. 15 The absence of graupel in ICAR-N compared to WRF can be traced to the MP scheme and is a result of the atmospheric conditions it encounters. The Thompson MP predicts graupel formation if riming growth exceeds the depositional growth of snow (Thompson et al., 2004). While the necessary atmospheric conditions are easily satisfied in WRF, the cloud water mixing ratio in ICAR-N is too low to initiate sufficient riming growth (see Fig. 9d). However, no clear indication for the underlying cause of the large difference in the cloud water mixing ratios between ICAR-N and WRF is found.

Case study
The previous sections have demonstrated that (i) the Brunt-Vaisälä frequency needs to be diagnosed from the background stratification in order to model a realistic perturbation flow field with ICAR, that (ii) it further requires a minimum model top elevation (which is dependent on the orography and the atmospheric background state) and that (iii) a combination of ZG/CG BCs (BC codes 011 and 111) are optimal to be used at the top of the ICAR model domain. The effects of these suggested 5 modifications to ICAR on a real world application are investigated with a case study conducted for the Southern Alps on the South Island of New Zealand located in the southwestern Pacific Ocean (Fig. 12a).
The Southern Alps are a mountain range approximately 800 km long and 60 km wide. They are oriented southwest-northeast and extend from approximately 41 o S to 46 o S, with approximately 97 % of the crest line lying above an elevation of 1500 m m.s.l. 10 (meters above mean sea level) and the highest peaks rising above 3000 m m.s.l.. The mean precipitation regime in the humid and maritime climate on the South Island of New Zealand is strongly influenced by the orography of the Southern Alps. The prevailing westerly and north-westerly winds advect moist air against the topographic barrier, leading to a precipitation maximum of approximately 14 m yr −1 along its western flanks in close proximity to the alpine ridge. While the western coast on average receives 5 m yr −1 , the plains east of the alpine ridge receive at most 1 m yr −1 due to the precipitation shadow of the Southern Alps (Griffiths and McSaveney, 1983;Henderson and Thompson, 1999).  Clear differences can be observed in the distributions of q sus ( Fig. 14a and b) -note, e.g., the distinct maximum of q sus above the initial topography peak in ICAR-O which is almost entirely absent in ICAR-N. These q sus maxima occur in the topmost Furthermore, this artificial cloud in ICAR-O near the model top generates precipitating hydrometeors that extend farther to the lee of the alpine crest compared to ICAR-N ( Fig. 14c and d). While ICAR-N produces more precipitation overall and is wetter than ICAR-O on the initial ramp of the western slope of the alpine range (up to approximately 169.8 o E in Fig. 14f), 10 ICAR-O is wetter downwind, yielding higher amounts of precipitation at the peak and the first leeward slope (Fig. 14e). Note that ICAR-O produces clouds in the topmost model levels even farther downstream as well (Fig. 14a), however, they do not generate precipitating hydrometeors during the investigated period. These results strongly indicate that the low model top setting of 4.4 km employed in Horak et al. (2019) is inadequate to allow for a correct representation of the cloud and precipitation processes within the domain despite the relatively high skill found for ICAR-O in their study. Therefore, the results additionally 15 demonstrate that when model skill is evaluated with statistical metrics based on surface observations alone (Horak et al., 2019), it does not necessarily reflect the skill of the model in correctly representing atmospheric processes such as gravity waves and associated cloud formation. Hence, it seems that the underestimation in precipitation near the crest and to its lee of an ICAR simulation with reasonably high model top compared to WRF (Fig. 9) is partly compensated in an ICAR simulation with a too low model top (ICAR-O in Fig. 14) by spurious effects introduced by the upper boundary conditions. It follows that the seeming improvement in the latter case is right but for the wrong reason. The results highlight that a more accurate representation of the wind fields is obtained only when the Brunt-Väisälä frequency, in accordance with linear mountain wave theory, is calculated from the unperturbed background state of the atmosphere (ICAR-N) rather than from the perturbed state (ICAR-O). The remaining differences of the wind fields in ICAR-N to the analytical solution may be attributable to two causes: Firstly, to solve the governing equations ICAR numerically calculates the Fourier 5 Transform of the topography h(x, y) in the domain. In cases where h(x, y) is not constant along the domain boundaries or where it exhibits discontinuities within the domain, this approach gives rise to numerical artifacts (see the Gibbs phenomenon, e.g., Arfken et al., 2013), introducing errors into the perturbed fields. Note that for a 2-D ridge as employed in this study h(x, y) = h(x). Therefore, while h(x w ) = h(x e ) = const, with x w and x e the x-coordinate of the western and eastern domain boundary, respectively, h(x) = const along the northern or southern domain boundary. This results in an average value of v of 10 0.02 m s −1 instead of the expected 0 m s −1 and therefore slightly altered values of u and w in comparison to the results from linear theory. These issues may be reduced by, for instance, filtering the topography accordingly or by adding a buffer around the domain (Florinsky, 2016). Additional research is necessary to determine which filtering methods or modifications to the topography are best suited to preprocess digital elevation models for ICAR. Secondly, ICAR does not solve for w directly but only analytically calculates u and v . The vertical perturbation is then determined by balancing the density-weighted horizon-15 tal winds from the continuity equation (Gutmann et al., 2016), starting at the lowest vertical level.
ICAR is intended as an computationally frugal alternative to full physics models, in principle allowing for very low model top elevations. While employing a low model top to take advantage of the associated computational cheapness is tempting, increased efficiency should not come at the cost of the physical fidelity of the model. The results in this study clearly show 20 that there is a lowest possible model top elevation Z min that ensures that the physical processes within the domain are not influenced by the model top. Boundary conditions imposed at the upper boundary are found not to influence the value of Z min for the investigated parameter space despite potentially mitigating errors in the potential temperature and water vapor fields. In particular, the cloud formation and precipitation processes within the domain are shown to almost exclusively depend on the model top elevation z top and not on the chosen set of boundary conditions, and only stabilize for z top ≥ Z min . It seems unlikely 25 that any boundary condition is able to accurately represent the effect of cloud and precipitation processes above the model domain and the resulting interaction with the corresponding processes in the model domain (e.g. the seeder-feeder mechanism). Therefore, in order to capture all relevant cloud and precipitation processes, the vertical extension of the domain should at the very least encompass the entire troposphere. Altogether these results highlight that model top elevations within the troposphere as employed by past studies are to be avoided (e.g., Gutmann et al., 2016;Horak et al., 2019;Alonso-Gónzalez et al., 2020).

30
This study strongly suggests that no general value for Z min is applicable to all possible scenarios with the results exhibiting large differences between the idealized simulations and the real case study. For the tested parameter space, including the real case, Z min mainly depends on the background state and the height of the topography. The dependence on the background state, characterized by the vertical wavelength λ z = 2πU/N d of the hydrostatic mountain wave, shows that overall larger λ z result in smaller Z min and, conversely, smaller λ z in larger Z min . The dependence of Z min on the background state is explicable with the horizontal wind speed U and the Brunt-Väisälä frequency N affecting the location, amount and magnitude of the up-and downdrafts in the domain. Similarly, Z min depends on ridge height due to the generally stronger up-and downdrafts triggered by higher topographies (Eq. (5)). However, note that the dependence on the ridge height is weak compared to the 5 dependence on the background state.
The determination of Z min considers all MP species with respect to their time averaged spatial distribution and the time averaged total mass within the cross-section as well as the 12-hour (P 12h , idealized simulations) or 24-hour (P 24h , real case simulations) precipitation sum along the cross-section. Note that potential temperature θ is indirectly included in determining 10 Z min since errors in the θ field influence the cloud formation and precipitation processes. However, this study shows that errors in the θ field introduced by the zero gradient boundary condition are mainly localized in the topmost vertical levels (Fig.   4), which correspond to approximately the uppermost 1 to 2 km of the domain, and result in only a negligible influence on cloud formation processes in the tested parameter space. While a constant gradient boundary condition reduces the errors in the potential temperature field, the default zero gradient boundary condition is a suitable alternative for θ provided z top is high 15 enough. This can be ensured by, for instance, employing the method to determine Z min described in this study.
A comparison between ICAR-N and WRF simulations conducted for the same topography and sounding reveals substantial differences in the spatial distributions of q v , q sus and q prc as well as the resulting P 12h . These differences are mainly attributable to additional effects included in the WRF but not the ICAR-N wind field, such as non-linearities and the amplification of the 20 perturbations due to the density decreasing with height. As a consequence both models predict distinctly different events to occur: A snow shower with the majority of snow falling upwind of the ridge in ICAR-N and a snow and graupel shower in WRF with the largest portion precipitating leeward of the ridge. While these results are obtained for one particular sounding they indicate that the linearisation of the wind field has the potential to significantly alter the distribution of precipitation in a study domain. This could have drastic consequences for the results of studies relying on ICAR to provide precipitation fields 25 for, i.e. applications in hydrology or glaciology.
For strongly stratified atmospheric conditions, a constant gradient BC was found to cause numerical stability issues in the idealized and real case simulations alike. Future studies could investigate further BC options that might allow a better approximation of the potential temperature profile: Such approaches might, for instance, (i) analytically diagnose θ for the vertical 30 level above the model top and then apply the corresponding values as a Dirichlet BC or (ii) prescribe the potential temperature from the corresponding height in the forcing data set as Dirichlet BC at the model top in ICAR.
The case study investigates the effect of the proposed modifications to ICAR on a real world application for the South Island of New Zealand. It reveals that these modifications shift the distribution of precipitation upwind, leading to dryer conditions in the alpine range but wetter coastal regions. The method for the determination of Z min presented in this study does not rely on tuning to measurements and may therefore be employed for every region in the world for which a suitable digital elevation model and atmospheric forcing data are available. Furthermore, the method ensures that for z top = Z min the cloud formation processes within the domain are independent from influences of the model top and that only the absolutely necessary amount of vertical levels is used in the simulations. This preserves as much of the computational efficiency of ICAR as possible with-5 out sacrificing additional physical fidelity. However, the extension of the method to determine Z min to longer study periods, compared to the 24 hours of the case study, and a larger variety of background states is not trivial and outside the scope of this study. If a substantial amount of simulations for different background states is required to determine Z min the associated computational cost may outweigh the gain of employing the lowest possible number of vertical levels for the entire study period. Therefore, future research could investigate variations of the Z min determination employed in this study. For instance, 10 a focus on the background states most frequent during each season, or on background states with shorter vertical wavelengths (resulting in higher values of Z min ) to find upper bounds for Z min may drastically reduce the required number of simulations.
With regards to the case study, the unmodified version of ICAR (ICAR-O) is found to produce enhanced precipitation in the alpine range due to artifacts (heightened mixing ratios of hydrometeors) in the topmost vertical levels in the horizontal  Since all ICAR-O simulations generally underestimate precipitation amounts at alpine weather stations on the South Island of New Zealand, and overshooting of measured values does mostly not occur, the higher amounts of P 24h for the simulation with z top = 4.4 km then lowered the calculated MSE. Even though the atmospheric processes in the ICAR-N simulation are more correctly represented in comparison to ICAR-O, the lower amount of P 24h at the alpine sites would result in a higher MSE. Therefore, even though the calculated MSEs were lowest for a model top setting at 4 km, the seemingly correct results were 25 produced for the wrong reasons. This additionally exemplifies why a comparisons to measurements alone cannot determine whether the model results are correct for the correct reason, only a detailed consideration of the underlying processes can be the basis fur such a conclusion.

Conclusions
The key findings and recommendations based on the extensive process-based evaluation of ICAR are summarized in the 30 follwing: model top elevation is lower, cloud formation and precipitation processes within the domain are affected by the model top.
-Results show that, in order to avoid spurious influences of the upper boundary to the microphysical processes within the domain, Z min should be at least as high as the tropopause but may be required even higher in other situations.

5
-Determining an exact value for Z min from comparisons to precipitation measurements may yield results in closer agreement to these measurements but potentially for the wrong reasons (i.e., model artifacts).
-In a proof of concept, the method described in this study to determine Z min is applied to idealized simulations and a real case alike.
-While most of the tested boundary conditions (in comparison to the default zero gradient boundary condition) are suit-10 able to reduce the errors in the water vapor and potential temperature fields, no tested combination of these boundary conditions can achieve a lower value for Z min .
-Model skill, when inferred only from comparisons to surface observations, does not necessarily reflect the model skill in representing atmospheric processes.
-The representation of the wind field in ICAR is improved by ensuring that the Brunt-Väisälä frequency is calculated from 15 the background state of the atmosphere provided by the forcing data. Note that the current version of ICAR employs the perturbed state of the domain.
This study highlights the importance of a process-based in-depth evaluation not only with respect to ICAR but for models in general. Particularly for regional climate models (RCMs) and numerical weather prediction (NWP) models, the results of the case study demonstrate a potential pitfall when model parameters are inferred solely from comparisons to measurements, 20 potentially leading to situations for which model results are more prone to be right but for the wrong reasons. With the increasing complexity of RCMs and NWPs, ICAR could provide a computationally frugal framework to study and better understand singular model components. This would allow for a process-based evaluation of, e.g., MP schemes or advection schemes, contributing to the development and improvement of RCMs and NWPs.
Code and data availability. The modified version ICAR v1.0.1 employed for the simulations  as well as the results obtained (Horak, 2020) are available as download from the respective zenodo repositories.
Author contributions. The investigation and its design, the simulations and their analysis as well as the visualization of the results and writing (original draft and editing) were carried out by JH. The conceptualization of the paper was a joint effort from all authors, as were the discussion and refinement of the methods presented. Additionally, funding acquisition and project administration were carried out by MH.