Multi-layer Cloud Conditions in Trade Wind Shallow Cumulus – Confronting Models with Airborne Observations

Airborne remote sensing observations over the tropical Atlantic Ocean upstream of Barbados are used to characterize trade wind shallow cumulus clouds and to benchmark two cloud-resolving ICON (ICOsahedral Nonhydrostatic) model simulations at kiloand hectometer scales. The clouds were observed by an airborne nadir pointing backscatter lidar, a cloud radar, and a microwave radiometer in the tropical dry winter season during daytime. For the model benchmark, forward operators convert the model data into the observational space for considering instrument specific cloud detection thresholds. The forward 5 simulations reveal the different detection limits of the lidar and radar observations, i.e., most clouds with cloud liquid water content greater than 10−7 kg kg−1 are detectable by the lidar, whereas the radar is primarily sensitive to the “rain”-category hydrometeors in the models and can detect even low amounts of rain. The observations reveal two prominent modes of cumulus cloud top heights separating the clouds into two layers. The lower mode relates to boundary layer convection with tops closely above the lifted condensation level, which is at about 700 m above 10 sea level. The upper mode is driven by shallow moist convection, also contains shallow outflow anvils, and is closely related to the trade inversion at about 2.3 km above sea level. The two cumulus modes are reflected differently by the lidar and the radar observations and under different liquid water path (LWP) conditions. The storm-resolving model (SRM) at kilometer scale reproduces the cloud modes barely and shows the most cloud tops slightly above the observed lower mode. The large-eddy model (LEM) at hectometer scale reproduces better the observed cloudiness distribution with a clear bimodal separation. We 15 hypothesize that slight differences in the autoconversion parametrizations could have caused the different cloud development in the models. Neither model seems to account for in-cloud drizzle particles that do not precipitate down to the surface but generate a stronger radar signal even in scenes with low LWP. Our findings suggest that even if the SRM is a step forward for better cloud representation in climate research, the LEM can better reproduce the observed shallow cumulus convection and should therefore in principle represent cloud radiative effects and water cycle better. 20 1 https://doi.org/10.5194/gmd-2020-14 Preprint. Discussion started: 6 March 2020 c © Author(s) 2020. CC BY 4.0 License.

This is a nice study that fits well into the scope of GMD. The use of the forward simulation gives interesting new insights about the deficiencies of cloud-resolving simulations in representing shallow cumulus clouds. My main comments regard a more thorough comparison of the representativeness of the selected LEM and SRM profiles, and an analysis of the uncertainty of the forward-simulation and the sensitivity of the results to the microphysical model assumptions.
My general comments are detailed in the following, as well as more specific comments and typographical suggestions.  Figure 5 that the 1.25km SRM has a larger cloud cover than the 300-m LEM, especially due to larger contributions from clouds with cloud tops > 1.3km. So I'm surprised that your results here are so different. This might be due to the different microphysical assumptions, but could also be due to the different domains and days used for the SRM vs. the LEM.
For the LEM, it seems that you are using data from only 10 grid points on 4 days, all sampled at the same latitude. Due to the high temporal resolution of the meteogram output this may give you a lot of profiles, but they will all be highly (auto)correlated. The LEM thus samples much less variable conditions than the SRM. To allow for a more robust and fair comparison of the LEM and the SRM, a comparison of the cloud fractions and/or cloud top height distributions of the LEM and SRM for the same domain and the same days should be made. This would establish how representative the meteogram data is.
As the necessary input for the forward-simulator is available only for the meteogram points of the LEM, it might be difficult to use the model output of the full LEM domain to do the forward simulation. You could just use the mean and variability of the parameters from the meteogram points to constrain the forward-simulator, which can then be applied to the entire domain.
Additionally, to understand how much of the forward-simulated SRM-LEM differences come from the different microphysics, I find it important to first show a comparison of the cloud-top height distributions of the two models without using the forward-simulator. This should also be compared to a best-guess observational cloud-top height distribution, either from the lidar alone or from a combination of the lidar and radar-detected clouds. For the lidar, you mention that clouds with liquid water content exceeding 10ˆ-7 kg/kg are detected. So you could apply this same threshold to the LEM and SRM simulations (for the SRM, also the sub-grid cloudiness will have to be taken into account).
Apart from showing the frequency distribution as in Figure  Other questions regarding the simulations are: -Are there any spin-up issues at the beginning of the simulations and is it feasible to use the LEM simulations already from 12 UTC on, i.e. just 3h after initialization? -In the appendix you mention that the SRM uses a diagnostic cloud s cheme in addition to the prognostic cloud scheme. This is an important detail that should already be mentioned in section 3. Furthermore, could you describe how the 'prognostic' cloud scheme works? Is it just a simple saturation adjustment?

C3
-Difference in vertical resolution: In L420 you mention this as a potential reason for the underrepresentation of inversion cloud in the SRM. I guess the 1.25km SRM version that was used to drive the LEMs should have 150 levels -so in case this model output was saved, you could use this SRM version to verify whether the vertical resolution is indeed the reason for the reduced anvils. Otherwise you could try to better understand the influence of the horizontal resolution on the anvil cloud amount by comparing the 300m-LEM to the next coarser LEM nest.
2. Uncertainty and sensitivity of forward-simulations to model assumptions I'm not very familiar with forward simulators, but I feel that it would be important to analyze and discuss the uncertainty of the forward simulations, and how this might influence the results. You mention that the forward simulator has to be configured such that the PSD used in the forward-simulator matches the PSD of the model as good as possible. I assume that there is some uncertainty involved in this process, and it would be good to show or discuss this more explicitly.
I would also appreciate if you could show somewhere what the variability of the input fields for the scale parameters are in the LEM, i.e. how variable the number concentrations are. This information can also help constrain a forward-simulation using the entire model domain of the LEM.
Also, I think that you could learn more about the potential deficiencies in the model microphysics by playing around with the forward simulator and feeding it with slightly adjusted input microphysics parameters. What would have to be different in the microphysics to render the simulations more comparable to the observations, given the simulated mass mixing ratios? You could try to understand how a slight change of the fixed parameters of the SRM one-moment scheme would influence the radardetectable cloud fraction. Given that the droplet radius is so important for the radardetectability, Figure 3c might look very different if you'd just fed the forward-simulator with slightly different number concentration parameters. For the LEM, You could also prescribe the mean number concentrations of the LEM as fixed parameter to mimic what a one-moment microphysics scheme would do.
A more thorough analysis of the uncertainty and sensitivity of the forward-simulations would render the manuscript scientifically more interesting, and should allow you to make your discussion in Section 6 more robust and less speculative.

SPECIFIC COMMENTS:
-Definition of cloud modes / types (e.g. L263-275): Please better define what you mean with 'thermal driven' mode resp. 'shallow convection' mode. You could also use well-established classifications or definitions such as the 'forced, active and passive' categories of Stull 1985, or the definitions from the cloud atlas of the World Meteorological Organization that were used in Vial et al. 2019 JAMES.
-Cloud-top height detection: I think it is never explicitly written whether you only consider the first-detected highest cloud-top height, or whether you also consider 2nd or pot. 3rd cloud-top heights in case of multilayered cloud scenes. Please mention this explicitly. o The referencing for the first two sentences in the introduction should be improved.
-The LCL computation from the dropsondes: Can you say how many dropsondes are used to interpolate the LCL? And by how much they are separated in space and time on average? -Differences between the western and eastern part of the domain related to cloud deepening: Not only is there a difference in the height of the upper mode, but also in the normalized frequencies of the upper mode, with the deeper western half having a reduced frequency compared to the shallower eastern half. This, and also the insensitivity of the lower mode, was also shown in LES of Vogel et al. 2020 QJRMS.
-Section 5.3: The discussion of the results in this section should be better structured and more focused on the most important features. It is not always clear what is compared to what, and there is a lot of switching around between LWP categories, the observations and the different models. I also spotted a lot of typographical errors that should be corrected (e.g. L347 partial coverage; L363 that such a cloud doesn't need any contribution...) - Figure 1: This figure could be improved. Please zoom more into the area of flight operations (only showing e.g. 7 • N to 20 • N), make sure that all flight paths are visible and not overlapping, and add markers/crosses for the dropsonde locations.
- Figure 5: What exactly does the cloud fraction in the legend refer to? Is it just the maximum cloud fraction? It would be nice to give the total projected cloud cover instead of a cloud fraction, as this would give a sense of the total cloudiness.
- Figure A1: similar to the above, in the caption you mix cloud cover and cloud fraction, but I guess you mean the same thing. Interactive comment on Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2020-14, 2020.