Assessment of valley cold pools and clouds in a very high-resolution numerical weather prediction model

The formation of cold air pools in valleys under stable conditions represents an important challenge for numerical weather prediction (NWP). The challenge is increased when the valleys that dominate cold pool formation are on scales unresolved by NWP models, which can lead to substantial local errors in temperature forecasts. In this study a 2-month simulation is presented using a nested model configuration with a finest horizontal grid spacing of 100 m. The simulation is compared with observations from the recent COLd air Pooling Experiment (COLPEX) project and the model’s ability to represent cold pool formation, and the surface energy balance is assessed. The results reveal a bias in the model long-wave radiation that results from the assumptions made about the sub-grid variability in humidity in the cloud parametrization scheme. The cloud scheme assumes relative humidity thresholds below 100 % to diagnose partial cloudiness, an approach common to schemes used in many other models. The biases in radiation, and resulting biases in screen temperature and cold pool properties are shown to be sensitive to the choice of critical relative humidity, suggesting that this is a key area that should be improved for very high-resolution modeling.


Introduction
The stable boundary layer presents a difficult challenge for numerical simulation (Beare et al., 2006), as it is associated with a rich variety of complex dynamical processes.The dominant eddy scales are smaller than those typical of convective boundary layers, turbulence is weak and intermittent (Van De Wiel et al., 2003) and the presence of internal grav-ity waves can have an important influence on drag and mixing.In complex terrain the flows are further complicated by drainage flows and the formation of valley cold pools that can result in minimum temperatures, which are significantly lower than those above the surrounding higher terrain.Cold pools can lead to localized road icing and fog formation, presenting hazards to road users and frost damage to agriculture.
Previous studies of drainage flow and cold pools have tended to focus on large-scale mountain valleys (e.g.Barr and Orgill, 1989;Lareau et al., 2013) or isolated bowls and sinkholes (e.g.Whiteman et al., 2008;Zängl, 2005).However, small-scale valleys are also of practical importance and these are often unresolved even in modern numerical weather prediction (NWP) regional models that have grid spacing of a few kilometers.
These small-scale valleys can nonetheless lead to significant spatial temperature variations (Smith et al., 2010).The COLd air Pooling Experiment (COLPEX) was an investigation of the formation of cold air pools in such small-scale United Kingdom orography (Price et al., 2011).It combined high-resolution modeling and observations for a field campaign in the Clun Valley, Shropshire, UK, from January 2009 to April 2010.The Clun Valley is 1-2 km wide with a depth of 100-200 m and, as such, is not resolved in the Met Office regional UK forecast model, neither at the 4 km horizontal resolution used operationally at the time, or at the current 1.5 km operational resolution.Cold air pools in this region can lead to local temperature differences of 5-10 K between hilltop and valley bottom.Details of the observational campaign are described in Price et al. (2011).Sheridan et al. (2013) analyzed the observed cold pool structure and related the frequency and strength of cold pools to previous idealized Published by Copernicus Publications on behalf of the European Geosciences Union.
studies of cold pool formation (Vosper and Brown, 2008).Modeling work has used a nested version of the Met Office Unified Model (MetUM), with a horizontal grid spacing of 100 m.Over the lowest 1 km, the vertical resolution is on average 27 m.The basic model configuration and initial comparison with observations is given by Vosper et al. (2013a).The results show that the high horizontal and vertical resolution is required in order to sufficiently resolve the orography and obtain a good level of agreement between the simulated and observed temperature variations across the cold pools in clear-sky cases.Subsequently, Vosper et al. (2013b) focused in detail on a single intensive observation period (IOP) that took place on the night of 4 March 2010 (IOP 16).The latter study demonstrated the skill of the MetUM in reproducing the observed temperatures across the valley and provided a detailed analysis of the heat budget within the cold pool.
Previous modeling studies in COLPEX, and in other field campaigns, have used a case study approach, simulating short (2-3 day) events where cold air pools are known to form.The current study takes a longer simulation period approach with a 2-month simulation at 100 m resolution in order to provide a longer term assessment of cold pool formation over the COLPEX region.This allows for a more systematic and objective validation of the model representation of cold pools over a range of conditions.It also allows for a quantification of the frequency, strength and drivers of cold air pools.Given the density of observations obtained in COLPEX this provides a good opportunity to validate the behavior of the MetUM at high resolution over a range of conditions.The simulation also allows for an investigation of the importance of spinning up the high-resolution model and can also be used to provide a more complete analysis of the synoptic-scale influences on cold pool formation.
Two months is not sufficiently long to generate a true climatology; however, the computational resources required to undertake a multi-year simulation at this resolution are beyond what was available for this project.The current study provides a useful intermediate step towards this.
In the following sections the model configuration, the simulations and forcing data are described.Following this a discussion of the importance of model spin-up is presented.From the long-term simulation an analysis of the temperature bias relative to observations is then given.Based on this analysis, an additional simulation is then presented and analyzed.A comparison of simulated and observed cloud cover is then made with reference to unresolved humidity variability.The paper concludes with a more general discussion of the results.

Model setup
The COLPEX simulations were conducted using the Me-tUM Model with a double-nested setup, with simulations running with horizontal grid spacings of 4 km, 1.5 km and 100 m (hereafter referred to as 4km, 1.5 and 100m models, respectively).Vosper et al. (2013a) described the modeling setup in detail, so only a brief summary is given here.The innermost 100m domain uses a non-uniform horizontal grid with the resolution decreasing from 100 m to 1.5 km at the boundaries (Tang et al., 2012).The domain covers a region of 80 km by 80 km, centered on the Clun Valley, Shropshire (UK), with a 30 km by 30 km inner domain of constant 100 m resolution.Figure 1 shows the innermost 10 km by 10 km of the domain including the Clun Valley as represented in the 100 m simulations.Analysis data from the Met Office three-dimensional (3-D) variational assimilation scheme (3D-VAR) are used to initialize the 4 km model and update the 4km state every 3 h.The 4km simulation then produces the lateral boundary conditions for 1.5 km that in turn produces lateral boundary conditions for the 100m simulation.The nesting is one-way, with no feedback to lower resolutions.1.5 km and 100 m simulations are "free running", i.e. are not re-initialized during the simulation.This allows for aspects such as soil moisture to develop spatial patterns consistent with the high-resolution terrain data during the simulation rather than being constrained by relatively coarse analysis fields.The simulations presented here are all initialized at 12:00 UTC, allowing the model to spin up prior to the onset of stable conditions in the evening.
Other details of the model setup are broadly similar between resolutions; however, there are a few key differences.First, in the 100 m simulations the number of vertical levels is increased from 70 to 140.Second, different turbulence parametrizations are used at different resolutions.The 4 km model uses a 1-D boundary-layer scheme, while the 1.5 km uses a 2-D Smagorinsky turbulence parametrization for horizontal mixing and a 1-D boundary-layer scheme for vertical mixing, and the 100 m model uses a 3-D Smagorinsky scheme (Lock et al., 2000).Finally, the vertical profile of critical relative humidity (RH crit ) necessary for cloud formation is increased, to take into account the reduction in grid box size (see Vosper et al., 2013a).The critical relative humidity threshold approach allows non-zero cloud cover to be diagnosed when grid box relative humidity is below saturation.This assumes that there is unresolved subgrid-scale variability in temperature and moisture that results in regions of cloud cover.This approach to sub-grid-scale variability was first proposed by Sommeria and Deardorff (1977) and Mellor (1977).It was then developed into the form used here by Smith (1990).The critical relative humidity is a function of the standard deviation of sub-grid-scale variability, such that when temperature and moisture variability are fully resolved the standard deviation of sub-gridscale variability is zero and RH crit is 100 %.For the 1.5 km model the vertical profile of RH crit is set to 91 % at the surface, reducing to 80 % at 845 m and constant at 80 % above this level.At 100 m this profile is set to 99 % in the lowest 500 m, decreasing linearly to 91 % at 3.4 km and constant (90 %) above this level.These profiles are shown in Fig. 2.

The simulations
The COLPEX observational field campaign ran between January 2009 and April 2010 (Price et al., 2011).A number of short simulations have been completed, corresponding to IOPs, some of which have been the focus of previous analysis (Sheridan et al., 2013;Vosper et al., 2013b).In addition to these, an extended simulation has been completed, for the period 17 August to 15 October 2009 (the long-term simulation).This period was chosen because it includes multiple IOP cases, two of which (9-10 September, IOPs 4 and 5) have already been investigated in detail (Price et al., 2011;Vosper et al., 2013a).The IOPs during this period include cases where the model reproduces the observations well (e.g.IOPs 4 and 5), and those where the model performs less well (16-17 September, IOP 6 and 7).In terms of investigating the mechanisms involved in cold pool formation it is appropriate to focus on IOPs where the model performs well; however, when the more general performance of the model is of interest it is important to consider a wider range of conditions over the whole 2-month period.

Continuity of 4 km analysis forcing data
The COLPEX simulations are reliant on archived operational 4 km UK analyses data from the Met Office 3D-VAR assimilation scheme.Unfortunately during 2009-2010 there are occasional instances of missing data from the archive.During the long-term simulation there were missing data during 5-7 September (18:00-14:00 UTC), 26-28 September (00:00-14:00 UTC) and on 4 October (09:00-14:00 UTC).It was beyond the resources of the current project to recreate the missing data.It was therefore necessary to understand and minimize the impact of the data gaps.
In order to retain as much of the system memory as possible, when the model is restarted after a period of missing forcing data the soil properties (soil moisture content and temperature) are initialized from the model state at the last equivalent time of day prior to the missing data.For example, when restarting the simulation at 12:00 UTC on 29 September the soil properties are initialized with those at 12:00 UTC on 25 September.Other, faster components of the simulation are simply downscaled; i.e., air temperature in the 100 m simulation at 12:00 UTC, 29 September, is re-initialized with the 4 km air temperature.In tests this approach was shown to minimize the impact of the data gaps on the model evolution.To test this a period without data gaps was chosen.Temperature and moisture time series at the main observation sites of Spring Hill (hilltop) and Duffryn (bottom of the Clun Valley) from the long 100 m simulation were compared with equivalent time series from simulations where the model was initialized using either the 100 m model state from 48 h previously, or by interpolating the 4 km solution to the 100 m grid.This analysis (not included here for brevity) led to the above approach.

Impact of spinning up the model state
One of the motivations for completing the long-term simulation is to quantify the importance of spinning up slower components of the model state (e.g.soil moisture and temperature) within the 100 m simulation.In the context of these simulations, spin-up is the process of integrating the model forward in time to reduce the impact of uncertainty in the initial conditions and to allow different components of the simulation to reach a consistent state.Completing multimonth simulations with this setup is currently very computationally expensive and the benefits of avoiding this are substantial.It is important to note, however, that soil properties in the shorter simulations are already spun up at the 4 km resolution.In downscaling to the 100 m simulation the soil temperatures are interpolated onto the 100 m grid and then adjusted relative to the height difference between orographic heights in the different resolutions using an assumed lapse rate of 6 K km −1 .This lapse rate was chosen from an analysis of soil temperature and orographic height variability in the 4 km output.Soil moisture is simply interpolated onto the 100 m grid.
Figure 3 shows the 100 m domain-averaged soil temperatures throughout the long-term run.The top soil cools throughout August despite warm synoptic conditions (mean temperature at Clun was 292.5 K; Jemmett- Smith, 2014, p. 91-92).At the start of September there is a warming trend (just prior to the first missing data period), before resuming the cooling trend.September was dominated by anticyclonic conditions with cyclonic conditions developing in October (Jemmett- Smith, 2014, p. 92).The lowest soil level shows an underlying trend that results from the extremely slow response time of this level.Figure 4 shows the domain-average soil moisture content throughout the long-term simulation on each soil level.Figures 3 and 4 show that long-term variability exists in the soil properties.
During the long-term simulation there are three pairs of IOPs: 9 and 10 September (IOP 4 and IOP 5), 16 and 17 September (IOP 6 and IOP 7), and 13 and 14 October (IOP 8 and IOP 9).Markers on Figs. 3 and 4  are close to those in the long-term simulation.Figure 4 shows some difference in soil moisture between the IOP and longterm simulations, which is most pronounced after the precipitation event at the start of September.This suggests that there are differences in the hydrological cycle between the two resolutions since in the IOP simulations the soil moisture content remains similar to the downscaled 4 km content throughout the simulation.
Figure 5 shows a comparison between the screen-level air temperature and specific humidity at the Duffryn site in the IOP and long-term simulations for the three IOP pairs.Figure 5 shows that in general, the long-term simulation is not required in order to simulate the IOP cases, i.e. that the atmosphere adjusts relatively rapidly in the 4 km and 100 m simulations.This supports the validity of earlier COLPEX work (e.g.Vosper et al., 2013b) and also suggests that the missing data periods in the long-term simulation do not undermine the analysis presented here.

Model temperature bias
Figure 6 shows the frequency distribution of biases in the daily minimum and maximum model screen-level (1.5 m) air temperature from the long-term simulation at the main observation sites, Duffryn (valley site) and Spring Hill (hilltop site; see Fig. 1).All data are hourly averaged and differences are defined as model minus observed.Results from 100 m and 1.5 km resolutions are shown.Figure 6a shows that for 100 m the daily minimum temperature at Duffryn shows a stronger cold bias than at Spring Hill, i.e. at Duffryn there is a higher number of days where the minimum temperature is  more than 3 K lower than observed.The mean bias at Duffryn is −1.6 ± 0.3 K and at Spring Hill it is −0.7 ± 0.2 K, where all errors stated are calculated as the standard error of the mean.For 1.5 km there is an apparent warm bias of 1.4 ± 0.3 K at Duffryn and 0.4 ± 0.2 K at Spring Hill.This warm bias at Duffryn is consistent with the fact that the valley is poorly resolved in the 1.5 km simulation.
Figure 6b shows that the simulated daily maximum temperatures have a general cold bias.For 100 m this cold bias is −0.6 ± 0.3 K at Duffryn and −0.9 ± 0.3 K at Spring Hill.In the 1.5 km simulation the mean biases are −1.1 ± 0.1 K at Duffryn and −0.1 ± 0.1 K at Spring Hill.If the Spring Hill temperatures are representative of temperatures outside the valley this suggests that there is a widespread cold bias in the 100 m simulation that is not caused by bias in the lateral boundary conditions or present in the lower resolution simulations.The cold bias in 1.5 km at Duffryn is most likely due to unresolved sheltering effects, which would give rise to local daytime warming within the valley.
The difference between hilltop (Spring Hill) and valley bottom (Duffryn) screen temperature gives a measure of the strength of cold air pooling in the Clun Valley Vosper et al. (2013a).Using this difference, Fig. 6c shows the frequency distribution of daily maximum cold pool strength in the 100 m and 1.5 km simulations and in the observations.It should be noted that this analysis includes all data from the simulation period, including stable and unstable conditions.For 100 m the model simulates more strong cold pool events than observed.The simulated mean 100 m cold pool strength is 2.4 ± 0.3 K, compared to an observed mean strength of 1.6 ± 0.3 K. Figure 6a and b show this difference is due to the stronger cold bias at Duffryn.For 1.5 km the valley is not properly resolved and the apparent cold pool strength is typically close to zero; the mean is 0.4 ± 0.1 K.

Daily minimum temperature bias
Figure 7 shows the distribution of 100 m model daily minimum temperatures against observations at Duffryn.The strongest cold biases exist during relatively warm nights, whilst there is good agreement between model and observations during the coldest nights.The coldest nights are clearsky cases, suggesting that perhaps errors in the simulated cloud cover may contribute to the cold bias seen at higher temperatures.Previous COLPEX studies have focused on clear-sky cases, since this is a prerequisite for strong cold pool formation (e.g.Vosper et al., 2013b); however, Fig. 7 shows that a more general evaluation of the modeling setup should include other cases.
Whilst there were no in situ direct measurements of cloud properties during this time at Duffryn or Spring Hill, downward long-wave radiative flux is available and directly quantifies the impact of cloud on the nocturnal energy balance.Figure 8 shows the agreement between simulated ( 100 m) and observed long-wave radiative flux at Spring Hill.The relationship to the temperature bias is illustrated by coloring the markers based on the observed hourly temperature bias.Each panel shows all observations from the 2-month simulation in the daily 3 h window, with each circle showing a single hourly comparison.At both Spring Hill and Duffryn (not shown) sites there are a substantial number of days where downward long-wave radiation is under predicted.This bias develops during the evening and persists through the morning.The size of the bias in long-wave radiation is similar at both sites, with a mean daily bias of −21.7 ± 2.7 Wm −2 at Duffryn and −23.3 ± 3.2 Wm −2 at Spring Hill.The bias is smallest in the period 13:00-16:00 UTC at both sites when there is a mean model bias of −6.0 ± 2.9 Wm −2 .There is a systematic increase in this bias during the night and morning, increasing from −24.4 ± 2.9 Wm −2 (19:00-22:00 UTC) to −33.7 ± 2.7 Wm −2 (07:00-10:00 UTC).There is a substantial cold bias in all time periods in Fig. 8; however, Fig. 8h in particular suggests that this cold bias is associated with the under predicted downwelling long-wave radiation.
In the 1.5 km simulation, the bias in downward longwave flux is smaller than seen in the 100 m model, the mean being −10.1 ± 2.4 Wm −2 , at Duffryn a reduction of 53 % compared to the 100 m model.At Spring Hill this value is −7.2 ± 2.4 Wm −2 (a reduction of 69 %).
At this point we conclude that the daily minimum temperature in the 100 m simulation has a systematic cold bias, most likely caused by too little cloud cover in the 100 m simulation itself (in contrast to biases introduced via transport through the lateral boundaries).The fact that Duffryn exhibits a greater cold bias than Spring Hill is most likely due to the effects of resolved cold pool formation, which exacerbate the bias.

Daily maximum temperature
During the daytime shortwave radiation dominates the surface energy budget.Figure 9 shows the agreement between the simulated and observed downward shortwave radiative flux at Spring Hill, combined with the screen temperature bias in the 100 m model.As shown previously, the simulated mean daily maximum temperature exhibits a cold bias.Figure 9 is consistent with this.Averaged between both stations, 89 % of data show a cold bias at the start of the day (07:00-10:00 UTC).This fraction decreases throughout the day, dropping to 71 % by 16:00-19:00 UTC.In contrast to this cold bias, downward shortwave radiation is over predicted on average, consistent with the reduction in frequency of cold bias data points.In the early morning and late evening (07:00-10:00 and 16:00-19:00 UTC) the mean bias in shortwave radiation is 14.4 ± 5.0 Wm −2 .Between 10:00 and 16:00 UTC the mean bias is 36.6Wm −2 .During the afternoon, strong warm bias points (defined as those for which the model temperature is 1 K or more higher than observed) are associated with an over prediction of downwelling shortwave radiation, whilst strong cold bias points (bias ≤ −1 K) are typically associated with under predicted downwelling shortwave radiation, though in general cold bias points are evenly distributed between over-and underpredicted shortwave radiation.It seems likely, therefore, that the cold bias in the daily minimum temperatures is the cause of the cold bias in daily maximum temperatures.It was hypothesized earlier that this is due to under prediction of cloud cover, and the observed over prediction of shortwave radiation also indicates too little cloud cover in the model at 100 m.

Impact of the RH crit profile
The above analysis suggests that a cloud cover bias affects the general ability of the model to accurately simulate the observed temperatures.This is now verified by performing an additional simulation using a modified version of the 100 m model setup.
As described earlier, although there are only a minimal set of differences between the 100 m and 1.5 km configurations, one difference is the prescribed vertical profile of the relative humidity threshold, RH crit , above which clouds form.The values are higher in the 100 m configuration, based on the assumption that there will be less sub-gridscale variability in humidity at higher resolution, since at sufficiently high-resolution variability in temperature and humidity would be resolved.In order to test the sensitivity of the simulation to cloud cover the prescribed vertical RH crit profile in the 100 m simulation was replaced with a vertical profile equivalent to the 1.5 km profile, with values interpolated onto the 100 m vertical levels.The modified 100 m setup (hereafter referred to as 100 m_r) is then tested in an additional simulation.
Figure 8 showed that a bias in the 100 m downward long-wave radiation exists throughout the night.This bias is pronounced during 15-25 September and is absent at coarser resolution ( 1.5 km) over the same period, suggesting that it is an internal feature of the 100 m simulation.This period also includes two IOPs (16-17 September; IOPs 6 and 7), allowing for a detailed comparison with observations, and therefore was chosen for a test of the 100m_r configuration.Figure 10 shows simulated mean vertical profiles of cloud fraction over the lowest 1 km for the 1.5 km, 100 m and 100 m_r simulations.As expected, comparing Fig. 10a  and b shows a large decrease in cloud cover between the 1.5 km and original 100 m simulation.This decrease in cloud cover is largely due to the change in RH crit profile, since reverting the RH crit profile at 100 m to the 1.5 km profile results in cloud cover fractions similar to 1.5 km (Fig. 10c).
Figure 11a shows the vertical cloud profile averaged over the re-run period and averaged spatially over the inner domain of the 100 m (i.e. the central domain of the simulation with constant 100 m grid spacing).Figure 11b and c show equivalent mean cloud profiles for day (07:00-19:00 UTC) and night (19:00-07:00 UTC) periods separately.In Fig. 11a the cloud cover over the lowest 100 m is similar in the Figure 9. Bias in 100 m simulated downward shortwave radiation and its relationship to bias in the screen-level temperature, at Spring Hill.The temperature bias is represented as the color and size of the marker (simulated -observed).100 m and 100 m_r simulations but above this the 100 m_r cloud is close to that in the 1.5 km model.When the cloud profiles for day and night are separated, more substantial differences become apparent between 1.5 km and 100 m_r.During the day the 100 m_r cloud fraction is close to that of 1.5 km, whereas the original 100 m cloud cover is consistently smaller.During the night 100 m_r produces more cloud over the lowest 500 m than 1.5 km and 100 m.Above 500 m the 100 m_r cloud is more similar to that of 1.5 km than 100 m.The diurnal variability in near-surface air conditions is larger in 100 m than 1.5 km during 15-25 September.The domain and time averaged 100 m near-surface air has lower night temperatures, increased dew deposition and lower specific humidities than 1.5 km.The cooler temperatures dominate, leading to the observed increased cloud cover at 100 m irrespective of the RH crit profile used (Fig. 11c).
Comparing the downward long-wave radiation in the 100 m and 100 m_r simulations, it is evident that the bias is reduced by reducing RH crit , but not eliminated.The 100 m downward long-wave radiation has a bias of −59.4 ± 17.3 Wm −2 averaged over 15-25 September and at both sites.For 100 m_r, this bias is −43.1 ± 18.1 Wm −2 : a reduction of 27 %.Both Duffryn and Spring Hill show similar biases.For the same period, the 1.5 km downward long-wave radiation has a similar bias to the 100 m_r output, with a mean bias of −44.2 ± 18.1 Wm −2 , suggesting that there is a residual bias, perhaps connected with errors in the 4 km driving model over this period.Note that 15-25 September was chosen as a period during which there is substantial bias in the simulation.The model long-wave bias of 1.5 km averaged over the whole of the long-term simulation period is −8.7 Wm −2 (see Sect. 3.1).
During the re-run period downward shortwave radiation in the 100 m simulation is over predicted by 59.1 ± 17.4 Wm −2 (considering all data between 07:00 and 18:59 h at both sites).In the 100 m_r simulation this bias is reduced to 11.2 ± 15.8 Wm −2 , again suggesting improved cloud cover.
In general, screen-level temperatures agree more closely with observed temperatures in the 100 m_r simulation than the 100 m simulation.Over the re-run period the mean temperature bias is reduced from −1.1 to −0.3 K at Duffryn and from −0.9 to −0.5 K at Spring Hill.Daily maximum temperatures are within 0.5 K of observed maxima during this period (for both RH crit profiles) and do not systematically improve with the updated profile (the bias at Duffryn changes from 0.3 to 0.1 K whilst at Spring Hill the bias changes from −0.02 to −0.3 K).Daily minimum temperatures do systematically improve, with bias at Duffryn changing from −2.3 to −0.2 K and the bias at Spring Hill changing from −1.5 to −0.7 K.
The impact of these changes on simulated cold pool strength is shown in Fig. 12, which illustrates how changing the RH crit profile reduces the number of nights in which cold pools develop, more closely matching the observed cold pool variability over this period.
Since the re-run period was chosen to include two IOP cases (16-17 September; IOP 6 and 7), it is also possible to compare simulated cloud profiles against radiosonde pro-file data.Figure 13 compares 1.5 km model cloud cover profiles with radiosonde profiles at Spring Hill.Radiosondes were released hourly from Spring Hill and Duffryn during the evenings of 16 and 17 September.Both nights show high relative humidity at heights between 500 m and 1 km, in particular during the night of the 17 September when the air is near saturated between 700 m and 1 km throughout the night.This is in reasonable agreement with cloud cover in the 1.5 km simulation, and Figs. 10 and 13 demonstrate that not only does the updated RH crit profile result in cloud cover at 100 m resolution, which is similar to that in the 1.5 km simulation, but also produces cloud cover that is in better agreement with observations.
As discussed earlier, the model critical relative humidity threshold is designed to account for unresolved variability in the humidity and temperature fields Smith (1990).It is therefore reasonable to expect that as resolution increases, the fraction of resolved variability increases and the appropriate RH crit increases towards 100 % (variability fully resolved).In order to quantify this, variability was calculated as the spatial standard deviation of relative humidity at each model height at each hour during the 10-day re-run period.These profiles were then time averaged.Figure 14 shows vertical profiles of relative humidity standard deviation.Also included in Fig. 14 are profiles for 100 m model output, which prior to analysis have been horizontally averaged to 1.5 km resolution.Away from the land surface RH standard deviation is greatest in 100 m as expected, and similar in both 1.5 km and the reduced resolution 100 m data.However, Fig. 14 also shows that close to the land surface differences in the surface energy balance between the model resolutions dominate and result in higher RH standard deviation in 1.5 km than the higher resolution simulation.This is particularly pronounced during nights.As was noted earlier there are differences in the domain-averaged near-surface conditions between model versions and 100 m is cooler and drier at night than 1.5 km, resulting in lower RH variability in 100 m than 1.5 km.

Conclusions
A high-resolution, 2-month simulation of the flow in Clun Valley has been produced, allowing an analysis of the base model climatology as well as producing a robust assessment of the ability of the MetUM to capture the variability and strength of cold pool formation.By comparing a 2-month simulation to a series of shorter, IOP, simulations it was demonstrated that short simulations provide accurate results without the requirement to spin up slowly evolving components of the model at the highest resolution though this may not hold true for other regions.Within the context of COLPEX this is an important result as it validates the approach taken in previous studies, which has only made use of shorter simulations.It is a useful result in general since it supports the scientific validity of future high-resolution simulations, without the requirement for computationally expensive extended simulations.
In addition to testing the importance of extended spin-up times, the longer simulation also allowed an assessment of the model climatology.Daily minimum temperatures were shown to have a cold bias in the highest resolution simulations relative to observations.This bias was then demonstrated to be due to choices made within the cloud scheme about sub-grid humidity variability.Compared to the 1.5 km resolution simulation ( 1.5 km), the original 100 m simulation ( 100 m) predicted less cloud cover; 1.5 km also compared better with radiosonde observations made during an IOP.
As a sensitivity test, a period in the long-term simulation was re-run, during which the bias in downward long-wave radiation at 100 m was most pronounced.For this simulation ( 100 m_r) the vertical profile of RH crit at 100 m resolution used the 1.5 km model profile; i.e., no change in subgrid-scale variability between resolutions was assumed.This reduced the differences in model cloud cover between the resolutions.The bias in the downward long-wave radiative flux in the 100 m_r simulations was also reduced relative to the observations and this resulted in a reduction in the cold bias of daily minimum temperature and an improved simulation of cold pool variability.This demonstrates the importance of considerations of sub-grid-scale variability for accurate simulation at very high resolution.The conclusions pre-sented here are based on simulations at a single location and require further investigation before drawing robust, general conclusions.The analysis has also focused on horizontal resolution; however, Vosper et al. (2013a) present results from 70 140 and 178 vertical level configurations of the model and show that the COLPEX simulations do exhibit a sensitivity to vertical resolution.When the 100 m model was run with 70 vertical levels (without changing RH crit ) as an additional sensitivity test, there was a 16 % reduction in cloud cover, suggesting the current conclusions are also sensitive to the vertical resolution.
Two case IOPs, 9 September (IOP 4) and 4 March (IOP 16), have been analyzed in previous studies (Vosper et al., 2013a, b), and it is important to note that when simulations of these IOPs were re-run with the 100 m_r setup there was negligible impact on the results, since both model and observations indicate clear-sky conditions.The original 100 m model output accurately predicts the observed temperature in these IOPs, consistent with the analysis presented here, which shows that the model is well formulated for clear-sky nights.NWP models are increasingly being run at resolutions more traditionally associated with large eddy simulations.Moving into this regime presents a number of challenges since at horizontal grid spacing of 100-1000 m models will only partially resolve eddies (Wyngaard, 2004;Hong and Dudhia, 2011), a domain referred to as the gray zone (Honnert et al., 2011).In the convective boundary layer, the dominant eddy length scales are typically around 1km in size, approaching the depth of the boundary layer and therefore most likely resolved by the current modeling setup (Wyngaard, 1990).However, stable boundary-layer eddies have smaller dominant eddy scales (Wyngaard, 1990) and will be poorly resolved at 100 m resolution.Shin and Hong (2013), for example, completed a range of simulations varying the model resolution, demonstrating this sensitivity to the atmospheric stability.Investigating the resolution sensitivity of large eddy simulations of the stable boundary layer, Beare and Macvean (2004) showed a convergence of model behavior only as horizontal resolution approached 2 m.
Interestingly, in contrast to the results of our study, Boutle et al. ( 2014) investigated stratocumulus simulations in the gray zone and found relatively little sensitivity to the choice of RH crit .This discrepancy suggests the application of a more sophisticated approach to representing unresolved variability in saturation, in which the assumed sub-grid variability is not static but instead varies according to properties of the flow.Watanabe et al. (2009), for example, developed such a scheme with a dynamic representation of the unresolved probability density functions.Such an approach has the potential to circumvent the need to specify RH crit values and may be more appropriate.

Figure 2 .
Figure 2. Vertical profiles of RH crit used in the different nested models.The 100 m and 1.5 km profiles are shown in black and red, respectively.

Figure 3 .
Figure 3. Evolution of the domain-mean soil temperature, in the 100 m simulation.The soil levels shown are 0.05 m (black), 0.225 m (green), 0.675 m (dark blue) and 2 m (light blue).Shaded circles show the values from corresponding IOP simulations.Vertical dashed lines mark 1 September and 1 October.

Figure 4 .
Figure 4. Evolution of the domain-mean layer soil moisture content at different levels, in the 100 m simulation.The soil levels shown are 0.05 m (black), 0.225 m (green), 0.675 m (dark blue) and 2 m (light blue).Shaded circles show the values from corresponding IOP simulations.Panels a-d correspond to increasing soil depths of 0.05, 0.225, 0.675 and 2.0 m.Vertical dashed lines mark 1 September and 1 October.

Figure 6 .
Figure 6.Bias in the model screen-level (1.5 m) temperatures (model minus observed).(a) Frequency distributions of daily minimum temperature bias compared to observations at Spring Hill and Duffryn, (b) frequency distributions of daily maximum temperature bias, (c) frequency distributions of cold pool strength, defined as the maximum daily difference between Spring Hill and Duffryn.The data are binned into intervals of width at 0.5 K. Daily minimum and maximum temperatures are calculated from hourly average temperatures.

Figure 7 .
Figure 7. Simulated daily minimum 100 m screen temperatures versus observations at Duffryn.

Figure 8 .
Figure8.Bias in 100 m simulated downward long-wave radiation and its relationship to bias in the screen-level temperature, at Spring Hill.The temperature bias is represented as the color and size of the marker (simulated -observed).Each marker represents mean hourly values.

Figure 10 .
Figure 10.Hourly area-averaged profiles of model cloud fraction (F ) for (a) 1.5 km, (b) 100 m and (c) 100 m_r.Vertical lines show 00:00 UTC, 16 September and 00:00 UTC, 19 September, and correspond to the period shown in Fig. 13.Area averaged are calculated over the 10 km by 10 km domain illustrated in Fig. 1.

Figure 12 .
Figure 12.Impact of RH crit profile change on the simulated cold pool strength, defined as the difference between screen temperature between Spring Hill and Duffryn (Spring Hill minus Duffryn).Positive values indicate the presence of cold air pools.(a) 100 m cold pool strength, (b) 100 m_r cold pool strength, (c) observed cold pool strength.