Surface representation impacts on turbulent heat fluxes in WRF (v.4.1.3)

The water and energy transfers at the interface between the Earth’s surface and the atmosphere should be correctly simulated in numerical weather and climate models. This implies the need for a realistic and accurate representation of land cover (LC), including appropriate parameters for each vegetation type. In some cases, the lack of information and crude representation of the surface leads to errors in the simulation of soil and atmospheric variables. This work investigates the ability of the Weather Research and Forecasting (WRF) model to simulate surface heat fluxes in a heterogeneous area of 5 southern France, using several possibilities for the surface representation. In the control experiments, we used the default LC database in WRF, which differed significantly from the actual LC. In addition, sub-grid variability was not taken into account since the model uses, by default, only the surface information from the dominant LC category in each pixel (dominant approach). To improve this surface simplification, we designed three new interconnected numerical experiments with four widely-used land-surface models (LSMs) in WRF. The first one consisted of using a more realistic and higher-resolution LC 10 dataset over the area of analysis. The second experiment aimed at investigating the effect of using a mosaic approach, where 30-m sub-grid surface information was used to calculate the final grid fluxes, based on weighted averages from values obtained for each LC category. Finally, in the third experiment, we increased the model stomatal conductance for conifer forests, due to the large fluxes errors associated with this vegetation type in some LSMs. The simulations were evaluated with gridded area-averaged fluxes calculated from five tower measurements obtained during the Boundary Layer Late Afternoon and Sunset 15 Turbulence (BLLAST) field campaign. The results from the experiments differed depending on the LSM and displayed a high dependency of the simulated fluxes on the specific LC definition within the grid cell, an effect that was enhanced with the dominant approach. The simulation of the fluxes improved using the more realistic LC dataset except for the LSMs that included extreme surface parameters for the coniferous forest. The mosaic approach produced fluxes more similar to reality and served to improve, especially, the latent heat flux simulation of each grid cell. Therefore, our findings stress the need 20 to include an accurate surface representation in the model, including soil and vegetation sub-grid information with updated 1 https://doi.org/10.5194/gmd-2020-371 Preprint. Discussion started: 2 February 2021 c © Author(s) 2021. CC BY 4.0 License.


25
The Earth's surface is constantly changing at different timescales (Sellers et al., 1995). Natural changes of the land surface (vegetation) occur due to climate variability and seasonality (Weltzin and McPherson, 1997;Crucifix et al., 2005). However, human beings have significantly contributed to non-natural and accelerated changes in land cover (LC), especially during recent decades (Pielke et al., 2011). These changes can be extremely important because they modify the natural cycles of energy (Seneviratne et al., 2010), trace gases (e.g., Muñoz-Rojas et al., 2015;Green et al., 2019) or nutrients (e.g. , Holmes 30 et al., 2005), among others, and because they alter ecosystems (Pielke et al., 1998). The consequences of these alterations are difficult to predict due to the non-linearity of the numerous connected processes (Pielke et al., 1999). On the one hand, the changes in the habitat alter food chains and impact vegetal and animal species (Auffret et al., 2018), but also smaller organisms living within them or in equilibrated soil, such as bacteria and viruses (Jeffery and Van der Putten, 2011;Blackburn et al., 2007;Rulli et al., 2020). On the other hand, radiative and texture properties of the surface are also modified: changes in albedo 35 (Loarie et al., 2011), emissivity, thermal properties of the soil (Luyssaert et al., 2014) or surface roughness (Bonan et al., 2018).
This modifies the heat and water exchange processes between the surface and the atmosphere by altering the net radiation at the surface. Indeed, the water transfers from the soil to the air (and vice versa) are significantly linked to the vegetation type and the soil properties: infiltration, runoff, soil moisture or evapotranspiration (ET) (Zhang and Schilling, 2006). All these changes in the surface energy balance have direct or indirect feedbacks on the planetary boundary layer (PBL) development (Combe et al.,40 2015), cloud formation (Vilà-Guerau De Arellano et al., 2012), atmospheric temperatures (Koster et al., 2006;Christidis et al., 2013), and rainfall (Koster et al., 2003). This may impact surface characteristics and vegetation activity again and, in the long term, it will restart the whole cycle by changing species (vegetation included), which need adapt to the modified environmental conditions (Pielke et al., 1998), with the direct or indirect associated impacts on the first triggers, the humans (Meyer et al., 1994;Rulli et al., 2020).
2 Evaluation data, WRF model and experiments design 2.1 Observational data for model evaluation The surface turbulent heat fluxes simulated by the WRF model were evaluated during a period of the BLLAST field campaign (Lothon et al., 2014). This campaign took place from 14 June to 8 July 2011 on the Plateau of Lannemezan (southern France). 110 Its main objective was to better understand the turbulence decay observed during the afternoon transition, and the extensive instrumentation deployment included several surface-energy-balance towers installed over different surfaces, which were representative of the vegetation within the explored area: prairies, forests, wheat, corn and moor. The analysed period comprised from 09:00 UTC to 15:00 UTC 19 June 2011, corresponding to part of IOP (intensive observation period) 2 of the campaign. This IOP was characterised by fair weather, no clouds, and a typical development of the boundary layer up to 800-1000 m 115 above ground level (agl). The IOP is representative of the general conditions of the rest of the IOPs of the campaign (the general meteorological conditions of the campaign can be found in Lothon et al. (2014)). The sensible and latent heat fluxes (SH and Le respectively) simulated by WRF were evaluated hourly and, in some cases, subsequently averaged. We focused on the model bias and root-mean-square error (RMSE), using as benchmark data the gridded AAF (area-averaged fluxes) calculated using the fluxes values observed over different vegetation types. of the instruments was set according to the vegetation height, which implied an homogeneous footprint for the five towers: 2 m for the moor, grass and wheat sites, 4 m for the corn site and 31 m for the forest site (approx. 6 m above the trees). These 125 measurements showed that the SH was more sensitive to LC type ( Fig. 1a and c) than Le ( Fig. 1b and d). This was probably linked to the different surface properties of the vegetation types (albedo, thermal inertia, emissivity), with more influence on the surface temperatures. The highest SH was measured over the forest site, followed by the wheat site, with peaks of around 400 and 270 Wm -2 respectively. The measurements over the grass, moor and corn sites showed lower values with maximum SH of around 150 Wm -2 during the central hours of the day. However, Le exhibited values that were similar for all the vegetation 130 types, with midday maxima around 300 Wm -2 . Nevertheless, the lowest Le was measured over the wheat field (this crop started to dry during the experiment), while the largest values were observed at the grass, forest and moor sites. Some differences were observed in the diurnal evolution of Le: the morning Le rise over the corn/wheat were delayed with respect to the measurements over other LCs, maybe linked to a delay in the transpiration processes associated with these crops (a full investigation of the reasons of it is out of the scope of this study). In any case, it should be highlighted that these fluxes values were representative 135 of those of the different IOPs of the BLLAST field campaign (Lothon et al., 2014;Couvreux et al., 2016), with no-water limitation due to the regular rain events observed in the area during the weeks before, which are typical conditions in these dates in the area.

Area-averaged fluxes (AAF) calculation
The EC measurements from towers were used for the calculation of AAF with resolution of 1 km in a region of 19x19 km 140 around the central site of the field campaign: 43 • 07 27.15 N, 0 • 21 45.33 E; 600 m above sea level (asl). This approach can be considered as a mosaic observational method for flux computation. It was based on the multiplication of the values of the fluxes measured over each vegetation type by their respective vegetation-cover fraction within each 1x1 km pixel (see Fig. 2a, b). Finally, the contribution from each vegetation type was summed to provide the final values in each grid cell (area-averaged fluxes). High-resolution and realistic LC data were needed to construct the AAF, obtained from the 30-m LC database prepared 145 by CESBIO, based on Landsat-5 data (Inglada et al., 2017) (Fig. 2b). Hence, the main assumption of the AAF calculation was to consider the same SH and Le for sites with the same LC type at different locations, implying also the following simplifications (further discussion is included after the list): 1. There were no flux differences depending on the soil type.
2. There was no difference in activity for the same vegetation type in the area. 3. There was no horizontal variability of the soil moisture (SM).
4. The radiation and the wind forcing was the same for all the pixels, without altitude influences.
6. Fluxes over urban and bare-soil surfaces were estimated using a simple model.
(1) Two soil types were the dominant ones in the evaluation area, based on the 1x1 km pixels from the soil database of 155 WRF (United States Department of Agriculture, USDA): clay loam (81% of the area) and loam (19% of the area). Since the dominant soil type in the pixels where the measurements were taken was clay loam, those pixels with loam were not included in the evaluation of the model. Therefore, the limitation based on the soil type was ultimately not a problem. This area is shown in Fig. 2d with a black rectangle.
(2) The area analysed is relatively small to expect significant differences between the natural vegetation belonging to the 160 same category. Although the limitation based on the possible different vegetation state can be more important in crop areas, seeding dates are normally coincident between fields in the area.
(3) Regarding the effect of possible SM differences within the area, two aspects should be noted. On the one hand, the potential SM horizontal variability due to possible inhomogeneous precedent rainfall over the area was not taken into account.
However, the SM input in the model is provided with a coarse resolution of 1 • and does not show any small-scale details.

165
This is a well-known limitation of mesoscale models which is sometimes addressed through the assimilation of SM data from satellites or with previous long simulations that serve to spin-up the surface in order to obtain the appropriate SM initial values (De Rosnay et al., 2013;Angevine et al., 2014;Santanello et al., 2016). In our case, this limitation of the mesoscale modelling is an advantage because it allows to perform a fairer model-observation comparison since this limitation also exist in the AAF.
On the other hand, part of the SM horizontal variability is caused by the different properties of the vegetation patches, 170 affecting runoff, infiltration, evapotranspiration, etc., and, finally, the SM content associated with each vegetation type. This LC effect on SM may be implicitly included in the fluxes that were measured over each individual vegetation type. In any case, we investigated the effect of including the SM horizontal heterogeneity with a higher resolution in the model initialization (not shown). This was done by using high-resolution SM satellite data from the Disaggregation based on Physical And Theoretical scale Change (DISPATCH) (Merlin et al., 2013;Molero et al., 2016) product at 1 km of resolution, but conserving the original 175 range of SM variability within the area evaluated. The effect of including this SM spatial heterogeneity was minimum in the fluxes simulations (in part due to the conservation of the range of SM values). Even if dedicated studies could be designed to investigate the effect of initialising the model varying the range of SM values in the area, these studies deserve a fully dedicated effort and are out of the scope of this paper.
(4) The limitation due to the radiation and wind forcing, which were assumed as equal in all the pixels, is expected to have 180 a small impact, due to the fair weather, with no clouds, and light wind conditions on the 19 of June (Lothon et al., 2014).
However, some differences could exist between some pixels situated at different altitude over the area (most of the pixels were in the range of altitude between 400 and 650 m asl, with minimum and maximum altitude of 282 m and 696 m asl in the whole evaluated area).
(5) The limitation of considering all forests as coniferous was due to the fact that the campaign only included measurements 185 over this type of trees, and not over broadleaf deciduous forests (the other forest category in the area). Therefore, the measurements taken over conifers were extrapolated to those areas covered by deciduous for the calculation of the AAF. This limitation was addressed by converting all the forest in the WRF model to conifer, which allowed a fairer model-observations comparison at the expenses of loosing the analysis over one LC category (deciduous forest).
(6) For the urban surfaces, SH and Le were calculated following a simple Penman-Monteith type model (De Bruin and 190 Holtslag, 1982) shown in Eq. (1) and Eq.
(2), respectively: where R n (Eq. 3) is the net radiation, defined as: 195 and G (Eq. 4) is the ground heat flux, considered a fixed fraction of R n : In these equations, SW ↓ and LW ↓ are the measured downward short-wave and long-wave radiative fluxes, respectively.
Despite the high-quality data and the efforts carried out to reduce uncertainties in the AAF, the evaluation of models with 205 surface observations is a necessary task which is not straightforward. The observational measurements are almost always linked to uncertainties that can add limitations to the evaluation process (Bou-Zeid et al., 2020). In our case, the measurements were taken at heights that implied homogeneous footprints, but even in this case, other uncertainties are always present (Mauder et al., 2020) the south of Site 1, were also used.  The inner domain covered an area of 120x120 km 2 , but only an area of 19x19 km 2 around the central point was evaluated, which had the same grid as the AAF used for the evaluation, i.e., the centre of the central pixel was located at the same 225 location as the centre of the AAF used for the evaluation. More details about the WRF technical configuration common for all experiments are included in Table 2. Surface-layer scheme MM5 similarity (Jiménez et al., 2012) Land-surface models Noah / Noah-MP / CLM4 / RUC Microphysics scheme WRF Single-Moment 3-class (Hong et al., 2004) Long-wave radiation scheme Rapid radiative transfer model (RRTM, Mlawer et al. (1997)) Short-wave radiation scheme Dudhia (Dudhia, 1989) Number of vertical levels 40 In order to add robustness to the study, four different LSMs available in WRF were analysed. The information below is mostly extracted from the literature: 1. Noah (Chen and Dudhia, 2001). Noah is a widely used LSM resulting from the collaboration among many different institutions. It is the default option in WRF and it is used in many other models, with an important application in operational models from the National Centers for Environmental Prediction (NCEP). The model considers four soil layers, where it computes temperature and soil moisture. It takes into account the type of vegetation (LC category), monthly vegetation fraction and soil type to calculate the runoff, ET and root zone. Since the WRF v.3.6, there is a 235 mosaic option available in the model (Li et al., 2013) to deal with the sub-grid heterogeneity (this option is not activated by default).
2. Noah-MP (Noah Multi-physics) (Niu et al., 2011). It is an extension of the Noah LSM that allows the use of multiple options for land-atmosphere processes (e.g., infiltration, run-off, etc.), resulting in a total of more than 4000 combinations (the default options are used in this work). This LSM contains a separate vegetation canopy with a two-stream radiation 240 transfer approach, shading effects and complex physics for the snow/ice processes within the soil. This LSM uses a different set of parameters for each vegetation category than Noah, with more vegetation-dependent parameters.
3. CLM4 (The Community Land Model v.4) (Oleson et al., 2010). It is a sophisticated LSM including state-of-the-art scientific knowledge about soil-vegetation-atmosphere processes. It first divides the simulation into five cover types (glaciers, lake, wetland, urban and vegetation). The vegetation is subsequently divided into four different plant functional 245 types (PFTs) that are defined based on the LC categories. It should be noted that this LSM was computationally the most expensive among the four analysed in this work.
4. RUC (The Rapid Update Cycle) (Smirnova et al., 2016). It uses 9 soil layers with higher density close to the surface. It has a complex treatment of snow processes. In the warm season, it corrects soil moisture in cropland areas to compensate for irrigation. This model also allows a mosaic approach for the sub-grid treatment of the cell heterogeneity, but it is 250 different than in Noah, with albedo values that correspond only to those parameters associated with the dominant LC category. The vegetation parameters used are obtained from the same look-up table than in Noah.

Experiments design
We performed a set of four different modelling experiments aimed at checking the sensitivity of the model to different changes in the representation of the surface. These experiments are summarised in Table 3 and explained below: 255 Table 3. Summary of the simulations and the names used along the article. Note how some experiments were not possible (-) for some LSMs. In this experiment we used the default configuration for the surface representation in WRF, i.e., the LC dataset from IGBP-MODIS (21 categories) and a dominant approach for the sub-grid heterogeneity, which means that the fluxes were calculated taking into account only the surface parameters of the dominant LC category, i.e., assuming that no sub-grid variability exists even when this information is available. The dominant LC category is the one with the highest percentage of coverage within 260 each pixel. This method has an intrinsic dependency on the number of LC categories of the dataset. For example, a pixel with 40% of water, 30% of conifer forest and 30% of deciduous forest will be treated as water, since it is the dominant category despite both types of forest cover 60% of the total surface of the grid. On the contrary, the dominant category would be forest if both types of forest were merged into a single category.
It is expected that these simulations will be limited by the fact that the representation of the LC in the area by IGBP-MODIS 265 is not totally correct in comparison with the reality (shown later). Besides, the dominant approach implies that the model does not take advantage of the higher resolution of the LC datasets.

NEW-LC experiment
In this experiment, we used a more realistic LC dataset than IGBP-MODIS, obtained from 30-m resolution data prepared by CESBIO (Inglada et al., 2017). In order to incorporate the more realistic LC from CESBIO in the WRF model, first, the different categories of the new LC dataset (CESBIO, 17 categories) were transformed into the most appropriate ones of the WRF default LC dataset (IGBP-275 considerations: -The two types of forest distinguished by CESBIO (conifer and deciduous) were transformed to conifer (evergreen needleleaf forest, ENF), since the EC measurements used in the AAF were taken over this type of forest due to the lack of 280 measurements over deciduous broadleaf forest (DBF). As commented before, the measurements taken over conifers were extrapolated to the areas with deciduous forests in the AAF calculation. By converting all the forest to conifer also in WRF, the observations-model comparison was fairer.
-The two possible types of crop in CESBIO (winter and summer crop type, i.e., wheat and corn respectively) were transformed to the single cropland category available in IGPB-MODIS. Hence, we did not take advantage of the differ-285 entiation of crop type by CESBIO, but it is important to note that wheat surfaces only covered 2.8% of the analysed area ( Fig. 2b).
-The four different urban categories of CESBIO were transformed to the single urban definition of IGBP-MODIS. Most of the urban surfaces of the area evaluated were defined as diffuse urban (8.2%) and industrial zones (0.8%).
This new LC information was incorporated in the modelling system following the technical details included in Appendix B.

290
Note how even using a more realistic LC dataset over the area, the sub-grid information was not used because the dominant approach was maintained in this experiment.

MOSAIC experiment
The sub-grid heterogeneity is important in the case of the BLLAST area due to the small scale of the LC patches (see Fig.   2b). This makes the area very appropriate for investigating the use of a mosaic approach in the model, as done in the labelled 295 MOSAIC experiment. The mosaic approach implies that the flux is computed for each LC category (tiles) within each grid cell based on the surface parameters tabulated for each one; and then averaged taking into account the percentage of coverage of each tile. That is, taking into account the sub-grid surface heterogeneity. In this context, Noah (see Li et al., 2013, for a complete description of the Noah mosaic approach) and RUC allow the possibility of using this approach in the WRF modelling system. The LC improvements of the NEW-LC experiment were also included here. More technical details about the model 300 implementation of this experiment are included in Appendix B.

FOREST experiment
The last experiment was motivated by the large surface fluxes errors found over those pixels covered by conifer (ENF) in previous simulations using Noah-MP and CLM4. Specifically, these LSMs overestimated the SH and underestimated the Le.
We hypothesised that this was caused by a parametrized resistance to transpiration that was to high. Hence, we designed the 305 FOREST experiment for Noah-MP. We modified three parameters for ENF, with the values used in Bonan et al. (2014): 1. The slope of the Ball Berry conductance (the inverse of the stomatal resistance to transpiration) equation, the so-called MP in Noah-MP and usually known as g 1 (the slope parameter). The Ball-Berry equation (Ball et al., 1987) linearly relates the stomatal conductance to the CO 2 assimilation rate: In this equation, the slope (g 1 ) represents the sensitivity of the stomatal conductance to assimilation, CO 2 concentration, humidity and temperature. It is the parameter that most affects the plant's transpiration (Cuntz et al., 2016), which increases with larger slope values. The g 0 parameter is the minimum conductance, h s and c s are the fractional relative humidity and the CO 2 concentration at the leaf surface, respectively, and A n is the net leaf CO 2 assimilation.
MP has a default tabulated value of 6 for ENF for Noah-MP and for CLM4 (Oleson et al., 2010), a value that is signif-

315
icantly different compared to the values assigned for other categories (9 for most of them, including, for example, the broadleaf evergreen forest (BDF)). The lower value for ENF limits the transpiration processes and could be the reason for part of the Le underestimation, leading to more energy being distributed to SH. This parameter was optimised in Bonan et al. (2014) and a value of 9 was also used for ENF in their study.
2. The minimum leaf conductance or the interception in the Ball-Berry equation (g 0 , indicated as BP in the Noah-MP 320 scheme). This parameter was changed from 0.002 to 0.01 mol H 2 O m -2 s -1 , as in Bonan et al. (2014) for ENF.
3. The maximum carboxylation rate at 25 ºC (V cmax25 ), a photosynthetic parameter that was changed from 50 to 62.5 µmol m -2 s -1 , as used in Bonan et al. (2014), which value was also selected according to previous literature. The range of observational values in their work for ENF ranged from 48 to 72 µmol m -2 s -1 and a value of 62.5 µmol m -2 s -1 was finally selected.

325
Hence, these three parameters were modified in Noah-MP according to Bonan et al. (2014), following the technical details included in Appendix B. Indeed, these changes allowed for more evapotranspiration in the ENF forest, which should improve the evaluation in our case study. It should be noted that the g 1 parameter (MP) was the one which most influenced the results, as indicated in Cuntz et al. (2016) and as observed in additional previous experiments performed in our case (not shown). In any case, the objective of this experiment was to demonstrate the high impact of the associated vegetation parameters on the 330 surface fluxes, not the optimization of these values for the specific tree species present in the area here analysed.
3 Results: Turbulent fluxes sensitivity to surface changes The model skill simulating the AAF fluxes is quantified in this section for the different experiments listed in Table 3. The results are shown and commented individually for each experiment. Since some of the results and the associated discussion differed significantly depending on the LSM used, the results were subdivided for each LSM. As a general reference, Table 4 335 shows a summary of the total RMSE obtained for SH and Le using the different LSMs and for all the experiments performed.  . 3a). For the Le, the pixels covered by SaW (Savanna Woody, which represent 33% of the area) are normally characterised by a remarkable Le underestimation, with biases values around -100 Wm -2 (Fig. 3d).
The definition of some of the LC categories existing in the DEFAULT experiment did not represent appropriately the real LC of the area (as seen in Fig. 1c), but they influenced notably the simulated model errors. This is the first indicator of the high 350 of the total area, respectively. The SH biases observed in the DEFAULT experiment were improved in the NEW-LC, especially for ENF and Gra, with slightly negative (close to zero) values. This led to a slightly negative SH bias in the whole area ( Fig.   3b, see horizontal red dashed line). For the Le bias (Fig. 3e), the opposite is observed, with a slight Le overestimation mainly caused by the Le overestimation over ENF (too much ET simulated by the model in this forest type). As it will be shown later, this result contrasts with the findings obtained with the rest of LSMs.

360
The bias is a good indicator of systematic errors for specific LC categories, but it is a poor indicator of the total model behaviour in the area (it can mix positive and negative values leading to neutral ones). Hence, the RMSE has been included in Fig. 3c and f, as an indicator of the total performance of the NEW-LC experiment. Note how in the case of the RMSE ( Fig. 3c and f), the results are shown by real LC categories, whose fractions of coverage are very similar to those used in the NEW-LC experiment (slightly different due to small differences in the categories, e.g., corn and wheat in the real LC were merged into 365 crop in the model).
The SH-RMSE (Fig. 3c) improved for the total area when using the improved LC dataset, from 56 Wm -2 (DEFAULT-Noah) to 44 Wm -2 (NEW-LC-Noah). The vertical bars of Fig. 3c indicate the RMSE for each real LC category (the LC categories used in the AAF). They provide information about the types of vegetation associated with larger errors, i.e., vegetation types whose processes or parameters are not well represented by the model. In this case, the SH improvements in the NEW-LC experiment 370 (red bars) are observed for all the pixels except for the only pixel where the dominant LC is wheat, associated with a larger SH flux than corn in the observations (see Fig. 2). In any case, wheat crops only covered 0.3% of the total area as dominant LC category (one pixel) and, therefore, the contribution to the total RMSE values was small. The results for Le-RMSE ( Fig. 3f) also show significant improvements when using the improved LC dataset, except for those pixels covered by urban land use, where the Le was notably underestimated (see Urb in Fig. 3e), but with a small impact in the total values due to their scarce 375 presence (2.4%) as dominant pixels. The Le improvement was more substantial in those pixels covered by grass than by forest, since Le is slightly overestimated in the forest pixels (see ENF in Fig. 3e). Notice how besides the better LC pixels distribution in the NEW-LC experiment, we also avoided using some LC categories associated with large errors in the DEFAULT-Noah experiment, such as the unrealistic SaW pixels related to large Le underestimation (Fig. 3d).

Noah-MP
Most of the LC categories showed a positive SH bias for DEFAULT-Noah-MP (Fig. 4a) and a slight negative Le bias (Fig. 4d), especially for those pixels which dominant LC was some of the different types of forest (ENF, WiF, DBF, MixF or SaW). This led to a large SH error and to the smallest error amongst the LSMs compared for Le (despite the slight underestimation), with RMSE values of 96 Wm -2 and 46 Wm -2 , respectively (see Fig. 4c and f and also Table 4 for the RMSE comparison among the four LSMs used).

385
Contrary to what happened with Noah, the results were, in general (all domain), worse for the NEW-LC experiment, even for the case of SH where the RMSE was already high. The SH bias remained positive, mainly influenced by the significantly high SH bias in ENF pixels (more than +150 Wm -2 ), while the bias over the grass and crop pixels were close to zero ( Figure   4b). For the Le, important negative biases were observed in those pixels mostly covered by ENF (around -120 Wm -2 ), but with lower errors for grass or crop pixels. The urban pixels were characterised by important positive SH biases and negative Le 390 biases, leading to RMSE of more than 150 Wm -2 (see red bar for urban in Fig. 4 c and f) for both flux. In any case, the small proportion of pixels with urban as dominant LC category led to a small contribution to the total error.
In contrast, since 45% of the total area was covered by ENF, the final RMSE values of NEW-LC-Noah-MP (horizontal red dashed line in Fig. 4c) are significantly higher than those of DEFAULT-Noah-MP (shown in blue), increasing the total SH-RMSE up to 116 Wm -2 and the Le-RMSE up to 90 W m -2 , with a significant worsening for the forest and urban pixels.

395
However, the bars in Fig. 4c and f also illustrate the improvement in the crop, moor and grass pixels for SH and grass for Le, but they were not enough to compensate for the large errors over ENF pixels. This was not observed in the Noah experiment

CLM4
The total area-averaged biases for SH and Le were close to 0 Wm -2 for the DEFAULT-CLM4 experiment ( Fig. 5a and d). The pixels covered by forest in the model tended to overestimate the SH, while those covered by grass, savanna and shrub slightly underestimated it (Fig. 5a). For the Le, there is a general tendency towards slightly underestimated ET processes in CLM4 in comparison with the real observations, except for the WiF (wild forest) and the Wet (wetlands) pixels (Fig. 5d) that were 405 present in the default IGBP-MODIS database. The total RMSE by DEFAULT-CLM4 was 69 Wm -2 and 55 Wm -2 for SH and Le, respectively (see horizontal dashed blue lines in Fig. 5 c and f, and Table 4).
The NEW-LC-CLM4 simulation ( Fig. 5b and e) produced larger biases than the default one, mainly caused by the SH overestimation and the Le underestimation in the pixels covered by ENF (evergreen needleleaf forest), around +100 Wm -2 and -100 Wm -2 , respectively. The biases observed over grass and crop pixels were close to 0 Wm -2 while some issues were observed 410 in those pixels mostly covered by urban, which showed, surprisingly, a substantial underestimation in SH and overestimation in Le of the order of 200-250 Wm -2 , but with a small effect in the total values due to the low number of pixels with urban as dominant LC.
The large SH overestimation and Le underestimation over those pixels covered by forest (ENF) affected the total performance of the model, worsening the SH results up to a SH-RMSE of 91 Wm -2 and 82 Wm -2 for the Le-RMSE. However, as observed 415 in the bars of Fig. 5c and f, most of the LC categories showed SH and Le improvements or almost no-changes in the NEW-LC experiment, except the forest and the urban pixels, whose tabulated parameters seem to be quite extreme, leading to significant errors in the simulation of the fluxes.
These results coincided with those obtained with Noah-MP for the case of ENF grid cells. Indeed, both LSMs employ the Ball-Berry equation to deal with the stomatal resistance of the plant, and both LSMs use a tabulated value of 6 for the slope of 420 this equation (MP or g 1 , see Section 2.3.4) for the ENF LC category, while it is 9 for almost all of the rest of the LC categories.
Hence, these two LSMs significantly overestimate SH and underestimate Le over the conifer trees in this case study.

RUC
The SH and Le biases obtained when using the DEFAULT-RUC simulation are the highest ones among the compared LSMs (see also RMSE in comparative Table 4), with SH-RMSE of 97 Wm -2 and Le-RMSE of 81 Wm -2 . There is a systematic 425 SH overestimation for all the LC categories (Fig. 6a) for DEFAULT-RUC, which is especially aggravated by those pixels unrealistically represented by WiF (Wild forest), with a bias of more than 100 Wm -2 which covered an important fraction of the total analysed area (27%). The Le is systematically underestimated (Fig. 6d) for all the LC categories, which shows a tendency towards too little ET in this LSM in this studied case, especially in those LC categories with shorter vegetation (SaW, Sav and Gra).

430
These biases were not corrected in the NEW-LC-RUC experiment with the more realistic LC (SH-RMSE of 90 Wm -2 and Le-RMSE of 85 Wm -2 ). The values obtained for each LC category ( Fig. 6b and e) were within the same order of magnitude than those obtained in the DEFAULT-RUC experiment, or even higher, as it is the case of the SH simulated in the grass surfaces, with a remarkable overestimation of around +75 Wm -2 . This led to very similar (and high) RMSE values for the DEFAULT and NEW-LC experiments ( Fig. 6c and f). Although the SH simulation improved for the grass and corn pixels, it worsened for 435 the forest pixels. For the Le, the contrary was observed: the biases of the forest and crop pixels improved but worsened for the grass ones, which were associated with too low ET. Note how the RMSE in the NEW-LC experiment over the grass surfaces was higher than 100 Wm -2 in RUC while they were around 40 Wm -2 in the rest of the analysed LSMs, where normally the grass surfaces were associated with improvements. Although it is not shown in this work, we detected a higher sensitivity of this LSM to the soil type in comparison with the other LSMs. Thus, the parameters associated with clay loam could be inappropriate for this LSM, leading to a wrong partitioning of the net radiation in the model (too low Le and too high SH). In contrast, the sensitivity of RUC to the LC categories was the lowest one among the compared LSMs. In any case, the analysis of the relative contribution of the soil type on the surface fluxes is out of the scope of this study.  the more the percentage of forest, the higher the SH, since forests were associated with the highest values of observed SH.
The opposite is observed for the grass pixels, with SH diminishing for increasing percentages since grass EC measurements 460 showed the lowest SH values. These dependencies are slightly observed for the Le due to the similar Le values measured over all the surface types with the EC towers (see Fig. 1b and d). Coming back to Fig. 7, one can deduce that Noah is the LSM that provides the most realistic values in comparison with the AAF from a visual comparison, especially for Le.
The other LC category associated with large biases was the urban one. In this case, the effect on the RMSE of the total area was minimum, since only the 2.4% of the pixels have urban as dominant LC category. It should be also noted that the fluxes 465 over the urban surfaces for the AAF computation (used as benchmark) were approximated with a simple model (see Section 2.1.2), due to a lack of measurements. Hence, a fair discussion about the absolute values of the biases found over this type of LC was not possible since relatively high uncertainty also exists in the values used to evaluate it. However, the fluxes simulated by the LSMs were also too extreme, with values close to 0 Wm -2 for Le in the case of Noah and Noah-MP (see grey circles in Fig. 7f and g), and with surprisingly low SH and high Le for the case of CLM4 ( Fig. 7c and h).

470
Furthermore, the dependence of the AAF with the percentage of coverage of the dominant LC in each pixel is well observed in Fig. 7e and j, which is inherent to the methodology used in the AAF calculation. However, this dependence is hardly seen from the model outputs (as expected using the dominant approach, note the small slope of the scatter plots for each coloured category). As commented before, it should be noted that Noah, Noah-MP and RUC use a dominant approach for each grid cell, meaning that the LSM uses some surface information (LAI, roughness length, or albedo) only from the dominant LC (Li 475 et al., 2013). This led to well-differentiated fluxes values for each category (Fig. 7). The case of CLM4 is slightly different since it uses a 5-class sub-grid approach that separates the model grid surface into glaciers, lakes, wetland, urban and vegetated Hence, from the NEW-LC experiment, we can conclude that there is a high dependence of the fluxes on the LC type in WRF, which can lead to important biases if the parameters and processes associated with specific categories (conifer forest in this case) are not appropriate. These results agree with those found by Couvreux et al. (2016), where twelve IOPs of the same field campaign were simulated with ARPEGE (Courtier and Geleyn, 1988), AROME (Seity et al., 2011) and ECMWF models (Simmons et al., 1989), with grid sizes of 10 km, 2.5 km and 16 km, respectively. In this work, the SH was systematically 485 overestimated over the area analysed by the three models. Indeed, a larger SH overestimation was found for the ARPEGE model in two of the three evaluated pixels close to the area, which surface was represented as forest. This also highlights the issues found over this type of vegetation cover, which was enhanced with a dominant approach (as in WRF in this work and in AROME in Couvreux et al. (2016)). This simplified approach does not take advantage of the available sub-grid surface information and motivated the MOSAIC experiment, where the sub-grid heterogeneity was taken into account in the Noah and  (c and h) and RUC (d and i). The x-axis shows the total fraction covered by the dominant LC category in each respective pixel. The results are provided for the different LC categories with colours. Panels e and j represent the same but for the gridded area-averaged fluxes (AAF) calculated from the EC measurements, which are used for the model evaluation.

MOSAIC experiment
The changes in RMSE obtained after using the mosaic approach in Noah and RUC LSMs are shown in Table 4, with significant improvements in Le and especially for the case of Noah, with a reduction of almost 20 Wm -2 in the Le-RMSE (from 63 Wm -2 to 45 Wm -2 ). For the case of Noah, Fig. 8  increasing the percentage of cover of this LC category. Thus, if a pixel with 100% urban were present, its Le value would tend to 0 Wm -2 using a mosaic approach, as observed when using the dominant method in Fig. 8d.
However, this merging effect was only slightly observed for the case of the MOSAIC experiment in RUC (Fig. 9). This was probably caused by the fact that the mosaic approach used in RUC is not applied for the albedo values, contrary to the MOSAIC-Noah simulation. In the case of RUC, only average emissivity, LAI and roughness length are used based on the 510 percentage of each surface. Average albedo, which is the parameter that has the highest impact on the net radiation of each grid cell, subsequently affecting SH and Le, is not used Figure 10 shows the albedo differences between NEW-LC and MOSAIC used by the Noah (upper figures) and RUC models (bottom figures). While the values for MOSAIC-Noah (Fig. 10b) consisted of a weighted average from the different LC of each grid cell, it was not the case for MOSAIC-RUC (Fig. 10d), which diminished the impact of the mosaic approach application in 515 the RUC model.
It should be noted that adding a mosaic approach caused a change in the percentage technically used for some LC categories.
This is well observed in the comparison of the percentages shown in Fig. 1b  although in a much more diffuse way (concentrated in more pixels). However, the extreme effect of those pixels fully considered as by urban are removed. This is also the case for the crop pixels, that cover 9.9% of the area (wheat and corn) shown in Fig. 1b (taking into account the subgrid variability), but only 3.9% as dominant category in 1x1 km pixels (Fig. 1d).
In any case, the mosaic approach might be always more realistic than the dominant one; in the latter, the sub-grid information is not used and a unique LC type is defined, even when the combination of secondary and similar LC categories have a higher

FOREST experiment
The previous experiments revealed the large biases in simulated surface fluxes associated with conifer forests (ENF) for some LSMs (Noah-MP and CLM4). These models simulated too low values of Le and too high SH in comparison with the observed 535 values (Fig. 7). This was also the case for RUC ( Fig. 7d and i), but in this case the biases were not only limited to the ENF pixels but to all the categories.
Since the biases were only observed in ENF for Noah-MP and CLM4, we hypothesised that this was caused by a too high resistance to the transpiration parameterised for these type of trees. We explained in Section 2.3.4 the motivation for this experiment and we justified the changes applied, based on the parameter values used in Bonan et al. (2014) for ENF 540 forest. Specifically, we changed the values of the g 1 , g 0 and V cmax25 parameters. These modifications should lead to increased transpiration, and therefore, also to decreased SH. Indeed, this is observed in the results, the RMSE decreases to 77 Wm -2 for SH and to 56 Wm -2 for Le, which are lower errors compared to the NEW-LC-Noah-MP experiment (see Table 4). The results are consistent with the fluxes shown in Fig. 11, where the SH and the Le observed at the forest site (black line) were compared to the simulated values obtained with the central pixel of the domain completely (100%) covered by ENF (conifer, blue), or 545 by ENF with these parameters changed (red line). These two additional simulations shown in Fig. 11 would correspond to the NEW-LC Noah-MP and the FOREST Noah-MP experiments, respectively. However, in these graphics the effect of the change is analysed in a clearer way for the whole diurnal cycle. Again, it is demonstrated how the tuning of these parameters towards less resistance to transpiration provided better results than the original parameters tabulated for ENF, both for SH and Le. Specially, the slope of the Ball-Berry equation (g 1 ) is the parameter that most influenced the results, which does 550 not seem appropriate in this area for this type of trees, since it limits transpiration too much. As asserted in Medlyn et al. (2011), g 1 is the key parameter for plant stomatal conductance, being quite variable among species in areas with different environmental conditions. The authors suggested that g 1 should increase with increasing temperatures, as might be the case in our study, compared to the values tabulated for this LC category in the model. The results of the present work indicate that the tabulated values, which were developed for other region/dates or even with a different type of conifer or density of 555 trees, should be revised. This is consistent with the scientific demand for systematic experimental observations representative of different scales (Vilà-Guerau de Arellano et al., 2020), from the leaf-level (as stomatal conductance) to the landscape and model grid-scales (for example reliable area-averaged fluxes).

Analysis of the simulated evaporative fraction (EF)
The previous analyses provided information about the biases and the RMSE of SH and Le independently for the different 560 experiments and LSMs. However, these analyses did not take into account the total energy available to be partitioned into the atmospheric fluxes, which may be different depending on the surface type. Hence, in Fig. 12  In Noah (Fig. 12a), the DEFAULT experiment (blue) provided a large variation in the EF of each real LC category. This 570 resulted from the unrealistic and varied LC representation in this experiment. The NEW-LC experiment (red) partially corrected the large EF variability, especially for the grass and forest pixels. Therefore, in general, not only were the SH and Le biases corrected, but also the fluxes partitioning, except for the moor and urban pixels, with scarce presence. Besides, the MOSAIC experiment (in orange) served to improve the simulated values of EF, which were very close to those obtained from the AAF.
The DEFAULT experiment in Noah-MP (Fig. 12b) systematically underestimated the EF (blue) in comparison to the obser-575 vations (dark grey). The NEW-LC experiment (red) served to correct this underestimation in all LC categories except in the forest and urban grid cells. In the urban pixels, the Le was very close to 0 Wm -2 , leading to very low EF. As discussed before, the Le was significantly underestimated in the forest (ENF) pixels, while the SH was overestimated, which led to underestimation of EF. This was partially corrected in the FOREST experiment (represented with green in Fig. 12b) by applying modified parameters that made easier the transpiration from these plants. However, even with these changes, the EF values obtained 580 from the model were far from those from the observations.
The DEFAULT experiment in CLM4 (Fig. 12c in blue) showed EF values more similar to those observed. The NEW-LC experiment produced EF values even more similar to the observations (Fig. 12c in red), except for those pixels covered by forest or urban. For the forest, the NEW-LC had the same issue as Noah-MP (i.e., too low EF associated with too low Le and too high SH), with an important influence on the total RMSE, worsening the values (Table 4). For the urban grid cells, EF 585 values of 0.8 were observed, meaning that an important amount of the available energy was used for Le, which was unexpected and far from the AAF calculations.
RUC (Fig. 12d) exhibited the smallest sensitivity to the modifications performed in each experiment in comparison to the other LSMs. As commented before, the LC changes in RUC had a lower impact on the results, which where more impacted by the soil type (not shown) than by LC. Hence, RUC showed a general tendency to underestimate EF with systematic Le 590 underestimation and SH overestimation for all the LC categories, as it was previously observed. The results from the previous subsections revealed substantial impacts of LC on simulated fluxes and their partitioning, especially by some LSMs that simulated extreme values for specific LC categories. The impacts on the surface fluxes will also affect the associated atmospheric variables close to the surface, with a potential influence on the full development of the PBL 595 (Sühring et al., 2014). Hence, we think that the next step of this investigation will be to analyse the relative contribution of the changes in the surface representation of the model on the atmospheric variables within the whole PBL.
In this context, the evaluation of the simulated PBL height (zi) can be considered as a first indicator of the model ability to simulate the PBL structure. For our case study, Fig. 13  observations are those obtained from the AAF in each respective pixel (Site 1 and Site 2) at the times of the zi measurements from each instrument (slightly different, but within the time range from 11:00 UTC to 13:00 UTC, see Table 1). In general, the observations indicated a zi between 823 and 1100 m agl, depending on the instrument used to estimate this value. These values are within the range of variation of those simulated by WRF, which indicates a good performance of the model for the 4 LSMs used, especially if the comparison is done with the modelled values in the pixels where the two sites were located (red and blue 610 circles in Fig. 13 for Site 1 and Site 2, respectively).
For the case of Noah LSM (Fig. 13a), the simulation of zi did not show a clear dependence of zi on EF or on the LC type, fitting well with the observed values for most of the surfaces, with zi values ranging within the same range of variability observed from different instruments. This was also the case for the values simulated by Noah-MP (Fig. 13b) and CLM4 ( Fig.   13c) over the grass pixels. However, these LSMs also showed a higher variability in the simulated zi depending on EF and 615 on the type of LC. Conifer surfaces, which were associated with EF underestimation due to SH overestimation, systematically simulated higher values of zi. The same was observed for the urban pixels in Noah-MP (with extremely low EF values), while CLM4 simulated the lowest zi values over the urban surfaces, due to the very high simulated EF (low SH). For the case of RUC ( Fig. 13d), the dependence of zi on EF or LC category was smaller, but this LSM tended to provide slightly higher values of zi, also associated with lower values of EF (SH was systematically overestimated and Le underestimated).

620
The results shown in Fig. 13 were those obtained from the NEW-LC experiment, which are summarised in Table 5    Hence, the effect of applying a MOSAIC approach would be expected to have a larger impact on these LSMs with a larger range of fluxes and zi variability. From a general analysis, the MOSAIC-Noah experiment was the one with zi values closer to the observations (zi/zi obs = 1.02), which coincides with the general evaluation done for the surface fluxes (Table 4). These results indicate how the important effects of the LC type on the surface fluxes in the case of some LSMs (and especially for some LC categories) are transferred to the top of the PBL, affecting zi even from an analysis of this variable at a resolution 635 of 1x1 km. Thus, we have illustrated the impacts that the surface representation have on the development of the PBL in the model, which encourages further future scientific analysis on this topic. Specifically, it would be interesting to analyse how the different scales of the heterogeneous surface patches impact the PBL development in the model (Sühring et al., 2014). That is, future work will try to answer the questions about how the scale, the type (LC categories) and distribution of the heterogeneous patches of the surface affect the PBL development, as well as how these changes differ depending on the land-surface and PBL 640 schemes of the model. However, this analysis deserves a fully dedicated strategy disentangling the surface effects from those from advection, subsiding meso or synoptical motions (e.g., Pietersen et al., 2015) and entertainment processes at the top of the PBL (e.g., Garcia-Carreras et al., 2015), as well as extensive zi measurements at different points of the area analysed will be needed for the evaluation. This future research will help to understand how the surface forcing affects the PBL and to what extent the processes reproduced in the model differ from those observed in the reality.
The changes in the LC of the Earth's surface trigger varied and sometimes unpredictable consequences at different spatiotemporal scales, affecting biophysical processes in the soil and in the atmosphere. Hence, it is crucial to know more about the impacts of the LC changes on all these processes. In this work, we investigated the sensitivity of turbulent heat fluxes simulated by the WRF model to the manner in which the surface is represented in it.

650
To this aim, different sensitivity experiments were performed for a case study over an heterogeneous area in the south of France. They were evaluated with gridded area-averaged fluxes (AAF), computed from tower measurements installed over five vegetation types (forest, corn, wheat, grass and moor) during the BLLAST field campaign. In order to add robustness to the study and to detect differences, the experiments were carried out using four LSMs available in WRF: Noah, Noah-MP, CLM4 and RUC.

655
First, a control experiment was performed with the default options in WRF: LC from the IGBP-MODIS database and a dominant sub-grid approach, i.e., the model used the tabulated surface parameters of the LC category with the highest percentage of coverage in each pixel. We hypothesised that these simulations were limited because of the large differences between the LC representation and the actual surface, and because of the loosing of LC information at the sub-grid scale.
Thus, a new experiment was designed (named NEW-LC), which improved the surface representation by adding the LC 660 information from the 30-m resolution CESBIO maps, which were much more accurate and realistic in the area than the default IGBP-MODIS dataset. The observed changes in the surface fluxes were dependent on the LSM used, due to their differences in the parameters associated with each vegetation type, and also to their different representation of the surface processes. The improvement was clear for Noah for all the LC categories. RUC was the LSM that showed the weakest response of the fluxes to the LC categories, without substantial changes in the scores. Noah-MP and CLM4 showed some improvements in those 665 pixels covered by crops or grass, but they also exhibited an important SH overestimation and Le underestimation in the surface fluxes simulated over those pixels mainly covered by conifer forest (ENF). The ENF biases contributed significantly to the total model error due to their relatively high percentage of coverage (45%) in the analysed area. The NEW-LC experiment revealed the need for a correct representation of LC in the analysed area, in part due to the high dependency of the fluxes on the LC categories. In addition, the appropriate characterization of surface parameters associated with some LC categories (e.g.,

670
conifer) still needs to be improved, as it was also discussed in previous works (e.g., Li et al., 2013;Cuntz et al., 2016).
In the second experiment (named MOSAIC), the sub-grid heterogeneity (below 1 km) was taken into account with a mosaic approach in Noah and RUC, meaning that the fluxes in each grid were calculated as weighted averages from individual surface fluxes obtained from each tile or LC category. The mosaic approach caused more homogeneity among the surface fluxes simulated in the analysed pixels, which corresponded better to the AAF used as benchmark data. This improvement in the 675 surface representation led to improved scores for Noah (especially for Le) while smaller changes were observed in RUC. This was because in RUC the mosaic approach did not include the use of pixel-averaged albedo values based on the percentages of each LC category, as done for the surface roughness, LAI and emissivity. In Noah, the albedo was also averaged, significantly contributing to the improvements, since the albedo is the parameter that seems to have a larger impact on the net radiation available to be partitioned into SH and Le.

680
Finally, a last experiment (named FOREST) was motivated by the issues found in the conifer pixels for some LSMs (especially Noah-MP and CLM4). The modifications in the FOREST experiment were conducted to reduce the resistance of conifer trees to transpiration, using updated parameters as used in Bonan et al. (2014). The effect of these changes was to facilitate the transpiration processes, coinciding better with the observations and improving the scores.
This work demonstrates again the importance of a correct representation of LC in the area which is evaluated, as it was also 685 shown in previous works (Cheng et al., 2013;Schicker et al., 2016;Jiménez-Esteve et al., 2018). This can considerably affect the simulation of the fluxes that will drive the associated boundary-layer processes, as discussed in the last section, where the study of the impacts on the PBL is encouraged as future research. Besides, it is worth using a mosaic approach to benefit from the sub-grid surface information that is normally available.
In any case, the particular conditions of the region analysed make that some specific conclusions might not be applicable to 690 other regions. For example, it is also possible that in our case we observed more ET over the conifer trees than in the model due to the possible particularities of the area. Hence, the parameters adjusted in the model for the conifers could be due to the differences among species belonging to the same LC category (Granier et al., 1989), tree density, tree age (Sellin, 2001) The possible uncertainties in the EC measurements used to evaluate the model should be also taken into account, especially over those vegetation types where it is, somehow, more difficult to have accurate or representative high-frequency measurements, as in the case of the forest. Furthermore, the calculation of the AAF consists on important assumptions based on the spa-700 tial extrapolation of EC data that can add uncertainty to the data used for the evaluation. In our case, they were also constructed using a simple estimation of the fluxes for urban surfaces, due to the lack of measurements. This could be also associated with errors (although with a small percentage of coverage in the analysed area). All these necessary simplifications that were done highlight again the importance of having extensive measurements over a wide variety of surface types (Cuxart and Boone, 2020) and including atmospheric, soil measurements, and those related to the plant physiology and status, even with field 705 strategies more ambitious than the BLLAST campaign, which already included a quite important deployment of instruments.
This will help to improve the evaluation process of models, a needed step to continue advancing in their development. Table A1 shows the LC transformation performed in the NEW-LC experiment.

NEW-LC experiment
Two variables were modified in the geo_em_d04.nc file (the fourth-domain output from the geogrid.exe program of the WRF preprocessing system (WPS)). On the one hand, the LANDUSEF variable was modified, including the new percentages for each LC category in all the 1-km grid cells of the fourth domain. The same was done for the rest of the domains, but only in the area covered by the inner domain. On the other hand, the dominant LC category in each pixel was calculated based on the 715 new information, which served to modify the LU_INDEX variable of the same files than before.
These files with the modified LANDUSEF and LU_INDEX variables were re-incorporated to the WPS system and the rest of the preprocessing programs were executed, i.e., ungrib.exe and metgrid.exe, obtaining the final met_em files used to run the model. Note how in order to use these modified files in the model simulations, the surface_input_source parameter in the WRF namelist.input file was set to 3.

MOSAIC experiment
To activate the mosaic approach, some options should be included in the namelist.input file of WRF. In the case of Noah, the sf_surface_mosaic option should be set to 1. Besides, the mosaic_cat option indicates the maximum number of tiles to be used, which was set to 19, the maximum possible in our area. In the case of RUC, the mosaic_lu and mosaic_soil should be included and set to 1.

FOREST experiment
The three parameters modified in the FOREST experiment were changed in the MPTABLE.TBL file of WRF. This file contains the vegetation parameters tabulated of the different LC categories for the two LC datasets available in WRF. Since our simulations used the IGBP-MODIS LC dataset, we changed these parameters in its corresponding section within the file.
Specifically, we modified MP from its original value (6) to 9, BP from 0.002 to 0.01 mol H 2 O m -2 s -1 , and V cmax25 from 50 to 730 62.5 µmol m -2 s -1 for the LC type 1 (ENF). This was done before running the WRF simulation with the Noah-MP LSM. Model Global Tropospheric Analyses data, continuing from July 1999 (NCEP, 2000), which were used to initialise the WRF model. Finally,