Land surface model influence on the simulated climatologies of temperature and precipitation extremes in the WRF v3.9 model over North America

The representation and projection of extreme temperature and precipitation events in regional and global climate models are of major importance for the study of climate change impacts. However, state-of-the-art global and regional climate model simulations yield a broad inter-model range of intensity, duration and frequency of these extremes. Here, we present a modeling experiment using the Weather Research and Forecasting (WRF) model to determine the influence of the land surface model (LSM) component on uncertainties associated with extreme events. First, we analyze land–atmosphere interactions within four simulations performed by the WRF model from 1980 to 2012 over North America, using three different LSMs. Results show LSMdependent differences at regional scales in the frequency of occurrence of events when surface conditions are altered by atmospheric forcing or land processes. The inter-model range of extreme statistics across the WRF simulations is large, particularly for indices related to the intensity and duration of temperature and precipitation extremes. Our results show that the WRF simulation of the climatology of heat extremes can be 5 C warmer and 6 d longer depending on the employed LSM component, and similarly for cold extremes and heavy precipitation events. Areas showing large uncertainty in WRF-simulated extreme events are also identified in a model ensemble from three different regional climate model (RCM) simulations participating in the Coordinated Regional Climate Downscaling Experiment (CORDEX) project, revealing the implications of these results for other model ensembles. Thus, studies based on multi-model ensembles and reanalyses should include a variety of LSM configurations to account for the uncertainty arising from this model component or to test the performance of the selected LSM component before running the whole simulation. This study illustrates the importance of the LSM choice in climate simulations, supporting the development of new modeling studies using different LSM components to understand inter-model differences in simulating extreme temperature and precipitation events, which in turn will help to reduce uncertainties in climate model projections.

Abstract. The representation and projection of extreme temperature and precipitation events in regional and global climate models are of major importance for the study of climate change impacts. However, state-of-the-art global and regional climate model simulations yield a broad inter-model range of intensity, duration and frequency of these extremes. Here, we present a modeling experiment using the Weather Research and Forecasting (WRF) model to determine the influence of the land surface model (LSM) component on uncertainties associated with extreme events. First, we analyze land-atmosphere interactions within four simulations performed by the WRF model from 1980 to 2012 over North America, using three different LSMs. Results show LSMdependent differences at regional scales in the frequency of occurrence of events when surface conditions are altered by atmospheric forcing or land processes. The inter-model range of extreme statistics across the WRF simulations is large, particularly for indices related to the intensity and duration of temperature and precipitation extremes. Our results show that the WRF simulation of the climatology of heat extremes can be 5 • C warmer and 6 d longer depending on the employed LSM component, and similarly for cold extremes and heavy precipitation events. Areas showing large uncertainty in WRF-simulated extreme events are also identified in a model ensemble from three different regional climate model (RCM) simulations participating in the Coordinated Regional Climate Downscaling Experiment (CORDEX) project, revealing the implications of these results for other model ensembles. Thus, studies based on multi-model ensembles and reanalyses should include a variety of LSM configurations to account for the uncertainty arising from this model component or to test the performance of the selected LSM component before running the whole simulation. This study illustrates the importance of the LSM choice in climate simulations, supporting the development of new modeling studies using different LSM components to understand inter-model differences in simulating extreme temperature and precipitation events, which in turn will help to reduce uncertainties in climate model projections.

Introduction
General circulation models (GCMs) and regional climate models (RCMs) are currently the most useful tools for the study of processes affecting the frequency, duration and intensity of extreme temperature and precipitation events, as well as projecting their evolution under different emission scenarios at global, regional and local scales. Both observational data and climate model simulations confirm that all of these statistics respond to climate change (Seneviratne et al., Published by Copernicus Publications on behalf of the European Geosciences Union.  Orlowsky and Seneviratne, 2012;Jeong et al., 2016). However, state-of-the-art global and regional climate models differ substantially in their representation of the climatology and response to warming of various indices of temperature and precipitation extremes (Sillmann et al., 2013a,b). Climate information provided by models is currently employed by public and private institutions dedicated to the evaluation and management of risks from extreme events and associated disasters (IPCC, 2013(IPCC, , 2019. It is, therefore, essential that climate models represent extreme events and their evolution as realistically as possible to aid in the design of appropriate policies to mitigate climate change and build resilience. In this study, we analyze the representation of a set of extreme indices, previously included in international reports such as the IPCC (2013) and Seneviratne et al. (2012), as simulated by the Weather Research and Forecasting (WRF) model with different land surface model (LSM) components. We focused on the climatology of these extreme indices, that is, the mean of each index from 1980 to 2013.
Soil conditions are coupled to near-surface atmospheric phenomena through energy and water exchanges at the ground surface. The representation of the interactions between the land surface and the near-surface atmosphere has been identified as a key factor in the simulation of extreme events (e.g., Lorenz et al., 2016;Vogel et al., 2017). For example, changes in soil moisture and soil properties may lead to variations in energy fluxes at the land surface affecting temperature and precipitation evolution. Changes in latent heat flux affect surface temperatures in the following way: a decrease in latent heat flux likely means an increase in the energy available for sensible heat flux, which is directly related to the air-ground temperature gradient. The increase in sensible heat flux yields an increase in this temperature gradient, which may lead to changes in air temperatures (Seneviratne et al., 2010). Meanwhile, changes in latent heat flux also yield changes in the atmospheric water content, possibly affecting the formation of clouds and precipitation (Seneviratne et al., 2010). Previous observational studies have shown the impact of soil moisture deficits on hot extreme temperatures through changes in evapotranspiration over southeastern and western Europe and Russia (Hirschi et al., 2011;Miralles et al., 2012;Hauser et al., 2016). Additionally, soil moisture regimes have been found to alter the energy and water exchanges at the surface, influencing interannual summer temperature variability in central parts of North America  and precipitation events in western North America (Diro et al., 2014). Land-atmosphere interactions, and consequently near-surface conditions, are influenced by vegetation and snow cover (Stieglitz and Smerdon, 2007;Diro et al., 2018). For example, Diro et al. (2018) showed that interactions between snow cover and atmospheric processes influence extreme events, increasing the frequency of cold events over western North America and affecting the variability in warm events over northeast Canada and the Rocky Mountains.
Metrics built on the representation of land-atmosphere interactions have been employed as a basis for evaluating extreme temperature and precipitation events in climate model simulations (Knist et al., 2016;Davin et al., 2016;Lorenz et al., 2016;Sippel et al., 2017;Gevaert et al., 2018;García-García et al., 2019). For example, Lorenz et al. (2016) evaluated outputs from six GCMs participating in the Global Land-Atmosphere Coupling Experiment of the Coupled Model Intercomparison Project phase 5 (GLACE-CMIP5) and concluded that ranges of intensity, frequency and duration of extreme events among climate projections are strongly related to inter-model differences in the representation land-atmosphere interactions. Gevaert et al. (2018) evaluated the representation of land-atmosphere interactions within a set of offline LSM simulations, finding similar spatial patterns of soil moisture-temperature coupling among LSM simulations but large variability in the degree and local patterns of land-atmosphere coupling. García-García et al.
(2019) employed a simple metric derived from soil and air temperatures to evaluate outputs from the CMIP5 models against observations over North America, suggesting a strong dependence of the simulated land-atmosphere interactions on the LSM component employed. The model differences in the representation of land-atmosphere interactions shown in these studies may be affecting the simulation of extreme events and thus contributing to the uncertainty in multimodel ensembles such as those formed by the CMIP5 and the Coordinated Regional Climate Downscaling Experiment (CORDEX) simulations.
The choice and complexity of the LSM component may have implications for the representation of land-atmosphere interactions in reanalysis products, since reanalysis products have shown discrepancies in the representation of landatmosphere coupling with observations (Ferguson et al., 2012;García-García et al., 2019). However, in contrast to the variety of LSM components employed in the new generation of GCMs, reanalyses use simplified versions of LSM components, typically included as part of the atmospheric model component. For example, all reanalysis products produced by the European Centre for Medium-range Weather Forecasts (ECMWF) models (CERA-20C, ERA-15, ERA20C, ERA-Interim, ERA-40 and ERA5 products) employed different versions of the same LSM component included in the code of the ECMWF atmospheric model (Hersbach et al., 2018). The two Modern-Era Retrospective analysis for Research and Applications (MERRA and MERRA2) global products employed similar versions of the GEOS-5 catchment land surface model (Reichle et al., 2011;Molod et al., 2015). The Japanese Reanalysis (JRA) products employed a modified version of the Simple Biosphere (SiB) LSM (Onogi et al., 2007), while most of National Centers for Environmental Prediction (NCEP) and National Center for Atmospheric Research (NCAR) products employed the Noah LSM (Tewari et al., 2004). The complexity and variety of these LSM components are limited in order to reduce computa-tional costs, affecting the quality of the represented land surface processes. This has already been noted by the scientific community, and some have attempted to address the issue by incorporating updated versions of LSMs in new land reanalysis products through offline LSM simulations forced by observational data products (LDAS, MERRA-land, ERA-Iterim/Land, ERA5-land; Rodell et al., 2004;Reichle et al., 2011;Balsamo et al., 2015;Hersbach et al., 2018). Although these new products can be useful for LSM development and provide data about the soil states and fluxes (Balsamo et al., 2015), the offline character of the new land products inhibits the representation of land-atmosphere coupling and feedbacks.
Here, we perform a set of modeling experiments to examine for the first time the influence of the LSM component on the simulation of key extreme indices and land-atmosphere interactions within land-atmosphere coupled climate simulations at continental scales. For this purpose, four regional simulations are performed over North America  using the WRF model including three different LSM components widely employed in model simulations and reanalysis products, as described in Sect. 2. To explore the influence of the LSM component on the simulation of extreme events in multi-model ensembles, we compare the uncertainty in the representation of extreme indices within our four WRF simulations with the uncertainty in three simulations participating in the North American component of the CORDEX project (NA-CORDEX). The methodology for the analysis of landatmosphere interactions and the representation of extreme events is described in Sect. 3. Section 4 presents the examination of land-atmosphere interactions, the analysis of LSM differences in the representation of temperature and precipitation extremes, and the comparison between the WRF simulations and three CORDEX simulations. A discussion about previous results and the main conclusions and implications of this study are presented in Sects. 5 and 6, respectively.

Description of the modeling experiment
We performed four regional simulations over North America (NA) using the version 3.9 of the Advanced Research WRF (WRF-ARW) model (Michalakes et al., 2001) including three different land surface models: the Noah LSM (Noah; Tewari et al., 2004), the Noah LSM with multiparameterization options (Noah-MP; Niu et al., 2011) and the Community Land Model version 4 LSM (CLM4; Oleson et al., 2010). Vegetation cover was prescribed in these three simulations (Noah, Noah-MP and CLM4); an additional simulation was conducted with dynamic vegetation cover in the Noah-MP LSM (Noah-MP-DV), allowing for the evaluation of the influence of dynamic vegetation on extremes. The use of different LSM configurations in a RCM permits the study of the influence of surface and soil processes on the simu-lated climate system in contrast to LSM offline simulations (Laguë et al., 2019).
The LSM components employed have been previously included in climate model studies or in reanalysis products. The CLM4 LSM component has been coupled to several GCMs participating in the CMIP5 project (Collins et al., 2006;Vertenstein et al., 2012). The Noah LSM has been extensively used for reanalysis products, as well as for RCM simulations such as those participating in the CORDEX project (Mesinger et al., 2006;Katragkou et al., 2015). The Noah-MP LSM has been selected for current studies using WRF (e.g., Liu et al., 2017). The Noah LSM is a rather basic LSM developed by NCAR and NCEP based on the Oregon State University (OSU) LSM (Mitchell, 2005). This LSM component describes soils using four layers with thicknesses of 10, 30, 60 and 100 cm, using a zero-flux bottom boundary condition at a depth of 2 m. The Noah LSM estimates soil moisture and temperature at the node of each soil layer, taking into account snow cover, canopy moisture and soil ice. The Noah-MP LSM is based on the Noah LSM, introducing relevant improvements, such as a dynamic vegetation option; a new separated vegetation canopy cover that improves the computation of energy, water and carbon fluxes at the surface; a separate scheme for computing energy fluxes over vegetated surfaces and bare soils; a new three-layer snow model; a more permeable frozen soil; and an improved description of runoff and soil moisture. Although the Noah-MP LSM is the updated version of the Noah LSM and has been shown to improve the simulation of surface processes in comparison to the Noah LSM (e.g., Niu et al., 2011;Yang et al., 2011), the Noah-MP LSM has not yet been implemented in any reanalysis product. The CLM4 represents one of the most advanced LSM components, incorporating a detailed description of biogeophysics, hydrology and biogeochemistry. The CLM4 classifies vegetation cover using up to 16 different plant functional types, considering the physiology and structure of different plants. The soil vertical structure is divided into a layer for the vegetation canopy, five layers for snow cover and 10 soil layers, placing the zero-flux bottom boundary condition at approximately 4.32 m. The main characteristics of the employed LSM components are summarized in Table 1.
Beyond the structural differences among LSM components, the remaining options and parameters are identical for the four WRF simulations. Boundary conditions for the WRF experiments are provided by the North American Regional Reanalysis (NARR) product, which is formed by the NCEP Eta atmospheric model, the Noah LSM and the Regional Data Assimilation System (RDAS) (Mesinger et al., 2006). NARR data are provided with a 32 km grid and 3hourly temporal resolution, available at the National Centers for Environmental Prediction (NCEP) archive. The domain set for the WRF simulations has 50 km horizontal resolution and 27 atmospheric levels, covering North America in a Lambert projection. The land use categories employed for  Up to 10 vegetation types in one grid cell Prescribed 10 4.32 m Up to five layers Oleson et al. (2010) the four simulations ( Fig. S1 in the Supplement) are derived from the Moderate Resolution Imaging Spectroradiometer (MODIS; Barlage et al., 2005). Sea surface temperatures were prescribed using the NARR product. The four WRF simulations start on 1 January 1979, which is the first year of the NARR product, and end on 31 December 2012, using a time step of 300 s for the model integrations. We use the first year of each simulation as spin-up and the other 33 years for the analysis. The selection of the first year as spin-up was done considering the initialization period previously used in WRF climate experiments, such as those in Wang and Kotamarthi (2015), Katragkou et al. (2015) and Barlage et al. (2005). The comparison of the latent heat flux and surface air temperature outputs from the WRF-CLM4 simulation starting on 1 January 1979 and a similar simulation starting on 1 June 1979 indicates that this period is enough to initialize the simulation (Figs. S22 and S23 in the Supplement). The employed physics parameterizations include the WRF single-moment (WSM) six-class graupel scheme for the microphysics (Hong and Lim, 2006), the Grell-Freitas ensemble scheme for cumulus description (Grell and Freitas, 2014), the Yonsei University scheme as planetary boundary layer scheme (YSU; , the revised MM5 Monin-Obukhov scheme for the surface layer (Jiménez et al., 2012) and the Community Atmosphere Model (CAM) scheme for the integration of radiation physics each 20 min intervals (Collins et al., 2004). The gap in resolution from the employed boundary conditions (32 km) to the final simulations (50 km) can be counterintuitive for a RCM experiment. The computational resources saved with this coarse resolution allow us to perform simulations long enough for the study of land-atmosphere interactions and extreme events at climatological scales and yet similar horizontal resolution and domain to those employed in the North American component of the CORDEX project (Giorgi and Gutowski, 2015) can be attained. Thus, this decrease in resolution allows us to generate a set of four WRF sensitivity experiments using different LSM configurations. Additionally, we do not apply any nudging technique, ensuring that the RCM evolves freely according to each LSM component and its representation of land-atmosphere interactions.

Methodology
Different metrics have been employed in the literature for the evaluation of land-atmosphere interactions within climate model simulations and observations. Among these metrics, we selected the Vegetation-Atmosphere Coupling (VAC) index (Zscheischler et al., 2015) as our evaluation metric for the representation of land-atmosphere interactions at monthly scales. This index has been previously employed in the literature to identify regions with episodes of strong landatmosphere coupling within climate model simulations and observational data (Zscheischler et al., 2015;Gevaert et al., 2018;Sippel et al., 2017;Philip et al., 2018). The VAC index is segregated in four categories based on the simultaneous occurrence of some given extreme percentile rages of surface air temperature (SAT) and latent heat flux (LH; Philip et al., 2018): VAC a if SAT < 30th pctl. and LH < 30th pctl.
→ land control coupling 0 otherwise. (1) Extremes of SAT and LH are defined as values exceeding (below) the 70th (30th) percentile, relative to a 20-year period (1980-2000) (Eq. 1). We use the VAC metric at monthly scales as in Sippel et al. (2017), since this work proved the usefulness of the VAC metric at monthly timescales for the analysis of the climatology of extreme indices. The VAC index classifies areas depending on the soil moisture regime into energy-limited areas, where atmospheric conditions controls land-atmosphere interactions (VAC a and VAC b ), and into water-limited areas, where soil moisture deficits control the water and energy exchanges at the air-ground interface (VAC c and VAC d ). As explained in Zscheischler et al. (2015), the VAC a category is associated with energy limitations (low SAT) caused by the presence of clouds and precipitation, which leads to a decrease in the vegetation photosynthetic activity and therefore an increase in soil moisture. The VAC b category is frequent in wet areas with high SAT, usually related to a clear sky and high radiation, which is associated with an increase in the vegetation photosynthetic activity inducing the depletion of soil moisture. During VAC c episodes, the combination of high SAT and soil moisture deficits leads to diminished vegetation photosynthetic activity, followed by low precipitation and consequently low soil moisture and high SAT, promoting heat waves and droughts. The VAC d category is associated with high precipitation over dry soils which stimulates vegetation photosynthetic activity, increasing soil moisture and decreasing SAT. A no-coupling option also occurs when SAT and LH extremes do not coincide in time.
We calculate the frequency of occurrence for each VAC category using deseasonalized and detrended monthly SAT and LH time series following the typical methodology (Sippel et al., 2017) at each grid cell from 1980 to 2012 (hereafter referred to as the analysis period). The frequency of occurrence for each VAC category is calculated by counting the VAC events for the analysis period seasonally: in boreal winter (December, January and February; DJF), in spring (March, April and May; MAM), in summer (June, July and August; JJA) and in fall (September, October and November; SON). The probability of each VAC category (Figs. S2-S5 in the Supplement) and the probability of the no-coupling case sum 100 % over the analysis period at each grid cell. The VAC probabilities of occurrence for each category are considered significant when they are higher than the 95th percentile of the population obtained by 100 randomly sorted 34-year time series of SAT and LH. For the study of landatmosphere coupling within each simulation, we represent the averaged frequency of events under atmospheric control (VAC a and VAC b ) and under land control (VAC c and VAC d ) at grid cells with significant frequency of occurrence for at least one of the two VAC categories.
After the analysis of land-atmosphere interactions in our set of simulations, we assess the representation of extreme events across the WRF simulations coupled to different LSM components. There are several definitions of indices related to temperature and precipitation extremes, mainly using thresholds based on absolute values or statistical percentiles (e.g., Sillmann et al., 2013a). Studies based on statistical percentiles improve the comparison among models but hamper the interpretation of results by losing the physical meaning of the variable (temperature or precipitation). Although the use of extreme indices defined with absolute values facilitates the understanding of results by a general public, these indices could include model-specific biases. These biases can be corrected by bias removal techniques; however, the advantage of applying bias removal techniques is not clear for the study of future climate trends and climate variability, since these techniques have been proven to modify the spatiotemporal consistency of climate models as well as internal feedback mechanisms and conservation terms (Ehret et al., 2012;Cannon et al., 2015). Additionally, the simulation of absolute temperatures is of central importance for temperature dependent processes that may have important consequences for society and ecosystems, such as soil carbon processes (Hicks Pries et al., 2017). Since extreme indices based on both absolute values and statistical thresholds present advantages and disadvantages, we selected a set of indices including both categories from the list of 27 indices recommended by the Expert Team on Climate Change Detection and Indices (ETCCDI; Karl et al., 1999, Table 2). The employed intensity indices of temperature events are based on temperature values on the hottest day and coldest night in summer and winter for warm and cold events. The frequency indices of the same events indicate the percentage of hot and cold days and nights in the year. The duration of the temperature events is represented with the number of consecutive hot days and cold nights. The intensity of heavy precipitation events is characterized by the total annual precipitation in wet days, while the frequency of precipitation events is studied using the number of very wet days per year. The duration of wet and dry events is represented with the annual number of consecutive wet and dry days. For more specific definitions of the indices employed in this study, please refer to Table 2. Since we are interested in the climatology of extreme events, temporal averages of each annual index are computed for the analysis period at each grid cell for each WRF experiment. Then, we compute the inter-model range of each index across the WRF simulations (i.e., the difference between the maximum and minimum values at each grid cell considering the four WRF simulations), using it as metric for the uncertainty in the WRF simulation of extreme events arising from the LSM component.
The effect of the LSM configuration on the simulation of extreme events can also be relevant for multi-model ensembles, such as those participating in the CORDEX project. Here, we compare the LSM effect on the WRF simulation of extreme temperature and precipitation events with the representation of extreme events by three different RCMs participating in the NA-CORDEX program (Mearns et al., 2017). For this purpose, we use the daily outputs from three NA-CORDEX simulations forced by reanalysis data (evaluation experiments; Table 3). These CORDEX simulations were performed by the WRF model (Skamarock et al., 2008), the Rossby Centre atmospheric model version 4 (RCA4) (Samuelsson et al., 2011) and the Canadian Regional Climate Modelling Network -Université du Québec à Montréal (CRCM-UQAM) model (Martynov et al., 2013) using boundary conditions from the ERA-Interim reanalysis (Dee et al., 2011). The remaining NA-CORDEX evaluation simulations available in the Climate Data Gateway at NCAR were not used because those simulations cover a significantly shorter period of time than our simulations. The spatial domain and resolution of the NA-CORDEX simulations are similar to that of the WRF simulations, as indicated in Sect. 2. Refer to Table S1 for information about the availability of the data employed in this work.    Martynov et al. (2013) 4 Results

Examination of land-atmosphere interactions in WRF simulations
All WRF simulations with different LSM components display similar spatial patterns for VAC categories, agreeing in seasonality and broadly in the regional classification of energy and water limited areas (e.g., areas with high probability of episodes where atmospheric forcing or soil conditions control land-atmosphere interactions) ( Figs. 1 and 2). Atmospheric forcing controls surface processes at middle and high latitudes in MAM, JJA and SON, moving southward in DJF (Fig. 1). Areas frequently driven by soil conditions are displayed over the western Mexican coast in DJF, spreading across low and middle latitudes in MAM, JJA and SON (Fig. 2). These spatial similarities in the VAC coupling metric indicate that factors common in our four simulations, such as land cover, topography, latitudinal differences or atmospheric parameterizations produce these spatial patterns.
Despite the broad agreement between LSM simulations in the spatial distribution of the VAC categories, there are regional differences in their representation of land-atmosphere coupling. These regional differences allow us to identify the Noah LSM as the one simulating the weakest annual land control on processes at the surface, mainly due to a relatively weak land control during MAM and JJA (Fig. 2). The areas where LSM simulations differ in the probability of episodes under atmospheric control (VAC a and VAC b ) vary with the season; for example, the Noah-MP LSM simulates a large area under atmospheric control over the southeastern US in DJF, while the CLM4 and Noah LSMs identify atmospheric control areas below the Great Lakes following a northwestern direction (Fig. 1). These differences in atmospheric control areas are caused by the different probability of extreme latent heat flux simulated by each LSM in DJF (Figs. S6 and S7 in the Supplement). In MAM, the Noah-MP LSM represents higher probability of atmospheric control episodes over the northern US in comparison with the CLM4 and Noah simulations (Fig. 1). The Noah simulation shows the strongest atmospheric control in JJA as compared with the remaining simulations, particularly over eastern and western regions of Hudson Bay, the southeastern US and small areas in Mexico (Fig. 1). This strong JJA atmospheric control in the Noah simulation is driven by the VAC a category (Fig. S2 in the Supplement) and likely related to the high probability of cold temperatures over these areas in this simulation (Fig. S9 in the Supplement). During SON, the Noah-MP LSM reaches the highest probability of episodes under atmospheric control at middle and high latitudes, caused by the high probability of extreme latent heat flux in comparison with the rest of the LSMs (Figs. S6 and S7 in the Supplement). The contribution of the VAC a and VAC b categories to these episodes is broadly similar across LSMs, with slightly higher VAC a in all seasons; mod-est LSM-specific differences include a tendency for the Noah simulation to show slightly higher VAC a probabilities across all seasons (but especially DJF) (Figs. S2 and S3).
Although the Noah simulation displays the weakest land control for all seasons, it shows regions under land control over northwestern North America in DJF also indicated by the CLM4 simulation but absent in the Noah-MP and Noah-MP-DV simulations (Fig. 2). The probability of land control episodes over the western Mexican coast is higher in the CLM4 and Noah-MP simulations than in the Noah and Noah-MP-DV simulations in DJF. These LSM differences are associated with the high probability of low latent heat flux over those regions in winter for the CLM4 and the Noah-MP simulations in comparison with the remaining simulations (Figs. S7 in the Supplement). In JJA, however, the Noah-MP-DV simulation presents a stronger land control at low and middle latitudes than the Noah-MP simulation (Fig. 2), mainly caused by the VAC d category and the high probability of cold temperatures (Figs. S5 and S9 in the Supplement). There are also regional differences between LSM simulations in SON, particularly over the southeastern US coast where the CLM4 shows the strongest land control, followed by the Noah-MP simulation (Fig. 2). The Noah-MP-DV simulation does not show this strong land control at low latitudes in SON due to the low probability of high latent heat flux represented by the Noah-MP LSM with dynamic vegetation (Fig. S6 in the Supplement). The weaker land control in the Noah simulation, however, is not explained by the probability of extreme temperature or latent heat flux, since these probabilities are similar to those in the CLM4 simulation (Figs. S6-S9 in the Supplement). Thus, it is associated with the absent of coincidences of extreme temperature and latent heat flux simulated by the Noah LSM. Exploring the contribution of VAC c and VAC d separately, it is shown they present small differences; for example, the VAC c probability in DJF is slightly higher than the VAC d probability for all simulations, showing the opposite behavior in JJA for the Noah-MP and the Noah-MP-DV simulations (Figs. S4 and S5 in the Supplement).

Climatologies of temperature and precipitation extremes in the WRF simulations
We continue this analysis comparing the representation of extreme events within the four WRF simulations by calculating the range among these four simulations. But first, we analyze the spatial features of the climatology of extreme temperature and precipitation indices as simulated by the mean of the four WRF simulations with different LSM configurations (hereafter WRF ensemble mean) and by each LSM simulation separately. The climatologies of temperature and precipitation extreme indices, as described in Table 2 and represented by the mean of each index for the analysis period, show similar spatial patterns across all WRF simulations with differ- ent LSM configurations (Figs. S10, S11 and S12 in the Supplement). The similarities in the spatial pattern of extreme events among our simulations indicate that factors other than the LSM configuration, such as land cover, topography, latitudinal differences and atmospheric parameterizations, are driving these spatial features. Figure 3 represents the simulated climatologies of all extreme indices for the ensemble mean, formed by the four WRF simulations. The WRF ensemble mean shows the most intense cold events at high latitudes and high elevations, with cold events being more frequent and longer over northwestern North America and over Mexico (Fig. 3a). The simulation of warm events is more in-tense in coastal areas of the US and Mexico and over the central US, being more frequent and longer over southern North America with a high percentage of hot nights over northeastern NA (Fig. 3b). Precipitation events are heavier and more frequent at higher elevations and over southeastern NA (Fig. 3c). The longest dry periods are simulated over the western Mexican and US coasts, reaching more than 80 consecutive dry days, while the longest wet periods are represented over the Rockies and the northwestern Mexican coast (Fig. 3c). Figure 4 summarizes the averaged climatology of each extreme index for each simulation. Averages are computed over six regions adapted from Giorgi and Francisco (2000): Central America, CAM; western North America, WNA; central North America, CNA; eastern North America, ENA; Alaska, ALA; and Greenland, GRL. Although there are differences between our regions and those defined in Giorgi and Francisco (2000), we kept the same nomenclature for an easy comparison. That is, we label this region as GRL, although our northeastern Canadian region does not include Greenland. Colors in the figure correspond to the hottest (red) and coldest (blue) index values among the WRF simulations for the representation of cold and warm temperature extremes, and to the driest (brown) and wettest (green) index values for the representation of precipitation extremes over each region. This approach helps us to identify the CLM4 simulation as that with the weakest and shortest cold extreme events, although simulating more frequent cold events than the rest of the LSM components (Fig. 4a). Meanwhile, the Noah-MP-DV simulation shows more intense cold extremes during shorter periods over most of the regions (CAM, CNA, ENA and ALA) in comparison with the Noah-MP simulation which uses prescribed vegetation (Fig. 4a). The CLM4 simulation also corresponds to the most intense representation of warm extremes for the index based on maximum temperatures, while the intensity index based on minimum tem- Figure 3. Climatology of extreme indices associated with cold temperature events (a), warm temperature events (b) and precipitation events (c) for the ensemble mean, formed by the four WRF simulations (Table 2: TXx/TNn, maximum/minimum value of the maximum/minimum daily temperatures; TN10p/TX10p, percentage of cold nights/days; TN90p/TX90p, percentage of hot nights/days; CSDI/WSDI, cold/warm spell duration index; R95p, total annual precipitation in wet days; R10 mm, number of wet days in a year; CDD/CWD, consecutive dry/wet days). The climatology of each index is estimated as the mean of each extreme index at each grid cell for the analysis period (1980-2012). peratures shows higher values in the Noah-MP simulation, except for the CAM region (Fig. 4b). The Noah simulation is associated with the weakest and shortest warm extremes over most areas, and the Noah-MP and Noah-MP-DV simulations with the most frequent and longest events. The effect of dynamic vegetation seems to weaken hot extremes at nights over all regions, making them longer at middle and high latitudes (CNA, ENA, ALA and GRL), except in the western US (Fig. 4b). That is, the Noah-MP-DV simulation yields warm events longer but not as hot as using prescribed vegetation in most regions at middle and high latitudes. For extreme precipitation events, the CLM4 simulation shows the most intense and frequent precipitation events over most ar-eas, while the Noah simulation shows the weakest and the least frequent precipitation events (Fig. 4c). The Noah-MP simulation produces the longest dry periods over all regions except at high latitudes, where the Noah-MP-DV simulation yields a higher number of consecutive dry days (Fig. 4c). The simulation with dynamic vegetation yields wetter results than the simulation with prescribed vegetation at middle and low latitudes, while at high latitudes the Noah-MP-DV simulation is generally drier than the Noah-MP simulation (Fig. 4c).
In summary, from the results presented here and in the previous sections, we see that the spatial patterns of landatmosphere coupling and the climatology of extreme indices are similar in our WRF simulations (Figs. 1, 2 and S10-  Table 2 among the WRF simulations averaging over six land North American regions adapted from Giorgi and Francisco (2000) (Central America, CAM; western North America, WNA; central North America, CNA; eastern North America, ENA; Alaska, ALA; and Greenland, GRL). Colors correspond to the hottest (red) and coldest (blue) index values among the WRF simulations for the representation of cold (a) and warm (b) temperature extremes, and to the driest (brown) and wettest (green) index values for the representation of precipitation extremes (c) over each region. S12 in the Supplement), indicating that the LSM configuration is not influencing these spatial structures. Therefore, other factors common in our four WRF simulations, such as land cover, topography, the latitudinal gradient or atmospheric parameterizations, generate the spatial distribution of the coupling metrics and the extreme indices. Nonetheless, each LSM configuration yields different degree of land-atmosphere coupling and different values of extreme temperature and precipitation events at local scales. Thus, the CLM4 LSM is identified as the component yielding the strongest land control on surface conditions and the highest temperatures during cold and warm events over most of North America as well as the heaviest and most frequent precipitation extremes over most locations (Figs. 1, 2 and 4). That is, the simulation with more coincidences of extreme high (low) LH and extreme low (high) SAT is also representing the most intense temperature and precipitation extremes. This suggests that the simulation of very low latent heat flux may be influencing the simulation of heat extremes by inducing an increase in the energy available for sensible heat flux and likely increasing air temperatures. Meanwhile, the simulation of high latent heat flux may increase the representation of atmospheric water content, inducing changes in the formation of clouds and precipitation. Thus, the strong land control on the CLM4 simulation seems to enhance the intensity of warm and heavy precipitation events comparing with the rest of simulations, particularly in comparison with the Noah simulation. The Noah LSM produces the weakest land control on surface conditions and one of the lowest intensities for all temperature indices as well as the lowest intensity and frequency of heavy precipitation events over all regions. The comparison of the Noah-MP simulations using prescribed and dynamic vegetation shows that the use of dynamic vegetation yields stronger land control at low and middle latitudes in summer and more intense, frequent and longer heavy precipitation events over the same regions (Figs. 1, 2 and 4). Thus, this comparison also supports that the simulation of strong land control leads to heavier precipitation events.

LSM uncertainty in the simulation of temperature and precipitation extremes
Although all WRF simulations show similar spatial patterns for temperature and precipitation extreme indices (Figs. S10, S11 and S12), there are large uncertainties in the climatology of each extreme index associated with the use of different LSM configurations. For the simulation of the intensity of cold events, the multi-model range across the WRF simulations for the hottest day in DJF (TXx DJF) shows large values over the boreal forest and the Rockies, where the index climatology is close to 0 • C (Figs. 3 and 5a). The representation of the coldest night in DJF (TNn DJF) shows large LSM dependence, yielding ranges up to 12 • C over the US and a spatial average of 4 • C, displaying large uncertainties over areas where the index climatology approaches 0 • C (Figs. 3 and 5a). The simulated intensity of warm temperature events, measured by the temporal average of the hottest day in summer (TXx JJA), differs by up to 10 • C among simulations over eastern North America, with a spatial average of 3.5 • C (Fig. 5a). The simulation of the mean coldest night in summer (TNn JJA) varies across simulations from 2 to 3 • C over the whole domain, except in the Arctic where the range across simulations reaches approximately 15 • C and the index value yields negative temperatures for some simulations (Figs. 3 and 5a). The frequency of warm extreme temperature events varies among simulations; the range for the number of hot days (TX90p, based on maximum temperatures) is up to 4.2 % over the US with a spatial average of 0.97 % over the whole domain, and the range for the num-ber of hot nights (TN90p, based on minimum temperatures) reaches values up to 3.8 % at low latitudes with a spatial average of approximately 0.7 % (Fig. 5b). Large values of the multi-model range for the number of hot days (TX90p) approximately coincide with the largest index values (Figs. 3 and 5b). Note that ranges of more than 2 % in the number of hot days and nights correspond to differences of more than 7 d per year in the index climatology simulated by different LSMs. Ranges of indices related to the frequency of cold events show smaller values than those for warm temperature events, displaying no clear spatial pattern with averages of ∼ 0.5 % (i.e., 1.8 d per year) for the number of cold days and nights (TX10p and TN10p; Fig. 5b). The duration of warm spells is greatly affected by the choice of the LSM component, while its effect is weaker on the simulated duration of cold events (Fig. 5c). The range of the duration of warm spells across simulations yields values of more than 10 d over Mexico and over broad areas of the central and southern US, with a spatial average of 2.8 d (Fig. 5c). Otherwise, the LSM effect on the simulated duration of cold spells is weaker, reaching differences of about 6 d among simulations in central Canada with a spatial average of 1.3 d (Fig. 5c). For both indices, the LSM differences are larger where the duration indices display larger values (Figs. 3 and 5c). The simulated climatology of the intensity of extreme precipitation events is also strongly affected by the configuration of LSM, with the total annual precipitation in wet days (R95p) reaching LSM differences larger than 100 mm at low latitudes and over the eastern US with a spatial average of 39 mm (Fig. 6a). The frequency of heavy precipitation events varies among simulations by about 35 d per year at some locations in Mexico and the US, with a spatially averaged range of 3.5 d per year (Fig. 6b). The areas with the largest intermodel range of the precipitation frequency index across simulations are located in Mexico, the Rockies and at some grid cells over the eastern US coast (Fig. 6b). The simulation of the number of consecutive dry and wet days also depends on the choice of the LSM component, presenting larger differences among simulations in the climatology of the consecutive dry day index than in the climatology of the consecutive wet day index (Fig. 6c). The inter-model range across LSM simulations reaches 37 d for the number of consecutive dry days over central and southwestern North America, with a spatial average of 4 d per year (Fig. 6c). Meanwhile, the simulated number of consecutive wet days also shows LSM differences of more than 20 d at a few grid cells but lower values over most of the domain, yielding a spatial average of ∼ 1.2 d (Fig. 6c). Large inter-model ranges of precipitation indices across WRF simulations coincide with areas where each index reaches the maximum values (Figs. 3 and 6).
Results for the VAC metric present some similarities with the spatial pattern of uncertainties in the WRF simulation of extreme temperature and precipitation events, which suggest a relationship between these results. The areas showing large uncertainty in the simulation of the intensity indices of cold Figure 5. Multi-model ranges across the WRF simulations (i.e., difference between the highest value and the lowest value of the four WRF simulations at each grid cell) of extreme indices associated with the intensity (a), frequency (b) and duration (c) of cold (left) and warm (right) extreme temperature events (TXx/TNn, maximum/minimum value of the maximum/minimum daily temperatures; TN10p/TX10p, percentage of cold nights/days; TN90p/TX90p, percentage of hot nights/days; CSDI/WSDI, cold/warm spell duration index). The range among simulations is computed using the mean of each index from 1980 to 2012 for each simulation. extremes coincide with areas where LSM simulations differ in the representation of DJF atmospheric control VAC categories (VAC a and VAC b ; Figs. 1 and 5). Particularly, the uncertainty in the hottest day in winter is larger over areas with evergreen needleleaf forest (Figs. 5 and S1 in the Supplement). Thus, although all simulations include the same land use categories, the differences in the representation of vegetation by each LSM (Figs. S13 in the Supplement) from the plant functional types used by the CLM4 LSM to the canopy cover simulated by the Noah LSM are likely related to the differences in the simulation of land-atmosphere coupling and extreme indices. For the simulation of warm extremes, large LSM differences in the intensity indices correspond to LSM differences in the JJA VAC categories associated with the energy-limited areas (Figs. 1 and 5). The areas with large uncertainty in the hottest day in summer also correspond with areas showing a mix of vegetation from croplands to forests (Fig. S1 in the Supplement). Thus, these results also suggest that LSM differences in the representation of vegetation cover play a role in the different representation of land-atmosphere interactions in energy-limited areas and consequently different climatologies of the hottest day among our simulations. The uncertainty in the simulation of the coldest night in summer is larger in areas over the mixed tundra category, where LSM configurations differ in the simulation of snow cover in summer (Figs. S1 and S13 in the Supplement). Thus, LSM differences in the representation of snow cover from the single snow layer simulated by the Noah LSM to the five layers simulated by the CLM4 LSM may also contribute to the uncertainty in the intensity index of warm events. The uncertainty in the number of hot days and the duration of warm spells is larger over regions under land control, particularly over open shrublands, suggesting the possible influence of LSM differences in the simulation of soil moisture (Figs. 2 and S1 in the Supplement). The range of the intensity index of precipitation extremes displays a large JJA component over areas under land control at low latitudes and under atmospheric control at middle and high latitudes (Figs. 1, 2 and S14a). For the intensity index of heavy precipitation events, our simulations show large uncertainties in areas with mixed vegetation (Figs. 6 and S1 in the Supple-ment), suggesting the influence of LSM differences in the representation of vegetation cover on the simulation of latent heat flux, thus leading to changes in the simulation of atmospheric water content and precipitation. The uncertainty in the intensity, frequency and duration of heavy precipitation events is high over the western Mexican coast, where the model is representing the tropical forest and the Noah simulation showed strong atmospheric control in disagreement with the rest of our simulations (Figs. 1, 6 and S1 in the Supplement). These results suggest that LSM differences in the description of vegetation and snow cover (e.g., the number of snow layers and the description of the canopy) are also contributing to uncertainties in the simulations of precipitation extremes. The differences in the VAC metric and in the extreme indices are larger between different LSM components than those between simulations with prescribed and dynamic vegetation (Figs. 1, 2 and 4). The different representation of land cover by each LSM configuration may yield different soil properties, such as albedo, evaporative resistance and surface roughness. These soil properties play a key role in the computation of the energy and water fluxes at the land surface and therefore in the simulation of near-surface conditions (Laguë et al., 2019).
In order to address the LSM influence on the simulation of extreme events, we compute the ranges among our four WRF simulations using the 95th percentile of the analysis period for each extreme index. The uncertainty in the WRF simulations due to the LSM component when using the 95th percentile for each extreme index leads to similar conclusions (Figs. S15 and S16 in the Supplement). The LSM differences using the 95th percentile of the analysis period are larger for all extreme temperature and precipitation indices than using the period mean as expected, but the marked areas are analogous (Figs. 5, 6, S15 and S16). The agreement in the representation of areas with large uncertainty in extreme indices between results using mean and extreme climatologies suggests the LSM influence on extreme events at climatological and shorter timescales.

Comparison between WRF simulations and three CORDEX evaluation simulations
The climatologies of extreme temperature and precipitation statistics as simulated by the three RCMs participating in the NA-CORDEX project (Table 3) show similar spatial patterns to our four WRF simulations (Figs. S10-S12 and S17-S19). These similarities in the spatial pattern of extreme indices represented by WRF and the CORDEX RCMs further support the hypothesis that the spatial features of these maps are controlled by topography, land cover and the latitudinal gradient, since the CORDEX RCMs employed atmospheric models and boundary conditions different to our WRF simulations. Although the spatial patterns are similar in both ensembles, the WRF simulations yield colder minimum temperatures in DJF (TNn DJF) and less frequent cold nights (TX10p) than the CORDEX simulations (Figs. S10 and S17). The percentage of hot days, however, is higher and warm spells are longer in the WRF simulations than in the CORDEX simulations, particularly over southwestern NA (Figs. S11 and S18). The intensity of heavy precipitation extremes is generally higher within the WRF ensemble than in the CORDEX ensemble, while dry periods are longer in the CORDEX simulations (Figs. S12 and S19). The uncertainties in the simulation of extreme statistics within the CORDEX ensemble show some similarities with the WRF uncertainties which arise from the LSM configuration. For example, the simulated climatology of DJF coldest night (TNn DJF) shows large uncertainties over the US for both ensembles, particularly over the eastern US (Figs. 5a and 7a). The climatologies of DJF hottest day (TXx DJF) display a large inter-model range within the WRF ensemble over areas where temperatures approximate to 0 • C, expanding southward for the CORDEX ensemble. The CORDEX inter-model ranges of the frequency indices for cold extremes do not show a clear spatial pattern in agreement with the WRF ensemble. There is, however, a region over the central US with slightly larger ranges among the CORDEX simulations than among the WRF simulations (Figs. 5b and 7b and S20 in the Supplement). The duration of cold spells presents large uncertainties in the CORDEX ensemble over the eastern US-Mexican border and over western Canada, coinciding with a small region with large inter-model range among the WRF simulations (Figs. 5c and 7c). For the simulation of warm temperature extremes, the uncertainties in the intensity indices among the CORDEX simulations show large ranges over the eastern US for the JJA hottest day (TXx JJA), in agreement with the WRF simulations, and at high latitudes for the coldest night (TNn JJA), including the eastern region of Hudson Bay also marked by the WRF ensemble (Figs. 5a and 7a). The frequency indices of warm events show a large inter-model range across the CORDEX simulations over the central US, also shown in the WRF simulations for the TX90p index (Figs. 5b and 7b). The uncertainty in the duration of warm spells among the CORDEX simulations does not show large spatial differences, although the ranges are slightly larger at low latitudes coinciding with regions marked by the WRF ensemble and at very high latitudes (Figs. 5c and 7c). The simulation of extreme precipitation statistics is generally more uncertain across the CORDEX simulations than across the WRF simulations (Figs. 6,8,and S21 in the Supplement). Interestingly, all regions with large uncertainties in the simulation of precipitation extremes among the WRF simulations are also identified as areas with large uncertainty across the CORDEX ensemble. There are, however, additional areas with large uncertainty in the CORDEX ensemble, particularly for the consecutive dry day index and the frequency index at middle and high latitudes (Figs. 6 and 8). The larger spread of the precipitation indices within the CORDEX ensemble in comparison with the spread in our WRF simulations (Figs. S21 in the Supplement) was expected due to the use of different atmospheric models in the CORDEX ensemble. Nonetheless, the agreement between the WRF and CORDEX simulations in the placement of areas with large uncertainties suggests that results from this study may be applicable to other modeling experiments, particularly for the simulation of warm temperature and precipitation extremes.

Comparison of inter-model ranges across the WRF and CORDEX ensembles
In order to provide context for the applicability of these results to other sets of simulations, we compared the range across our WRF simulations with the inter-model range across three CORDEX simulations in representing extreme events . Since CORDEX simulations were performed by three structurally different RCMs (the WRF, RCA4 and CRCM-UQAM models), we expected a broader inter-model range of the simulated extreme indices across CORDEX simulations. Differences in the representation of extreme events among the CORDEX simulations arise from several factors, such as different atmospheric parameterizations, land surface model components, the representation of land cover, treatment of boundary conditions, including sea surface temperatures, and the application of nudging techniques. In addition to all these factors, the sensitivity to initial conditions in models may be another important factor for the inter-model range of the simulated extreme events. The WRF sensitivity to initial conditions may also affect the interpretation of the differences among our four simulations with different LSM configurations. However, previous analyses using the WRF model (Liu et al., 2019;Gallus and Bresch, 2006) as well as other climate models (Kharin et al., 2007;de Elía et al., 2008;Sillmann et al., 2013a) have shown that the spread of extreme events among ensemble members of an individual model is generally small compared to inter-model spreads or the differences arising from different physics configurations.
Although the CORDEX simulations were performed using boundary conditions from the ERA reanalysis product, the comparison with the WRF simulations is possible because we compute the ranges across simulations as a measure of the uncertainty in each simulation ensemble. Thus, we compare the uncertainty in each set of simulations finding common areas with large ranges for the representation of cold and warm temperature extremes and precipitation extremes, despite the fact that they used different products as boundary conditions. The agreement in the placement of areas with large uncertainties in the representation of extreme events within the CORDEX ensemble and those within our WRF simulations suggests that the uncertainties in these areas may arise from similar causes. Our WRF simulations only differ in the con- Figure 7. Inter-model range across three CORDEX simulations (i.e., difference between the highest value and the lowest value of the three CORDEX simulations at each grid cell) of extreme indices associated with intensity (a), frequency (b) and duration (c) of cold and warm extreme temperature events ( Table 2). The range across simulations is computed using the mean of each index from 1980 to 2012 for each simulation.
figuration of the LSM component. Therefore, the differences between LSM components can also be an important source of uncertainty in the simulation of extreme events within the CORDEX simulations, through a different representation of land-atmosphere interactions.
One of the simulations included in the CORDEX ensemble was performed by the WRF model using the Noah LSM component. The comparison of the extreme indices between our WRF-Noah simulation and the one included in the CORDEX ensemble shows similar spatial patterns and regional differences in the value of each extreme index (second column in Figs. S11-S13 and third column in Figs. S17-S19 in the Supplement). However, this comparison is not very different if we use another CORDEX simulation performed by a different RCM. This suggests that the spatial pattern of the extreme indices is driven by factors common in all simulations, such as land cover, topography and the latitudinal gradient. The regional differences in the value of extreme indices between our WRF-Noah simulation and the WRF-Noah CORDEX simulation are likely caused by the use of nudging techniques to match the ERA-Interim product in the CORDEX simulation.
Although there are more sources of uncertainty in the CORDEX simulations than across the WRF simulations, the comparison between the uncertainty within each set of simulations (i.e., the difference between the range among the WRF simulations and the range among the CORDEX simulations) displays larger ranges across the WRF simulations than across the CORDEX ensemble over certain areas and for certain extreme indices (Figs. S20 and S21 in the Supplement). This suggests the possible existence of bias compensation inside the CORDEX simulations. Moreover, each RCM may have a different sensitivity to the employed LSM component as well as to other components and parameterizations. Additional sensitivity studies using the WRF model or another climate model with different settings and parameterizations may help to discern other important sources of uncertainties in the simulation of extreme events, such as horizontal resolution.

Climatology of extreme events as represented by
the WRF simulations and by the CMIP5 simulations Sillmann et al. (2013a) presented an evaluation of the CMIP5 models in simulating some of the extreme indices defined by ETCCDI; this information was used in the Intergovern-mental Panel on Climate Change (IPCC) chapter on models' evaluation (Flato et al., 2013). The analysis period employed by Sillmann et al. (2013aSillmann et al. ( ) (1981Sillmann et al. ( -2000 differs from the one used in this analysis, but a rough comparison can be made between our results and theirs for some extreme indices. For example, the spatial patterns of the DJF coldest night and JJA hottest day are similar for the WRF and CMIP5 ensemble means ( Fig. 3 and Fig. 2 in Sillmann et al., 2013a). The similarities in the spatial pattern of extreme indices between our WRF simulations, the CMIP5 and the CORDEX ensembles suggest that the topography, land cover and latitudinal gradient are driving these spatial features. Sillmann et al. (2013a) also provides regional averages over six NA regions, adapted from Giorgi and Francisco (2000). These spatial averages allow identification of some regional differences between the WRF and the CMIP5 ensembles, for example, over the eastern US coast (ENA region) where the WRF simulations yield warmer JJA maximum temperatures than the CMIP5 ensemble ( Fig. 4 and Fig. 3 in Sillmann et al., 2013a). The spatial patterns of the WRF and CMIP5 ensembles for CSDI and WSDI indices are also similar, although the WRF ensemble reaches longer cold and warm events ( Fig. 3 and Figs. S6-S7 in Sillmann et al., 2013a). The representation of the intensity index for heavy precipitation events (R95p) also shows similar spatial patterns between both ensemble means, although the WRF ensemble is generally more intense over most regions (Figs. 3 and 4, and Figs. 6 and 7 in Sillmann et al., 2013a). Similar results are found for the simulation of consecutive dry days, showing similar spatial patterns with some regional differences especially at low latitudes (CAM region, Figs. 3 and 4, and Figs. 6 and 7 in Sillmann et al., 2013a). The variability across the CMIP5 ensemble for the simulation of precipitation indices seems to be particularly large at low latitudes (CAM region) similar to WRF uncertainty in the representation of precipitation extremes associated with the LSM component (Fig. 6, and Fig. 7 in Sillmann et al., 2013a). Although this is a rough comparison between results presented in this article and in Sillmann et al. (2013a), this comparison suggests that our conclusions could be also applicable to the CMIP5 ensemble as it was the case for the CORDEX ensemble.

Implications of these results
Increases in heat-related events have been directly and robustly associated with increases in mortality, for example, in Europe during the heat wave of 2003 (Fischer et al., 2007) or in India during the heat wave of 2015 (Pattanaik et al., 2017). Heavy precipitation events often lead to floods, which also are directly associated with economic loss and death toll (Hu et al., 2018). All climate change projections point to a future increase in extreme temperature and precipitation events (Sillmann et al., 2013b); thus, developing mitigation strategies will become necessary to preserve human health. Climate model simulations are our best source of informa- The indices employed here to study the climatology of extreme temperature events were based on minimum and maximum temperature outputs. However, many studies have proven that the study of compound events using indices based on multiple variables, such as temperature and moisture outputs, are more representative of thermal stress in humans and ecosystems than standard indices (Zscheischler et al., 2018). The large LSM influence on the climatology of extreme temperature and precipitation events suggests that the uncertainty arising from the LSM component could be higher on extreme indices based on multiple variables. However, the analysis of the LSM influence on compound events is beyond the scope of this work and constitutes an interesting line for future research.

Conclusions
WRF simulations over North America coupled to different LSM components showed similar spatial patterns of landatmosphere interactions as measured by the VAC index. The use of this metric allows the classification of our results into energy-limited areas, where atmospheric conditions control land-atmosphere interactions (VAC a and VAC b ), and waterlimited areas, where soil moisture deficits control the energy and water exchanges between the land surface and the lower atmosphere (VAC c and VAC d categories). Our results indicate atmospheric control over land-atmosphere interactions at middle and high latitudes and land surface control over lower latitudes, particularly in JJA. However, the simulation of land-atmosphere coupling differs at regional scales depending on the LSM choice in two directions: by altering land control on surface processes (VAC c and VAC d categories) and by altering atmospheric conditions and its influence on land-atmosphere interactions (VAC a and VAC b categories). Thus, the Noah LSM is associated with the weakest representation of land control on surface conditions, while the CLM4 LSM simulates one of the strongest land effects on surface conditions. The use of different LSM components leads to large ranges of represented extreme temperature and precipitation events, affecting their simulation in intensity, frequency and duration. The CLM4 LSM yields the weakest cold events, the warmest hot days and the heaviest precipitation events, while the Noah simulation yields the weakest warm temperature events and the weakest heavy precipitation events. Meanwhile, the Noah-MP LSM produces the driest simulation, yielding slightly wetter conditions when using dynamic vegetation at middle and low latitudes. Although the LSM differences in our results are more marked than differences between the simulations with prescribed and dynamic vegetation, the use of dynamic vegetation yields stronger land control at low and middle latitudes in summer and more intense, frequent and longer heavy precipitation events and reduces the duration of droughts over the same regions. Thus, our results suggest a relationship between the degree of land control on surface conditions reached by each LSM configuration and the intensity of extreme events, in agreement with the case study during the 2010 Russian heat wave (Zscheischler et al., 2015).
Previous studies using GCM simulations suggested a dependence of the simulated land-atmosphere interactions on the employed LSM component with possible consequences for the simulation of extreme events (García-García et al., 2019). Results from four WRF simulations differing only in the LSM configuration support that hypothesis, identifying LSM differences in the description of land cover as an important factor for the simulation of near-surface conditions. Additionally, areas with large uncertainties in the simulation of temperature and precipitation extremes across the WRF simulations due to different LSM components appear in the NA-CORDEX model ensemble, which indicates the possible LSM influence on the simulation of extreme events within other model ensembles. This work reinforces the important role of the LSM component in climate simulations, supporting the urgency of ongoing research focused on improving this model component and their implementation in regional and global climate models as well as in reanalysis products. The strong LSM dependence of climate model simulation of extremes is also of special importance for international reports focused on land, such as the IPCC Special Report on climate change, desertification, land degradation, sustainable land management, food security and greenhouse gas fluxes in terrestrial ecosystems (IPCC, 2019). Future sensitivity analyses to the LSM component using different regional and global climate models would be useful to understand models' differences in simulating temperature and precipitation extremes, helping to narrow the inter-model range across reanalyses and climate model projections in simulating extreme events.
Author contributions. AGG designed the modeling experiment, performed the simulations and analyzed model outputs. All authors contributed to the interpretation and discussion of results. AGG wrote the manuscript with continuous feedback from all authors.