Sensitivity studies with the regional climate model COSMO-CLM 5.0 over the CORDEX Central Asia Domain

. Due to its extension, geography and the presence of several underdeveloped or developing economies, the Central Asia domain of the Coordinated Regional Climate Downscaling Experiment (CORDEX) is one of the most vulnerable regions on Earth to the effects of climate changes. Reliable information on potential future changes with high spatial resolution acquire signiﬁcant importance for the development of effective adaptation and mitigation strategies for the region. In this context, regional climate models (RCMs) play a fundamental role. In this paper, the results of a set of sensitivity experiments with the regional climate model COSMO-CLM version 5.0, for the Central Asia CORDEX domain, are presented. Starting from a reference model setup, general model performance is evaluated for the present day, testing the effects of singular changes in the model physical conﬁguration and their mutual interaction with the simulation of monthly and seasonal values of three variables that are important for impact studies: near-surface temperature, precipitation and diurnal temperature range. The ﬁnal goal of this study is two-fold: having a general overview of model performance and its uncertainties for the considered region and determining at the same time an optimal model conﬁguration. Results show that the model presents remarkable deﬁcien-cies over different areas of the domain. The combined change of the albedo, taking into consideration the ratio of forest fractions, and the soil conductivity, taking into account the ratio of liquid water and ice in the soil, allows one to achieve the best improvements in model performance in terms of cli-matological

thus highlighting specific model deficiencies (Kotlarski et al., 2014). These may be related to the model formulation and to choices in model configuration (Awan et al., 2011;Bellprat et al., 2012a, b;de Elía et al., 2008;Evans et al., 2012). In the second case, it should be possible to improve model performances by testing different model configuration setups and choosing the one that better agrees with observations. This approach might be conceived as an optimization step. Nevertheless, it is important to emphasize the fact that a specific model configuration could produce better results, by simply compensating 5 for some deficiencies in the model formulation (Hourdin et al., 2017).
In climate models, the complexity and small spatial scales of the physical processes involved, requires the so-called parameterization of many of these processes: this basically consists in summarizing physical phenomena and their interaction across different spatial and temporal scales (Fernández et al., 2007;Rummukainen, 2010;McFarlane, 2011;Hourdin et al., 2017), which is associated with substantial uncertainties. The same processes may be described through different parameterizations, In this section the research methods are described, including details on the model and the different simulation setups, the observational datasets used for the evaluation of model results and the employed metrics.

Model and Experiments Description
The Consortium for Small-Scale Modeling in Climate Mode (COSMO-CLM) is a non-hydrostatic regional climate model 5 developed by the CLM-community, an open international network of scientists. The model version employed in this study is the COSMO-CLM 5.0_clm9. Many studies have been conducted in the recent years over different CORDEX regions, using the COSMO-CLM (Panitz et al., 2014;Dobler and Ahrens, 2010;Bucchignani et al., 2016;Smiatek et al., 2016;Jacob et al., 2014;Kotlarski et al., 2014;Zhou et al., 2016).
The simulations presented in this study are performed with a spatial resolution of 0.22°, as specified in the new CORDEX-10 CORE directives (Gutowski Jr et al., 2016), on a rotated geographical grid. The initial simulation domain extends from ∼3°to ∼145°over longitudes and from ∼16°to ∼73°over latitudes. The domain includes a model relaxation zone of ∼250 km on each domain side, used to "relax" the model variables towards the driving data (Køltzow, 2012;Davies, 1976). Results of the simulation for this area are excluded from the analysis, with the final domain extent shown in Fig. 1. If not differently specified, all the simulations are run over a fifteen year-long period from 1991 to 2005, with the first five years excluded from the analysis 15 and considered as spinup time.
In a set of sensitivity experiments labeled from a to q in the first section of Tab. 1, the effects on model performances of different changes in the model configuration are tested, first individually and then combining them with each others. The setup of experiment a is the reference from which the other experiments are configured, by implementing the modifications specified in the table. The model configuration used for the reference simulation is the same used for the CORDEX East Asia domain 20 for the COSMO-CLM model version 5.0. This was considered as a good reference for the purposes of this study, since the two regions share a large part of their domains. A general description of the setup of the reference simulation is provided in Tab. 2.
All the performed simulations are driven by the NCEP version 2 reanalysis data (Kanamitsu et al., 2002), provided as boundary and initial conditions. The boundaries have a temporal resolution of 6 hours and a spectral resolution of T62 (∼ 1.89°). NCEP2 data have been selected as boundary data, instead of commonly employed ERAInterim reanalyses (Dee et al.,  (Bentsen et al., 2013;Iversen et al., 2013), with a spatial resolution of, respectively, ∼ 210 km, ∼ 210 × 140 km and ∼ 270 × 210 km. Thus, using NCEP2 as drivers allows to reproduce a resolution jump more similar to the one present when using the considered GCMs.

30
Acknowledging the fact that ERAInterim reanalysis data, which have a spectral resolution of T255 (∼ 0.7°), are normally employed for the evaluation of RCMs, two additional simulations are performed, driven by ERAInterim (second section of Tab. 1). This allows to estimate the effects of the two different driving data on the simulations results and to support possible conclusions on an optimal setup, verifying how significantly the results differ in the two cases.
In order to better discriminate different sources of uncertainties in the model simulations, a run covering the period 1991-2005 is also performed (third section of Tab. 1), using a timestep of 120 s, instead of the one of the reference simulation of 150 s.

5
Two 25-year long simulations, covering the period 1991-2015, are performed for testing different configurations that could help in reducing model biases over areas characterized by the presence of permafrost in winter. The two simulations, labeled SOIL and SNOW in the fourth section of Tab. 1, are performed increasing in both the number of soil layers from 10 to 13, together with their total depth from approximately 15 m to more than 130 m, and, only for SNOW, additionally using the multilayer snow model of COSMO-CLM (Machulskaya, 2015). These simulations cover a longer period than the others, since a 10 longer spinup time is necessary in order to account for more and deeper soil layers. Their results, excluded from the direct comparison with the other simulations, are discussed in the results and concluding sections of this paper.
Finally, a set of four simulations are additionally performed for the investigation of the model internal variability (last section of Tab. 1). These simulations have the same setup as the reference simulation a, but are initialized at four different dates, shifted by +/-1 and 3 months with respect to the reference one.

15
All the proposed simulations are designed with the goal of better understanding main model limitations for the area and to which degree they can be reduced by properly configuring the model, isolating the effects of different sources of uncertainties.

Observations
Gridded observational datasets are used to compare model results against observational data on a similar scale. These gridded datasets are obtained through statistical extrapolations of surface observations. In addition to uncertainties related to the original 20 measurements, these datasets also contain important uncertainties due to the statistical extrapolation procedure (Flaounas et al., 2012;Gómez-Navarro et al., 2012). For climate model evaluation studies, these uncertainties are usually taken into account by using a range of different datasets (Collins et al., 2013;Gómez-Navarro et al., 2012;Bellprat et al., 2012a, b;Flaounas et al., 2012;Lange et al., 2015;Zhou et al., 2016;Solman et al., 2013).
In this study, the issue of observational uncertainties is addressed by considering three different datasets for each of the 25 investigated variables. The datasets include both observations and reanalysis data. For all the three considered variables, information is retrieved from the CRU TS4.1 observational dataset (Harris and Jones, 2017). Information on T2M and PRE is also retrieved from the University of Delaware (UDel) gridded dataset (Willmott, 2000), provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA. For T2M and DTR, in addition, the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA2) (Gelaro et al., 2017) is employed. For precipitation, the third considered dataset is the Global 30 Precipitation Climatology Centre dataset (GPCC) (Becker et al., 2011), while the ERAInterim reanalysis dataset (Dee et al., 2011) is used in addition to MERRA2 and to CRU for the evaluation of DTR.
All the datasets are retrieved on a grid with the same spatial resolution of 0.5. The ERAInterim data, that originally have a horizontal resolution of approximately 80km, are interpolated to the same grid resolution. The output of the simulations is upscaled to the same 0.5 • grid of the observations. For temperature and diurnal temperature range, a bilinear remapping method is used for the upscaling, while for precipitation a conservative remapping method is employed. Fig. 2 shows the spread of the different observational datasets for each variable, for yearly, winter and summer climatological means over the period 1996-2005. As evident, large differences emerge among the different datasets, in particular for regions characterized by complex topography and lower observational stations density, such as the Tibetan Plateau and Siberia. The 5 given spread could make it hard to quantify model biases over certain regions. In the case of T2M and DTR, the spread is certainly influenced by the fact that some of the datasets are reanalyses. Nevertheless, for T2M, differences exceeding 8°are present, in particular in winter, even between the CRU and the UDEL, over regions where the interpolation is highly affected by the low number of stations (Matsuura and Wilmott, 2012;Bucchignani et al., 2014). For PRE, the spread in the different observations (expressed in percentage with respect to the GPCC values) is remarkable in winter over the Tibetan Plateau and in 10 summer over particularly dry areas. Despite the differences might likely be influenced by the employed interpolation methods in each case, the spatial coverage of observation is still considered their main driver (Dong and Sun, 2018;Matsuura and Wilmott, 2012;Sun et al., 2018;Naumann et al., 2014).

Analysis Details and Evaluation Metrics
In order to rank different model configurations according to their skills in simulating the three considered variables over the 15 region, their performances are evaluated with respect to the ones of the reference simulation (a, Tab. 2).
Since in the context of CORDEX simulations the main interest is often on the comparison of the mean climate between two different periods in time, the primary focus of the proposed analyses is on climatological monthly values of the considered variables: monthly values of daily means of T2M and DTR are considered, while for PRE integrated values are used. In addition, the results are supported by the investigation of the simulated monthly variability. 20 In the latter case, since the model is not expected to exactly match the observed temporal evolution of the investigated variables point by point (Gleckler et al., 2008;Wilks, 2006), regional mean anomalies are considered. For each grid point in the domain, monthly anomalies are first calculated by subtracting the climatological mean from each monthly value. The variability is then analyzed based on these anomalies averaged over sub-regions characterized by similar climate conditions. The decomposition of the domain into a set of sub-regions is obtained by means of a k-means clustering (Steinhaus, 1956;25 Ball and Hall Dj, 1965;MacQueen et al., 1967;Lloyd, 1982;Jain, 2010) of quantile-normalized (q-normalized) monthly climatologies of the investigated variables. K-means is a clustering technique using the concept of Euclidean distance from the centroids of a pre-determined group of clusters, for separating similar data into groups. For the purposes of this paper, following several tests and the results of other studies (Mannig et al., 2013), a total number of 11 clusters is selected. The k-means clustering algorithm is reiterated over 3000 times in order to achieve the presented results, using q-normalized values 30 of monthly climatologies of T2M and DTR derived from the CRU dataset and PRE values derived from the GPCC as input. For both the analyses of mean climate and variability, metrics adapted from Gleckler et al. (2008) are used. In the following subsections, we give more details on the employed metrics.

Climatology
For the evaluation of the climatological means, we employ a Skill Score (SS) metrics expressed as: where the Mean Absolute Error (MAE) is given by: where sim and obs are the monthly climatological means of, respectively, the considered simulation and observational dataset. The indices i, j and m vary, respectively, over longitudes, latitudes and months of a year. W is the sum of the weights

Variability
The analysis of the model performances in simulating the mean climate is complemented by the investigation of simulated variability.
There is no reason to expect models and observations to agree on the phasing of internal (unforced) variations. Hence metrics such as MAE are not appropriate for characterizing the model performances for interannual variability (Gleckler et al., 2008). 20 Here, for an overall evaluation of the simulated variance in the different cases, the ratio of simulated to observed variance is considered: It is important to mention that correctly matching the observed variance does not guarantee a correct representation of the 25 modes of variability associated with this variance.
For taking into account observational uncertainties, all the proposed analyses are conducted separately for each observational dataset. Changes in model performances for a given configuration are considered relevant only when consistent among the different observations.

Results
In this section the results of the conducted analyses are presented, starting from the consideration of climatological means and 5 followed by the analyses of simulated-to-observed variability.

Mean Climate
In order to characterize the general performances of the model over the region, for the three considered variables, maps of the yearly, winter and summer mean biases of the reference model simulation a with respect to the different observational datasets, are first presented. Concerning PRE (Fig. 5), remarkable biases are present in the winter and summer as well as in the annual mean for all 25 the observational datasets. The biases in this case are expressed as percentage with respect to the values of the corresponding observational estimates. In summer (Fig. 5, right panels), a particularly pronounced negative bias, with values down to -100%, is visible over arid regions and the monsoon area. This is of the same order of the spread of the observational datasets for the area (Fig. 2). Over the Tibetan Plateau the bias in summer is positive, with values larger than 100%. In winter (Fig. 5, central panels), this positive bias becomes even larger (but again in the order of the spread of observations), and extends further over 30 adjacent regions. Over the central part of the domain, a different behavior is evident between winter and summer: while in winter the model simulates wetter conditions (+20% to +100%), summers are drier (∼ -50 %) than in observations. In the annual mean (Fig. 5, left panels), the simulated climate is wetter over a large part of the eastern domain (with values exceeding +100%) and drier over desert zones (with rare values smaller than -80%). In this case, over the central part of the domain, winter and summer biases compensate each other.
In all the cases, the simulated DTRs are smaller than the observed ones over almost the entire domain (Fig. 6). A positive bias in DTR, rarely exceeding +5°C, is evident only over isolated parts of the southern domain, in particular over the southern 5 borders of the Tibetan Plateau. The differences arising from the comparison against CRU observations are more pronounced than the ones against reanalysis data, with biases lower than -10°C in some cases. The pattern of the bias is quite similar for all the three considered datasets, with some larger differences in summer. Over the northernmost part of the domain, characterized by particularly cold conditions (minimum temperature under -30 o C in winter, see Tab. 3), a strong negative bias is evident only with respect to the CRU in all seasons. The smaller bias over this area arising from the comparison against reanalysis data is 10 most likely due to the nature of these datasets, which combine model predictions and observations.
The additional simulations performed with the same reference setup but with a different timestep (TIMESTEP, Tab.1) and driven by ERAInterim instead of NCEP2 (a_ERAInterim, Tab.1), lead to very similar biases, for all variables (see supplements). This suggests that evinced biases are likely inherent to the model formulation itself.

15
In this section, the results of the Skill Score (SS) derived from the MAE calculated over the mean seasonal cycle and all the points of the domain are presented. As for PRE, also for DTR (Fig. 7, bottom row) only one experiment seems to sensibly improve over the results of the 30 reference simulation: experiment i (SS ranging between +4% and +5%). In this experiment, the soil heat conductivity takes into account the ratio of soil moisture to soil ice. For DTR two experiments, d and j, lead to particularly negative skills (SS between -4% and -5%), which also affect the combined experiments including their configuration changes. The unique exception is the combined experiment q: in this case, the negative effects on the simulation of DTR of experiment d, seem to be compensated by the positive ones of experiment i, resulting in positive values of SS, varying between +1% and +2%.
In summary, the presented analyses show that the combined representation of surface albedo (taking into account forest fraction) and soil heat conductivity (accounting for the ratio between ice and moisture in the soil), as configured in experiment q, has the best positive effects on the representation of the mean seasonal cycle of all the three considered variables, among all 5 the tested configurations.
Although some differences in the results of the SS calculated based on the different observational datasets are evident, experiment q shows the same positive sign of improvement in all the cases. This is also true when comparing the results obtained driving the same simulations q and a with NCEP2 and ERA-Interim reanalysis data (Tab. 4). This confirms the potential of experiment q in improving model performances for the area.   for almost the entire domain. Therefore, even though important improvements are obtained in different cases, it is crucial to highlight the fact that these might be the product of some compensation effect over different variables and domain sub-regions.

SS -Single Seasons
The same SS analyses are additionally conducted for the monthly climatologies of each season, over the entire domain. Seasonal analyses could help in discriminating simulations presenting good and coherent performances over more periods of the year. The results, reported in the supplementary section of this paper, show that the largest changes in the seasonal values of SS are obtained for summer, for processes related to the representation of surface and soil properties. On the other hand, in 5 winter the changes in model performances among the different experiments are substantially smaller than in the other seasons.
Overall, for single seasons, the most important and consistent improvements in the simulated climatological mean of the considered variables with respect to the reference simulation are obtained for experiment q, confirming the results obtained for the seasonal cycle.

Effects of Soil Depth and Snow Model on mean winter temperatures 10
The two simulations (SOIL and SNOW) specifically designed for testing the effects of changes in soil depth and the use of a multi-layer snow model on the COSMO-CLM simulation of T2M over areas characterized by the presence of permafrost and snow in winter, do not present significantly different results than the reference simulation ( Fig. 9). In particular, they do not allow to reduce the warm bias in T2M present for the reference simulation in winter over the Western Siberia part of the domain, with even warmer conditions simulated by experiment SNOW (Fig. 9, left).

Variability
In this section, the results of the analysis of simulated variance is presented, with the goal of complementing the analyses of the mean climate of Sec. 3.1. First, a general overview of the model skill in simulating observational variability is described, followed by a discussion of the different uncertainties affecting this metrics. Nevertheless, the most pronounced changes are still limited to a few clusters.
For T2M, the best results in terms of simulated variance are obtained: the model is able to reproduce the interannual variability of the observations particularly well. In particular, a good agreement between simulated data and observations is evident for subregions WSC, IMO and ARC. The largest underestimation of the observed variance of T2M is obtained for cluster 30 CSA. Therefore, the model is not only unable to simulate the mean temperatures of particularly cold areas, as demonstrated in Sec. 3.1, but it also shows a very low variability for the same variable, over these regions, when compared to observations.
A particulalry small value of the variance ratio is also evident for T2M for the sub-domains DSS, SAR and STE throughout almost all the experiments. These regions are all characterized by a large range between minimum and maximum monthly temperatures (see Tab. 3).

Uncertainties in the Investigation of Simulated Variability
In this section, the influence of uncertainties associated with the observational datasets, boundary data and internal variability on the evaluation of simulated variability are quantified. To investigate the effect of the model internal variability, four additional 15 simulations have been conducted using the setup of the reference simulation, but shifting the initial date by +/-1 and +/-3 months.
Left columns of each panel of Fig. 11 show the absolute differences in the variance ratio of experiment a calculated, for each variable, with respect to different observational datasets. In addition, the right columns of the same figure show the absolute differences in the variance ratio between experiment a and the other experiments. The range of changes in the two cases is 20 comparable for almost all clusters and variables. In many cases, the changes resulting from the use of different observations are larger than the differences between the experiments. In these cases, the observational uncertainty is thus too large to allow for a classification of the different experiments in terms of their skill in reproducing the observed variance. The influence of the observational datasets on the variance ratios is larger for PRE and DTR than for T2M.
Despite variations in the boundary data and in the simulated internal variability (as quantified in the additional experiments 25 with shifted initial dates) do not have the same strong effect on the simulated variance ratio as the observational uncertainties, for some regions their values are still comparable to the differences between the simulations (not shown).
In conclusion, the fact that different uncertainties are in the same order of magnitude as the differences between the simulations does not allow for a classification of the different experiments with respect to their skill in representing the observed variability.

30
The main goal of this work is to evaluate a set of different configuration setups of the regional climate model COSMO-CLM over the CORDEX Central Asia domain, and to isolate different sources of uncertainties, in order to quantify general model performances and to provide a basis for possible improvements of the model simulations for this region. The results of this study are of fundamental importance in the light of the next phase of the CORDEX initiative, in particular considering the 5 vulnerability of the region to the possible effects of climate change.
Concerning the simulation of the mean climate, the model shows remarkable deficiencies in simulating the three considered variables (near surface temperature, precipitation and diurnal temperature range) over different areas of Central Asia and different seasons. Even though over specific areas of the domain these biases are hard to be quantitatively assessed, due to high uncertainties in the considered observational datasets, their spatial pattern is similar in all the cases.

10
For temperature, large positive model biases are present in winter over Siberia, with remarkable values exceeding +10°C in some cases. There are two likely reasons for these biases: an unsatisfactory representation of snow cover and soil permafrost.
In fact, both these factors have a significant impact on heat transport within the soil and heat flux between soil and atmosphere, with important effects on near surface temperatures (Frauenfeld et al., 2004;Lachenbruch and Marshall, 1986;Saito et al., 2007;Klehmet, 2014). Siberian permafrost often exceeds a depth of 100 meters, reaching values of up to 1km (Yershov, 2004).

25
Other regional climate models suffer from a similar bias (Guo et al., 2018;Meng et al., 2018). Acknowledging the fact that for this area the observational uncertainty is particularly high, the evinced biases could partly be related to a bad representation of the albedo for highly complex topographies. In fact, a study by Meng et al. (2018) showed that changes in the albedo over the region have led to an important improvement of the results of an RCM. Another possible explanation for this cold bias might be the parametrization of surface fluxes (Zhuo et al., 2016). Consequently, further analyses should focus on improving 30 the mode representation of these processes.
For precipitation, particularly wet conditions are simulated by the COSMO-CLM over the Tibetan Plateau. This bias seems to be common to several RCMs for areas characterized by complex topography (Guo et al., 2018;Gao et al., 2015;Feng and Fu, 2006) and is likely related to an overestimation of orographic precipitation enhancement in the models (Gerber et al., 2018) and/or to an incorrect simulation of the planetary boundary layer (Xu et al., 2016). Additionally, in the COSMO-CLM simulations a significant dry bias occurs over arid and desertic regions, especially in summer. A similar COSMO-CLM bias has already been seen for other semi-arid and dry regions of the World, such as the Mediterranean region. In this case, it was connected with an incorrect simulation of soil-atmosphere interactions by the model (Fischer et al., 2007;Seneviratne et al., 2010;Russo and Cubasch, 2016), which is likely the case also for Central Asia. For both the Tibetan Plateau and arid summer 5 areas, it is important to note that the biases are in the same order of the spread of the observations.
The model underestimates the climatological mean of the diurnal cycle of temperatures, for all the seasons and sub-regions of the domain. This bias is relatively homogeneous over the entire domain of study. Several studies have shown that RCMs typically underestimate diurnal temperature ranges over different parts of the World (Kyselỳ and Plavcová, 2012;Mearns et al., 1995;Laprise et al., 2003). The main factors responsible for these deficiencies seem to be errors in the simulation of the 10 atmospheric circulation, cloud cover and heat and moisture fluxes between surface and atmosphere.
The evinced model limitations for the mean climate do not seem to differ significantly when considering ERAInterim as driving data and a different timestep.
In order to test whether it is possible to reduce the determined model biases, and to which degree, sensitivity experiments have been performed to study the effect of different changes in the configuration of COSMO-CLM and their mutual interaction.

15
After considering different sources of uncertainties, the combined change of the albedo taking into consideration the ratio of forest fractions and the soil conductivity taking into account the ratio of liquid water and ice in the soil, leads to the best results in simulated climatological means of the three considered variables (experiment q). Importantly, the model seems to be particularly sensitive to those parameterizations that deal with soil and surface features, and that could positively affect the repartition of incoming radiation.  , 30, 113-132, 2008. Jacob, D., Bärring, L., Christensen, O., Christensen, J., De Castro, M., Deque, M., Giorgi, F., Hagemann, S., Hirschi, M., Jones, R., et al.: An inter-comparison of regional climate models for Europe: model performance in present-day climate, Climatic change, 81, 31-52, 2007. Jacob, D., Elizalde, A., Haensler, A., Hagemann, S., Kumar, P., Podzun, R., Rechid, D., Remedio, A., Saeed, F., Sieck, K., et al.: Assessing the transferability of the regional climate model REMO to different coordinated regional climate downscaling experiment (CORDEX) regions, Atmosphere, 3, 181-199, 2012.
Russo, E. and Cubasch, U.: Mid-to-late Holocene temperature evolution and atmospheric dynamics over Europe in regional model simula-