Improving permafrost physics in the coupled Canadian Land Surface Scheme (v. 3.6.2) and Canadian Terrestrial Ecosystem Model (v. 2.1) (CLASS-CTEM)

The Canadian Land Surface Scheme and Canadian Terrestrial Ecosystem Model (CLASS-CTEM) together form the land surface component of the Canadian Earth System model (CanESM). Here we investigate the impact of changes to CLASS-CTEM that are designed to improve the simulation of permafrost physics. Eighteen tests were performed including changing the model configuration (number and depth of ground layers, different soil permeable depth datasets, adding a surface moss layer), and investigating alternative parameterizations of soil hydrology, soil thermal conductivity and snow properties. 5 To evaluate these changes, CLASS-CTEM outputs were compared to 1570 active layer thickness (ALT) measurements from 97 observation sites that are part of the Global Terrestrial Network for Permafrost (GTN-P), 105 106 monthly ground temperature observations from 132 GTN-P borehole sites, a blend of 5 observation-based snow water equivalent (SWE) datasets (Blended5), remotely-sensed albedo, and seasonal discharge for major rivers draining permafrost regions. From the tests performed, the final revised model configuration has more ground layers (increased from 3 to 20) extending to greater depth (from 4.1m 10 to 61.4 m) and uses a new soil permeable depths dataset with a surface layer of moss added. The most beneficial change to the model parameterizations was incorporation of unfrozen water in frozen soils. These changes to CLASS-CTEM cause a small improvement in simulated SWE with little change in surface albedo but greatly improve the model performance at the GTN-P ALT and borehole sites. Compared to the GTN-P observations, the revised CLASS-CTEM ALTs have a weighted mean absolute error (wMAE) of 0.41 0.47 m (depending on configuration), improved from > 2.5 m for the original model, 15 while the borehole sites see a consistent improvement in wMAE for most seasons and depths considered, with seasonal wMAE values for the shallow surface layers of the revised model simulation at most 3.7 ◦C, which is 1.2 ◦C more than the wMAE of the screen-level air temperature used to drive the model as compared to site-level observations (2.5 ◦C). Sub-grid heterogeneity estimates were derived from the standard deviation of ALT on the 1 km measurement grids at the GTN-P ALT sites, the spread in wMAE in grid cells with multiple GTN-P ALT sites, as well as from 35 boreholes measured within a 1200 km region as 20 part of the Slave Province Surficial Materials and Permafrost Study. Given the size of the model grid cells (ca. 2.8◦), sub-grid heterogeneity makes it likely difficult to appreciably reduce the wMAE of ALT or borehole temperatures much further.

CRCM5 to investigate cold region hydrological performance. They reported improvements by incorporating supercooled soil water, fractional permeable area, and a changed hydraulic conductivity formulation for frozen soil. MacDonald (2015) coupled CLASS v. 3.6 to the Prairie Blowing Snow Model (PBSM) to simulate the influence of chinooks (Föhn winds) over the South Saskatchewan River Basin. He investigated 15 alternative parameterizations relating to the model physics and concluded by recommending that four of those be considered for adoption in CLASS to improve the simulated snow water equivalent (SWE) 5 and soil water. Three of the suggested parameterizations dealt with snow properties and the fourth related to soil thermal conductivity (MacDonald, 2015).
Our study evaluates the individual and combined effects of suggested enhancements to the Canadian Land Surface Scheme coupled to the Canadian Terrestrial Ecosystem Model (CLASS-CTEM) for simulating processes relevant to soils with permafrost or pronounced seasonal freezing. The model enhancements suggested above have previously been recommended in 10 research studies but not been previously implemented into the CLASS-CTEM framework (unless otherwise noted). Here we investigate the impact of these previously proposed model enhancements as well as several model configuration changes suggested in the literature. Based on this evaluation, a revised version of CLASS-CTEM, containing several enhancements is described and also evaluated. To evaluate model behaviour we draw upon measurements of the thickness of annual thaw in perennially frozen soils (active layer thickness) and borehole temperature sites from the Global Terrestrial Network for Per-15 mafrost (GTN-P, 2016) along with other observation-based datasets for snow, surface albedo and runoff.
Numerous studies have investigated the permafrost physics performance of models (e.g. see review in Riseborough et al., 2008) including other large scale models used in ESM applications, such as JULES (Chadburn et al., 2015a, b), JS-BACH (Ekici et al., 2014), and the Community Land Model (CLM, e.g., Alexeev et al., 2007;Lawrence et al., 2008;Lee et al., 2014) allowing us to design our proposed experiments based on their conclusions. The performance of CLASS-CTEM permafrost 20 physics will be evaluated through offline simulations where the model is forced with reanalysis meteorology to avoid biases found in the simulated climate of the coupled model as well as biases in the associated feedbacks. This study is focused on model performance at the large spatial scale of the CanESM as our principle aim is to improve the simulated permafrost physics so that the carbon cycle processes in these regions is well bounded. It is therefore not aimed at shedding light on physical processes in permafrost zones or investigating model performance at individual point locations as the model performance at a 25 single site does not directly translate to model performance over large regions.
In the remainder of the paper, Section 2 describes the CLASS-CTEM model, the study design as well as parameterizations tested, and the GTN-P sites used in model evaluation. Section 3 evaluates the model performance as well as discussing the influence of sub-grid heterogeneity while Section 4 gives overall conclusions and discusses limitations of our study and future directions for CLASS-CTEM development. The model uses leaf area index (LAI), rooting depth, canopy mass, and vegetation height to evaluate the energy and water 5 balance terms of the vegetation canopy and its interactions with the atmosphere. The number of soil layers can vary depending on the application but the standard model setup uses three soil layers of 0.1, 0.25, and 3.75 m thickness. The soil texture (sand, clay, organic matter) dataset used by CLASS-CTEM is the Global Soil Dataset for use in Earth System Models (GSDE; Shangguan et al., 2014). The soil permeable depth is from Zobler (1986) (hereafter Zobler86).CLASS v.3.6.2 adopts the soil albedo approach of Lawrence and Chase (2007) with the incorporation of a soil colour index geophysical field. 10 CLASS prognostically determines the water content (liquid and frozen) and temperature of all soil layers at each timestep.
Also calculated at each timestep, depending on ambient conditions, are the temperature, mass, albedo, and density of a single layer snow pack, interception of rain and snow on the vegetation canopy, and amount of ponded water on the soil surface.
Mineral soils are parameterized using the pedotransfer functions of Cosby et al. (1984) and Clapp and Hornberger (1978).
Organic soils (organic matter >30% by weight) are modelled as peat following Letts et al. (2000). In the standard CLASS-15 CTEM framework, lateral transfers of heat or moisture between grid cells are neglected; the treatment of processes such as streamflow and blowing snow require the inclusion of separate, specialized routines (e.g., Soulis et al., 2000;Arora et al., 2001;MacDonald, 2015). All simulations presented here have no geothermal heat flux at the bottom of the soil column.
CTEM calculates the carbon and vegetation dynamics on a daily timestep receiving from CLASS daily mean soil moisture, soil temperature, and net radiation. Photosynthesis and canopy conductance occur on the CLASS timestep. CTEM simulates 20 the respiratory costs and carbon uptake for nine plant functional types (PFTs) which are subsets of the four CLASS PFTs.
The CLASS PFTs (with corresponding CTEM PFTs in parentheses) are needleleaf tree (needleleaf deciduous and needleleaf evergreen), broadleaf tree (broadleaf cold deciduous, broadleaf drought/dry deciduous, and broadleaf evergreen), crop (photosynthetic pathway C 3 and C 4 ), and grass (C 3 and C 4 ). CTEM carries five carbon pools representing plant leaves, roots, and stems along with two detrital pools for litter and soil C. 25 For global simulations, CLASS-CTEM is typically run at the CanESM atmosphere resolution which is approximately 2.8 • by 2.8 • corresponding to a grid cell size of approximately 49 000 km 2 at 45 • latitude and about 33 500 km 2 at 70 • . Various studies have used observation-based datasets to evaluate CLASS-CTEM at scales from site-level to global (e.g., Peng et al., 2014;Melton andArora, 2014, 2016). While CLASS-CTEM is capable of running in a mosaic (multiple tiles per grid cell) configuration (e.g. Melton and Arora, 2014;Melton et al., 2017), the simulations presented here are run with a single tile per 30 grid cell.

Study design
Eighteen experiments were run to assess the impact of model geophysical fields (soil texture, soil permeable depth, and meteorological forcing), model setup (number of soil layers, addition of a moss layer), and model parameterizations (Table 1). The physical quantities used for model evaluation are presented in the next section. The initial model version (Exp. Base model) uses 3 ground layers of thicknesses 0.1, 0.25, and 3.75 m for a total depth of 4.1 m. The first seven experiments address model 5 configuration and input geophysical fields. To test the sensitivity of simulated permafrost to meteorological forcing, CLASS-CTEM was forced with two different meteorological datasets, the Climate Research Unit -National Centres for Environmental Prediction (CRUNCEP v. 8;Viovy, 2016) and the Climate Research Unit -Japanese 55 year Reanalysis (CRUJRA55 v. 1.0.5; Harris et al., 2014;Kobayashi et al., 2015). CRUNCEP was used as the base forcing dataset with additional runs performed for some experiments with CRUJRA55 (see Table 1). While both of these meteorological datasets use the CRU TS dataset 10 (Harris et al., 2014) as the underlying monthly climatology, they differ in their meterological models (NCEP or JRA55). Additionally the spatial resolution of JRA55 is 0.5 • while that of NCEP is 2.5 • . Thus, the two datasets differ in their spatial and high frequency (sub-monthly) temporal variability. However these differences will be somewhat lessened by their regridding to the CLASS-CTEM model resolution. The meteorological inputs (surface air temperature, surface pressure, specific humidity, wind speed, precipitation, and longwave and shortwave radiation) are disaggregated from 6 hourly to half-hourly time steps 15 while the simulation runs following the methodology in Melton and Arora (2016). Both datasets are available over the extended periods necessary for permafrost simulation (CRUNCEP v. 8: 1901, CRUJRA55 v. 1.0.5: 1901-2017. Exp. 20 ground layers changes the number of ground layers from 3 to 20. The 20 layers have higher resolution near the surface with thicker layers at depth (see Table A1). If the permeable soil depth is shallower than the modelled ground column, layers below the soil permeable depth are treated like hydrologically inactive bedrock and are assigned thermal conductivity 20 (2.5 W m −1 K −1 ) and heat capacity (2.13 × 10 6 J m −3 K −1 ) values characteristic of sand particles (Verseghy, 2017). If the transition from permeable soil to impermeable bedrock occurs within a soil layer, CLASS calculates the water fluxes only in the depth of permeable soil but simulates one soil temperature for the layer.
The influence of the soil permeable depth dataset is examined by replacing the soil permeable depths of Zobler86 with either the SoilGrids dataset (Exp. SoilGrids depth, Shangguan et al., 2017) or that of Pelletier et al. (2016) (hereafter referred to 25 as Pel16; Exp. Pel16 depth). The influence of a moss layer is examined in Exp. SoilGrids+Moss and Pel16+Moss. In these experiments the top soil layer is replaced with photosynthetically-inactive moss with a higher porosity, hydraulic conductivity and heat capacity than mineral soil following Wu et al. (2016) Whereas the first series of experiments just described investigated aspects of the model setup, the second series of experiments investigates alternative parameterizations and uses the SoilGrids+Moss experiment as a starting point (the same 30 geophysical fields and model configuration). The alternative parameterizations are described in detail in Appendix sections A2 to A7. Briefly, these experiments fall into three main areas related to: 1) heat transfer, 2) snow, and 3) hydrology. The heat transfer experiments replace CLASS-CTEM's default soil thermal conductivity parameterization (Côté and Konrad, 2005) with that of de Vries (1963) following the recommendations of MacDonald (2015)(Exp. deVries thermal cond. results are for the influence of frozen water as described in Ganji et al. (2015) SoilGrids+Moss discussed in the Supplement). As de Vries (1963) does not account for frozen water in soil, whereas Côté and Konrad (2005) does, a further experiment uses a recently published parameterization that simplifies and extends de Vries (1963) to include both frozen and unfrozen water (Exp. Tian16 thermal cond.; See Section A2; Tian et al., 2016). Four experiments were devoted to aspects of how snow is simulated in CLASS-CTEM. Experiments Snow cover:Yang97 and Snow cover: Brown03 replace CLASS-CTEM's default function to relate snow depth to grid cell fractional snow cover from a linear relationship 5 (Verseghy, 2017) to a hyperbolic tangent (following Yang et al., 1997) or an exponential function (following Brown et al., 2003), respectively (Supplement Figure 2). Another experiment (Fresh snow density) changed the calculation for the density of freshly-fallen snow from one based solely on air temperature (Verseghy, 2017) to also considering wind speed following the CROCUS model (Essery et al., 1999). The final experiment concerned with aspects of the snow parameterization is Snow albedo decay. CLASS-CTEM uses an empirical exponential decay function to simulate the decrease in snow albedo as snow ages. In Snow albedo decay, the default parameterization is replaced by an efficient spectral method (Wiscombe and Warren, 1980;Dickinson, 1983). The last series of experiments looked at hydrology. Water in soils can be, partially or completely, unfrozen at temperatures below 0 • C due to the effects of interfacial curvature, adsorption forces and solutes (Watanabe and Mizoguchi, 2002;Dall'Amico et al., 2011). Experiment Super-cooled water incorporated the unfrozen water in frozen soils 5 parameterizaton of Niu and Yang (2006) and Exp. Modif. hydrology modifies the soil matric potential and saturated hydraulic conductivity to account for the influence of frozen water following Ganji et al. (2015). Active layer thickness in CLASS-CTEM is determined by the temperature and water content of the ground layers. If a layer's temperature is 0 • C, the frozen water fraction is used to estimate the thickness of freezing within the layer, i.e., if half of the 15 water content in the layer is frozen, the ALT is assumed to be halfway through the layer. Permafrost area in the model domain was calculated by selecting grid cells with active layer thicknesses less than the model total ground column and multiplying by the grid cell area. analysis to regions northward of 45 • N. We compared our simulated seasonal runoff to measured discharge rates for seven major river basins that drain permafrost regions for the period from 1965 to 1984 (Ob, Volga, Lena, Yenisei, Yukon, Mackenzie, and Amur rivers; UNESCO Press, 1993). This comparison is limited to seasonal discharges since the CLASS-CTEM runoff is not routed, thus the timing of transport of the water from each grid cell to the river mouth is neglected. On a seasonal timescale this should not cause serious errors but the results must be interpreted with caution. 25

Permafrost distributions from the literature
Because permafrost cannot easily be observed spatially and reliable data are sparse, global or continental-scale simulation results are often compared to estimates of permafrost distributions. Most prominently, this is the "Circum-Arctic map of permafrost and ground-ice conditions" (Brown et al., 1997) that distinguishes zones of permafrost extent at a scale of 1:10 000 000.
These zones are based on expert assessment and manual delineation, often following isotherms of mean annual air temperature.
Here, we use permafrost extent to refer to the fraction (0 -1) of the surface that is underlain by permafrost within a pixel or a polygon, permafrost area to the actual area (km 2 ) underlain by permafrost and permafrost region is used to denote the area  Table 1), 132 GTN-P borehole observation sites (red; Supplement Table   2) and the Slave Province Surficial Materials and Permafrost Study (green; SPSMPS; Lac de Gras, Northwest Territories, Canada) used for model evaluation. Each site is classified according to its permafrost zone listed in the GTN-P. The site markers are semi-transparent hence regions with many closely located GTN-P sites will cause overlap, and darkening, of the markers.
(km 2 ) where some proportion of the ground can be expected to contain permafrost. The permafrost region is commonly taken to include areas with a permafrost extent exceeding some threshold (Zhang et al., 2000;Gruber, 2012). These definitions are relevant because CLASS-CTEM produces a binary result, i.e., permafrost is present or absent in a cell, and the classes (Zhang et al., 2000;Brown et al., 1997) and the continuous index (Gruber, 2012) of permafrost extent that are used for comparison need to be interpreted appropriately. Neglecting aggregation effects (Giorgi and Avissar, 1997), which arise when the average 5 fine-scale behaviour of a simulated environmental variable is not equal to the simulated coarse-scale behaviour, a threshold of permafrost extent at 50% provides a first estimate of the region that should be compared with a model producing a binary result. For example, environmental conditions that give rise to a permafrost extent of 60% would likely be considered to have permafrost in the binary model and their area would be counted as having permafrost entirely (rather than only 60% of it).
Similarly, conditions that produce a permafrost extent of 40% would likely result in not having permafrost in a binary model.

10
As a consequence, we use the total area of all polygons or pixels with an expected permafrost extent larger than 50% as the appropriate area to compare with the results from CLASS-CTEM, termed 'region_50'. This includes continuous and extensive discontinuous permafrost in the Brown et al. map totalling to 15 Mkm 2 (Zhang et al. 1999) and a similar number can be interpreted from a plot of permafrost zonation index and permafrost region (Gruber, 2012).

Comparison against GTN-P ALT sites: sites with no simulated permafrost
A first simple test of permafrost performance for CLASS-CTEM is to check whether the GTN-P ALT sites are in fact simulated as containing permafrost. Given that CLASS-CTEM is being run on the CanESM grid (ca. 2.8 • ), it is possible that site conditions such as meteorology, orography, or vegetation at the GTN-P ALT measurement sites could be quite dissimilar to those of 5 the nearest grid cell, which covers many thousands of km 2 . In such cases, CLASS-CTEM could simulate no permafrost where some permafrost indeed exists. Per experiment, the number of sites with no permafrost simulated are listed in Table 2. These ALT sites were removed from further analysis as the ALT in sites without permafrost is not defined. Most experiments had between six and eight observation sites (corresponding to 4 to 6 grid cells) incorrectly simulated as permafrost-free (ISPF). The Base model experiment has significantly more sites ISPF at 15, corresponding to 2 or 3 additional grid cells. In general, for the 10 same experiment, the CRUJRA55 meteorological forcing results in fewer grid cells ISPF than CRUNCEP. Small differences in the simulated presence of permafrost (or the number of sites ISPF) are to be expected given the possibility of errors in the meteorological forcing and local variations in site-level characteristics, but large differences can indicate problems with the model setup and parameterizations.

Initial model performance 15
The Base model experiment simulates a permafrost area (PA) of 8.6 Mkm 2 (north of 60 • S; Table 2) with permafrost confined to northern Siberia, Alaska and the northern edge of Canada ( Figure 2). This low PA is in line with that simulated by CLASS-CTEM when coupled within the CanESM, although the spatial distribution is different due to the different atmospheric forcing (Koven et al., 2013). Also plotted in Figure 2 is the PE estimate of Brown et al. (1997). The Brown et al. (1997) dataset gives permafrost spatial distribution in four classifications which are not directly comparable to ALTs but may be used to give a 20 general indication of PA from an independent estimate. Owing to the coarseness of the model grid CLASS-CTEM is not able to simulate isolated or sporadic permafrost. For regions of discontinuous and continuous permafrost, comparing the estimated distribution of Brown et al. (1997) to the modelled ALT indicates poor agreement.
With such a small permafrost area many of the GTN-P ALT sites were ISPF as mentioned above.

Increasing the number of ground layers
Increasing the number of ground layers from 3 to 20 decreases the number of GTN-P ALT sites ISPF from 15 to 7 (Table 2). Figure 3 shows the difference between the simulated and observed ALT at each grid cell with GTN-P ALT sites for selected experiments. The average MAE computed against the GTN-P ALT observations for Exp 20 ground layers is over 2.5 m with simulated ALTs strongly overestimated (Figure 3). When the number and depth of ground layers is increased, but the soil bedrock. The absence of water and therefore of heat consumption by melting ice in these lower ground layers causes the model soil column to be generally too warm. However, the total global PA increases from 8.6 Mkm 2 simulated by the Base model to 16.8 Mkm 2 (Table 2)  To look in closer detail at the model performance for the GTN-P borehole sites, Figure 5 shows the Gaussian kernel density estimate (KDE) derived from differences between the simulated and observed borehole temperatures. For shallow soils, as 15 the seasons progress from winter to fall, the proportion of instances with a strong cold bias decreases with a warm soil bias taking over in summer, especially in the shallowest depth band. This would indicate the modelled soil heat fluxes are somewhat exaggerated. The fall period generally has the least bias, potentially due to the loss of the warm summer bias but prior to the establishment of the cold winter bias.
3.4 Increasing the soil permeable depths 20 Changing the soil permeable depth dataset to SoilGrids (Exp. SoilGrids depth) from Zobler86 gives a general improvement over the 20 ground layers simulations with a drop in average MAE to 1.162 m at the GTN-P ALT sites ( Figure 3). There is also a shift to shallower ALTs ( Figure 2) with a slight decrease in PA to 15.7 Mkm 2 , which is within the range literature estimates (Table 2 and discussed further in Section 2.3.4). The greater permeable depths associated with SoilGrids lead to deeper penetration of water into the soil, resulting in more water being allocated to runoff than made available for plant  Numerous studies have pointed to the importance of increasing the simulated ground column depth and number of ground layers to better capture the decay with depth of the influence of multi-decadal variability (e.g. Smerdon and Stieglitz, 2006;Alexeev et al., 2007;Nicolsky et al., 2007;Paquin and Sushama, 2014). Of particular relevance to our study, Paquin and Sushama (2014) used CLASS in CRCM5 and found shallow soil configurations (permeable depth < 1 m throughout much of the model domain) to lead to overly strong seasonal cycles with resulting overly deep ALTs, similar to the work of Smerdon and Stieglitz (2006), and in line with our Base model simulation with its small estimated PA.
The availability of comprehensive global soil permeable depth datasets is relatively recent. Previous studies would often assume a constant permeable soil depth, either shallow (Dankers et al., 2011) or deep (Lawrence et al., 2008) with the deeper layers hydrologically inactive. Comparing the three permeable depth datasets (Zobler86, SoilGrids, and Pel16; Supplement  (Gornall et al., 2007;Van Der Wal and Brooker, 2004) and modelling studies have incorporated organic layers (e.g. Lawrence et al., 2008;Paquin and Sushama, 2014) or bryophytes (Porada et al., 2016) to improve permafrost dynamics. Exps. SoilGrids+Moss and Pel16+Moss both incorporate a non-photosynthetic moss layer in place of the first layer of soil (see Section 2.2) and both simulate generally shallower ALTs than their parent simulations (SoilGrids depth and Pel16 depth, respectively; Figure 2).

25
The effect of moss introduction for the SoilGrids+Moss experiment is to reduce average MAE from 1.162 to 0.472 m for the GTN-P ALT sites (Figure 3) The general cooling influence is evident by comparing to the GTN-P ALT sites ( Figure 3) and also through the increase in simulated PA from 15.7 to 17.9 Mkm 2 . A similar improvement is seen for Exp Pel16+Moss. The high porosity of the moss layer causes less water to be available at the surface for evaporation, reducing the latent heat flux and making more water available for runoff, and its insulating effect keeps the soil surface cooler, which reduces plant growth and 30 also the sensible heat flux (Supplement Figure 4). The reduction in plant growth due to cooler soils also reduces water uptake for transpiration further increasing runoff.
Comparing simulated ground temperatures to observations at the GTN-P borehole sites shows a slight increase in wMAE at all depth ranges and seasons compared to the SoilGrids simulation (Figure 4). Comparing the KDE plots of the bias distri-  bution between modelled and observed borehole temperatures for the SoilGrids+Moss and the SoilGrids simulations shows an increased cool bias in the shallow soil which is especially evident in summer ( Figure 5). This bias extends deeper into the soil column, albeit weakening with depth. The cooling of soils due to the incorporation of a moss layer was also found by Porada et al. (2016), however their simulations included a dynamic extent for moss cover. The creation of a cold bias due to the introduction of a moss layer is reasonable considering that the moss layer was applied to all areas uniformly. While this 5 experiment was intended to understand the impact of moss on simulated ground temperatures, future work should attempt to place moss with a more realistic distribution, similar to Porada et al. (2016).
Comparing the model experiment outputs to the GTN-P sites in Figure 3 it is evident that increasing the number of ground layers and the soil permeable depth and incorporating a top layer of moss/organic matter improves the simulated ALTs. These Siberian permafrost (Supplement Figure 5). The average MAE at the GTN-P ALT sites is reduced to 0.314 m (Figure 3) however, at the GTN-P borehole sites, the simulated ground temperatures are biased cold, primarily in summer and fall, and worsening with depth ( Figure 4 and 5).

Changing the relationship between snow depth and snow cover
Two experiments investigated different relationships between snow depth and the grid cell snow cover in CLASS-CTEM (Snow 25 cover: Yang97 and Snow cover: Brown03). These modifications increased global PA (~1.2 Mkm 2 ) with a slightly higher PA estimated for Snow cover: Yang97 ( year, which tends to be more pronounced during fall and winter (Supplement Figure 6) although there is little difference between the two snow cover experiments.   Table 2) for three depths: 0.05 -0.5 m, 0.5 m -1.5 m, and 1.5 -3.0 m, for each season and for selected experiments. The bandwidth was chosen using Scott's rule of thumb (Scott, 1992).
Changes in snow cover can lead to large changes in albedo due to the significant brightness difference between snow and vegetation/bare ground. To investigate the impact of these experiments on albedo we evaluated seasonal averages of simulated albedo against MODIS observations over latitudes northward of 45 • N for the period 2000 to 2013. We find the spring (AMJ) albedo from the various simulations is about the same (Supplement Fig. 7).

Considering wind speed in the calculation of fresh snow density 5
In CLASS-CTEM, the density of freshly fallen snow depends on the ambient air temperature (Eqn. A19). Exp. Fresh snow density tested a parameterization from the CROCUS model that also includes wind speed in this calculation (Eqn. A20) which yielded an increase in PA to 18.9 Mkm 2 . Compared to the GTN-P ALT sites, the Fresh snow density results are similar to those of the snow cover experiments with no improvement in average MAE (0.581 m; Figure 3) and no discernible impact upon modelled DJF SWE compared to Blended-5 or upon spring (AMJ) albedo compared to MODIS (Supplement Figure 6 and 7).
The typical wind speed in the CRUNCEP meteorological forcing dataset when snow is falling is in the range of 1 -5 m s −1 ( Figure 6). With Eqn A20, the density of freshly fallen snow tends to be lower at very low wind speeds then higher as wind speed increases for the same air temperature. The generally higher density of fresh snow with the CROCUS parameterization  Table 2). The efficient spectral method for albedo decay generally 15 produces lower albedos than CLASS-CTEM's original exponential parameterization. The impact upon spring albedo and SWE leads to a notable decline in model performance compared to observation-based datasets (Supplement Figures 6 and 7).The CRUJRA55-forced experiments, on the other hand, give slightly better spring albedo for all experiments forced with that meteorological dataset. This could be due to the sub-monthly variability difference of CRUJRA55 compared to CRUNCEP as Beer et al. (2018) found one of the largest impacts of changing climate variability in model forcing to be snow depth. The lower albedo in the Snow albedo decay experiment leads to a smaller snowpack which melts earlier resulting in reduced spring runoff, a longer growing season, and a higher LAI. The warmer land surface results in larger ALTs. At the GTN-P borehole sites, the Snow albedo decay experiment's warmer ground layers gives a noticeable increase in wMAE values across all seasons and most depth bands. 5

Allowing unfrozen water in frozen soils
The inclusion of unfrozen water in frozen soils (Exp. Super cooled water) increased PA to 20.1 Mkm 2 with a minor improvement at the GTN-P ALT sites (Figure 3). The GTN-P borehole sites showed little change in the wMAE values (Figure 4).
The larger PA for this experiment could be reflecting the thermal conductivity differences between completely frozen soil and frozen soil with some residual liquid water. The differences in bulk thermal conductivity would slow heat transfer into the 10 deeper ground layers for the Super cooled water simulation during periods where the soil layer temperature is below 0 • C. As a result spring warming would be slower to reach deeper layers. Ganji et al. (2015) investigated streamflow for 21 watersheds in eastern Canada using CLASS and the WATROUTE routing scheme. They report their modifications (super-cooled soil water, fractional permeable area and modified hydrology due to ice; discussed in Section A7) improved streamflows particularly during the spring melt. The changes were attributed to reduced 15 hydraulic conductivity of frozen soils causing more snow melt runoff and less infiltration. We did a rudimentary comparison of our simulated seasonal runoff for seven major river basins that drain permafrost regions (Figure 7; Section 2.3). As the CLASS-CTEM simulations did not include excess ground ice (e.g. slab ice such as ice wedges or lenses commonly found in regions affected by thermokarst processes), groundwater or interflow, all of which could increase runoff (baseflow) in the summer and fall seasons, we limit our discussion to the spring and winter seasons. The Super cooled water experiment has 20 lower spring runoff than both Base model and SoilGrids+Moss but higher winter runoff, making it more in line with observed river discharges (Figure 7).
Given the Super cooled water and Tian16 thermal cond. simulations had the lowest average MAE at the GTN-P ALT sites ( Figure 3) a simulation was run with both of these parameterizations included (Exp. Super-cooled+Tian16). This experiment further reduced the average ALT MAE but considerably worsened simulated ground temperatures at the GTN-P borehole sites 25 ( Figure 4). This incongruity between model performance at the ALT and borehole sites could be reflecting biases due to the spatial distribution of the sites (see Figure 1), the differing number of observations of ALT vs. borehole temperatures, or to biases in the observations themselves, which is discussed in Section 3.13.

Modifying hydrology due to ice
The Modif. hydrology experiment modified soil matric potential and saturated hydraulic conductivity to account for the impact 30 of frozen water following the work of Farouki (1981) and Koren et al. (1999). These changes yielded a simulated PA of 19.5 Mkm 2 (Table 2) with generally slightly deeper ALTs in much of the high latitude PD compared to SoilGrids+Moss  sites is similar to Exp. SoilGrids+Moss (Figure 4). Since the modifications to soil matric potential and saturated hydraulic conductivity (Equations A35 and A36) generally decrease water mobility in soils with ice present, the Modif. hydrology soils are generally wetter, allowing higher annual latent heat flux and supporting higher LAI. The Modif. hydrology experiment has similar runoff to the Base model experiment with higher spring runoff than observed river discharges while the winter runoff is reduced compared to SoilGrids+Moss and is also smaller than the observed river discharges (Figure 7). To investigate 5 synergistic effects between the two modifications (Modif. hydrology and Super cooled water), a simulation was run with both modifications applied (similar to Ganji et al.'s Exp. 3). This simulation gave slightly higher spring runoff but similar winter runoff compared to Modif. hydrology (not shown). Thus it appears, with respect to runoff, the modifications to hydrology have a stronger influence than super cooled soil water, in line with the conclusion of Ganji et al. (2015) that the primary effect is to reduce hydraulic conductivity which decreases infiltration and increases snow melt runoff.

Influence of sub-grid heterogeneity
The CLASS-CTEM model grid used in our study is the same as that used in the CanESM.  Table 1). However, one square kilometre is still small compared to model grids ranging in size from hundreds to thousands of km 2 . One measure of the influence of sub-grid heterogeneity can be obtained by considering the MAE per site in the grid cells where we have more than one GTN-P ALT site (Figure 8). For these grid cells, the spread in MAE at each site ranges from 0.01 m (grid cell with 2 sites) to 0.59 m (12 sites). While it is not reasonable to directly compare the sub-grid 10 range of MAE to the model average MAE shown in Figure 3, Figure 8 demonstrates that sub-grid heterogeneity is a significant source of variability in ALT within model grid cells and that variability will impose constraints on the lower limit of MAE that is attainable by the model. The SPSMPS collected air and ground temperature measurements for 15 m x 15 m plots with hourly borehole temperatures at thirty-five boreholes all located within a ca. 1200 km 2 area. The observed screen-level temperatures are generally reasonably close to those of CRUNCEP but CRUNCEP has slightly cooler summer temperatures (Supplement Figure 8). What is most striking about the borehole temperatures at Lac de Gras is the large spread in ground temperatures at all depths and in most seasons ( Figure 9). The temperature range is smallest in fall and spring when the soils are thawing or freezing and largest in 5 winter with differences varying from 12 to over 20 • C depending on the soil depth. This remarkable spread in temperature is due to variations in slope, aspect, soil moisture, soil texture, soil organic matter content, and vegetation type and distribution. The simulated ground temperatures from two experiments are plotted alongside the boreholes (SoilGrids and SoilGrids+Moss). As the model is driven by CRUNCEP and we have no precipitation information for the SPSMPS sites, it is difficult to determine the cause of any biases. Also, although the SPSMPS sampling area is considerably larger than the GTN-P sites, the same 10 arguments apply concerning the mismatch of scales between the observational area and the model grid, and the variability introduced by sub-grid heterogeneity.
An additional measure of how reasonable the model wMAE is at the borehole sites can be obtained by comparing the CRUNCEP screen-level temperature, which is used to force the model, and the observed screen-level temperature at each GTN-P site. The MAE for screen-level temperature is between 2.17 and 2.53 • C across all seasons. Therefore the model's 15 wMAE range for shallow soil of ca. 3 to 3.7 • C varies from ca. 0.8 to 1.2 • C above that of the MAE for CRUNCEP's screenlevel temperature (for the SoilGrids+Moss simulation). Given the large spread in borehole temperatures in a relatively small area at the SPSMPS sites, and the MAE of the model's forcing air temperature, it appears the model's wMAE can be considered reasonable.
3.13 Influence of bias due to ALT or borehole sampling locations 20 Temperature in individual boreholes and ALT at individual sites often differ from the grid cell they are compared with because of sub-grid variability as discussed above. The underlying spatial variation of ground temperature, even at distances smaller than 1km is well documented (Smith, 1975;Morse et al., 2012;Gubler et al., 2011). If the locations of GTN-P sites were randomly sampled, sub-grid effects would be expected to cancel out and, consequentially, a mean bias (cf. Figure 3)  showing values of seasonal thaw depth that underestimate the amount of ground ice that was melted due to frost-table probing without recording surface subsidence (Shiklomanov et al., 2013). In summary, it is likely that a slightly positive model bias, i.e. higher temperatures and greater ALT simulated than observed, would correspond to a model that best represents reality.
Quantifying that effect however is beyond the present study.

Conclusions
The performance of CLASS-CTEM in cold regions has been investigated in the past by numerous researchers who have suggested several modifications to improve the model's performance in these regions. Drawing from these recommendations 5 and other studies, 18 experiments were carried out to investigate the influence of: 1) the number of ground layers, 2) soil permeable depth datasets, 3) the addition of a moss layer, 4) changing the soil thermal conductivity formulation, 5) altering the derivation of snow cover based on snow depth, 6) adding the effect of wind speed to the calculation of fresh snow density, 7) changing the model's snow albedo decay calculation to an efficient spectral parameterization, and 8) modifications to frozen soil hydrology including allowing unfrozen water in frozen soils and an alteration to hydraulic conductivity and soil matric 10 potential for the presence of ice. Two soil permeable depth datasets were tested (Pelletier et al. (2016) et al., 2015), and seasonal runoff to river discharges for major rivers draining the Arctic (UNESCO Press, 1993) as well as literature estimates of permafrost area ( Table 2).
The original model version had an overly small simulated permafrost area of 8.6 Mkm 2 which was almost doubled to 16.7 Mkm 2 by increasing the number and depth of ground layers. Of the two soil permeable depth datasets, Pelletier et al.

5
(2016) gave consistently lower average mean absolute errors (MAE) at the GTN-P ALT sites compared to SoilGrids. However, SoilsGrids was chosen for further simulations as this dataset appears to be better validated (Shangguan et al., 2017). For the two meteorological datasets used, the permafrost specific results depended on the model configuration and parameterizations tested.
More consistently, spring albedo appeared to be better simulated using CRUJRA55 while winter SWE was slightly better with CRUNCEP. Changes to the model configuration by increasing soil permeable depths using the SoilGrids dataset, and adding a impacts on wMAE at the GTN-P borehole sites, and a possible improvement in seasonal runoff. Further assessment of the improvements in runoff using a river routing scheme are needed before this parameterization will be fully adopted. Based on the tests performed here, the optimal model configuration will include more ground layers to a greater depth, soil permeable depths from the SoilGrids dataset, and moss in locations where it is appropriate. These changes give a simulated permafrost area of between 15.7 to 17.9 Mkm 2 (Table 2)  There are six main limitations of our study. First, thermokarst processes due to melt of excess ground ice (ice wedges or lenses) are not simulated. As maps of ground ice extent improve (e.g. O'Neill et al., 2018) and become more suitable for use as a model geophysical field, parameterizations such as Lee et al. (2014) could be incorporated. Second, our treatment of mosses and their impact is simplistic. A more comprehensive approach such as the LiBry model (Porada et al., 2016) would allow 25 for dynamic moss extents and more bryophyte subtypes including lichens. Third, the plant functional types used here are not specific to the Arctic and do not include shrubs. Shrubs, in particular, are presently expanding and have complex impacts upon Arctic regions (e.g. Figure 3 in Myers-Smith et al., 2011). Fourth, orographic influences on permafrost such as slope and aspect were not resolved. Fifth, inland water bodies and their impact upon ground thermal regimes were not considered. Finally, the influence of sub-grid heterogeneity was ignored as permafrost in the model grids is binary thus excluding the simulation of 30 discontinuous permafrost. With regard to the influence of sub-grid heterogeneity, the standard deviation of ALT on the 1 km 2 -1 ha measurement grids at the GTN-P ALT sites, the spread in MAE in grid cells with multiple GTN-P ALT sites, and the SPSMPS collection of 35 boreholes over a 1200 km 2 study area indicate that it is likely difficult to reduce the wMAE of ALT or borehole temperature much further, given the size of the model grid cells (ca. 2.8 • ). Based on the model physics performance presented here, it appears that with the modifications described above, the land surface scheme in CLASS-CTEM is well-suited to provide the physical conditions for simulating carbon fluxes in the permafrost domain.
Code availability. CLASS-CTEM is available as a tarball from Melton (2019). The following code tags correspond to experiments in this manuscript (see Table 1 The simple moss parameterization used here follows Wu et al. (2016) with the exception that our moss layer is non-photosynthesizing. The

A2 Soil thermal conductivity
CLASS-CTEM calculates the thermal conductivities of organic and mineral soils following Côté and Konrad (2005). The soil thermal conductivity, λ (W m −1 K −1 ), is modelled via a relative thermal conductivity, λ r , which varies between a value of 1 20 at saturation and 0 for dry soils: Using the following generalized relationship, the relative thermal conductivity is obtained from the degree of saturation (the water content divided by the pore volume), S r (unitless): The dry thermal conductivity, λ dry , is calculated via an empirical relationship using the pore volume, θ p (m 3 m −3 ), with different coefficients for organic and mineral soils: While the saturated thermal conductivity, λ sat , is calculated by Côté and Konrad (2005) as a geometric mean of the conductivities of the soil components, other studies (e.g. Zhang et al., 2008) have found the linear averaging used by de Vries (1963) to be generally more accurate and this approach has been adopted by CLASS-CTEM, where λ ice is the thermal conductivity of ice, λ liq is that of liquid water and λ s is that of the soil solid particles.
Exp. deVries thermal cond. replaces the CLASS-CTEM default soil thermal conductivity parameterization with that of de Vries (1963): where the a subscript denotes the air component, θ is the volumetric fraction, and f is the 'weighting' factor (unitless) which 20 is given by: (A8) f a = 1 3 2 1 + g a ( λa λ liq − 1) where g a represents a unit-less empirical air pore-shape factor, for wet soil whereas the thermal conductivity of completely dry soils is calculated by, λ = 1.25 f a λ a θ a + f s λ s θ s + f organic λ organic θ organic f a θ a + f s θ s + f organic θ organic (A12)

10
The Tian et al. formulation also modifies the pore-shape factor (equation A10) to be, for air and for ice. Tian et al. (2016) introduce a shape factor for ellipsoidal soil particles, g m as, 15 g m = g sand θ sand + g silt θ silt + g clay θ clay (A15) where g sand is 0.182, g silt is 0.00775, and g clay is 0.0534. The shape factor for organic soils, g organic , is set to 0.5. The same 'weighting' factor is used for ice, air, organic and mineral soil components and left unchanged from equation A9.

A3 Snow cover fraction
CLASS-CTEM relates snow depth, (d snow ; m), to snow cover, (f snow ; fraction), via a linear function (Supplement Figure 2) (Verseghy, 2017), where d 0 is a limiting snow depth assigned a value of 0.1 m. Exp. 'Snow cover:Yang97' changes the CLASS-CTEM linear 5 function to a hyperbolic tangent function (Yang et al., 1997), Another alternative parameterization for snow cover from snow depth was proposed by Brown et al. (2003), which was not evaluated in MacDonald (2015). This relation was developed based on analysis of a global gridded snow water equivalent product designed to evaluate GCMs. Exp 'Snow cover:Brown03' tests the impact of that parameterization by changing the 10 snow cover function to the proposed exponential form (Brown et al., 2003),

A4 Fresh snow density
The density of freshly fallen snow is related to its ice-crystal structure and the volume of the ice crystal that is occupied by air.
Generally, snow density is the result of 1) processes occurring in the cloud that affect the size and shape of the growing ice 15 crystals, 2) processes that modify the crystal as it falls, and 3) compaction on the ground due to prevailing weather conditions and metamorphism in the snowpack (Roebber et al., 2003).
Fresh snow density ( ; kg m −3 ) in CLASS-CTEM is calculated based on air temperature (T a ; K). For air temperatures below freezing, (T f ), a relation from Hedstrom and Pomeroy (1998) is used, while for temperatures at or above freezing CLASS-CTEM uses an equation from Pomeroy and Gray (1995), In Exp. Fresh snow density, the effect of wind speed (u, m s −1 ) is included following the approach used in the CROCUS model as detailed in Essery et al. (1999) with a minimum density of 50 kg m −3 following MacDonald (2015): = max[50, 109 + 6(T a − T f ) + 26u 1/2 ] Wind speed may be considered important in determining fresh snow density as wind speeds greater than approximately 9 m s −1 can move ice crystals on the surface leading to crystal fractionation during saltation and surface compaction increasing the snow density (e.g., Gray and Male, 1981, p. 345-350).

A5 Snow albedo decay
Snow albedo (α s ; unitless) decreases as snow ages due to snow grain growth and deposition of soot and dirt (Wiscombe and 5 Warren, 1980). In CLASS-CTEM this process is treated via empirical exponential decay functions (Verseghy, 2017 Exp. Snow albedo decay replaces the CLASS-CTEM exponential decay function with a spectral method based on Wiscombe and Warren (1980) and adapted for efficiency by Dickinson (1983). This efficient spectral method first calculates the diffuse 20 radiation albedo based on the albedo of fresh snow and the transformed snow age factor (F age ) α dif,visible = (1 − 0.2F age )α f s,visible (A26) α dif,nir = (1 − 0.5F age )α f s,nir (A27) where τ s is a non-dimensional snow age at each timestep found via τ s (t + ∆t) = τ s (t) + (r 1 + r 2 + r 3 )∆t where r 1 represents the effects of grain growth due to vapor diffusion as and r 2 = r 10 1 , representing the additional effects at or near the freezing of meltwater on grain growth. r 3 represents the effects of soot and dirt and is set to 0.3. T g,1 is the temperature of the top soil layer (K), τ 0 is 10 6 s, S f is the snowfall rate for that timestep (kg m −2 s −1 ), and ∆P is the snow fall amount threshold (10 kg m −2 ). If, within a timestep, the fresh snowfall amount exceeds ∆P , the snow age is set to that of new snow (τ s = F age = 0).

A6 Super-cooled soil water
In experiment Super-cooled water, unfrozen soil water in frozen soils is introduced into CLASS-CTEM following Niu and Yang (2006). Unfrozen water can exist in frozen soils through the capillary and absorptive forces exerted by soil particles on 20 water in close proximity. The upper limit on the residual amount of water that can remain liquid under given soil temperature and texture conditions is parameterized by Niu and Yang (2006) as, where g is gravitational acceleration (m s −2 ), L f is the latent heat of fusion (J kg −1 ), and T soil,i is the soil layer temperature (K). According to Romanovsky and Osterkamp (2000) unfrozen water content in moss is negligible so θ liq,max is set to zero 5 for moss layers.

A7 Modified hydrology
In Ganji et al. (2015) several changes were implemented in CLASS to address how the model deals with frozen soil water.
First, super-cooled soil water was added following Niu and Yang (2006) as described above. Secondly, fractional impermeable area was introduced, also following Niu and Yang (2006), but this has little impact upon our model simulations (discussed in 10 Appendix B). Their final modification was to account for the impact of frozen water on the soil matric potential (ψ; m) after Farouki (1981) and Koren et al. (1999) by adding a new term [(1 + C k θ ice ) 2 ] to the existing CLASS functional relationship, where C k is a constant, set to 8, that accounts for the effect of an increase in specific surface area of soil minerals and liquid water as water freezes and ice forms (Kulik, 1978). ψ sat is the soil matric potential at saturation (m) and b is the Clapp 15 and Hornberger empirical 'b' parameter (unitless) (Clapp and Hornberger, 1978). The calculation of hydraulic conductivity k; ms −1 is also modified by multiplication with a similar term [(1 + C k θ ice ) −4 ], k = k sat θ liq θ p

2b+3
(1 + C k θ ice ) −4 (A36) where k sat is saturated hydraulic conductivity. The effect of these changes is to generally increase soil matric potential and decrease hydraulic conductivity when ice is present in the soil. These modifications are tested in Exp. Modif. hydrology.

20
Appendix B: Fractional permeable areas in frozen soils CLASS-CTEM accounts for the impact of frozen soil water through an empirical correction factor (f ice ; unitless), according to Zhao and Gray (1997).
This factor is used to correct the calculated soil hydraulic conductivity, k (m s −1 ) which is found via the Clapp and Hornberger (1978) equation: where k sat is the hydraulic conductivity at saturation and b is an empirical parameter. Soil moisture is related to soil matric potential (ψ; m) in CLASS-CTEM following Clapp and Hornberger (1978), where ψ sat is the saturated soil matric potential (m). Niu and Yang (2006) parameterize fractional permeable areas in frozen soils. Following their formulation, within a grid cell the permeable (perm) and impermeable (imp) patches affect the flux of water (q; m s −1 ) as where the impermeable grid cell fraction, F imp can be estimated as and α is set to 3 following Niu and Yang (2006). Assuming q imp is set to zero, Niu and Yang parameterize the influence of the permeable areas on hydraulic conductivity can be parameterized as while the soil matric potential is calculated as, This formulation results in a soil matric potential that is insensitive to ice content within the soil (Supplement Figure 9) which seems unreasonable (see for example Wen et al., 2012). This fact is indeed noted by Ganji et al. (2015) who state that the soil matric potential as defined by Niu and Yang (2006) is not appropriate for the case of frozen soil. The inclusion of θ ice 20 in the numerator could be a typographical error. If it is removed the hydraulic conductivity and soil matric potential behave quite similarly to the original CLASS relations which make use of the factor f ice in place of 1 − F imp (Supplement Figure 10).
Testing shows the model is relatively insensitive to the small changes visible in the plots (not shown).