Novel coupled permafrost-forest model revealing the interplay between permafrost, vegetation, and climate across eastern Siberia

Boreal forests of Siberia play a relevant role in the global carbon cycle. However, global warming threatens the existence of summergreen larch-dominated ecosystems likely enabling a transition to evergreen tree taxa with deeper active layers. Complex permafrost-vegetation interactions make it uncertain whether these ecosystems could develop into a carbon source rather than continuing atmospheric carbon sequestration under global warming. Consequently, shedding light on the 15 role of current and future active-layer dynamics and the feedbacks with the apparent tree species is crucial to predict boreal forest transition dynamics, and thus for aboveground forest biomass and carbon stock developments. Hence, we established a coupled model version amalgamating a one-dimensional permafrost-multilayer forest land-surface model (CryoGrid), with LAVESI, an individual-based and spatially explicit forest model for larch species (Larix Mill.), extended for this study by including other relevant Siberian forest species and explicit terrain. 20 Following parametrization, we ran simulations with the coupled version to the near future to 2030 with a mild climate-warming scenario. We focus on three regions, covering a gradient of summergreen forests in the east at Spasskaya Pad to mixed summergreen-evergreen forests close to Nyurba, and the warmest area at Lake Khamra in the south-east of Yakutia, Russia. Coupled simulations were run with the newly implemented boreal forest species and compared to runs allowing only one species at a time, as well as to simulations using just LAVESI. Results reveal that the coupled version corrects for 25 overestimation of active-layer thickness (ALT) and soil moisture and large differences in established forests are simulated. We conclude that the coupled version can simulate the complex environment of central Siberia reproducing vegetation patterns making it an excellent tool to disentangle processes driving boreal forest dynamics.

warming is leading to the relaxation of warmth-deficit limits at the northern margins and hence invasion of the tundra at an as yet unclear rate (Berner, 2013, Reese et al., 2020. At the same time, large parts of boreal forests, especially in central Siberia, are prone to droughts and increasing disturbances (such as fires) potentially driving a forest transition from deciduous species to evergreen taxa (Bonan, 2008, Herzschuh 2020. Accordingly, the albedo and further ecosystem components/feedbacks will change and due to the large size of such boreal forests, this is likely to have a positive feedback effect on climate warming 5 (Bonan, 2008). However, the involved forest dynamics and interactions with the atmosphere and soil need to be considered in sufficient detail to forecast projections that are more realistic and to understand better the consequences for the ecosystems of Siberia (Kirpotin et al., 2021).
Forest modelling is typically done globally including the carbon cycle/permafrost etc. but individuals and all life-history stages need to be considered for a precise simulation. Modern global models such as LPJ-GUESS (Zhang et al., 2013) include 10 individual models. They are used to show that forests will change and advance north. However, migration lags are typically not represented and only climate envelopes serve for the distribution of plant functional types (PFTs). Dispersal processes and complexities have recently been recognized (Snell 2014, Snell & Cowling 2015, Lehsten et al., 2019 but are not yet used as standard for simulations. Further, most modelling schemes still start with established trees, which makes them more general and computationally effective for a global application but at the cost of losing important detail for ecosystem responses (see 15 discussion in Kruse et al., 2016). Also, the use of representative grid cells on a large grid without considering landscape will cause deviations to an extent that is unclear as to whether the impact on results is large and significant or not. Individual based models (IBMs) could help here as they have sufficient detail of represented species/ecosystems but applications are therefore only possible on landscapes not continents (Grimm & Railsback, 2005, DeAngelis & Mooij, 2005. Nevertheless, they are the best tools to understand a system and develop general responses that can then inform or guide global model development . 20 Further, neither a radiative transfer scheme through a multilayer canopy nor detailed representation of permafrost are included in typical simulation approaches.
Here, we aim at creating a model system that can accurately assess detailed thermal and hydrological fluxes between permafrost and forest cover as recently developed by Stuenzi et al. (2021aStuenzi et al. ( , 2021b. It will include a dynamic vegetation model, which has 25 a full life-cycle to allow intraspecies and interspecies interactions at all stages (seed-seedling-mature tree) leading to nonlinear behaviour of population dynamics as well as resolving a 3-dimensional landscape that is available for Siberian treeline areas developed by Kruse et al. (2016Kruse et al. ( , 2019aKruse et al. ( , 2019b. https://doi.org /10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License.

Description and updates
The Larix vegetation simulator (LAVESI) is an individual-based spatially explicit model that simulates larch stand dynamics (Kruse et al, 2016;Kruse et al, 2018). Monthly temperatures of the coldest (January) and warmest (July) months and 5 precipitation series can force this model. In addition, 6-hourly data on wind speed and direction are needed to simulate seed distribution and tree reproduction, growth, and death (Kruse et al., 2019b;Kruse et al., 2018;Kruse et al., 2016).

Addition of landscape sensing
Data from the digital elevation model (DEM) TanDEM-X 90 m was downloaded from the web service provided by the German Aerospace Center (DLR https://download.geoservice.dlr.de/TDM90/; Krieger et al., 2007). Subsequently, the tiles were 10 reprojected to the corresponding UTM zone of the focus areas (Khamra N49, Nyurba N50, Spasskaya N52). All tiles were merged for each subzone and resampled by linear interpolation from 90 m to 30 m resolution using functions from the "raster" package in R (Hijmans, 2020). The results were imported in SAGA GIS version 2.3.2 (Conrad et al., 2015) and subjected to a basic terrain analysis tool using the standard parameters. The resulting rasters were water masked using the cloud-based geospatial data analysis platform Google Earth Engine (GEE, Gorelick et al., 2017) to assess Sentinel-2 imagery between 1 st 15 May 2018 and 15 th October 2018 with a cloud cover of less than 20% and thresholds manually set for spectral band B12 (2190 nm) until all water was masked out by comparing them to an RGB composite image. The DEM along with slope angle and terrain water index (TWI, moisture content) were cropped to 510x510 m (260,100 m²) areas for this study and exported as plain text files for import into LAVESI.
LAVESI reads this data provided in 30 m resolution and interpolates linearly from the closest four grid cells for each 20 cm 20 grid tile of the environment grid. Based on empirical relationships of forest presence for combinations of slope angle and TWI established in a study by Shevtsova et al. (in prep, 2021), an environment growth impact factor (Envirgrowth, 0-1) is calculated for each tile and tree diameter growth at this position is reduced accordingly (Eq. 1, Appendix A).
where is the interpolated terrain water index of the 20x20 cm² environmental grid cell i, and is the slope angle of 25 the same grid cell i.
Seed dispersal has been improved. Seeds can now only be dispersed to places which are at the same or lower elevation than the release height in the terrain. https://doi.org /10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License.

Addition of species and estimating leaf area index (LAI)
Further species were added to the existing model presented in Kruse et al. (2016). To add a fast forward implementation of species in LAVESI, we modified the code so that the program can be started with either one or all species in a mix simultaneously. The species are numbered (integer values), which are used internally to assess species-related variables ( Table   1, further variables in Appendix B, Table B1) when called for in the functions as necessary. Therefore, the code is independent 5 from the species and allows adding species or functional types simply by adding a new line in the new specieslist.csv in the main folder of LAVESI.
For this study, we analysed field data from the Chukotka and central Yakutia 2018 expedition in the same way as we did for Chukotka (Biskaborn et al., 2019, Kruse et al., 2019a. In the area of central Yakutia, species belonging to the Pinaceae family 10 form the forests. From these, two deciduous boreal forest tree species were sampled, Larix cajanderi Mayr. (LACA) and Larix gmelinii (Rupr.) Rupr., (LAGM), and three evergreen species, Picea obovata Ledeb. (PIOB), Pinus sibirica Du Tour (PISI), and Pinus sylvestris L. (PISY) (Kruse et al., 2019a). While the two larch species are best adjusted to the harsh environment of Northeast Siberia, and are able to grow on shallow active layers above permafrost, they differ mainly in their frost hardiness and the species LACA can even endure colder temperatures in winter (Table 1). PIOB is a competitor for L. gmelinii growing 15 at similar environmental conditions, however preferring deeper thawed active layers of minimum of 200 cm. On well-drained sites, PISY grows well and outcompetes the other species. In milder environments, LASI and PISI grow on similar sites as LAGM and PIOB.
Tree-ring width data were established from tree discs and cores collected from sites close to Lake Khamra and from the region 20 Nyurba. The discs and tree cores were prepared by standard dendroecological processing steps: (1) sanding with progressively finer paper until tree rings are clearly visible, (2) making high-resolution images for a track with a binocular and attached camera, (3) detecting rings with CooRecorder (Cybis Elektronik & Data AB) and cross-dating, and (4) exporting individual tree-ring chronologies (more details in Kruse et al., 2020). Tree-ring width data per species were then imported to R using the dplR package (Bunn et al., 2020) and regression models were set up with the gnls-function from the nlme package (Pinheiro 25 et al., 2019). For each species, we extracted the median of the loess-smoothed (span=1.5) yearly growth increase of individual trees and set up a generalized least squares regression using a nonlinear model. This was successful for LACA, LAGM, and PIOB, but not for PISI and PISY due to small sample sizes, where current values of PIOB are used as a first estimate (Table   1). For each tree in the simulation, the maximum actual growth can be estimated with the following equation.
where TRW is the tree-ring width for species i at one year depending on the fitted parameters gdbasalconst, gdbasalfac, and gdbasalfacq. Biomass data were prepared following the protocol of Shevtsova et al. (2020) and allometric relationships were established to empirically estimate the leaf area (LA) from total leaf biomass for each tree (Eq. 3), followed by a log-log linear regression forced to pass through the origin employing the basal diameter as explanatory variable (Eq. 4). To estimate the LA for each tree, we used specific leaf area (SLA) parameters to translate from the dry weight of needles to leaf area (Eq. 3). For each 5 species, the SLA was extracted from literature values: SLALAGM = 120 cm² g -1 (Xian-kui et al., 2015), which was also used for the closely related sister species LACA, SLAPIOB = 50 cm² g -1 (Konôpková et al., 2020), SLAPISY = 50 cm² g -1 (extracted from the most recent source Błasiak et al., 2021, although other values are reported, 34 cm² g -1 in Reich et al., 1998, 40 cm² g -1 in Mencuccini & Bonosi, 2001. For PISI no source for SLA values was found and we assume it is similar to PISY.
where BM is the biomass of tree i in g, and SLA is the specific leaf area for species j.
where a is the slope of the linear model fit and DB is the basal diameter of the tree i. for species j 15 During simulations runs with LAVESI, the LA for each individual tree is estimated based on the fitted linear regression model using the following equation.
The leaf area index (LAI) of each CryoGrid 10x10 m grid cell in LAVESI is then the sum of leaf area values of present trees. 20 When a tree crown area covers more than one cell, the value is distance-weighted on the closest grid cells. For each species, the crown radius is estimated from field data with a log-log linear regression and the slope and y-intercept are used in LAVESI, parameters crownradiusestslope and crownradiusestinterc, respectively (Table 1).

Addition of a dynamic litter layer and estimating active-layer thickness (ALT)
A dynamic, growing litter layer with constant growth of 0.5 cm yr -1 and stochastic disturbance effects was introduced in the 25 Environmentupdate-function of LAVESI. When the parameter "litterlayer" is switched on, each of the 20 cm grid cells have a chance that the litterlayerheight can be reduced. This is stochastically implemented and for each year there is a 10% chance the litter layer is reduced by 10%, a 9% chance of a 25% reduction, a 0.9% chance of 50%, a 0.09% chance of 90%, and a 0.01% chance of a 99% reduction. This leads to a litter layer of ~15 cm in the areas of interest in simulation runs, as is observed in the region of interest (Kruse et al., 2019a). With this functionality, locally acting insulation effects are included in the 30 estimation of the actual ALT in one environment grid cell. The estimation of the maximum active-layer thickness was already https://doi.org/10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License.
introduced in the original setup of LAVESI (Kruse et al., 2016) and still serves as a first estimate thus making it possible to run a stand-alone simulation of LAVESI without coupling to the CryoGrid (see below).

Parameterization and validation
We compared results from simulations until year 2015 with field inventories from the 2018 expedition, focusing on the following key regions: Lake Khamra (westernmost, warmest), Nyurba (intermediate, climate station), and Spasskaya Pad 5 (easternmost for boreal forests of Yakutia) for which we used literature values for comparison. Values were in the range of expected results.

Description and updates
The model used to simulate the thermo-hydrological interactions between permafrost ground and the forest canopy is based 10 on CryoGrid (originally described in Westermann et al. 2016). CryoGrid is a one-dimensional, numerical land surface model that simulates the thermo-hydrological regime of permafrost ground by numerically solving the heat-conduction equation. The This entire model setup has previously been extensively validated for different study sites throughout our study region, 20 including Nyurba (63.08°N, 117.99°E), Spasskaya (62.14°N, 129.37°E), and Ilirney (67.40°N, 168.37°E) (Stuenzi et al. 2021a and 2021b). Validation exercises were carried out based on measured and modelled ground surface temperature (GST), activelayer thickness (ALT), soil moisture, bowen ratio, and short-and longwave radiation below and above the canopy. Parameters defining the canopy, snow, and soil properties were set according to literature values, model documentation, and own measurements (see Stuenzi et al. 2021b for constants and multilayer canopy parameter choices). Table 2 summarizes the 25 parameter choices for the three different sites. Table C1 summarizes the commonly used CryoGrid parameters.

Coupling the models
The coupled model set-up benefits from the detailed process implementation gained while developing the individual models and brings the 1D to a landscape simulation. Therefore, we can reproduce the energy transfer and thermal regime in permafrost ground as well as the radiation budget, nitrogen and photosynthetic profiles, canopy turbulence, and leaf fluxes, while at the 30 https://doi.org/10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License. same time predicting the expected establishment, die-off, and treeline movements of larch forests (Fig. 2). In our analyses, we focus on vegetation and permafrost dynamics and reveal the magnitudes of different feedback processes between permafrost, vegetation, and current and future climate in Siberia.
LAVESI serves as host model and can now be set to call individual CryoGrid instances in a given year. For this, the data in 5 LAVESI are aggregated on a 10x10 m grid superimposed on the 20x20 cm grid. Key state variables are leaf area index (LAI), plant area index (PAI), fraction of deciduous species, litter layer height, organic layer height, albedo, and the soil humidity in percent (= plant available water, PAW), which are provided to CryoGrid. These values can either be sorted by LAI and exported for 5 quartiles (implemented but not used here) or from the three areas that are equal slices from left to right (used here, see Appendix A1-A3). When the output file is created, LAVESI can be set to either start CryoGrid directly via a system call or 10 scheduling the instance with a bash file for the workload manager slurm (Yoo et al., 2003). Based on the key state variables provided by LAVESI for each of the areas, CryoGrid starts three (or five) parallel simulations. Once the output has been written, LAVESI reads the file and produces for the three levels anomalies for available soil water and active-layer thickness.
With these anomalies, the 10x10 m CryoGrid-grid in LAVESI is filled and from this, the anomalies used to calculate the new values for each 20x20 cm environment grid cell. When the quartile-mode is set, the state values are assigned to this grid 15 calculated by linear interpolation of the LAI-sorted state values and anomalies are calculated as in the other mode. The multilayer canopy model in CryoGrid requires a minimum LAI of 0.7 m 2 m -2 and a minimum height of 1 m to successfully build the radiative transfer scheme from the atmosphere to the ground, therefore forest covers below these values are ignored.

Forcing data and landscape of focus areas
The meteorological forcing data required by the multilayer canopy-permafrost model (air temperature, air pressure, wind 20 speed, relative humidity, solid and liquid precipitation, incoming long-and shortwave radiation, and cloud cover) are obtained from ERA-5 (ECMWF Reanalysis, Hersbach et al., 2018) 1).
To provide a millennia-long time series for model spin-up of LAVESI these series were matched to historical climate data for 25 the forcing retrieved from the 0.5°x0.5° Climate Research Unit gridded Time Series (CRU TS version 3.23) monthly data   (Harris et al., 2020). By repeating the 20 th century data in a loop, a 2100-year long monthly climate series was established from 1 to 2100 CE for each focus region using the RCP 2.6 prediction scenario.

Simulation experiments
We forced LAVESI simulations with the RCP 2.6 climate scenario calling CryoGrid first in 2015 and yearly in the following 30 years, letting the simulation run until the year 2030. Simulation runs were started with the updated LAVESI version on an empty landscape with true topography starting at 1 CE to allow for spin up and ending in 2100 CE. Into the empty landscape, https://doi.org/10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License. seeds (5000 ha -1 yr -1 ) for initiating population establishment were introduced for the first 50 years. Subsequently, only 100 ha -1 yr -1 seeds were introduced to allow for re-establishment after a complete die-out of trees on the whole simulation area. Each simulation was rerun without calling CryoGrid to compare the differences when the improved active-layer thickness and available soil water is used.
In addition to the simulation that uses equal proportions of seeds of each species introduced into the simulation area, we started 5 individual simulations for each single species.

Statistical analyses
All statistical analyses in this study were performed in R 3.6.1 (R Core Team, 2019), mostly using included standard functions, with the addition of functions from the package "lattice" (Sarkar, 2008) for plotting the data.

Comparing simulations with LAVESI and the coupled version
The values are very similar for the runs with all and individual species. In nearly all years, LAVESI overestimates ALT by up to 20 cm (mean over all is 109.6±11.4 cm versus 96.1±10.2 cm, which is ~14.1%) at all focus regions (Fig. 3). The soil humidity anomaly fluctuates around 0% at Lake Khamra, is overestimated for Nyurba by ~10%, and Spasskaya Pad by ~20% (Fig. 4). Both are corrected in the coupled version of LAVESI-CryoGrid. 15 A gradient of LAI forms, which follows the TWI gradient on all sites (I: left, driest to III: right, wettest, Fig

Discussion
The simulations using site information for the three focus sites yield dense tree stands in LAVESI simulations but not in the coupled version. The coupled model results in smaller key soil parameters, active-layer thickness (ALT), and plant available water (PAW). PAW has a strong impact as trees grow poorly in conditions exceeding drought and waterlogging thresholds (e.g. Liang et al., 2014, Mamet et al., 2019, Lawrence et al., 2015, Barber et al., 2000. Drought leads to a higher mortality of 5 trees and in consequence, population simulations are driven close to extinction within the simulation duration of 15 years.
However, when drought-adapted Pinus sylvestris occupies the niche, there is nearly no change and it could, in the end, colonize the simulation areas. This implies that the model is reproducing the natural dynamics well.
Species preference matches observations and expectations (Kuznetsova et al., 2010). Larches have a wide ecological niche 10 and are widespread (Mamet et al., 2019). They are generalists and best adapted to the harsh Siberian environments that were predominantly wet but are now become drier with global warming. Picea obovata grows best in the westernmost, warm areas and reaches larger LAI/biomass than when growing in mixed stands competing with other species. This could be attributed to (intraspecific) competition, which, as Wieczorek et al. (2017) shows, seems to be a strong factor dampening the response of tree stands when climatic conditions improve. 15 As fires become more intense and frequent under global warming, spruce or other species may become dominant rather than shade-intolerant larch species. In the currently naturally deciduous, larch covered areas, evergreen taxa may invade and change the heat fluxes and energy balance with the threat of entering a positive feedback loop such as a deepening of the ALT (Bonan, 2008, Stocker et al., 2013, Stuenzi et al., 2021b. 20 In general, technical issues arise when coupling models and implementing I/O indirectly via output files. The two different time-steps (years in LAVESI vs. 5 minutes in CryoGrid) and computational speeds lead to long computation times and high requirements of computational resources of the coupled version. To avoid any delay, a parallelization of CryoGrid simulations, as implemented here, is highly recommended, especially for dry study sites where a simulated year in CryoGrid can take up 25 to 4 hours. We find that simulations of homogeneous areas perform best and especially that the exchange is set up using three to five instances sorted by LAI. The constraint lies here and more instances would improve the representation of variants of deciduous/evergreen covered plots but these improved LAVESI simulations come at the cost of computation time when not using the parallelized version as developed for this study.

Conclusions 30
The as-is application of LAVESI overestimates ALT values by around 14% therefore we advise using the implemented correction from CryoGrid for forecasting forest dynamics in the proposed coupled version. The 3D simulations provide a way https://doi.org /10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License.
to understand permafrost distribution and interactions with vegetation. Further implementations, tracked online in the github repository of LAVESI, include spatially explicit fire and trait variation and adaptation. This public sharing of the source code plus advances in both models allows the easy exchange, development, and adaptation to further regions. This and the simple set-up make the coupled model version easy to implement and thus offers a wide applicability. However, fieldwork or literature values from remote areas are necessary to adjust parameters and adapt the model and species to local site conditions, which is 5 an issue for the vast remote areas in Siberia. With increasing data from these remote areas, such as better satellite imagery coverage and resolution, the collection of more detailed field data (loggers recording soil temperatures and moisture in the active layer), and monitoring of permafrost dynamics and tree growth, the drivers of forest dynamics may be disentangled and thus improve the model.

Acknowledgements 10
We are grateful for the support of Sven N Willner in parallelizing LAVESI. The data and experience gained on the successful Expedition Chukotka and Central Yakutia in 2018 was the basis for the developments and we thank especially Luidmila A Pestryakova for her support and leadership of the expedition, but all participants involved in the expedition. We thank Luca Farcas and Christopher Guth for their support in preparing samples and establishing tree-ring chronologies, and Ingo Heinrich from the GFZ dendro lab for their support of the dendroecological analyses. We thank Iuliia Shevtsova for supporting the 15 organization and analyses of the biomass data. We thank especially the HPC support at AWI mainly Natalya Rakowski and Malte Thoma for their assistant in the simulation setup.
This study has been supported by the ERC consolidator grant Glacial Legacy to Ulrike Herzschuh (no. 772852). Further, the work was supported by the Federal Ministry of Education and Research (BMBF) of Germany through a grant to Moritz Langer (no. 01LN1709A). Funding was additionally provided by the Helmholtz Association in the framework of MOSES (Modular 20 Observation Solutions for Earth Systems).

Data availability
The CryoGrid code is available on Zenodo (https://doi.org/10.5281/zenodo.5119987). LAVESI is publicly available on GitHub at https://github.com/StefanKruse/LAVESI the branch used for this study is https://github.com/StefanKruse/LAVESI/tree/CryoGrid_multispecies and the commit used for the simulations for this study is 25 93a9767 and the final commit will be permanently stored on Zenodo upon acceptance of the manuscript.
Biomass data and tree-ring growth information for the species present at the ROI is available upon request and will be stored publicly upon acceptance of the manuscript to PANGAEA https://www.pangaea.de/ https://doi.org/10.5194/gmd-2021-304 Preprint. Discussion started: 6 October 2021 c Author(s) 2021. CC BY 4.0 License.

Author contributions
Stefan Kruse SK and Ulrike Herzschuh UH developed the idea of coupling LAVESI with CryoGrid. SK and Simone Stuenzi SMS set up the coupled version and maintained the simulation runs together. Josias Gloy JG developed the upgrade to implement multiple species efficiently in LAVESI, SK realized the implementation of the LAVESI related programming. SK analysed the data and drafted the first version of the manuscript, which was jointly edited by all authors. 5 The authors declare no conflict of interest.