TIMBER v0.1: a conceptual framework for emulating temperature responses to tree cover change

. Land cover changes have been proposed to play a significant role, alongside emission reductions, towards achieving the temperature goals agreed upon under the Paris Agreement. Such changes carry both global implications, pertaining to the biogeochemical effects of land cover change and thus the global carbon budget, and regional/local implications, pertaining to the biogeophysical effects arising within the immediate area of land cover change. Biogeophysical effects of land cover change are of high relevance to national policy-and decision-makers and accounting for them is essential towards effective 5 deployment of land cover practices that optimises between global and regional impacts. To this end, Earth System Model (ESM) outputs that isolate the biogeophysical responses of climate to land cover changes are key in informing impact assessments and supporting scenario development exercises. However, generating multiple such ESM outputs in a manner that allows comprehensive exploration of all plausible land cover scenarios is computationally untenable. This study proposes a framework to explore in an agile manner the local biogeophysical responses of climate under customised tree cover change 10 scenarios by means of a computationally inexpensive emulator, TIMBER v0.1. The emulator is novel in that it solely represents the biogeophysical responses of climate to tree cover changes, and can be used as either a standalone device or supplementary to existing climate model emulators that represent the climate responses from greenhouse gas (GHG) or Global Mean Temperature (GMT) forcings. We start off by modelling local minimum, mean and maximum surface temperature responses to tree cover changes by means of a month-and Earth System Model (ESM)- specific Generalised Additive Model (GAM) 15 trained over the whole globe. 2-m air temperature responses are then diagnosed from the modelled minimum and maximum surface temperature responses using observationally derived relationships. Such a two-step procedure accounts for the different physical representations of surface temperature responses to tree cover changes under different ESMs, whilst respecting a definition of 2-m air temperature that is more consistent across ESMs and with observational datasets. In exploring new tree cover

Abstract. Land cover changes have been proposed to play a significant role, alongside emission reductions, in achieving the temperature goals agreed upon under the Paris Agreement. Such changes carry both global implications, pertaining to the biogeochemical effects of land cover change and thus the global carbon budget, and regional or local implications, pertaining to the biogeophysical effects arising within the immediate area of land cover change. Biogeophysical effects of land cover change are of high relevance to national policy and decision makers, and accounting for them is essential for effective deployment of land cover practices that optimise between global and regional impacts. To this end, Earth system model (ESM) outputs that isolate the biogeophysical responses of climate to land cover changes are key in informing impact assessments and supporting scenario development exercises. However, generating multiple such ESM outputs in a manner that allows comprehensive exploration of all plausible land cover scenarios is computationally untenable. This study proposes a framework to explore in an agile manner the local biogeophysical responses of climate under customised tree cover change scenarios by means of a computationally inexpensive emulator, the Tree cover change clIMate Biophysical responses EmulatoR (TIMBER) v0.1. The emulator is novel in that it solely represents the biogeophysical responses of climate to tree cover changes, and it can be used as either a standalone device or as a supplement to existing climate model emulators that represent the climate responses from greenhouse gas (GHG) or global mean temperature (GMT) forcings. We start off by modelling local minimum, mean, and maximum surface temperature responses to tree cover changes by means of a monthand Earth system model (ESM)-specific generalised additive model (GAM) trained over the whole globe; 2 m air temperature responses are then diagnosed from the modelled minimum and maximum surface temperature responses using observationally derived relationships. Such a two-step procedure accounts for the different physical representations of surface temperature responses to tree cover changes under different ESMs whilst respecting a definition of 2 m air temperature that is more consistent across ESMs and with observational datasets. In exploring new tree cover change scenarios, we employ a parametric bootstrap sampling method to generate multiple possible temperature responses, such that 1 Introduction Following the Paris Agreement in 2015, 42 % of nationally determined contributions (NDCs) submitted by countries included afforestation-or reforestation-based actions and targets (Seddon et al., 2020). The recent COP26 in Glasgow furthermore saw a pledge to halt and reverse deforestation by 2030 (COP, 2021). Considering this, society is set to experience notable land cover changes in hopes of achieving global warming levels well below + 2 • C and to pursue efforts in limiting them to +1.5 • C above pre-industrial levels. In anticipation of this, the Earth system model (ESM) community has put great effort into understanding and quantifying the biogeochemical and biogeophysical effects of land cover changes Lawrence et al., 2016;Davin et al., 2020;Boysen et al., 2020).
Biogeochemical effects of land cover changes largely affect the global carbon budget, while biogeophysical effects are essential in understanding regional climate impacts as well as extremes Pitman et al., 2012;Lejeune et al., 2018). Recent studies by Windisch et al. (2021) and Lawrence et al. (2022) highlighted the need to consider the biogeophysical effects of land cover changes in order to effectively identify and prioritise areas for reforestation or afforestation and conservation. Such underscores the regional importance of the biogeophysical effects of land cover changes under future climate scenarios Hirsch et al., 2018) and evidences the need to consider them within impact assessments  and scenario development exercises (Van Vuuren et al., 2012;Calvin and Bond-Lamberty, 2018). Exploring the biogeophysical effects of land cover changes under all possible future land cover scenarios solely through ESMs, however, quickly becomes untenable due to computational costs, and it is worth pursuing computationally inexpensive alternatives such as climate model emulators.
Climate model emulators are computationally inexpensive tools, trained on available climate model runs to then render probability distributions of key climate variables for runs that have not been generated yet. By statistically representing select climate variables, emulators are able to reduce the dimensionality of climate model outputs, allowing for agile ex-ploration of the uncertainty phase space surrounding climate projections. Climate model emulators designed to reproduce regional or grid-point-level, annual to monthly temperature projections usually operate as ESM specific and start by deterministically representing the regional or grid-point-level mean response of temperatures to a certain forcing, after which the residual variability -treated as the uncertainty due to natural climate variability -is sampled or stochastically generated (Alexeeff et al., 2018;McKinnon and Deser, 2018;Link et al., 2019;Castruccio et al., 2019;Beusch et al., 2020;Nath et al., 2022b). Outputs of such emulators act as approximations of multi-model initial-condition ensembles, providing distributions of temperature responses to the forcing of choice for impact assessments. To date, however, such climate model emulators mainly represent the greenhouse gas (GHG) or global mean temperature (GMT) forcing within their mean response, neglecting the biogeophysical effects of land cover changes.
In this study, we set up a conceptual framework for emulating the biogeophysical responses of climate variables to land cover changes, hereafter simply referred to as responses. As a first step, we focus on emulating the surface and 2 m air temperature responses to land cover changes between forest and cropland, simply denoted as tree cover changes. The resulting emulator constitutes a prototype version of the Tree cover change clIMate Biophysical responses EmulatoR, i.e. TIMBER v0.1. Since representation of natural climate variability is well-explored in other emulators, TIMBER v0.1 purely focusses on representing the mean response of temperatures to tree cover change. In doing so, we recognise that the ESM data available for training (described under Sect. 2) are under-representative of the full range of possible tree cover changes across the globe. Consequently, we pursue a more probabilistic representation, such that parametric uncertainties given the training data population are accounted for. TIMBER v0.1 can thus be used as a standalone device or as a supplement to other emulators. The structure of this paper is as follows: Sect. 3 introduces the emulator framework and its calibration and evaluation procedure, Sect. 4 presents the calibration and evaluation results and illustrates some emulator outputs, Sect. 4.4 demonstrates the application of the emulator to different Shared Socioeconomic Pathway (SSP) scenarios, and Sect. 5 wraps up with the conclusion and outlook.

ESM experiments
Idealised Earth system model (ESM) experiments that isolate the effects of tree cover change on the climate were run as part of the LAnd MAnagement for CLImate Mitigation and Adaptation (LAMACLIMA) project; a detailed description of these simulations can be found in (De Hertog et al., 2022). The experimental setup was designed to capture the maximal potential climate response due to afforestation, reforestation, or deforestation as compared to present-day land cover conditions. Accordingly, extreme afforestation (AFF) and deforestation (DEF) scenarios were run alongside a reference scenario (REF). The REF scenario spans 150 years, with land cover conditions and other forcings (GHG emissions, etc.) kept constant at 2015 levels. The AFF (DEF) scenario then consists of a full expansion of forest (crop) cover relative to that of 2015 levels, with all other forcings again kept constant at 2015 levels, and again spans 160 years, with a 10-year spin-up period, which is excluded. The AFF (DEF) was implemented by removing the non-required vegetation types (i.e. crops, grassland, and shrubs for AFF and forest, grasslands, and shrubs for DEF) and upscaling the remaining vegetation to fill up the grid cells. Bare land was conserved throughout this process in order to respect the biophysical limits of where vegetation can grow. The difference between the AFF (DEF) run and the REF run outputs averaged over the 150 years provides the climate response to idealised reforestation or afforestation (deforestation).
Temperature responses derived from the ESM simulations are distinguished into local and non-local responses following the checkerboard approach developed by Winckler et al. (2017c). Local responses represent the expected climate responses to land cover change within the immediate area of change and can be applied in any global tree cover change scenario, whereas non-local responses represent remote effects of land cover change and depend on the global extent and patterns of land cover change. Given that local responses are independent of the global extent and patterns of land cover change, we focus only on them for the rest of this study, and the term response exclusively refers to the local response hereon. Participating ESMs running simulations within the LAMACLIMA project are the Community Earth System Model version 2.1.3 (CESM2), the Max Planck Institute Earth System Model version 1.2 (MPI-ESM), and the European Community Earth System Model version 3-Veg (EC-EARTH).

Observational dataset
To demonstrate the applicability of the emulating approach outlined in this study on observational data, we use the Duveiller et al. (2018c) dataset, hereafter referred to as D18. This dataset was derived using a space-for-time substitution approach applied to surface temperatures from satellite data in order to map potential local responses of daytime, mean, and nighttime surface temperatures to land cover transitions. It considers transitions from forest to several other land cover types (e.g. shrubland or grassland). To ensure comparability with the ESM runs, we choose to only focus on forest transitions to cropland, which are hereafter by analogy also referred to as DEF. It should be noted that we do not emulate the temperature response to afforestation in this case since the D18 dataset assumes a symmetrical temperature response for transitions from cropland to forests. Additionally, the dataset contains some information gaps in space and is thus spatially sparse as compared to the spatially complete ESM output fields.

Tree cover change scenarios in selected SSPs
The emulator framework developed in this study enables us to predict the expected local temperature changes that would be given by the dataset it is trained on (being derived from models or observations) in response to any scenario of spatially explicit tree cover changes. We apply it to scenarios of tree cover changes according to the Shared Socioeconomic Pathways SSP1-2.6 and SSP3-7.0 . SSP1-2.6 follows the narrative of a global trend towards sustainable development from SSP1  and entails changes in global emissions and further climate forcings that eventually lead to a radiative forcing of 2.6 W m −2 in 2100. Strong land use regulations mean that tropical deforestation is reduced, while economic development enables increases in crop yields, and the focus on sustainability entails less food waste and a reduction in the consumption of animal products . Overall, this leads to an increase in forest cover in many parts of the world. In contrast, SSP3-7.0 follows the SSP3 narrative and leads to a radiative forcing of 7.0 W m −2 in 2100. SSP3 features a world in which there is a resurgence of nationalism and regional conflicts that translate into a stronger focus on domestic and regional issues and low international cooperation, particularly on environmental issues. Land use is thus not well regulated, low economic development and reduced technology transfer mean that crop yields stagnate or decline, while diets with high shares of animal products and high rates of food waste prevail. As a result, deforestation continues, especially in the tropics.
In this study, we use the trajectories of tree cover changes according to SSP 1-2.6 and SSP 3-7.0 as modelled by the integrated assessment models IMAGE and MESSAGE-GLOBIOM, respectively Fujimori et al., 2017). Tree fraction maps are obtained as the CMIP6 variable "treeFrac" from the CMIP6 new-generation library hosted by ETH Zurich (Brunner et al., 2020).

Overview of the emulation approach
The emulation framework presented in this study aims at predicting local temperature responses to tree cover changes and is split into three parts. The first part seeks to statistically represent the expected responses of minimum ( T S min m,s ), mean ( T S mean m,s ), and maximum ( T S max m,s ) surface temperature for a given month m and location s, generically referred to as T S m,s , to tree cover change (Sect. 3.2). This is carried 4286 S. Nath et al.: TIMBER v0.1 out using a generalised additive model (GAM) that is calibrated via a blocked-cross-validation procedure in order to account for the specificity of the training data. The predictive ability of the GAM is also evaluated using a blocked-crossvalidation procedure.
The second part then seeks to diagnose 2 m air temperature responses ( T 2 m m,s ) from the statistically represented surface temperature responses using observationally derived relationships (Sect. 3.3). T 2 m m,s is an important variable for impact assessments; however, it is diagnosed differently across ESMs (leading to inter-ESM discrepancies in their modelled response to tree cover change) and is also defined differently between ESMs and observations. The split approach suggested in this study therefore allows us to maintain a response of surface temperature to tree cover change that is specific to the ESM or observational data trained on, from which T 2 m m,s is then diagnosed using observationally derived relationships independent of training data. In such, we account for the different physical representations of temperature responses to tree cover change for each ESM whilst also ensuring a consistent definition of T 2 m m,s across ESMs and with observational datasets, ergo the possibility to compare them.
The third part aims to quantify the uncertainty in the final T 2 m m,s predictions that arise from the parametric uncertainties within the GAM (Sect. 3.4). The GAM's parametric uncertainty is assessed using a parametric bootstrap procedure (Hastie and Tibshirani, 1986;Wood, 2017) so as to evaluate the imperfections within its fitted parameters conditional on the given training sample population. Given the limited amount of training data available, this is an important step towards quantifying the confidence in the temperature response predictions from TIMBER.

Model description
We model the expected T S m,s conditional on tree cover change and geographical attributes using a month-specific generalised additive model (GAM) trained over the whole globe. The GAM, hereon referred to as min/mean/max m -depending on whether it is applied to daily minimum, maximum, or mean surface temperature -or more generically as m , is provided by the Python pyGAM package. m can easily ingest multidimensional data and has the advantage that it does not prescribe any functional form, allowing flexibility in representing linear to more complex response types. The input predictor matrix (X) given to m is composed of tree cover changes relative to the 2015 ( 2015 treeFrac) and the geographical attributes of longitude (long), latitude (lat) and orography (orog). Maps of treeFrac 2015 implemented under the AFF and DEF scenarios and the orog (defined as metres above sea level) are available for reference in Figs. A1 and A2, respectively, in Appendix A. The conditional distribution of T S m,s is assumed to be normal, where te m represents a tensor spline term built across the three-dimensional 2015 treeFrac, long, lat space with coefficient terms stratified according to orog using the by operator so as to create a varying coefficient model (Hastie and Tibshirani, 1993). For further details on tensor splines and the by operator, see Wood (2017). m can be calibrated for its lambda parameter (λ), which controls the complexity of the shape of te m (where a smaller λ value allows for a more complex shape) and its number of basis functions, also noted as nbfs (where more basis functions mean more degrees of freedom).

Blocked cross-validation for model calibration
A first blocked cross-validation (Roberts et al., 2017) is conducted to find the model configuration (i.e. the set of model parameters λ and nbfs) that performs best over geographical and climate regions (steps 1-4 of Fig. 1). Block samples are constructed by identifying regions sharing climate and geographical characteristics. K-means clustering is used to cluster grid points according to background climate (based on climatological values of temperature and relative humidity) in the REF simulation of each ESM and in historical climatological data from WorldClim v2 (https://www.worldclim.org/ data/worldclim21.html, last access: 6 June 2022) when further calibrating on the D18 observational data. To select the optimal number of clusters, we calculate the improvement in the performance of the K-means-clustering algorithm (measured as the average distance of all points from the centre of their respective cluster groups, with a smaller distance indicating better performance) with increasing number of clusters, then select the number of clusters after which no further improvement in performance is observed. Grid points are subsequently split according to continuous geographical regions: Africa, North America, South America, Australia, Eurasia, Tibetan Plateau, and the Southeast Asian Islands. The composite cluster blocks obtained through this procedure are illustrated in the upper-right corner of Fig. 1 (for example, ESM, CESM2) and in Fig. A1 in the Appendix. Cross-validation is then performed using the composite blocks identified in both the climate and geographical space. Successively and for each block, m is fitted to data for the whole land area except over that block. At each iteration, λ values between 0.001 and 1, as well as a number of basis functions between 5 and 9, are tried out, representing a possible model configuration. For each block, the performance of each model configuration is evaluated by calculating the RM-SEs, its predictions, and the actual ESM or observational data over that block. By doing so, we hope to nudge the λ parameter and the number of basis functions to values that most flexibly apply across all possible geographical and climate conditions whilst ensuring independence between training and test sets by accounting for spatial correlations. Eventually, crossvalidation is carried out across all train-test splits such that each block is used for testing once, and the set of model parameters yielding the best performance for m as measured by the RMSE across all test sets is selected. The parameters of these model configurations and their performance are shown in Sect. 4.1.1.

Blocked cross-validations for model evaluation
Having selected the optimal λ value and nbfs configuration for m , a final training on the whole set of training data is conducted to obtain the fully calibrated m . Blocked crossvalidation is further employed to evaluate the calibrated m 's performance into no-analogue conditions where the model has the least information (Roberts et al., 2017), thus providing a representative idea of the model's ability to predict new tree cover change scenarios unseen during calibration. It is mainly required that the model is able to predict well across different background climates, as well as for different amounts of tree cover change; therefore, its performance is evaluated separately in no-analogue conditions representative of each of these aspects.
First, since m was originally calibrated by creating blocks that considered both climate and geographical space, the performance into no-analogue background climates is assessed by re-using those same blocks. Successively and for each block, the best-performing configuration of m identified during calibration is trained on data for the whole land area except that block. The RMSEs between the values predicted by m and the actual values in the ESM or observational data over that block are then calculated. The results of this procedure are described in Sect. 4.1.2.
Then, another set of blocks is constructed by splitting the same seven continuous geographical regions as in the previous section but by dividing the grid cells constituting those according to the amount of tree cover change 2015  A similar procedure to that applied for the no-analogue background climate conditions is then conducted but using these newly constructed blocks: successively and for each block, m is trained on data for the whole land area except over that block using the sets of parameters identified in Sect. 3.2.2. For each block, the RMSEs between the values predicted by m and the actual ESM or observational data are then calculated. They constitute an estimate of the predictive ability of m for tree cover change amounts unseen during training and are presented in Sect. 4.1.3. values so as to obtain grid-point-level coefficients specific to geographical and/or background climate conditions. By taking a stacked generalisation of the GWR and CSWR outputs, Hooker et al. (2018) were able to reconstruct global T 2 m m,s maps over the period 2003 to 2016 in a geographically and climatically consistent manner.

Diagnosing the 2 m air temperature response from changes in surface temperatures
In this study, we use a model adapted from Hooker et al. (2018) to diagnose T 2 m m,s from surface temperatures. Ideally, the Hooker et al. (2018) model would be refitted to derive ESM-specific coefficients between ESM surface temperatures and observed T 2 m m,s data. Given that this study primarily focusses on setting up a conceptual framework, however, we choose to directly apply the original coefficients derived by Hooker et al. (2018) as an initial proof of concept. Before applying the Hooker et al. (2018) model, we first make some modifications to it so as to enable a smooth translation between observed and ESM spaces. In the following subsections, we introduce the modifications made to the Hooker et al. (2018) model and furthermore outline some tests performed to check that the modified version of it applied to assuming that the effects of land cover type are minimal on β GWR 0,m,s , we then get where β GWR 0,m,s , β GWR 1,m,s , and β GWR 0,m,s are coefficient terms obtained from GWR. We choose not to use the CSWR coefficient terms as background climates between observations and ESMs are not consistent and there is the additional uncertainty surrounding the evolution of CSWR coefficient terms under changing background climates. Additionally, we use T S pairs will act as pseudo-samples that represent the underlying uncertainty due to natural variability within observational data.  Given that this is less than half of the grid points, we proceed with applying the multivariate Gaussian process.

Emulating 2 m air temperature responses to tree cover changes within the SSP scenarios
By predicting the expected surface temperature responses using the calibrated m (described in Sect. 3.2.1) and subsequently diagnosing the corresponding 2 m air temperature response using Eq.
(3), we can emulate the expected 2 m air temperature response to tree cover changes over the whole land area for any land cover change scenario. In this study, we do so for two Shared Socioeconomic Pathways, SSP2 1-2.6 and SSP3-7.0, for which the underlying narratives and resulting changes in tree cover over the 21st century are presented in Sect. 2.3. We only present the results for changes in tree cover between 2015 and the end of the century (mean changes between 2015 and 2100).
In arriving at the final 2 m air temperature response emulations, we are mindful of the limited training data available for constructing m . To account for this, we assess the underlying signal-to-noise ratio in the emulations by considering noise as the parametric uncertainties within m conditional on the training sample population. The noise in emulations arising from the parametric uncertainties within m is evaluated using a parametric bootstrap procedure (Hastie and Tibshirani, 1986;Wood, 2017). In the following sections, we outline the parametric bootstrap procedure used, followed by how its results allow for evaluation of the signal-to-noise ratio in the final 2 m air temperature response emulations.

Estimating parametric uncertainty in the predicted temperature responses
We quantify the impact of parametric uncertainties within m on the T S m,s predictions following a parametric bootstrap method as outlined in Fig. 2 (Wood, 2017;Efron and Tibshirani, 1993). Parametric bootstrapping constitutes first approximating the joint distribution of the coefficients (β) and λ parameter used within m , conditional on the training data available, i.e. f (β, λ|X) (step 1, Fig. 2), from which β values are then sampled to estimate surface temperature responses (step 2, Fig. 2). To avoid high computational costs, the joint distribution is approximated by first bootstrap sampling the distribution of λ conditional on the training material, i.e., f λ (λ) (steps 1a-1b, Fig. 2), from which the distribution of β, conditional on both λ and the training material, is constructed over the whole f λ (λ) space (step 1c, Fig. 2), such that f (β, λ|X) ≈ f (β|λ, X) · f λ (λ). Surface temperature response values are then sampled by drawing β distributions from random parts of the f λ (λ) space (step 2a, Fig. 2) and sampling coefficient values from them (step 2b, Fig. 2), which are then used to estimate T S m,s values (step 2c, Fig. 2). Figure 2. Sampling routine of the generalised additive model. First, an approximation of the coefficients' (β) and λ parameter's joint distribution given the available training data is constructed (step 1), from which coefficient terms are sampled to calculate T S mean m,s values with step 2. Steps 1a-1b construct the sampling distribution of the λ parameter (f λ (λ)) given the known variability in the training data, and step 1c then constructs the distribution of β conditional on the training data and λ parameter at each point of the f λ (λ) space. As such, the T S mean m,s values calculated in step 2 account for the uncertainty in the shape of T S mean m,s responses, as modulated by β and λ values.

Evaluating signal-to-noise ratio in the predicted temperature responses
In representing temperature responses under new tree cover change scenarios, we consider the signal-to-noise ratio in the final T 2 m m,s emulations. Noise constitutes the underlying parametric uncertainty within m arising from the training sample population. We start by sampling T S min/max m,s values from min/max m globally for each relevant pixel using the parametric bootstrap procedure outlined in Sect. 3.4.1 and then diagnose T 2 m m,s for each sample. The β and λ parameter uncertainty spaces are constructed using 10 bootstraps, from which 200 samples are then drawn. We take the mean across all samples as the expected T 2 m m,s value and the standard deviation across all samples as the underlying parametric uncertainty within the GAM. The signal-to-noise ratio is then obtained as the ratio between the mean and standard deviation values. We consider emulations with a signal-tonoise ratio lower than 0.5 to be insignificant as the underlying parametric uncertainty is double the actual magnitude of the expected response. Given the computational expenses of running ESMs, such gives m the benefit of mainly requiring extreme tree cover change scenarios as training material, from which it can further explore all possible outcomes of in-between scenarios itself. It should be noted, however, that this does not remove the benefit of having more training material on top of the extreme scenarios but simply minimises the training data requirements of m .

Blocked-cross-validation results
In this section, we show the calibration and evaluation results of mean m , obtained by performing different sets of blocked cross-validation as described in Sect. 3.2.2 and 3.2.3. The calibration and evaluation results for example months of January and July, which are representative of the hottest and coldest months for the Northern Hemisphere and vice versa for the Southern Hemisphere, are shown. First, we show results from the blocked cross-validation used to calibrate m for its optimal λ parameter and number of basis functions (Sect. 4.1.1). Second, we show the results of the blocked cross-validations employed to evaluate the calibrated m 's performance into no-analogue conditions. No-analogue conditions of background climate (Sect. 4.1.2) and those of tree cover change amounts (Sect. 4.1.3) are considered specifically with a separate blocked cross-validation performed for each. The following subsections show the blocked-crossvalidation results for mean m only as this gives a representative idea of the validity of this study's framework. Blocked-crossvalidation results for min/max m are provided in Appendix C. The mean m trained on observational data has a λ parameter value of 0.001 for both January and July, which is significantly lower than that of 1 otherwise chosen for all ESMs. This could be because the observationally trained mean m only receives training data for the DEF scenario, which implements large magnitudes of tree cover change localised to specific regions (see Fig. A1 in the Appendix). Thus, lower λ parameter values are favoured to allow for complex representation with higher spatial variability. The observationally trained mean m moreover shows poor performance, with only 34 % of grid points having RMSE values of less than 0.5. This possibly arises from less training being data available for the observationally calibrated mean m (i.e. less grid points as well as only one tree cover change scenario), such that mean m cannot gain as much information to predict with. All ESMs show higher RMSEs, with a lower proportion of grid points having RMSE values < 0.5, for the DEF scenario than the AFF scenario. This could be related to the difficulty in representing the complex response types with high spatial variabilities within the DEF scenario. Such highlights a design consequence of mean m , where te m is fitted smoothly over long, lat, and 2015 treeFrac, thus falling short in representing high spatial variabilities as brought about by large magnitudes of localised tree cover change. While CESM2 and EC-EARTH show a varying number of grid points with RMSE values below 0.5 between January and July for the AFF and DEF scenarios, MPI-ESM shows similar performance across both months for the AFF and DEF scenarios. Additionally, MPI-ESM's mean m favours the simplest representation across all ESMs with the lowest number of basis functions chosen for both January and July. Such indicates a smoother response type outputted by MPI-ESM, with deforestation in the tropics not necessarily leading to significant temperature jumps within space.

Results of model calibration
Overall, mean m mostly displays RMSEs less than or equal to 0.5 for all ESMs. Higher RMSEs (> 0.5) are usually localised to regions of extreme magnitudes of deforestation for CESM2 and EC-EARTH. In the case of MPI-ESM, higher RMSEs are localised to different regions depending on the month. For example, in both the AFF and DEF scenarios, MPI-ESM shows higher RMSEs over South America and Australia in January and over southern North America and the Mediterranean region for July. In such, mean m proves itself to be a reasonably flexible framework to represent expected temperature responses to more realistic magnitudes of tree cover change. As noted in the observationally calibrated mean m , a substantial hindrance to mean m 's performance is the availability of training data, where it is recommended to have both directions (i.e. positive and negative) of tree cover changes available for training.  For observations and ESMs, the spread in RMSEs displays a month dependency across all latitudinal bands, evidencing the seasonality in T S mean m,s responses to tree cover change and also the need for prior background climate information being more important for certain months than others. Despite the spread in RMSEs being large, median values are mostly below 0.5 for ESMs and below 1.5 for observations, which is in line with those seen in Fig. 3, indicating overall good prediction skill for mean m in relation to unseen background climate conditions. Observation RMSEs for DEF at −45 to −15 • N, however, show significantly higher median values than those in Fig. 3, although this is more likely to be due to data sparsity within the training data for this region, leading to little information being learned by the observationally calibrated mean m for this region. Across ESMs, DEF in the tropics (−15 to 10 • N) shows the largest spreads in RMSEs, with slightly higher median values than those of Fig. 3. Given that mean m may underperform within these areas due to the localised, large magnitudes of deforestation (as seen for CESM2 and EC-EARTH in Fig. 3), exploration of its performance into no-analogue tree cover changes is first required before concluding lower prediction skill for unseen background climate conditions within these areas.

Evaluation of mean
m under no-analogue tree cover change amounts Figure 5 shows the median RMSEs (with error bars indicating 50 % confidence intervals) obtained for mean m 's sampled predictions in relation to no-analogue tree cover changes aggregated to latitudinal bands for the example months of January and July. For observations and ESMs, magnitudes and patterns of RMSEs are similar between January and July across all latitudinal bands, contrary to what has been found for the predictive ability in no-analogue background climate conditions Sect. 4.1.2. This is expected as the way that local temperature response to tree cover changes depends on the season varies across background climates (and mainly across the latitudes; see, for example, Li et al., 2015) and is thus intuitively more important for representing seasonality in T S mean m,s values. Median RMSEs for 2015 treeFrac ≤ −0.5 in the tropics are higher than those seen for DEF in Fig. 4, indicating that the prediction skill for mean m in the tropics is more dependent on the availability of training information for similar tree cover changes than for similar background climate. MPI-ESM is an exception to this, displaying much larger RMSEs for DEF in Fig. 4. Such could result from MPI-ESM outputting a weaker response to tree cover change in the tropics, as previously suggested in Sect. 4.1.1, making the availability of prior background climate information the main factor influencing mean m 's prediction skill. Observations, CESM2, and EC-EARTH show an increase in RMSEs across all latitudinal bands as 2015 treeFrac values move towards the more extreme ends (−1 for observations and ±1 for CESM2 and EC-EARTH), sometimes even reaching RMSEs higher than those seen in Fig. 3. This indicates lower prediction skill for mean m in relation to unseen, extreme tree cover change conditions for observations, CESM2, and EC-EARTH. Nevertheless, the resolved skill seen in Fig. 3 verifies the need to have a training dataset representative of the extreme ends of tree cover change, as T S mean m,s responses may systematically become more nonlinear with increasing magnitudes of tree cover change.

Illustration of mean m outputs
In this section, we showcase the results of mean m when predicting T S mean m,s for any amount of tree cover change compared to 2015 levels and across the world. A select tree cover change value is applied to all grid points, and mean m is then used to predict the temperature responses for that tree cover change. Figure 6 illustrates the mean T S mean m,s predictions as well as their 95 % interval calculated across all grid points within a given latitudinal band. We choose the same latitudinal bands used in Figs. 4 and 5 T S mean m,s . As a preliminary check, the predictions can be roughly compared to the ESM outputs for the idealised AFF and DEF simulations as analysed by De . Only a rough comparison is possible, however, as we generate predictions for tree cover change maps of constant values across grid points, whereas the tree cover change maps applied within the AFF and DEF scenarios vary in values across grid points since they represent the full expansion of forest and/or Over all latitudinal bands and months shown, the largest 95 % intervals occur towards the extreme ends of tree cover change for both observations and ESM-based predictions. This is especially the case for deforestation, where the 95 % intervals are generally larger than those of afforestation. Higher 95 % intervals at extreme ends of tree cover change result from fewer grid points which undergo more extreme tree cover changes, ergo less training material. This highlights once more the higher uncertainty in the predictions by mean m for extreme amounts of tree cover change (in both directions).
Mean observation-based predictions remain close to 0 across all latitudinal bands for both January and July owing to the high data sparsity which makes it difficult to extract significant T S mean m,s responses during training. Nonetheless, observation-based 95 % intervals are in general agreement with those of ESMs across all latitudinal bands and months shown.

Surface to 2 m air temperature diagnosis
In this section, we apply the modified (Hooker et al., 2018)

2 m air temperature diagnoses
Since fewer than half of the grid points have positive values and since such values are isolated to certain months and geographical areas, we proceed with diagnosing T 2 m m,s from T S min/max m,s values outputted by min/max . The calibration and evaluation results for min/max are available in Appendix C and show similar results as those seen in Sect. 4.1, namely minimal additional RMSEs when predicting in relation to no-analogue conditions sampled out of the training dataset as compared to when predicting after having seen the full training dataset (i.e. comparing RMSE values from  values as derived in another study using the same ESM (Meier et al., 2018). An exception here is northern-hemispheric July T S min/max m,s values, for which a cooling was observed in Meier et al. (2018) in contrast to the warming seen in the training ma-terial used within this study (see Appendix C, Fig. C1). Such a discrepancy could arise from too-large albedo responses shown by CESM2 and highlights the caveats of diagnosing T 2 m m,s from T S min/max m,s , where physical inconsistencies in the surface temperature responses as represented within ESMs can be transferred to T 2 m m,s during diagnosis. Nevertheless, the task of m is to mimic ESM outputs irrespective of their realism, and to this end, the statistically derived relationships for T S min/max m,s in relation to tree cover changes match those of the ESM outputs trained on.

Exploration of tree cover change effects within SSP scenarios
In this section, we showcase the results of applying TIM-BER v0.1 calibrated on simulations conducted with CESM2 to the scenarios of future tree cover changes in SSP1-2.6 and SSP3-7.0. We employ the sampling method as described in Sect. 3.4 such that parametric uncertainties within the GAM are also represented. This provides a first step towards sta- Figure 6. mean m 's depiction of T S mean m,s , shown for observations and ESMs (colours) for the months of January (first row) and July (second row) across the whole range of 2015 treeFrac and aggregated to latitudinal bands (columns). λ parameters and number of basis functions (nbfs) chosen through blocked cross-validation are given in the first column of their respective month and are colour coded according to their respective training data (observations or ESMs). Solid lines represent the mean T S mean m,s predictions, and the surrounding band represents the 95 % interval calculated over T S mean m,s predictions for all grid points within the respective latitudinal band.
tistically emulating T 2 m m,s responses to tree cover change in a manner that not only provides the expected response but also gives an idea of the signal-to-noise ratio within predictions. Figure 9 shows maps of end-of-century tree cover changes (shown in the first column) under SSP 1-2.6 and SSP 3-7.0 and their associated mean T 2 m m,s responses (second column), obtained by sampling T S min/max m,s values from min/max m , applying Eq. (3) to get T 2 m m,s and taking its sample average. The signal-to-noise ratio is furthermore given by taking the ratio between the absolute mean T 2 m m,s value and its standard deviation calculated across sample results for T 2 m m,s (third column). We consider areas with a signal-to-noise ratio lower than 0.5 to have an insignificant temperature response as their surrounding parametric uncertainty is double that of the magnitude of response.
SSP 1-2.6 shows substantial cooling from afforestation in southern Africa and Brazil for both January and July. A substantial July warming due to deforestation can also be seen in the Tibetan Plateau due to deforestation. SSP 3-7.0 shows a significant January and July warming due to deforestation in Central Africa, the Tibetan Plateau, and South America. Western North America shows a significant cooling from deforestation, especially in July, while parts of East Asia show significant cooling from afforestation for both January and July.
In general, areas with a tree cover change lower than 0.1 in magnitude tend to have a signal-to-noise ratio lower than 0.5 and thus an insignificant temperature response. Such systematically lower signal-to-noise ratios indicate that m is not only aware of the lack of information it has for smaller changes in tree cover but can also infer that temperature responses to such tree cover changes are likely to be trivial.

Conclusion and outlook
This study presents TIMBER v0.1, a conceptual framework for representing monthly temperature responses to changes in tree cover. TIMBER v0.1 starts by modelling minimum, mean, and maximum surface temperature responses to tree cover change with a month-specific GAM, which is trained over the whole globe; 2 m air temperature responses are then diagnosed from the modelled minimum and maximum surface temperatures using observational relationships derived by Hooker et al. (2018). Such an approach maintains the ESM-specific temperature response to tree cover change whilst ensuring a constant diagnosis and observationally consistent definition of 2 m air temperature. The GAM is evaluated for its ability to predict in relation to unseen, i.e. no-analogue, background climate and tree cover change conditions. This is done using a blocked-crossvalidation procedure in order to account for the spatial struc-ture of the data when splitting into subsamples used for training and testing. Overall, the GAM shows good skill in predicting in relation to no-analogue conditions, with minimal RMSEs in addition to those that occur when predicting after having seen the full training dataset and thus all available background climate and tree cover change information. Such provides confidence in the GAM's ability to derive meaningful relationships from the training data provided by the ESMs. Nevertheless, poorer representation for extreme, localised tree cover changes was identified, such as for deforestation in the tropics, most likely due to the difficulty in adequately representing high spatial variability.
When predicting in relation to new tree cover change scenarios, we are especially mindful of the training data only including grid points which experience extreme tree cover change in the training simulations. To this extent, surface temperature responses are sampled from the GAM in a manner that explores all possible shapes of responses in between the two extreme ends of tree cover change as provided by the training data; 2 m air temperature responses are then diagnosed from the sampled surface temperature responses, and relevant responses are identified as those that have a high signal-to-noise ratio (> 0.5).
The final outputs of TIMBER v0.1 are demonstrated for SSP 1-2.6 and SSP 3-7.0. Generally, areas with less than ±10 % tree cover change render a low signal-to-noise ratio, which is intuitive as responses to such low changes in tree cover are likely to be minimal. Employing TIMBER v0.1 thus provides an avenue to explore the impacts of tree cover change and their underlying uncertainty due to the availability of training data and model calibration. It should be stressed that, given the lack of comparable ESM simulations that employ the checkerboard approach to isolate local signals of land cover changes, TIMBER's outputs cannot be thoroughly validated and must therefore be cautioned with the limitations of its current set-up, specifically that they are produced with limited amounts of training data and that the 2 m air temperature is diagnosed using observational relationships -as provided by Hooker et al. (2018) -directly applied to the ESM space. In the following subsections, we further highlight areas of potential improvement, elaborate upon the suitable modes of application for TIMBER v0.1, and detail possible further developments.

Areas of potential improvement
One area of potential improvement pertains to the model calibration procedure. When inspecting the calibrated λ parameter values and the number of basis functions, the limits of values cross-validated for (0.001 and 1 for the λ parameter and 5 and 9 for the number of basis functions) seem to be favoured. Reasons behind this could be (1) the blocked cross-validation sometimes removes too-large chunks of data, leading to an overestimation of RMSEs chosen; and/or (2) the range of λ parameters and/or number of basis function values calibrated for is too narrow. The first reason could be tackled by further splitting the blocks such that each block has a predefined Figure 9. T 2 m m,s values resulting from end-of-century changes (i.e. 2100) relative to 2015 in tree cover for SSP 1-2.6 (upper panel) and SSP 3-7.0 (lower panel) scenarios for the months of January (top rows) and July (bottom rows). Mean T 2 m m,s values (second column) and their signal-to-noise ratios (third column) calculated over the sampling distributions are shown. 2015 treeFrac maps are given in the first column, and grid points with | 2015 treeFrac| < 0.01 are not considered. number of samples. Alternatively, the GAM could be fitted over specific climate regions, and blocked cross-validation could be conducted with uniformly sized blocks composed along latitude and longitude dimensions, although, here, it is likely that the complete spectrum of tree cover change information will be lost for some regions. The second reason is easily solved by cross-validating over a larger range of values.
Another area of improvement could be to derive ESMspecific coefficients for the Hooker et al. (2018) model. Such would entail fitting for the relationships between ESM minimum and maximum surface temperatures and the observational 2 m air temperatures as used by Hooker et al. (2018).
Since the additional biases introduced by using the original Hooker et al. (2018) model) and minimum and maximum surface temperatures may also be quite large (e.g. minimum winter temperatures in the Northern Hemisphere are likely to occur after the time of overpass when nighttime temperatures are measured), such would be an essential step in being able to accurately diagnose 2 m air temperatures.

Modes of application
TIMBER v0.1 provides a framework to explore local-level temperature implications of tree cover changes in an agile manner under different tree cover change scenarios. TIM-BER v0.1 can be used as both a standalone device and as a supplement to other emulators. It should be noted that, to provide complete representation of the biophysical effects of tree cover change, albedo and thermal fluxes would have to be considered as well. To this extent, the temperature responses provided by TIMBER arise from a combination of the effects of albedo and thermal flux responses to tree cover changes on the atmospheric energy balance. Here, we summarise some key takeaways pertaining to the use of TIMBER v0.1 for generating new tree cover change scenarios.
Upon inspection of the T S mean m,s response patterns across all tree cover changes (Fig. 6), inter-ESM differences become quite apparent. Such differences are continuously studied and mainly arise from differences in model physical representation Lawrence et al., 2016;Lejeune et al., 2018;Davin et al., 2020;Boysen et al., 2020;De Hertog et al., 2022). Being able to train the GAM across all ESMs presents the opportunity to capture these uncertainties due to model physical representation, which may sometimes be higher than the parametric uncertainty within the GAM given the training data. When exploring new tree cover change scenarios, the need to have as many ESMs represented should therefore be emphasised. Moreover, the outputs of TIMBER v0.1 should always be interpreted as representative of the ESM-simulated world, which does not necessarily translate to observed reality.
In applying TIMBER v0.1 to different tree cover change and climate scenarios, it should furthermore be acknowledged that the effects of initial starting conditions and those of background global warming levels have not been accounted for (e.g. see Winckler et al., 2017b, for the possible effects of initial starting conditions on temperature responses to land cover changes). In order to represent such effects, TIMBER would require more training data. Furthermore, if the Hooker et al. (2018) coefficients are recalibrated for the ESM space, the impacts of changing climate on 2 m air temperatures could well be represented through the CSWR coefficients. Nonetheless, outputs of TIMBER v0.1 should more so be treated as hypothetical sensitivities and not definite responses.
Finally, as a conceptual framework, TIMBER v0.1 comes with its limitations that need to be accounted for and improved in future versions. A noteworthy limitation is the diagnosis of 2 m air temperature that relies on the modified Hooker et al. (2018) model. Such a set-up was implemented so as to enable constant diagnosis and definition of 2 m air temperatures across ESMs and observations. However, since the original Hooker et al. (2018) model takes night and day surface temperatures as predictors, whereas the modified model used in this study takes minimum and maximum surface temperatures, current 2 m air temperature predictions should be treated with caution. As seen in Fig. 7, such differences are expected to introduce minimal biases since TIM-BER looks at relative changes and not absolute values. However, for select months and regions (e.g. winter in Europe and North America), there are still added biases as night and day surface temperatures do not necessarily correspond to minimum and maximum surface temperatures. An additional limitation to TIMBER is the lack of available ESM data to evaluate it against. Such a problem was circumvented in this study by synthesising the closest representation to noanalogue condition predictions for TIMBER. Nevertheless, when applying TIMBER to different scenarios, predictions should always be treated as approximations. To this extent, the signal-to-noise ratio calculation from TIMBER is an essential feature as it represents the model confidence in predictions based on the available training material.

Future developments
It would be possible to extend TIMBER v0.1 to represent other impact-relevant climate variables. A variable to start with could be relative humidity, from which metrics such as wet-bulb globe temperature (WBGT) and labour productivity could be derived. In doing so, variable cross-correlations between temperature and relative humidity should be conserved, such that compound events -which largely affect WBGTs -are sufficiently captured. To this extent, a vectorised generalised additive model (VGAM) (Yee and Stephenson, 2007) could be employed, which retains variable cross-correlations by constructing a multivariate conditional probability distribution e.g. by using a bi-normal distribution as opposed to the normal distribution used within this study.
In its current set-up, TIMBER does not differentiate between plant functional types (PFTs). Temperature responses to tree cover changes, however, may differ between different PFTs. For example, needleleaf trees in temperate regions are associated with a stronger winter warming as compared to broadleaf trees, which otherwise lose their foliage during winter (Duveiller et al., 2018b). Representing the temperature responses to different PFTs instead of treating tree cover fraction as a single element would thus further enrich the outputs of TIMBER. A starting point to this could be differentiating between needle-and broadleaf trees. Each of these tree types could be treated as separate tensor spline terms within the GAM, and the final temperature results would be obtained by adding both terms. When doing so, the potential model accuracy gained should be assessed in relation to the added model complexity (i.e. increase in the number of tensor spline terms). Given that needle-and broadleaf trees are unevenly spread geographically (where broadleaf trees occur more in the tropics and needleleaf trees occur more in the temperate regions), it may also be worth training a separate GAM per geographical region so as to get an even representation of needle-vs. broadleaf trees, as well as to prevent model overfitting.
Looking into other land management practices such as irrigation and wood harvest could also be of interest, particularly as their effects on surface temperatures are expected to be similar in magnitude compared to those due to land cover changes (Luyssaert et al., 2014). In doing so, customisation of TIMBER v0.1's framework in relation to the land management practice of choice could be necessary. For example, when looking at irrigation, the implementation of irrigation can be extremely localised and seasonal (Thiery et al., 2017, and it would be preferable to train the GAM as region specific and across all months instead of month specific and across all grid points. To this extent, the GAM has the advantage of not prescribing any functional form, giving it flexibility in deriving climate responses to different types of LCLM forcings regardless of the format of the training data. In order to jointly explore future tree cover and GHG scenarios, coupling TIMBER v0.1 with other temperature emulators such as MESMER-M or -X (Beusch et al., 2020;Nath et al., 2022b;Quilcaille et al., 2022) also proves to be worthwhile. In doing so, care would have to be taken to not double-count the tree cover change signal as MESMER-M and -X are trained on SSP runs, which contain both GHG and tree cover change signals. Accordingly, it is advisable to first model the expected tree cover change signals within the SSP runs using TIMBER v0.1, following which MESMER-M or -X can be trained on the SSP runs with the modelled tree cover change signals removed.  Figure A2. Orography features, defined as metres above sea level, used in input predictor matrix for m for observations and ESMs (columns).
Author contributions. QL, CFS, and SIS identified the need for an emulator framework applied to representing land-cover-forced climate responses. SN developed the emulator set-up and the calibration and evaluation procedure with help from QL, LG ,and JS. QL and GD furthermore assisted SN in integrating the observationally based 2 m air temperature diagnosis scheme within the emulator. SJDH, QL, FH, FL, IM, JP, and WT produced the ESM output. All the authors contributed to interpreting the results and streamlining the text.