Predicting Global Terrestrial Biomes with the LeNet Convolutional-Neural-Network

. A biome is a major regional ecological community characterized by distinctive life forms and principal plants. Many empirical schemes such as the Holdridge Life Zone (HLZ) system have been proposed and implemented to predict the global distribution of terrestrial biomes. Knowledge of physiological climatic limits has been employed to predict biomes, 10 resulting in more precise simulation, however, this requires different sets of physiological limits for different vegetation classification schemes. Here, we demonstrate an accurate and practical method to construct empirical models for biome mapping: A convolutional neural network (CNN) was trained by an observation-based biome map, as well as images depicting air temperature and precipitation. Unlike previous approaches, which require assumption(s) of environmental constrain for each biome, this method automatically extracts non-linear seasonal patterns of climatic variables that are 15 relevant in biome classification. The trained model accurately simulated a global map of current terrestrial biome distribution. Then, the trained model was applied to climate scenarios toward the end of the 21st century, predicting a significant shift in global biome distribution with rapid warming trends. Our results demonstrate that the proposed CNN approach can provide an efficient and objective method to generate preliminary estimations of the impact of climate change on biome distribution. Moreover, we anticipate that our approach could provide a basis for more general implementations to build empirical models 20 of other climate-driven categorical phenomena.

improves the reconstruction accuracy compared to the classical HLZ scheme.
We follow Ise and Oba (2019) and Ise and Oba (2020) for training CNN with input variables. This method represents climatic conditions using graphical images and employs them as training data for CNN models. To account for seasonal variability, previous correlative climate-vegetation models needed to pre-define representative variables. For example, Levavasseur et al. (2013) divided each climatic variable into four "seasonal" predictors by averaging data corresponding 3-70 month periods (i.e., DJF for winter, MAM for spring, JJA for summer, and SON for fall). By contrast, the method we employed can automatically extract non-linear seasonal patterns for climatic variables that are relevant in biome classification. In other words, it enables CNNs to learn the seasonal pattern of multiple climatic variables without any indexical expression, which would reduce the amount of information and add a source of arbitrary.

Data
For training the CNN model, we employed potential land cover types and the monthly climate information from the ISLSCP2 Potential Natural Vegetation Cover (Ramankutty and Foley, 2010) and CRU TS4.00 (Harris and Jones, 2017) datasets, respectively. Both datasets have a 0.5° global surface grid resolution. The ISLSCP2 dataset is an observation-based biome map, which classifies the global land surface into 15 vegetation types (Fig. 1a). The ISLSCP2 dataset represents the 80 world's vegetation cover that would most likely exist now in equilibrium with present-day climate and natural disturbance in the absence of human activities. The CRU TS4.00 is based on an archive of climatic conditions observed in more than 4,000 weather stations distributed worldwide. Climatic conditions between 1971 and 1980 were selected for CNN training since this time period is just before the beginning of a clear global warming trend (Rood, 2015), and the number of meteorological stations that contributed to the dataset remained relatively stable (Harris et al., 2014). 85 In machine learning experiments, a fraction of the training data is typically divided randomly into two subsets, of which one is used for model training, and the other is then used to validate the trained model. This study used the CRU TS4.00 climate data as training data, which was generated by interpolating data from weather stations, meaning that values in each grid are not independent of those in nearby grids. Under these circumstances, validation using the typical procedures described above would risk overfitting (i.e., training the model too closely or exactly to a particular set of data, thereby creating a model that 90 may fail to fit additional data or reliably predict future observations) (Leinweber, 2007). Therefore, other climate datasets were used for validating the trained model: NCEP/NCAR reanalysis (Kalnay et al., 1996), and the HadGEM2-ES (Collins et al., 2011) and MIROC-ESM datasets (Watanabe et al., 2011). Notably, the nature of these three datasets is different from that of the CRU TS4.00; the NCEP/NCAR consists of reanalysis data that incorporates observed and weather model output data, while the other two datasets were derived only from climate models. Details of these climate datasets are available in 95 Table S1. To be consistent with the training data, the spatial resolutions of the validation data were linearly interpolated to a 0.5° grid mesh, and climatic conditions from 1971 to 1980 were employed.
In this study, the accuracy when the model was applied to the training climate dataset (i.e., the CRU dataset) is referred to as the "training accuracy", which shows how well the model was trained to extract common features of each category from images. The accuracy for the validation climate dataset (i.e., the NCEP/NCAR reanalysis, Had2GEM-ES, and MIROC-ESM 100 datasets) is referred to as the "test accuracy", which shows how the model is robust against independent input data.

Visualization of climate data for machine learning
We graphically represented the standardized air temperature and precipitation data on a grid using R statistical computing software version 3.3.3 (R-Core-Team, 2018). These images will be referred to hereafter as visualized climatic environments (VCEs). For efficient machine learning, climate data were standardized prior to visualization. The -20-30°C monthly mean 105 air temperature range and 8-400 mm/month precipitation range were log-transformed to 0.01-1.00. Values below and above these ranges were respectively treated as 0.00 and 1.00. To evaluate how seasonality of climate regulates the biome, we also conducted CNN training with annual mean air temperature and annual precipitation. For this analysis, an annual mean biotemperature range of 0-30°C and an annual precipitation range of 80-4000 mm/year were used. Here, bio-temperature was defined as the mean of above-freezing monthly air temperatures. Using the annual mean bio-temperature and annual 110 precipitation, we first evaluated how different representations of the VCEs influenced the training and found no major differences (Table S2), and hence the most compact VCE with the smallest computation time requirement, the RGB colour tile, was used for this entire study.
In the VCE of the RGB colour tile, up to three climate variables can be represented by RGB channels. To find the optimal combination of climatic variables, we systematically evaluated the model performance of 14 combinations of climatic 115 variable experiments for both annual and monthly means (Tables S3 and S4, respectively). Downward shortwave radiation and humidity were added for this evaluation, as all of the climate datasets contain these. Generally, training accuracy increases with the number of climatic variables; however, the test accuracy does not increase further after two climatic variables. This suggests that models with three climatic variables are at risk of overfitting. Amongst the models of annual and monthly means of climatic variables, the model with monthly mean air temperature and monthly precipitation had the 120 highest test accuracy. Therefore, models that combined air temperature (bio-temperature for the model of annual mean climate) and precipitation were employed for the entire study.
We also evaluated the influences of different transformations of climatic variables (Table S5) and assignment patterns of air temperature and precipitation to RGB colour channels of the VCE (Table S6) on the resulting accuracy. Based on these evaluations, we settled on models with a combination of air temperature (bio-temperature for the model of annual mean 125 climate) and precipitation, both of which are log transformed, and assigned to the blue and red channels, respectively, of the colour tile VCE representation. Examples of VCEs of annual mean climate and monthly mean climate are shown in figures S1 and S2, respectively.

Training of the CNN model
The LeNet (Lecun et al., 1998), which is the world's first CNN, was employed for this study. The computer employed to 130 execute the learning had Ubuntu 16.04 LTS installed as the operating system and was equipped with an Intel Core i7-8700 CPU, 16 GB of RAM, and an NVIDIA GeForce GTX1080Ti graphics card, which accelerates the learning procedure. On the computer, the NVIDIA DIGITS 6.0.0 software (Caffe version: 0.15.14) served as the basis for CNN execution, and LeNet was employed to train the CNN via the TensorFlow library. To see how DIGITS actually implements the CNN, its internal code can be viewed using the DIGITS menu (on the "New image model" screen, click the "Custom Network tab" and select 135 "TensorFlow"). A description of the CNN model and its parameter settings are available in Supplementary Information 1. To train the CNN model, ten VCEs corresponding to years 1971-1980 were generated for each grid using the CRU data, resulting in 572,640 VCEs (i.e., 10 × 57,264 grids). These VCEs were assigned to 15 categories according to the observation-based biome of the grid, and the CNN model was trained to determine biomes from the VCEs. The numbers of training VCEs for each biome ranged from 4,490 (comprising temperate broadleaf evergreen forest/woodland areas) to 140 91,740 (comprising evergreen/deciduous mixed forest area). The training was conducted for each of the annual and monthly sets of VCEs, and their computation times for training completion were 109 and 132 minutes, respectively. The annual and monthly climate training procedures are identical except for its VCEs.

Validation of the trained model
To validate the trained CNN model, a VCE of the average climate conditions from 1971 to 1980 was obtained for each grid 145 and each validation climate dataset. These VCEs were applied to the trained CNN model and were classified by their most plausible biome. It took roughly 8 minutes to complete the VCE classification (i.e., 57,264 in total) for each climate dataset.
Then, the computed biome distributions were validated by quantitative comparison with the observation-based biome map of ISLSCP2.
For comparing the differences and similarities between two biome maps, cross-tabulation matrices were obtained for each 150 comparison. Tables S7 and S8 show cross-tabulation matrices of training accuracies as examples. Using these matrices, the differences between the two biome maps were separated into two components: quantity disagreement and allocation disagreement (Pontius and Millones, 2011). Here, a quantity disagreement indicates a discrepancy between the proportions of the categories (i.e., the biome), while an allocation disagreement indicates a discrepancy in the spatial allocation of the categories under a given set of category proportions in the reference and comparison maps. 155 The use of one particular climatic dataset for training and three different climatic datasets for validation introduces a source of arbitrary error. To examine the dependency of climatic datasets for training and reconstructing performances, an experiment was performed wherein training and reconstruction of the same biome map was conducted using all combinations of the four historical climatic datasets, and then the reconstructive accuracies were compared. which performance was compared among models trained on monthly climate data averaged over 10-year (1971-1980; control), 20-year (1961-1980), and 30-year (1951-1980) periods. Validation datasets for each model were averaged over the same periods as the training data.
We used different climate datasets for training and validating the models to avoid overfitting that may be caused by dependencies in values among nearby grids in the training data (CRU TS4.0). To assess the effects of overfitting, we 165 compared performance among four models that differed with respect to the grain size of training data. Nearby grid cells (0.5°) of the CRU dataset were aggregated by one of four grain sizes: 1 × 1 (0.5°), 2 × 2 (1.0°), 4 × 4 (2.0°), and 8 × 8 (4.0°).
For each grain size group, 70% of grains were randomly selected for model training, and the remaining were assigned to validation. Validation with coarser grains should be less impacted by overfitting. In addition to the extent of overfitting, grain size may also influence training efficiency, because a coarser grain may skew the allocation ratio of minor biomes 170 between training and validation sub-groups, especially when these biomes have clumped distributions. To assess this possibility, validation was also conducted using other climate datasets.
Finally, we conducted an additional experiment for comparing the accuracy of PNV map reconstruction between the HLZ scheme and our method using common training data set. We developed a look-up table of the most common PNV for each combination of annual mean bio-temperature class and annual precipitation class, consistent with the HLZ scheme. The bin 175 sizes of the HLZ scheme are six for the annual mean bio-temperature class and eight for the annual precipitation class. As these coarse-grained bin-sizes would potentially depress the accuracy of the PNV simulation, we also developed look-up tables of 12 × 16, 24 × 32, and 48 × 64 bin-sizes to ensure that the comparison between our model and the HLZ scheme is as fair as possible. Note that the HLZ scheme employs a hexagon table, but we employed a cross-tabulation table for simplicity. CRU annual climate and ISLSCP2 PNV map were used for generating the table. Then the table was applied to all 180 climatic datasets we employed in this study, drawing reconstructed PNV maps for comparison.

Application of the CNN model to future climate scenarios
Following validation, the CNN model trained with monthly mean climate data was used to predict future biome distribution maps by applying climate scenarios for the 21st century. These predictions were conducted in combinations of two GCMs (i.e., MIROC-ESM and HadGEM2-ES) and two representative concentration pathways (RCPs; i.e., RCP2.6 and RCP8.5). 185 These RCPs represent the atmospheric greenhouse gas (GHG) concentration forecasts adopted by the IPCC for its 5th Assessment Report (AR5) in 2014. RCP2.6 assumes that global annual GHG emissions will peak between 2010 and 2020 and decline substantially afterwards. By contrast, RCP8.5 assumes that emissions will continue to rise throughout the 21st century. The scenarios RCP2.6 and RCP8.5, respectively project that atmospheric CO2 could reach 421 ppm and 936 ppm by the end of the 21st century (IPCC, 2013).  1 and 2). Besides the most plausible biome, the CNN outputs its certainty, which is the probability (in %) of the classification judged by the CNN. Geographical distribution 195 of the certainty clearly showed considering seasonality improves the certainty except Northern parts of South America and African continents where no apparent seasonality exists (Fig. S3). These results are consistent with Prentice et al. (1992), demonstrating that global biome distribution is under substantial controls of climatic tolerance and the occurrence and extent of drought seasons. In fact, seasonality significantly improved the average training accuracies from 3.5% to 61.9% for tropical deciduous forests, 0.4% to 54.8% for temperate broadleaf evergreen forests, and 24.5% to 79.0% for boreal 200 deciduous forests (Tables S7 and S8). The same pattern can be observed in test accuracy comparisons (Figs. 2, 3, and S3), although temperate broadleaf evergreen and boreal deciduous forests were largely absent from Had2GEM-ES and MIROC-ESM, respectively (Fig. 2). These absences would be due to differences in the reconstructed current climate among datasets (Fig. S4). Overall, for all climatic datasets examined, better training and test accuracies were consistently obtained in CNN models trained with monthly mean climate data than in those trained with annual mean climate data (Fig. 4). Thus, the CNN 205 model trained with monthly mean climate data was used for analysis with the climate scenarios in the 21st century.
For all combinations of CNN models and climatic data, the allocation disagreement was much larger than the quantity disagreement: while the allocation disagreement ranged from 0.227 to 0.392, the quantity disagreement varied from 0.037 to 0.200 (Fig. 4). The larger allocation disagreement can be explained by the tendency of observation-based biome distributions to be fragmented over areas with similar climatic conditions (Fig. 1a), while model reconstructed biome distributions had 210 more continuous structures (Figs. 1b, 1c, and 3) (For example, Australian continent). The probability of the most plausible biome tended to be lower for these fragmented regions (Fig. S3), suggesting these regions have climatic conditions suitable for multiple potential biomes. The lower quantity disagreement demonstrated that the CNN model reconstructed the fraction of the global biome composition under the current climatic conditions well. As the main purpose of this research is to develop an empirical model of climatic controls on biome distribution, this would indicate that the reconstructions of biome 215 maps with the CNN models are actually much more accurate for their particular purpose than implied by the accuracies found from the simple map comparison. Table 1 compares the dependence of reconstruction accuracy on combinations of climate datasets for training and test climate datasets. Accuracies were higher and less variable when the climate dataset for training and testing were identical (0.701-0.734), compared to when these datasets were different (0.394-0.559). These results suggest that uncertainty in 220 historical climate reconstruction and over-fitting are more significant sources of failure in reconstructing biome distribution than the dependency of training on a particular climate dataset.
No major trends were observed in test accuracies in the sensitivity test, which compared performance among models trained using monthly climate averaged over 10-, 20-, and 30-year periods (Table S9). This indicates that climate data averaged over a 10-year period are sufficient for model training. However, long-term climatic conditions are important in controlling biome 225 distribution via extreme climates, which may cause complete reorganization of systems and communities and may provide important opportunities for, and constraints to, plant recruitment. For example, in response to anomalous drought during 2002-2003, regional-scale dieoff of overstory woody plants was observed across southwestern North American woodlands (Breshears, et al., 2005). Considering the effects of extreme climates in the model would be an interesting topic for future study. 230 Grain size of the training and validation data did not result in noticeable differences in training and test accuracies, with the exception of the CRU dataset (Table S10), demonstrating that the influence of grain size on training efficiency is negligible.
In contrast, test accuracies of the CRU dataset were lower at coarser grain sizes, at 80.4%, 78.2%, 76.1%, and 72.2% for the 1 × 1, 2 × 2, 4 × 4, and 8 × 8 grain sizes, respectively. These results suggest that dependencies in values among nearby grids in the CRU dataset resulted in overfitting. However, the effect of overfitting appears to have been much smaller than that of 235 systematic differences among climate datasets (Fig. S4); irrespective of grain size, test efficiencies of the CRU dataset were least 19.5% higher than those of other datasets. Therefore, our validation method, which suffers from the systematic differences among climate datasets, should underestimates the actual performance of the models, and performance would be much better than we demonstrated in this manuscript.
Accuracies of PNV reconstructions using the HLZ look-up tables for each climate data-set increase with the resolution of bin 240 sizes for climate classifications (Table 2). It reaches quasi-equilibrium at 24 bio-temperature classes × 32 precipitation classes, which delivers nearly identical results with the CNN model. This result demonstrates that our VCE method extracts the best possible distribution of the most plausible PNV in a two-dimensional space of climatic variables.

Prediction of biome distribution with the CNN model
The applications of the CNN model to the climate scenarios predicted a significant shift in global biome distributions (Fig. 5) 245 and area coverages (Fig. S5) under rapid warming trends (Figs. S6 and S7). For both GCM outputs, more intense biome shifts were predicted for RCP8.5 than for RCP2.6, but the shift trends remained consistent. The most visible change was the expansion of temperate forests over boreal forests in both North America and Eurasia. Boreal and cold vegetation shrank and its composition changed; tundra areas gave way to boreal forests, while boreal evergreen forests became confined to a narrow strip at higher latitudes. Tropical vegetation remained relatively unchanged, but nearly all tropical deciduous forests 250 in the southern hemisphere were substituted by savanna, which coincided with a reduction in annual precipitation (Figs. S6 and S7).
Given the uncertainty of the climatic predictions derived from the ESMs and RCP scenarios, our analysis of the climate change effect only indicates the potential for considerable changes in biome distribution at the end of the 21st century.
Besides, changes in expected biome, which is an equilibrium state of vegetation coverage, are not always accompanied by 9 immediate changes in actual vegetation. In fact, these time lags can be very long (i.e., decades to millennia) because the adjustment of vegetation to new climate conditions entails a series of plant population dynamics processes, such as seed dispersal, establishment, competition against other existing plants, and reproduction (Sato and Ise, 2012). Even present-day plant species distributions are considered not in equilibrium with present-day climates (e.g., Woodward, 1990). Our study cannot infer such transient changes in vegetation, however, current process base approaches are also not a reliable option for 260 reconstructing plant population dynamic processes at the global scale; biome map predictions under common changing climate scenarios differ significantly from state-of-the-art dynamic global vegetation models (DGVMs) (Pugh et al., 2020).
Hence, empirical and top down approaches, like our simulation, should still have an important role to play in approximate mapping of biomes under changing climatic conditions.

Limitations and future directions of our approach 265
There are two types of approach to mapping biomes: the correlative climate-vegetation approach and process-based approach (Notaro et al., 2012;Yates et al., 2009). We employed the former, which has advantages and disadvantages compared to the latter. An advantage of the correlative approach is that it is relatively straightforward and may be rapidly applied to different climate change scenarios. Indeed, models using the correlative approach are a common tool for predicting the impacts of climate change on biodiversity for conservation planning, because they can be easily used to simultaneously assess large 270 numbers of species (e.g., Thomas et al., 2004).
An important disadvantage of the correlative method is that extrapolating current correlations between climate and biome distributions into the future may lead to seriously biased predictions; strong performance in the present climate does not guarantee similar performance under a new set of climatic conditions that may occur in the future. However, neither Had2GEM-ES ( Fig. S3f and Fig. S8a, b) nor MIROC-ESM (Fig. S3h, and Fig. S8c, d) showed apparent expansions of biome 275 uncertainty in projected climatic conditions at the end of the 21st century. This may suggest outside the environmental space of the training data is not conspicuous at the global scale. For quantifying methodological uncertainty might also result from comparing performances between correlative and process-based models in 'unsuitable' outside the environmental space of the training data (Yates et al., 2009).
A second disadvantage of the correlative approach is that it cannot infer impacts of elevated atmospheric CO2 on biome 280 distribution. An increase in CO2 may favour forests over grasslands due to the advantage that C3 plants may gain over C4 plants under such conditions (Bond et al., 2003). Notably, palaeoecological studies have demonstrated that C4 ecosystems were more extensive during the last glacial maximum and decreased in abundance following deglaciation in response to increased atmospheric CO2 concentrations (Ehleringer et al., 1997). Besides, projections of atmospheric CO2 have significant divergence among socioeconomic scenarios from 421 ppm (RCP2.6) to 936 ppm (RCP8.5) at the end of the 21st 285 century.
DGVMs, which use process-based approaches, may facilitate the identification of areas where elevated CO2 may affect biome distribution under projected climates. Indeed, the third phase of the Inter-sectoral Impact Model Inter-comparison Project, now in progress (Warszawski et al., 2014), includes a sensitivity test for CO2 in which biome distribution is compared between scenarios of both climate and CO2 change, and scenarios of climate change only. We should note, 290 however, that even for current state-of-the-art process-based models, incorporating effects of elevated CO2 is not straightforward due to their complexity; effects appear to be taxon specific, to interact strongly with soil type and climate, and to be highly dependent on nitrogen availability (Korner, 2003;Spinnler et al., 2002).
We must also keep in mind that the correlative climate-vegetation approach ignores feedbacks between vegetation and climate, which are known to influence vegetation distribution at equilibrium (Pitman, 2003). Both Had2GEM-ES and 295 MIROC-ESM explicitly consider climate-vegetation interactions, including dynamic adjustment of biome distribution, and hence its projected climates are the outcomes of such interactions. However, due to the difference in projected distributions of biomes among models, some regions should have mismatched reconstructions of the interactions. Implementing the CNN model with earth system models to dynamically adjust biome distribution to simulated climate distribution would address this issue. 300 The CNN model was trained with an observation-based biome map, which is composed of natural vegetation only. However, the impact of human activity on ecosystems is now so prevalent, and hence predicting ecosystem changes without explicit consideration of socio-economic systems would be challenging (Ellis, 2015). Therefore, future research might address how current patterns of human activity interact with projected biome changes to reveal regions where these interactive agents align and amplify one another. 305 This study only considers biome distribution at the 0.5-degree scale. At this scale, climate can be regarded as the dominant factor that determines vegetation composition, and hence correlative climate-vegetation approach fits well in identifying vegetation distribution. However, at more local scales, topography, soil type, and fine-scale biotic and abiotic interactions (e.g., habitat structure, fire, storms) become increasingly important (Willis and Whittaker, 2002). One possible extension of our study is integrating these factors, acting at different spatial scales, into a hierarchical modelling framework (Pearson and 310 Dawson, 2003). Another possible extension is simply adding one more variable that tightly controls PNV at sub-grid scales (such as altitude, slopeness, or slope aspect) into the VCE because one of the three RGB channels is empty in our model. Our study adopted the LeNet architecture implementation, which has six hidden layers, to create CNN models. Botella et al. 315 (2018) found that a deep network (six hidden layers) outperformed a shallow network (one hidden layer) for building species distribution models; however, Benkendorf and Hawkins (2020) found that using more than two hidden layers was of no benefit, and argued that the usefulness of deeper networks depends on the size of the training dataset. Therefore, carefully selecting the approximate complexity of architecture implementation may improve model accuracy. We compared performances of models trained by four different types of VCE representation of annual precipitation and average annual 320 bio-temperature, and all models have an almost equal performance (Table S2). This result might indicate that LeNet perfectly extracts at least two variables irrespective of how visualized. Lastly, the default parameters in NVIDIA DIGITS 6.0 remained largely unchanged. Our approach was kept relatively simple to demonstrate the robustness of our concept; however, further improvements to the scheme could be explored by selecting other implementation architectures and systematically testing the effect of parameter modulation. 325

Conclusion
Regardless of the limitations discussed above, this study provides an efficient and practical method for generating preliminary estimations of the potentially dramatic impact of climate change on biome distributions. Since this method is simply an application of image classification AI, it demands much less technical skill and computer resources.
Reconstruction of global biome distribution substantially improved when climate seasonality was taken into consideration, 330 demonstrating that the method successfully extracted seasonal patterns of climatic variables that are relevant in biome classification. This method could also be applied to building empirical models of other climate-driven phenomena such as cropping systems and the spread of vector-borne diseases, and hence has potential to be a de facto standard for building empirical models across a range of research and application fields.

Data availability 335
All data required to reproduce the analyses described herein are publicly available at the following URL/DOI: