Development of a Reduced Complexity Plant Canopy Physics Surrogate Model for use in Chemical Transport Models: A Case Study with GEOS-Chem v12.3.0

Biosphere-atmosphere interactions strongly influence the chemical composition of the atmosphere. Simulating these interactions at a detailed processbased level has traditionally been computationally intensive and resource prohibitive, commonly due to complexities in calculating radiation and light at the leaf level within plant canopies. Here we describe a surrogate canopy physics model 15 based on the MEGAN3 detailed canopy model parameterized using a statistical learning technique. This surrogate canopy model is designed specifically to rapidly calculate leaf-level temperature and photosynthetically active radiative (PAR) for use in large-scale chemical transport models (CTMs). Our surrogate model can reproduce the dominant spatiotemporal variability of the more detailed MEGAN3 20 canopy model to within 10% across the globe. Implementation of this surrogate model into the GEOS-Chem CTM leads to small local changes in ozone dry deposition velocities of less than 5%, and larger local changes in isoprene emissions of up to ~40%, though annual global isoprene emissions remain largely consistent (within 5%). These changes to surface-atmosphere exchange lead to small changes in 25

surface ozone concentrations of ±1 ppbv, modestly reducing the northern hemispheric ozone bias, in which is common to many CTMs, here from 8 to 7 ppbv.

Introduction 35
The biosphere plays an important role in modulating the abundance and variability of trace gases and aerosol in the atmosphere. Direct emissions of gas phase species are drivers of the majority of the natural reactivity in the atmosphere, and are important precursor sources to pollutants and climate relevant species like ozone and particulate matter IPCC, 2013;Safieddine et al., 2017). 40 On the other hand, vegetation serves as a direct sink for these same species through a process known as dry deposition (Lelieveld and Dentener, 2000;Silva and Heald, 2018). The physical structure of the vegetation can also influence the production and loss of atmospheric constituents through changes to atmospheric turbulent transport and reductions in the actinic flux below the canopy (e.g. Makar et al., 45 2017). Additionally, chemical reactions occurring within the plant canopy act as a source and sink for reactive species in the above-canopy atmosphere (Goldstein et al., 2004;Makar et al., 1999). Ultimately, the balance between the role vegetation plays as a chemical source and sink is a controlling factor for the abundance and variability of trace gases and aerosol across many regions of the globe (e.g. Geddes 50 et al., 2016;Silva et al., 2016;Unger, 2014). It is thus important to properly account for these processes when simulating the composition and chemistry of the atmosphere.
Explicitly simulating biosphere-atmosphere interactions necessitates a detailed representation of physical, chemical, and biological processes that occur at the scale 55 of an individual plant. This is typically achieved by integrating a set of energy and radiative balance equations vertically throughout a canopy (e.g. Ashworth et al., 2015Ashworth et al., , 2016Goudriaan and Laar, 1994). This sort of physical model of the canopy calculates the environmental parameters that drive the biological and chemical processes which ultimately impact the atmospheric fluxes of trace gases and aerosol 60 Lamb et al., 1996). These canopy models tend to be computationally quite expensive, and are based on measurements taken at very fine resolution (e.g. meter or less), while most atmospheric chemical transport models operate on the 10-200 km scale. Reconciling these differences in scale and addressing the steep computational requirements inherent in both canopy models 65 and atmospheric chemical transport models are critical challenges in simulating chemically relevant interactions between the biosphere and the atmosphere.
Given the computational costs, atmospheric chemical transport models approximate canopy physics and the resulting effects on biosphere-atmosphere interactions through various parameterizations. (e.g. Guenther et al., 2006;Wesely, 1989;Zhang 70 et al., 2003). Most of these parameterizations are based on observed relationships, and intended to reduce the computational load around the calculation of the temperature of leaves and the amount of light (specifically photosynthetically active radiation, PAR) reaching leaves throughout the canopy. These model parameterizations commonly assume that the temperature of leaves is equal to the 75 air temperature just above the plant canopy (e.g. Guenther et al., 2006;Millet et al., 2010) or are based on parameterizations that ignore leaf temperature entirely (e.g. Wesely, 1989). The parameterizations for leaf level PAR vary widely; from assuming that the PAR reaching leaves in the canopy is equal to the flux of PAR incident on the top of the canopy, to having some sort of reduced complexity multiplicative factor 80 that represents the bulk canopy effects (e.g. shading of leaves, Guenther et al., 2006;Wang et al., 1998). To our knowledge, the overall impact of these parameterized assumptions on the fidelity of modern chemical transport models has not been comprehensively characterized. However for biogenic isoprene emissions, these canopy approximations can lead to regional differences of greater than 20% relative 85 to a fully detailed canopy model (Guenther et al., 2006). Direct representation of these processes is a necessary step to improve model reliability and validity, particularly in a rapidly changing Earth System (Committee on the Future of Atmospheric Chemistry Research et al., 2016). Currently, many processes related to canopy energy and radiative balance are not represented in 90 models of atmospheric chemistry due to computational constraints. In this work, we present a reduced complexity canopy model to calculate leaf temperature and PAR for use in large-scale chemical transport models. This reduced complexity model removes the need for approximating bulk effects of plant canopies on leaf-level PAR and leaf temperature, and it allows for a more explicit process-based representation 95 of these effects on biosphere-atmosphere interactions at the leaf level. Our reduced model reproduces the output of the more detailed vegetation model well, without the large computational overhead.

MEGAN3 Canopy Model
We develop and implement a computationally efficient surrogate of the MEGAN3.0 100 canopy model (https://bai.ess.uci.edu/megan, last accessed 04/09/2019), an update from previous versions of MEGAN (Guenther et al., 2006. This canopy model calculates leaf temperature and leaf-level PAR for a 5-layer plant canopy for both sunlit and shaded leaves, where each canopy layer represents a fraction of the total plant canopy. The model is originally based largely on Goudriaan and Laar 105 (1994) and a brief description follows; for more information see Guenther et al. (1999Guenther et al. ( , 2006Guenther et al. ( , 2012. In the MEGAN3 canopy model the fraction of sunlit leaves in the canopy decreases exponentially as a function of the local solar elevation angle, canopy leaf area index (LAI), a clustering coefficient that accounts for leaf geometries, and a canopy 110 transparency coefficient representing the fraction of the canopy that does not intercept incident radiation. The leaf temperature is calculated from a system of energy balance equations based on Goudriaan and Laar, (1994) and Leuning et al. (1995), with a maximum absolute difference between air temperature and leaf temperature of 10˚C. Leaf-level PAR is computed as a function of incoming radiation 115 incident to the canopy top, the sunlit fraction of leaves, LAI, and a suite of geometric and radiative look up table characteristics, predominantly based on Goudriaan and Laar (1994), Leuning et al. (1995), and Spitters (1986 (Keenan et al., 2011). The MEGAN3 canopy model has been specifically developed for use in simulating biogenic emissions, and has been extensively applied in related studies (e.g. Chen W. H. et al., 2018;Geron et al., 2016;Guenther et al., 2006). Additionally, the MEGAN 130 framework has been widely adopted across a variety of regional and global models (e.g. GEOS-Chem, WRF-CHEM, and CESM). Thus the MEGAN3 canopy model is a good candidate for surrogate model development because it enables a direct implementation of improved process-based canopy physics into a variety of 3D models without the need for substantial model architecture development. 135

Surrogate Model Development
To begin the surrogate model development, we first use a variable selection approach to evaluate and rank which of the suite of model input variables are most important for the simulation of both leaf level PAR and temperature. To do this, we use a machine learning regression method for model simplification and 140 parameterization, specifically LASSO (Least Absolute Shrinkage and Selection Operator, Hastie et al., 2001). As applied here, LASSO is a regression method that calculates linear coefficients through a modified least squares cost function, with the addition of a penalized L1 norm (the sum of the absolute value of the coefficients).
While LASSO was originally developed as a complete regression method, we follow 145 the recommendations of Hastie et al. (2001) and use LASSO only for variable importance ranking and dimensionality reduction of the input variable space to the model.
We apply the linear LASSO method for rankings across a full year of 3-hourly simulated canopy physics from the MEGAN3 canopy model at the global scale for the 150 year 2012. Input meteorology is from MERRA-2 assimilated meteorological fields at 2˚x2.5˚ horizontal resolution (Gelaro et al., 2017), and the vegetation distribution from the Olson 2001 land maps (Olson et al., 2001). LAI is derived from the MODIS TERRA MOD15A2 Product (Myneni et al., 2002(Myneni et al., , 2007 re-gridded to 2˚x2.5˚ horizontal resolution and a monthly timescale. This input data is identical to that 155 used in the GEOS-Chem chemical transport model (CTM) described below for direct comparison with prior work and ease of implementation into that CTM. The spatial distribution of vegetation and LAI are summarized in Figures 1 and 2, respectively.
In general, forested land classes have the highest LAI values, and are spread throughout the tropics and the northern latitudes. Crops, grasses, and shrubs are 160 located predominantly in transitionary landscapes and near regions of larger population (e.g. India, Central North America, etc.), and tend to have lower LAI values.  The LASSO importance rankings are remarkably consistent for both sunlit and 170 shaded leaves and for all vertical levels of the canopy. For each quantity, the two highest ranked variables are consistent at each layer throughout the canopy, and have substantially larger importance to the final result than any additional variable.
For brevity we discuss only those first two variables here. The two most important variables for the calculation of leaf temperature are air temperature and wind 175 speed. Air temperature dominates in importance for the calculation of leaf temperature, with a larger coefficient emerging at a higher L1 norm penalty weighting. Other variables that are physically important in nature (e.g. solar radiation) do not appear important in the LASSO rankings due in part to how they covary with air temperature, and how the rankings are derived separately for sunlit 180 and shaded leaves. For the calculation of leaf PAR we find that the two most important variables are PAR out of the lowermost atmospheric gridbox (incident on the canopy), and the local vegetation LAI. We use these selected variables to develop a simplified parameterization for leaf temperature and PAR.
We model leaf temperature for a given canopy level, i (Ti,leaf, K) as linear with 2-185 meter air temperature (Tair, K): (1) Ti,leaf = Ai + Bi *Tair Where Ai and Bi are fitted parameters per canopy level (i). This ignores the addition of the second most important variable, wind speed. However, the addition of wind speed to the regression only improves the performance of the model by less than 190 1% in total bias and R 2 ; thus for simplicity, we neglect this variable.
For the calculation of leaf-level PAR at a given canopy level (PARi,leaf, μmol m -2 s -1 ), we use an exponential Beers-Law analogue, including the influence of Leaf Area Index (LAI) and the PAR incident to the top of the canopy (PARtoc, W/m 2 ): (2) PARi,leaf = PARtoc * exp(Ci + Di*LAI). 195 Where Ci and Di are fitted parameters per canopy level (i).This exponential functional form is chosen due to the observed and simulated relationships between LAI and canopy light interception (Engel et al., 1987;Goudriaan and Monteith, 1990) following a similar functional form.
We fit equations 1 and 2 for all layers of the canopy and for sunlit and shaded leaves, 200 resulting in 20 total free parameters necessary to model the entire plant canopy across the globe. In this regression method, we ignore the role of differing vegetation classes and apply the regression agnostic to vegetation type. This is done to keep the total necessary number of free parameters low (20 versus 120), and because this more parsimonious model performs quite well (see Section 3.1) 205 without the need for additional vegetation type specific coefficients. The resulting surrogate model coefficients are summarized in Table 1.

210
The final quantity necessary for the canopy model is the fraction of sunlit and shaded leaves. Here, that fraction in each layer of the plant canopy is calculated directly following the MEGAN3 code, (see Guenther et al., 2006Guenther et al., , 2012, without any model simplification. The sunlit fraction is calculated as follows: Where is the solar angle above the horizon, Kb is the extinction coefficient for black leaves, C1 is the canopy clustering coefficient, C2 is the canopy transparency, and f is the fraction of biomass in the canopy light travels through to reach a given leaf (the vector [0.05, 0.23, 0.5, 0.77, 0.95]). Consistent with the MEGAN3 parent 220 canopy model, we assume a Gaussian distribution of biomass in the canopy, centered in the middle canopy layer, and a canopy transparency of 0.2, and a leafclustering coefficient of 0.9.
From this relatively simple three-function parameterization (Leaf Temperature, Leaf PAR, and Sunlit Leaf fraction), we are able to implement more physically 225 realistic parameterizations for biosphere-atmosphere interactions in atmospheric chemical transport models.

Surrogate Model Performance
Here, we evaluate the surrogate model performance for all vegetation globally.

Temperature 230
The surrogate model simulated annual canopy average leaf temperature distribution and performance for 2012 is summarized in Figures 3 and 4. In Figure   3, the average for each canopy layer is calculated as a sum of the sunlit and shaded leaves, weighted by the sunlit fraction of that layer. In turn, the canopy average is calculated as the weighted sum of the layer averages, weighted by the fraction of the 235 canopy biomass in each layer. The annual average temperature is shown in Figure   3a, where it largely follows a latitudinal gradient. The warmest temperatures are ~310 K in the tropics, and the coldest average leaf temperatures are ~280 K in the northern high latitude boreal regions. The surrogate model is linear with the 2meter "near-surface" air temperature, and therefore follows that spatial distribution 240 directly.
The surrogate model for leaf temperature performs well, with the annual average spatial R 2 and mean bias relative to the full model shown in Figures 3b and 3c, respectively. Across all regions, the R 2 is very high, indicating that a linear relationship between 2-meter air temperature and canopy average temperature is a 245 good approximation for capturing the variability of the full MEGAN3 canopy model.
The temperature R 2 drops below 0.90 only in coastal regions/gridboxes that contain very little vegetation, representing less than 5% of all vegetated areas. The temperature surrogate bias is also generally quite low, as shown in Figure 3c. The majority of regions have an absolute mean bias of less than 1 K, and more than 90% 250 of the annual average mean biases are less than 2 K. The surrogate model generally computes temperatures that are biased cool over highly vegetated tropical and subtropical regions, and slightly overestimates temperature over northern boreal forests (by ~0.1 K). The most substantial overestimations occur in or near the hot and arid regions of the globe, and always in regions where there is little vegetative 255 cover at all. On a relative scale these biases are quite small; all are less than 1% of the total magnitude. The model development process described in Section 3 removed the vegetation type discrimination in the parent MEGAN3 canopy model. This leads to some spatial coherence in the global bias patterns, where regions dominated by coniferous forests, grasses, and shrubs (e.g. boreal northern 260 hemisphere and the western United States) tend to be slightly biased warm, and the other vergetation types are biased slightly cool. The vertical profile of annual average leaf temperature is shown in Figure 4a. The broad shape of the canopy profile is consistent across vegation types and the globle.
The upper canopy layers are cooler than the lower canopy layers, as an insulating 270 effect from air temperature occurs within the canopy. The higher order variability (e.g. small differences between adjacent layers at the top and bottom of the canopy) stems from the more detailed representation of canopy energy balance in the full MEGAN3 model, which includes the influence of terms like PAR, relative humidity, LAI, and wind speed. However this higher order variability is quite consistent, 275 allowing for it to be reproduced in the simplified surrogate model. Similar to the spatial performance, the overall surrogate model performs well throughout the canopy. The surrogate model temperature R 2 is shown in Figure 4b.
The values are all near 1.0, with the lowest value of 0.97 in the middle of the canopy, where the transition from cooler to warmer leaves is slightly more difficult for the 280 surrogate model to capture. As demonstrated in Figure 4c, the bias throughout the canopy is low as well. On a global annual average, the surrogate model is biased cool, but only slightly (on both a relative and absolute scale). The highest magnitude bias is at the top canopy layer, at -0.04 K.

Photosynthetically Active Radiation 290
The annual canopy average leaf-level PAR for 2012 is shown spatially in Figure 5a.
In Figure 5, canopy temperature averages are calculated using the same method as for canopy temperature. Annual average PAR varies from ~200 μmol m -2 s -1 to greater than 600 μmol m -2 s -1 . This spatial variability is a function of both PAR incident on the top of the canopy (largely related to cloud cover and solar angle) and 295 the canopy LAI. Leaf level PAR in the surrogate model varies linearly with incident PAR, and decreases exponentially with LAI. Additionally, the reduction under high LAI is exacerbated due to a higher fraction of shaded leaves in high LAI canopies,  The average vertical distribution of leaf-level PAR throughout the canopy and the associated surrogate model performance are shown in Figure 6. To explore the additional dependence on LAI, the quantities shown are separated into three LAI ranges. These are as follows: a low range with LAI less than 0.5, a midrange with LAI 335 between 0.5 and 5, and a high LAI range containing canopies with total LAI greater than 5. The low LAI range represents ~40% of all vegation throughout the year, the middle range represents nearly 60%, and the high LAI range contains only a small fraction of all vegetation (~1%).
The average distribution of PAR across canopy levels is shown in Figure 6a. As LAI 340 increases, there is a substantial reduction in leaf level PAR deeper into the vegetated canopy. This is particularly obvious with the high LAI range, consistent with substantial shading and light interception above the bottom of densely vegetated canopies. On the other hand, the variability throughout the low LAI canopies is quite small. This LAI dependence explains in part the relatively low canopy average leaf-345 level PAR throughout the tropical forests in Figure 5a. The variability in the PAR at the top canopy layer (Canopy Level 1 in Figure 6a) stems from two major sources.
The first is simply the spatial distribution of these LAI ranges in relationship to the annual average incident PAR to the canopy top. Very high LAI values occur primarily over the tropics, where sunlight is consistently high throughout the year, and the 350 seasonal effects of changing solar angles is small. The opposite is true for many of the regions with smaller LAI values, which are distributed more evenly across the globe. A second order effect in the MEGAN3 canopy model is that of in-layer attenuation of light and shading throughout the canopy, where leaves in a given layer may intercept light and shade leaves lower within that same layer. This has the 355 effect of reducing the layer average leaf-level PAR as a function of leaf geometries and LAI, and explains why the highest canopy layer average leaf-level PAR is not the same as the average PAR incident on the top of the canopy. Figures 6b and 6c summarize the statistical performance of the surrogate model vertically through the canopy in terms of the R 2 and the mean bias, respectively. 360 Overall, the surrogate model reproduces the PAR variability as compared to the full parent model well. For both the low and middle LAI ranges (LAI less than 5), all R 2 values are greater than ~0.9. The only substantially lower R 2 are from the lower canopy in high-LAI regions, where PAR is generally quite small (see Figure 6a). The surrogate model struggles somewhat to capture this lower canopy variability due in 365 large part to the increased complexity of resolving canopy shading and radiative physics in high LAI canopies. However, the ultimate influence on the total canopyscale bias is generally low.
The vertical distribution of that bias is shown in Figure 6c. Broadly, the absolute PAR bias is low (less than 5% on a relative scale) and decreases throughout the 370 canopy. All biases are positive except for the top canopy layer for high LAI range canopies, and this poor fit is likely related to the limited representation of high LAI regions in the full dataset (only ~1% of all vegetated area), and isn't present if a lower cutoff for high LAI ranges is used (e.g. LAI > 3). The decreasing magnitude throughout the canopy is largely related to the decreasing overall leaf-level PAR (see 375 Figure 6a). It important to note that the bias terms are all sensitive to the choice of LAI bin ranges, and the variability in bias at each level can be quite large (e.g. above 50 μmol m -2 s -1 in the top canopy layer). For both the high and middle LAI ranges, the absolute magnitude of the PAR bias decreases throughout the canopy, and the bias remains relatively constant for low LAI range vegetation. On a relative scale, 380 these biases are all quite small with a normalized mean bias usually less than 5%.
The exception to this is the lowest layer of the high LAI range canopies. In this low canopy layer the magnitude of the bias is quite low, as is the total magnitude of leaflevel PAR, the resulting difference between small numbers leads to a relatively large normalized mean bias of ~0.3. 385 An essential function of canopy models used in CTMs is to calculate the amount of light that falls on already light-saturated leaf surfaces. This regulates the effect of a change in PAR incident on the canopy on various biological and physical processes (e.g. biogenic emissions). We estimate the fraction of leaves that are light-saturated 395 using the gPAR formulation from the MEGAN algorithm (Guenther et al 2006.
This variable aims to capture the amount of light saturation on a given leaf, and ranges from 0 to 1, with higher values corresponding to more saturated leaves. To explore light saturation, we examine cases where the gPAR value is greater than 0.9. A scatterplot of the annual average fraction of leaves that are light-saturated (gPAR ≥  Ultimately, this assessment demonstrates that the surrogate model reproduces the parent MEGAN3 canopy model well for both leaf temperature and leaf-level PAR. 410 The exponential relationship between leaf-level PAR and canopy incident PAR and the linear relationship between leaf temperature and near-surface air temperature captures the majority of the information inherent in the parent model. Some higher order variability in absolute magnitude of the variables is missing from this surrogate model, however the biases are generally all within ~10%. 415

Chemical Transport Model Description
We evaluate the impact of the canopy model parameterization on atmospheric composition using the GEOS-Chem v12.3.0 chemical transport model (www.geos-chem.org). GEOS-Chem is a computational model for simulating atmospheric chemistry, including a detailed HOx-NOx-BrOx tropospheric chemical mechanism 420 (Bey et al., 2001;Mao et al., 2013;Travis et al., 2016). We drive GEOS-Chem with MERRA-2 Meteorology at 2˚x2.5˚ spatial resolution, with 47 vertical layers (Gelaro et al., 2017). The timesteps for convection and chemistry are 10 and 20 minutes, respectively. Identically to the canopy model input data, we use LAI values from the MODIS-Terra MOD15A2 product (Myneni et al., 2002(Myneni et al., , 2007 We modify the MEGAN implementation in GEOS-Chem to allow for the representation of canopy physics decribed in Section 3. In order to properly scale all emission factors to the plant canopy using a canopy model, a normalization factor must be applied at a set of standard environmental and ecological conditions (Guenther et al., 2006. This normalization factor varies depending on the 465 choice of those standard conditions and the canopy model used. In MEGAN2.1 these standard conditions are: LAI of 5, current air temperature of 303K, current incident PAR at the canopy top of 1500 μmol m -2 s -1 , and a 10%/80%/10% split of growing, mature, and senescent leaves Kaiser et al., 2018). We calculate other necessary standard conditions, specifically the 24-hour average air 470 temperature and PAR, from the meteorological fields conditional on locations that meet the previously described standard conditions. In situations where all of the previous instantaenous standard conditions (e.g. Current Temperature = 303K, Current PAR = 1500 μmol m -2 s -1 , and LAI = 5) are jointly met to within ±10%, we calculate the 24-hour average prior meteorological conditions from the global 475 reanalysis fields, and use the mean of those calculations as the standard 24-hour average conditions. The resulting standard conditions for 24-hour average temperature and 24-hour average PAR are 298.5 K and 740 μmol m -2 s -1 , respectively. These standard conditions result in a normalization factor of 0.21 using the surrogate canopy model surrogate developed in this work. The value of 480 0.21 is lower than those used in implementations of previous MEGAN model versions in other models such as CLM (0.3) and WRF-Chem (0.57) . It is important to note that the normalization factor approximately scales with the square of the current temperature conditions and linearly with the current PAR conditions. For the 24-hour average conditions, the scaling is reduced to 485 approximately linear for temperature and as a square root for PAR. Given this, small deviations from these standard conditions (e.g. those that could arise from different 24-hour averaging methodology) can lead to substantial changes in the normalization factor. Additionally, consistent with previous work these standard conditional calculations are likely variable across model meteorological 490 configurations and should be recalculated on a model-specific basis . Since this normalization factor is applied consistently to all emissions globally at all times, it linearly modulates all biogenic emissions. As such, the total emissions calculated by the MEGAN2.1 emissions framework are highly sensitive to the parameter choices made in this normalization processes. 495 In GEOS-Chem v12.3.0 we update the activity factors associated with PAR, LAI, and temperature as well as the normalization to take advantage of our new canopy  (Guenther et al., 2006, the effect of LAI is calculated through direct multiplication of the emission factor by LAI as opposed to an activity factor formulation, along with a canopy normalization factor (CCE): 510 -/ These activity factors for PAR, LAI, and temperature are the same as those in Guenther et al. (2012), as averages throughout the canopy weighted by the biomass fraction within a given canopy layer (wl). There is an additional canopy depth 515 emission activity response applied to the light dependent activity factors which is intended to model the variability of emissions throughout the canopy (e.g. . This canopy depth activity factor is a multiplicative factor that varies linearly as a function of LAI and canopy depth, with a value between 0 and 1.3. For clarity, we refer to the MEGAN emissions implementation in GEOS-Chem using the 520 -"..) activity factors as "MEGANPCEEA" and those using the "01234 activity factors as "MEGANCanopy". While γT in the MEGANPCEEA approach follows a similar functional form to that in MEGANCanopy, the lack of vertical canopy structure in the MEGANPCEEA configuration leads to a very different treatment of the joint effects of Temperature, PAR, and LAI on emissions. Specifically, the MEGANPCEEA approach aims to 525 approximate the joint effects of shading and temperature change within a canopy, whereas MEGANCanopy aims to directly simulate those processes. We use the canopy physics surrogate model described in Section 3 to calculate the leaf temperature and PAR in the MEGANCanopy implementation.
Though stress factors in the MEGAN framework allow for the additional capability 530 to evaluate the impact of vegetative stress processes on emissions (e.g. Geron et al., 2016), we do not enable those processes in this study. The other activity factors (leaf age, soil moisture, and CO2 inhibition) are the same in both the MEGANCanopy and MEGANPCEEA.

Dry Deposition 535
Dry Deposition in GEOS-Chem v12.3.0 is calculated through a resistor-in-series approach based on the Wesely (1989) parameterization, originally described and implemented in Wang et al. (1998). In this approach, the dry depositional flux of gas phase species is calculated as the surface concentration of that gas multiplied by a transfer velocity known as the "dry deposition velocity". A recent assessment of the 540 dry deposition velocity parameterization in GEOS-Chem found that biases in simulated dry deposition velocities are in general quite low, though there is evidence that missing key processes may be responsible for missing variability in the simulation (Silva and Heald, 2018).
Prior to this work, canopy effects were not directly considered in GEOS-Chem dry 545 deposition, and only approximated in bulk using a polynomial decomposition scheme (Wang et al., 1998) that calculated a single constant jointly representing both a multiplicative factor (1 + b/PARleaf, b = 50 μmol m -2 s -1 ) to the stomatal resistance from Baldocchi et al. (1987) based on leaf-level PAR and a normalization of the stomatal resistance by LAI. Here, we replace the polynomial decomposition 550 scheme and use the leaf-level PAR calculations from the canopy surrogate to directly calculate the multiplicative factor, and then explicity normalize by LAI. The LAI normalization in the original polynomial decomposition calculates values that are a factor of ~1.7 higher than those calculated through direct normalization when using the surrogate model. To maintain the same magnitude of the simulated dry 555 deposition velocities as in the standard model, which are generally unbiased (Silva and Heald, 2018), we scale the stomatal resistance by a factor of 0.6.

Surrogate Model Integration into GEOS-Chem
Implementing the updated canopy surrogate in a global model directly impacts the surface-atmosphere exchanges processes of biogenic emissions and dry deposition, 560 which together influence the chemical composition of the atmosphere. In this section we outline the changes to both surface processes, focusing on isoprene emissions and ozone dry deposition, followed by the changes to surface level ozone concentrations in the GEOS-Chem model.
The impact of the canopy model on isoprene emissions in 2012 is summarized in 565 Figure 9. The annual average isoprene emissions using the MEGANCanopy emissions implementation are shown in Figure 9a, with the highest emissions in the tropics and subtropics, as well as the Southeast United States. Though not distinct in Figure   9, the boreal forests are a substantial emitter of biogenic species during the summer months. The relatively small emissions from this region during the winter months 570 reduce the prominence of these emissions on the annual average. The global annual total of isoprene emitted in 2012 from the MEGANCanopy configuration is ~350 Tg C yr -1 .
The annual average differences in the simulated isoprene emissions following implementation of the surrogate canopy model (MEGANCanopy -MEGANPCEEA) are 575 shown in Figure 9b. In general, emissions decrease over forested regions and increase over non-forested (grasses, crops, and shrubland) areas. The highest absolute changes are the decrease in the Equatorial Amazon and the increase in Northern Australia. On a relative scale, the forested and non-forested differences are more apparent. This relative change (MEGANCanopy/MEGANPCEEA) is shown in Figure  580 9c. While there are relatively modest decreases in tropical and boreal forests, the emissions increase in heavily cropped Indian Subcontinent and sub-Saharan Africa show the largest relative change. Though the spatial variability in relative difference is substantial, the annual global isoprene emissions from the canopy model are within 5% of the original model version (~340 Tg C yr -1 ). These results are 585 consistent with those from Guenther et al. (2006) who found that the global total biases in isoprene emissions were low, but spatial variability was large, when using a parameterized approach (MEGANPCEEA) over a direct canopy model implementation in the MEGAN framework (as in the surrogate model application, MEGANCanopy). On aggregate, these changes are all well within the stated uncertainty 590 of the MEGAN isoprene emissions of approximately a factor of 2 .

difference between the surrogate model and base version of simulated emissions. Panel C shows the relative difference between the surrogate model and base version of simulated emissions (surrogate/base model).
It is not possible to directly parse the individual process contributions to the total emissions changes due to the fundamentally different coupled treatments of the 600 influence of temperature, PAR, and canopy structure on biogenic emissions through the activity factors in both the MEGANCanopy and the MEGANPCEEA configurations.
However, a comparison of the isoprene differences between the two simulations against LAI, leaf-level PAR, and leaf temperature ( Figure 10) indicates that the changes are most strongly driven by the leaf-level PAR and LAI effects. The isoprene 605 emissions changes are directly proportional to leaf-level PAR, inversely proportional to LAI, and show no substantial relationship to leaf temperature. The forested and non-forested differences in Figure 9 can be explained further from the correlations shown in Figure 10. The forested areas with the largest decreases in isoprene emissions tend to have high LAI values and lower canopy average leaf PAR, 610 whereas the opposite is true for the non-forest locations. The relationships in Figure   10 support the interpretation that the leaf-level PAR and LAI effects are the largest drivers of change in biogenic isoprene emissions between the two model versions.
Overall, these results indicate that the representation of canopy radiative physics is more important than thermodynamically resolving the difference between air and 615 leaf temperature for simulating biogenic emissions in the MEGAN framework. There are few spatial constraints on isoprene emissions that can act as independent validation data for the new model framework. However, recent work over the southeast United States (Kaiser et al., 2018;Travis et al., 2016;Yu et al., 2016) indicates that the base version of GEOS-Chem used here ( Simulated monoterpene emissions differ from isoprene emissions in that monoterpene emissions are more sensitive to temperature, with an additional influence of a light independent emission factor that varies with leaf temperature . There is a fairly constant 20-30% decrease across regions with lower LAI values, including the African savannahs and the Indian subcontinent. 645 The highest absolute changes are in transitional areas near high LAI forests, with warmer temperatures (the tropics and subtropics). The high LAI areas of the tropical and northern forests show smaller decreases of ~5%. These changes, while substantial, are well within the stated uncertainty ranges in monoterpene emissions of the MEGAN model (300-400%, Guenther et al., 2012). 650 Changes in simulated ozone dry deposition velocities in 2012 are summarized in Figure 12. Figure 12a shows the annual average spatial distribution of ozone dry deposition velocities. The values vary from less than 0.1 cm s -1 over the global 660 oceans, to above 0.5 cm s -1 in densely vegetated regions like the tropical rainforests.
The impact of the updated canopy model on ozone dry deposition velocities is in general quite small, with an average change of near zero (~0.004 cm s -1 ). The annual average relative change is shown in Figure 12b, and the absolute difference in Figure 12c, both in relation to the base version of GEOS-Chem v12.3.0. These 665 changes are nearly all within ±5%, or ±0.01 cm s -1, with a maximum change of 15%, (0.04 cm s -1 ). Relative changes track most strongly with broadleaf and coniferous forested areas. This is consistent with those regions being most sensitive to stomatal deposition (Silva and Heald, 2018), as the canopy scheme implemented here changes dry deposition only through the calculation of the stomatal resistance term. 670 The small overall changes to surface-atmosphere exchange processes associated with the updated canopy scheme produce only a modest impact on simulated atmospheric composition. We describe the changes to surface ozone here, as an illustrative example. 680 The annual average spatial difference in surface ozone between a simulation using the canopy physics described here and the base version of GEOS-Chem is shown in In total, the changes in surface ozone concentrations slightly ameliorate known biases. There is a persistent high bias (~10 ppbv) across chemical transport models in simulating surface ozone concentrations over the continental mid-latitudes (Travis et al., 2016). The addition of the new canopy physics parameterization very 700 modestly reduces this bias in GEOS-Chem (approximately 8 ppbv) by about 1 ppbv, driving simulated ozone closer to observations.

Implementation of MEGAN3 Emission Factors
In addition to improved process representation, the canopy surrogate model presented here allows for the direct application of new emission factors generated using the MEGAN3 Emission Factor Processor (https://bai.ess.uci.edu/megan), which allows users to generate emission factors from various input datasets. While 710 the focus of this work is on the impact of representing canopy physics, we include here a description of this full implementation of MEGAN3 in the GEOS-Chem model for completeness. We calculate landscape-average emission factor distributions using the global growth form and ecotype distributions, the emission type speciation, and the leaf level emission factor database available from the MEGAN3 715 Emission Factor Preprocessor. All MEGAN3 Emissions Factor Preprocessor options are kept at their default values (i.e. confidence rating J = 0, and 20 total species classes). The landcover and emissions data are the same as that used for MEGAN2.1 except that the landcover updates described by Yu et al. 2017  We implement the MEGAN3 emission factors in GEOS-Chem v12.3.0 using the canopy surrogate model activity factor formulation. For clarity, we refer to "MEGAN3Full" as the emissions implementation in GEOS-Chem v12.3.0 using the MEGAN3 leaf-level emission factors and MEGAN3 activity factors with canopy 745 physics calculated following the canopy surrogate model described in Section 3. The annual isoprene emissions simulated using MEGAN3Full are higher than using the MEGAN2.1 canopy-scale factors in GEOS-Chem (as in both MEGANCanopy and MEGANPCEEA), but more in line with previous work ).
Specifically, annual total isoprene emissions for 2012 are ~570 Tg C yr -1 in 750 MEGAN3Full which is a factor of 1.6 larger than those configurations discussed earlier in this manuscript. The largest contribution to these differences is not the differences in emission factor maps, but is instead the removal of the normalization factor of 0.21, which additionally removes the need for the somewhat arbitrary choice of "standard conditions" for emissions (see Section 4.1). This 570 Tg yr -1 755 emissions total is much more similar to the magnitude of global emissions from versions of MEGAN2.1 implemented outside of the GEOS-Chem model (535-578 Tg yr -1 ) given by Guenther et al. (2012), and within the stated uncertainty range for MEGAN isoprene emissions . These annual average isoprene emissions using MEGAN3Full are shown in Figure 15 below. In general the spatial 760 pattern in the emissions in Figure 15 matches those from the MEGANCanopy configuration ( Figure 9), with an R 2 of ~0.8.

765
Since the isoprene emissions calculated using the MEGAN3Full algorithm are so much larger than those used in previous versions of GEOS-Chem, they alter the composition of the atmosphere significantly. For example, annual average surface ozone concentrations in the Southeast U.S. increase by nearly 5 ppbv relative to the base version of GEOS-Chem v12.3.0 (which uses MEGANPCEEA), exacerbating the 770 existing model bias further (Travis et al., 2016). However, MEGAN3Full represents a more up-to-date and physical characterization of biogenic emissions. Future work reconciling the differences between these bottom up isoprene emissions estimates and top down constraints from measurements of composition (e.g. Kaiser et al., 2018) is needed. 775

Conclusions
We describe a novel method for simulating canopy physics relevant to atmospheric chemistry at very low computational cost. Our surrogate canopy model is based on the detailed canopy model in the MEGAN3 codebase, and simplified using a statistical learning technique for the determination of variable importance. This 780 updated scheme allows for improved physical process representation of biosphereatmosphere interactions, including a full implementation of the MEGAN3 emissions scheme activity factors and a more direct implementation of the light and LAI dependence of dry deposition.
When implemented into a chemical transport model, this canopy scheme impacts 785 the spatial distribution of isoprene emissions, but maintains the global total to within 5%. Consistent with prior work (Kaiser et al., 2018), isoprene emissions are reduced over the Southeast United States, with local absolute changes that can exceed 30%. This difference in surface-atmosphere exchange ultimately has a modest impact on surface ozone, with absolute annual average changes generally 790 less than 1ppbv, though it does drive ozone concentrations closer to observed values. The surrogate model additionally allows for integrating new leaf level emission factor maps into GEOS-Chem, which we show leads to substantial changes in biogenic emissions.
In a rapidly changing earth system, it is critical to represent chemical, biological, and 795 physical processes with as high fidelity as possible. Surrogate models that allow for rapid implementation of computationally expensive processes can play a key role in representing these processes. The work presented in this manuscript represents a step toward further explicit descriptions of biosphere-atmosphere interactions in models of atmospheric chemistry. Future work should include more detailed 800 observational constraints and characterization of in-canopy chemical reactions, turbulent exchange, and biological processes, and their resulting impacts on abundances of trace gases in the atmosphere.

Author Contributions
CLH and SJS designed the study. SJS developed and implemented the surrogate 805 model and performed simulations and analysis. ABG developed MEGAN3 model code. All authors contributed to the manuscript preparation.