Towards an advanced atmospheric chemistry-enabled ESM with dynamic land surface processes: Part I-Linking LPJ-GUESS (v4.0) with EMAC modelling system (v2.53)

Earth System Models (ESMs) are invaluable tools that have emerged from decades of research modelling the earth system. Central to this development has been the coupling of previously separate model types, such as ocean, atmospheric and vegetation models, to provide interactive feedbacks between these earth system components. Here we present the initial steps of coupling LPJ-GUESS, a dynamic global vegetation model, to EMAC, an atmospheric chemistry enabled atmosphereocean general circulation model. The LPJ-GUESS framework includes a comparatively detailed tree-individual based model 5 of vegetation dynamics, a crop and managed-land scheme, a nitrogen cycle and a choice of fire models; and hence represents many important terrestrial biosphere processes and provides a wide range of prognostic trace gas emissions from vegetation, soil and fire. When development is complete, these trace gas emissions will form key inputs to the state-of-art atmospheric chemistry representations in EMAC allowing for bi-directional chemical interactions of the surface with the atmosphere. At this point, the full model will be a complete ESM with a fully prognostic land surface and detailed atmospheric chemistry, 10 and will become a powerful tool for investigating land-atmosphere interactions such as: the methane cycle and lifetime and the atmospheric chemistry of reduced carbon; fire effects and feedbacks; future nitrogen deposition rates and fertilisation scenarios; ozone damage to plants; and the contribution of biogenic volatile organic compounds to aerosol load and, via cloud condensation nuclei activation, to cloud formation (e.g., bioprecipitation cycles). Initial results show that the one-way, on-line coupling from EMAC to LPJ-GUESS gives a good description of the global vegetation patterns and reasonable agreement with 15 a suite of remote sensing datasets.

Manuscript under review for journal Geosci. Model Dev. Discussion started: 19 June 2018 c Author(s) 2018. CC BY 4.0 License.
The model simulation duration was dictated by the need to spin-up the LPJ-GUESS vegetation into equilibrium. This follows the standard LPJ-GUESS procedure of running for 100 years without N limitation to build of the N pools, killing the vegetation and then running for a further 400 years to allow the vegetation to reach equilibrium. The last 50 years of this 500 year simulation were averaged to produce the plots shown here. At that time, no significant trends in PFT extension and PFT height were obvious, but the vegetation shows interannual variability as expected. 5 In the discussion of simulation results that follows, reference is made to standard LPJ-GUESS simulations forced by the CRUNCEP bias-corrected, re-analysis data set. These are simply referred to as "offline" simulations.

Model evaluation
As LPJ-GUESS has been evaluated in detail in previous work, it is beyond the scope of this work to perform a full model eval- 10 uation and propose improvements for the dynamic vegetation model. Instead we performed basic evaluation using an expertderived Potential Natural Vegetation (PNV) map and using remotely-sensed data sets of tree cover (Dimiceli et al., 2015), canopy height (Simard et al., 2011) and biomass (Avitabile et al., 2016;Thurner et al., 2014) to consider how LPJ-GUESS responds when EMAC climate is used as the forcing data. To provide an overall summary metric of data-model agreement across the relevant spatial domain, the Normalised Mean Error (NME) is presented following the prescription and recommendations 15 in Kelley et al. (2013).
At this stage of model development we do not seek to precisely simulate the vegetation state of a particular year or period.
Our atmospheric simulations are not nudged by meteorological data, but rather an unconstrained simulation based on a single year of SSTs, so they do not correspond to a particular calendar period. Furthermore we prescribe a fixed atmospheric CO 2 20 concentration. Instead, our goal with this evaluation is to perform simulations which correspond loosely to the last couple of decades in order to check if the coupled model is working as expected, and to gain some insight into biases that may be present when LPJ-GUESS is forced using EMAC climate. Furthermore, it should be noted that the tree cover and biomass datasets reflect the biosphere as observed in the previous decade or so, and therefore inherently contains the considerable effect of human land use. This results in a conceptual mismatch between the PNV as simulated by LPJ-GUESS and the observed 25 biosphere state which is relevant when considering these comparisons. To quantify this effect, NME scores including a land use correction (see Appendix C for details) for these datasets are also included in Section 4.5.

Biomes
The simulated Leaf Area Index (LAI) was used to classify the vegetation cover into eight "megabiome" types following 30 Forrest et al. (2015). The broad vegetation categories give an overview of the vegetation structure and functioning at a level of detail relevant for studying interactions between the land surface and the atmosphere. These simulations results were com-6 Geosci. Model Dev. Discuss., https://doi.org /10.5194/gmd-2018-135 Manuscript under review for journal Geosci. Model Dev. Discussion started: 19 June 2018 c Author(s) 2018. CC BY 4.0 License. pared to an expert-derived PNV map (Haxeltine and Prentice, 1996) classified into equivalent categories Forrest et al., 2015) which was regridded using a largest area fraction algorithm to the spatial resolution of the simulations.
It should be borne in mind that there are various sources of uncertainty affecting the classification of biomes in both the data and the model output, such as the somewhat subjective LAI threshold applied to the model data and the inherently subjective nature of expert classification. However, these uncertainties are to some extent minimised by the choice of broad megabiomes 5 (see Forrest et al. (2015) for further discussion) and so, despite this lack of quantitative rigour, such classifications still provide a useful visual method for comparing vegetation cover.
The simulations reproduce the global patterns of vegetation cover well (Fig. 1), although some regional discrepancies are visible. The extent of temperate forest of the east coasts of North America and Asia were not well simulated, and the extent of the tropical savannas and dry/deciduous tropical woodlands were also underestimated. As offline simulations with LPJ-10 GUESS forced by observed climate data do not show these tendencies , it seems likely that this is a result of a precipitation bias in the EMAC climate. Another regional discrepancy in the T42 simulation is a general underestimation of tundra and boreal forests at high latitudes. This is somewhat improved in the higher resolution T63 simulation, as these showed a greater tundra and boreal forest extent, including simulation of boreal deciduous needle-leaved forests. Therefore, this mismatch can be attributed to a high-latitude cold bias in the EMAC climate at low resolution, which is somewhat mitigated 15 at higher resolution due to a better representation of the synoptic scale systems in T63 (Roeckner et al., 2006). Given that the T42 and T63 results are broadly similar, for brevity of the presentation the subsequent spatial benchmarks will be shown only for the T63 run, although the T42 summary metric results will be tabulated and discussed.

Tree cover
The collection 6 of the MOD44B MODIS tree cover (Dimiceli et al., 2015) was averaged between 2000 and 2015 and bilinearly 20 interpolated to the simulation resolutions using conservation remapping and then compared to the simulated tree cover. The combined model produced reasonable global tree cover patterns as would be expected by a state-of-the-art DGVM (Fig. 2). However, regional differences are clearly visible which can be attributed to three main sources. The most prominent of these are due to the fact that the simulation is of PNV (ie no human land use processes are included in the simulation) where the observations are of the current state of the planet and therefore include the impact human land use. This conceptual 25 mismatch in the comparison can reasonably account for the large over-estimations of tree cover in Europe, China and temperate North America. The second source of disagreement is the climate biases in the EMAC derived climate, most obviously the underestimation of tree cover on the north-east coast of South America, as already indicated in the biome plots (Fig. 1). A final source of disagreement is due to the inevitable imperfect process representations in LPJ-GUESS which may be responsible, at least in part, for the over-estimation of tree cover north of the central African tropical forest (c.f. Figure

Biomass
Whilst the amount of standing biomass is not itself directly relevant in land-atmosphere exchanges, it is however a useful quantity for evaluating DGVM performance (it has a strong dependence on gross primary productivity) and for interpreting results from other variables which are directly relevant to land-atmosphere interactions (such as canopy height). We combined two biomass datasets, one tropical (Avitabile et al., 2016) and one northern temperate and boreal (Thurner et al., 2014), The coupled model simulates well the global patterns of biomass (Fig 3), but it does not capture the very high biomass observed in the tropical forests. This underestimation may be due to the use of pre-industrial nitrogen deposition data. It also underestimates biomass as in the north-east South America and south-east Asia, as would be expected from the biome and tree cover plots (Figs 1 and 2). There is also an over-estimation (small in magnitude but large as a relative fraction) of biomass in Europe and China, most likely due to human land use.

Canopy height
In the case of a bidirectional coupling, the simulated canopy height will have a direct effect on atmospheric circulation through roughness length. To evaluate simulated canopy height, a 1 km tree canopy height map (Simard et al., 2011) was first aggregated to 10 km resolution by simple averaging (excluding no data values) and then interpolated to the simulation resolution using conservative remapping. Comparison of these data with simulated canopy height (calculated from individual tree height, 5 see Appendix B) revealed that the coupled model simulates global distributions of canopy height reasonably well (Fig 4), but systematically underestimates tree height in highly forested areas. Comparison of the canopy height data to offline LPJ-GUESS (data not shown) revealed a similar, but weaker, tendency to underestimate canopy height in some regions. This indicates that this may be systemic behaviour in the current LPJ-GUESS parameterisation. In light of the biomass results in Section 4.3 where biomass is generally underestimated, it may be that the disturbance interval is too frequent, which does not allow sufficient 10 time for biomass and canopy height to build up to realistic levels. Adjusting the disturbance frequency may therefore offer a solution. Another possible cause is the current maximum crown area of trees LPJ-GUESS is 50 m 2 , which is rather low for tropical trees, and may result in a an under-weighting of the contribution of mature individuals to canopy height (see Appendix B). In contrast, the simulations tended to overestimate canopy height in arid areas (also observed in offline simulations). This may be attributed to the lack of shrub PFTs and/or a low competitiveness of grass PFTs vs tree PFTs, possibly due to an 15 under-estimation of fire frequency. In summary, the pattern of global canopy height is acceptable, but it may be appropriate to adjust the parameterisation in LPJ-GUESS to better reproduce global canopy height in the EMAC framework.

Summary metrics
The NME scores for both the T63 and T42 simulations are presented in Table 1 (lower is better). In the case of tree cover and 20 biomass, the results are also presented with a land use correction (LUC) factor (see Appendix C). Applying the LUC has a marked improvement on the NME scores for tree cover, implying that some of the discrepancies seen between simulation and observation apparent in Fig 2 and 3 are indeed due to human land use. This also suggests that applying a land use scheme or correction will be important when enabling feedbacks from the land surface to the atmosphere in the future. For biomass the results are not so clear cut, as including the land use corrections worsens agreement, particularly for the T42 simulation.

25
This can be understood in the context of Fig 3 which shows that the combined model underestimates biomass, and so it can be expected that further reducing the biomass (through the land use correction) will worsen agreement. Fortunately this is not a major concern in terms of the coupled model system as biomass does not directly effect the atmosphere. However it does indicate where further model calibration may yield improvements.
Increasing spatial resolution also improves the agreement between simulations and observation. This indicates that increased  larger degree of spatial aggregation (of both the evaluation data and the model input data) at coarser resolutions leading to more homogenised values and therefore more agreement. The fact that benchmarking scores improve when moving to a higher spatial resolution implies that the higher spatial resolution leads to a tangible increase in model performance. The canopy height data was produced in such a way that no land use correction is necessary.

Conclusions
Here we have reported the first steps towards to producing a new atmospheric-chemistry enabled ESM by combining an atmospheric-chemistry enabled AOGCM with a DGVM. The technical coupling work is now complete and has been achieved in a manner which respects both the integrity and philosophy of the two modelling frameworks, and will therefore allow relatively straightforward updates to both components.

5
Results from one-way coupled simulations (in which climate information generated by EMAC is used to force LPJ-GUESS but no land-surface information is relayed back to EMAC) showed that the vegetation patterns produced from EMAC climate are reasonable on a global scale. However some regional deviations from the observed vegetation are apparent. Some of these are due to the simple fact that this configuration LPJ-GUESS produces PNV (with no human impacts) while the observed 10 vegetation implicitly includes human impact. This effect was confirmed by performing a correction to account for human land use which improved agreement between simulation and observation. Human land use can be included in future model versions by utilising a the recently developed crop and managed land module in LPJ-GUESS (Lindeskog et al., 2013), the use of which should mitigate these issues to a large extent.

15
A second class of deviations is due to biases in the simulated climate, particularly precipitation biases. This is a more difficult problem to solve; improving climate simulations is the subject of much on-going research. However, it is clear that using higher spatial resolution mitigates climate biases which results in tangible improvements in the simulated vegetation. Furthermore, it may be that using dynamically simulated land surface boundary conditions (in this case from LPJ-GUESS) may reduce climate biases. This will be the subject of future studies.

20
Finally, there are some discrepancies arising as an inevitable consequence of the approximations, missing processes and parameter uncertainties inherent in a process-based model such as LPJ-GUESS. These may be reduced by on-going improvements occurring as LPJ-GUESS is further developed and refined. Given the rather rigorous requirements placed on a biosphere model when bi-bidirectionally coupled to an atmospheric model, it may also be necessary to perform some focused model Whilst further work remains before the full ESM is completed, we have demonstrated that coupling LPJ-GUESS into the 5 EMAC/MESSy modelling framework has been accomplished, and that LPJ-GUESS provides a suitable basis for an improved and dynamic representation of the land surface in EMAC. A companion publication will present a two-way model coupling and investigate the effects of the atmosphere. Once the full coupling has been enabled and calibrated, the resulting model will be a unique tool for investigating atmosphere-biosphere interactions. LPJ-GUESS is used and developed world-wide, but development is managed and the code maintained at Department of Physical Geography and Ecosystem Science, Lund University, Sweden. Model code can be made available to collaborators on entering into a collaboration agreement with the acceptance of certain conditions. The MESSy-coupled version of LPJ-GUESS will be maintained as a derivative of LPJ-GUESS. Because access to LPJ-GUESS is also restricted, no DOI can be assigned to LPJ-GUESS versions. The specific code version 20 used here to enable the MESSy coupling the LPJ-GUESS code in EMAC, code is archived on the LPJ-GUESS subversion server with tag "_publications/MESSY_1.0_20180108" in the catalogue "MESSy". For more details and contact information please see the LPJ-GUESS website (http://web.nateko.lu.se/lpj-guess) or contact the corresponding author.
For review purposes, the code used here is available to the editor and reviewers via a password protected link on condition that the code is for review purposes only, it cannot be used for any other purposes and must be deleted afterwards.

25
Appendix A: Details of coupling implementation The following modifications were made to the LPJ-GUESS code: -Creation of a three new functions to be called externally by the MESSy framework to: initialise an LPJ-GUESS simulation (or restart from a saved state if appropriate); perform one day of LPJ-GUESS simulation given one day of EMAC climate data and return the relevant data; and save the LPJ-GUESS state to disk. These key functions encapsulate the 30 interactions between MESSy and LPJ-GUESS.
initialisation in the MESSy framework, and the inclusion of one extra member function of the InputModule class (to read the gridlist file) which was implemented as a dummy function in the other LPJ-GUESS input modules.
-Creation of one additional internal function to calculate the daily values to be handed back to EMAC (such as vegetation cover for a particular PFT).

5
-Inclusion of an additional output module to save model output useful for benchmarking.
-Minor modifications to the standard output module such that the MPI rank number of each process is added to the file output names allowing the output from each process to be stored in the same directory.
-Minor modifications to the standard LPJ-GUESS restart code to allow the MESSy restart cycle number to be added to the names of the state files to be saved or read by LPJ-GUESS.

10
-Removal of some of code for the LPJ-GUESS real-time visualisations which is incompatible with the MESSy framework.
No changes to the scientific modules were made, and the directory structure and compilation machinery were untouched.
Wherever new code conflicted with the standard offline version, a preprocessor directive was used to ensure that the model still compiled in the standard way outside the MESSy framework. Thus the integrity of LPJ-GUESS was maintained so that updates from the LPJ-GUESS trunk version can be applied relatively easily and the code can still be compiled and run offline.

15
On the MESSy side, the Makefile has been modified to compile the complete LPJ-GUESS code into a single library file using CMake, which is LPJ-GUESS's native compilation machinery. This was necessary because LPJ-GUESS is written in C++ whereas EMAC is written in FORTRAN. The LPJ-GUESS library is linked to the rest of the EMAC code with the standard linker of EMAC (also including a link to C++ standard library). LPJ-GUESS provides functionality to new EMAC 20 submodel (VEG) with its individual submodel interface layer (see Jöckel et al., 2005), which is controlled by a namelist and invokes the above mentioned C++ functions to communicate with LPJ-GUESS.
In the initialisation phase, the grid from EMAC is transferred into LPJ-GUESS. Note, that currently there is only a geographic decomposition induced by EMAC, which could lead to some processors not having a single land box and cause idle 25 time for that specific CPU. In future an additional, individual decomposition of the land gridcells to optimise CPU balance is desired, which could make use of the UniTrans library developed within the ScalES project 3 , which shall also be used for load balanced distribution of chemical gaseous reactions. However, currently the LPJ-GUESS code with its daily timestep consumes very little computing time compared to the climate calculations of EMAC.

30
In its interface layer, the VEG submodel accumulates the required input fields (daily temperature, precipitation, incoming solar radiation and atmospheric CO 2 concentration) for the vegetation and, depending on the time step length of the LPJ-GUESS code, triggers the call of the LPJ-GUESS routines using the TIMER-MESSy interface structure routines (see Jöckel et al., 2005).

5
The combined model uses the pre-existing restart facilities of the LPJ-GUESS code, such that when EMAC triggers a restart, a restart is triggered for LPJ-GUESS. When a simulation is continuing from a restart, a flag is sent to the LPJ-GUESS code and the restart files of LPJ-GUESS state are read in allowing a seamless, continuous simulation. This feature may also be used to start a simulation with already well established vegetation from LPJ-GUESS restart (state) files, potentially significant saving significant amounts of CPU time that would otherwise be required to spin up the vegetation (typically the order of 500 10 simulation years).

Appendix B: Canopy height calculation
Canopy height of a patch was calculated from individual tree cohort heights by a simple algorithm that attempts to reconstruct top of canopy height as it would be viewed from above, for example by a satellite. It utilises the modelled quantity Foliar

15
Projective Cover (FPC), which is the ground area covered by the crowns of trees of a cohort expressed as a fraction of the patch area. LPJ-GUESS allows limited overlapping of trees and hence the sum of tree cohort FPC can be greater than unity. In this case cohorts are selected in descending order of height until the sum of their FPC reaches 1, i.e. smaller cohorts are assumed to be under the taller cohorts and so do not contribute to top of canopy height. Cohorts smaller than 5 m don't contribute to canopy height as the remotely-sensed dataset does not include canopies lower than 5 m. Having selected the contributing tree 20 cohorts, the canopy height is simply the FPC-weighted sum of the contributing cohort heights.

Appendix C: Land use correction
In order to correct the model output for 'missing' tree cover and biomass to human land cover modification, a simple correction was derived from the Globcover2009 land cover product (Arino et al., 2012). For each simulated gridcell the fraction of naturally vegetated land pixels from the Globcover2009 product was calculated. This fraction was then used to scale the model 25 outputs of tree cover and biomass to give a simple, first order reduction based on remotely-sensed data. For these purposes, natural vegetated land cover was defined as classes: -