Modeling symbiotic biological nitrogen fixation in grain legumes globally with LPJ-GUESS (v4.0, r10285)

Biological nitrogen fixation (BNF) from grain legumes is of significant importance in global agricultural ecosystems. Crops with BNF capability are expected to support the need to increase food production while reducing nitrogen (N) fertilizer input for agricultural sustainability, but quantification of N fixing rates and BNF crop yields remains inadequate on a global scale. Here we incorporate two legume crops (soybean and faba bean) with BNF into a dynamic vegetation model LPJ-GUESS (Lund–Potsdam– Jena General Ecosystem Simulator). The performance of this new implementation is evaluated against observations from a range of water and N management trials. LPJ-GUESS generally captures the observed response to these management practices for legume biomass production, soil N uptake, and N fixation, despite some deviations from observations in some cases. Globally, simulated BNF is dominated by soil moisture and temperature, as well as N fertilizer addition. Annual inputs through BNF are modeled to be 11.6± 2.2 Tg N for soybean and 5.6± 1.0 Tg N for all pulses, with a total fixation of 17.2± 2.9 Tg N yr−1 for all grain legumes during the period 1981–2016 on a global scale. Our estimates show good agreement with some previous statistical estimates but are relatively high compared to some estimates for pulses. This study highlights the importance of accounting for legume N fixation process when modeling C– N interactions in agricultural ecosystems, particularly when it comes to accounting for the combined effects of climate and land-use change on the global terrestrial N cycle.

Abstract. Biological nitrogen fixation (BNF) from grain legumes is of significant importance in global agricultural ecosystems. Crops with BNF capability are expected to support the need to increase food production while reducing nitrogen (N) fertilizer input for agricultural sustainability, but quantification of N fixing rates and BNF crop yields remains inadequate on a global scale. Here we incorporate two legume crops (soybean and faba bean) with BNF into a dynamic vegetation model LPJ-GUESS (Lund-Potsdam-Jena General Ecosystem Simulator). The performance of this new implementation is evaluated against observations from a range of water and N management trials. LPJ-GUESS generally captures the observed response to these management practices for legume biomass production, soil N uptake, and N fixation, despite some deviations from observations in some cases. Globally, simulated BNF is dominated by soil moisture and temperature, as well as N fertilizer addition. Annual inputs through BNF are modeled to be 11.6 ± 2.2 Tg N for soybean and 5.6 ± 1.0 Tg N for all pulses, with a total fixation of 17.2 ± 2.9 Tg N yr −1 for all grain legumes during the period 1981-2016 on a global scale. Our estimates show good agreement with some previous statistical estimates but are relatively high compared to some estimates for pulses. This study highlights the importance of accounting for legume N fixation process when modeling C-N interactions in agricultural ecosystems, particularly when it comes to accounting for the combined effects of climate and land-use change on the global terrestrial N cycle.

Introduction
The agricultural sector is the main contributor to anthropogenic nitrous oxide (N 2 O) emissions (Reay et al., 2012;Tian et al., 2020) and a key nitrate pollution source to freshwater systems (Moss, 2008), mostly due to the intensive use of synthetic nitrogen (N) fertilizer and animal manure . This trend has been amplified by the expansion of agricultural land to provide food for a growing population and changing dietary patterns (FAO, 2018). The use of crops with biological N fixation (BNF) capability in agriculture has been discussed as one option to address the conflict between the need to increase food production and the associated environmental problems of N loss (Becker, et al., 1995;Fageria, 2007;Northup and Rao, 2016). Nfixing crops, like grain and forage legumes, not only provide protein-rich food for the human population and farmed animals (Voisin et al., 2014;Stagnari et al., 2017), but they are also directly useable as "green manure", reducing the amount of chemical N fertilizer required in agricultural systems (Liu et al., 2011;Meena et al., 2018).
Soybean (Glycine max L.), with its countless and varied uses, is now one of the most widely grown crops in the world because of attractive cash return from its grain yield (FAO-STAT, 2021). There are concerns about the sustainability of soybean production, in particular because of its links to deforestation and loss of native vegetation in the Amazon and other areas of South America (Fehlenberg et al., 2017;Heilmayr et al., 2020). Other grain legumes, such as faba bean (Vicia faba L.), chickpea (Cicer arietinum L.), and cowpea (Vigna unguiculata L.), play an important role in improving soil quality as green manure when they are rotated or used as intercrops between cereals depending on the region (Williams et al., 2014;Denton et al., 2017). In comparison to non-legume plants, using legumes as green manure is more effective to build up or maintain soil fertility, as they not only increase soil organic matter when adding their biomass to soils, but also add extra N into the soil as a result of their symbiotic association with bacteria (Peoples et al., 2009;Ciampitti and Salvagiotti, 2018). The enriched soil N and soil organic carbon contents jointly support growth and productivity in subsequent crops (Jensen et al., 2012;Hajduk et al., 2015). Much experimental evidence has indicated that grain legume biomass increases linearly with an increasing BNF rate (Salvagiotti et al., 2008;Unkovich et al., 2010;Córdova et al., 2019) and that the N benefit to soil fertility from green manure is closely correlated with N fixation capacity, assuming that the entire legume plant is tilled into the soil (Fageria, 2007;Meena et al., 2018). Estimating the rate of BNF is thus important not only for an accurate prediction of grain legume production but also for a better understanding of where and to what degree N loss (i.e., N leaching and gaseous N emission) in cropland systems can be reduced by partially or fully replacing chemical N fertilizer with legume green manure.
Although grain legumes' BNF rates can be measured at field sites and in controlled environments, ecological models are needed for understanding and quantifying the rate of BNF on larger spatial scales and longer temporal perspectives. In many process-based crop models, a common method of representing BNF is to use a pre-defined potential or maximum N fixation rate that is adjusted by limiting environmental factors (Liu et al., 2011). The potential N fixation rate is then estimated either from plant nodule, root, and aboveground biomass (e.g., Boote et al., 2008;Corre-Hellou et al., 2009;Wu et al., 2020) or from plant N demand status (e.g., Cabelguenne et al., 1999;Robertson et al., 2002), varying with plant life cycle. Environmental constraining factors, such as soil temperature, water availability, soil mineral N concentration, and plant growth stage, are mostly taken into account (Liu et al., 2011;Chen et al., 2016). The big challenge in modeling legume BNF is that the process of symbiotic N fixation is always accompanied by the cost of fixed total photosynthetic carbon (C) to maintain legume symbiotic growth, activity, and reserves, which may be around 4 %-16 % of C (Kaschuk et al., 2009). Such a photosynthetic consumption strength would result in productivity loss if the photosynthesis rate did not increase to compensate for the cost (Kaschuk et al., 2010). In most models C cost mechanisms have not been implemented into N fixation, consistent with the assumption that the plant N uptake from soils does not cost carbon (e.g., Cabelguenne et al., 1999;Robertson et al., 2002;Corre-Hellou et al., 2009;Drewniak et al., 2013;Von Bloh et al., 2018;Wu et al., 2020), despite many field experi-ments demonstrating that the energy consumption required for BNF is far larger than soil mineral N uptake (Ryle et al., 1979;Harris et al., 1985;Macduff et al., 1996). In several other models, root substrate C concentration was adopted as an alternative to represent the C demand of N fixation (e.g., Thornley and Cannell, 2000;Yu and Zhuang, 2020). Only a few models assume that such a consumption can be assessed directly against C acquired in photosynthesis, in which the C cost per unit of fixed N is defined as either a constant of 6 kg C kg N −1 (Boote et al., 2008;Meyerholt et al., 2016) or a dynamic function of soil temperature ranging between 7.5 and 12.5 kg C kg N −1 (Houlton et al., 2008;Fisher et al., 2010).
The global production and consumption of grain legumes have greatly increased over recent decades (FAOSTAT, 2021). Accurately representing and quantifying the dynamic process of biological N fixation in models is important for better understanding grain legumes' contribution to food security and agriculture sustainability, particularly in the context of global environmental change. However, because of inadequate information on the environment and crop management, as well as the missing or incomplete BNF mechanism in models (e.g., C cost as mentioned above), current simulation of grain legume N fixation and its yield is still very weak, especially when it comes to global-scale modeling.
Thus, in this study, by accounting for the importance of soybean in overall agriculture and trade, as well as the higher N fixation capacity of faba bean compared to other pulses (Peoples et al., 2009;Unkovich et al., 2010;Liu et al., 2019), we implement these two grain legumes with BNF into a process-based vegetation model (LPJ-GUESS;Smith et al., 2014;Olin et al., 2015). Processes are added to LPJ-GUESS to estimate the symbiotic relationship between legumes and bacteria, also taking into account the plant C cost of BNF. Model results are extensively evaluated with worldwide site-level observed data and compared against country-level yield statistics, as well as continentlevel BNF rates. The model-based and large-scale quantification of the N fixation capacity in legumes provides a scientific foundation for predicting the present and future N cycle in agro-ecosystems, allowing recommendations for fertilizer N application under different climatic conditions in legumebased farming systems.

Model description
LPJ-GUESS is a process-based dynamic vegetation model that simulates carbon and nitrogen (C-N) dynamics at scales typically ranging from regional to global . The model represents vegetation and soil dynamic processes as well as their interactions in response to changes in the environment and management, such as climate, CO 2 concentra-tion, soil physical properties, N deposition, and N fertilization. Three land-use types are included in the model: natural vegetation, pasture, and cropland. Vegetation on natural land is represented as the establishment, growth, and mortality of 12 plant function types (PFTs). Pastures are simulated by competing C 3 and C 4 grasses, in which 50 % of aboveground biomass is annually harvested to account for the effects of grazing (Lindeskog et al., 2013). Crops in LPJ-GUESS are described by crop functional types (CFTs), which differ in their C allocation scheme, morphological traits, and heat sum requirement for growth. At present, four CFTs are represented in the C-N version of LPJ-GUESS: two temperate C 3 crops with sowing carried out in spring and autumn, a tropical C 3 crop (representing rice), and a C 4 crop (representing maize). Sowing dates on a large scale are determined dynamically in the model based on local climatology in each grid cell with five seasonality types represented (a combination of temperature-and precipitation-limited behaviors; Waha et al., 2012), and crops are harvested once each year when prescribed heat sum requirements are fulfilled (Lindeskog et al., 2013). Multi-cropping systems within a year are not yet implemented in the model. The recent representation of crops includes the incorporation of soil N transformation (Tian et al., 2020) together with a C-N allocation for crops operating on a daily time step (Lindeskog et al., 2013;Olin et al., 2015). Cropland management options for global-scale application include irrigation, tillage, N application, cover crop grass between the main growing seasons, and residue retention (Pugh et al., 2015;Olin et al., 2015). In this study, soybean is simulated as one additional crop because of its large importance as a food, fodder, and oil crop, and the parametrization of faba bean is representative for the group of pulses in general. The model schematic and other calculations including the C cycle and the N cycle follow an earlier version of LPJ-GUESS Wårlind et al., 2014;Olin et al., 2015).

Updated daily carbon allocation parameters
Similar to most ecosystem and crop models, LPJ-GUESS adopts crop-specific accumulated heat requirements to model plant growth development, and crops are allowed to adapt to the local climate by dynamically adjusting the heat requirements to different climatic zones (Lindeskog et al., 2013). To better represent C and N allocation in various phenological phases, Olin et al. (2015) defined crop development stage by considering the effects of temperature, vernalization days, and photo-period following Wang and Engel (1998). In this study, we assume that the grain legume development stage is linearly correlated with its accumulated heat units according to the field-based soybean experiments described in Irmak et al. (2013). It is estimated as where DS is crop development stage ranging from 0 to 2 (DS = 0, sowing; DS = 1, flowering; DS = 2, harvest); fphu is the fraction of today's accumulated heat units to the total heat requirement; fphu anthesis is the threshold of fphu when anthesis starts, below (above) which crop growth belongs to the vegetative (reproductive) stage; and a and b are the linear regression coefficients, varying between the vegetative and reproductive phases. The values of a and b, as well as the crop-specific base temperature ( • C) to estimate the accumulated heat units, are both given in Table S1 in the Supplement. The daily fraction of assimilate allocation to leaves, stems, and roots is an important process before storage organs are formed. The assimilate invested in roots can help crops overcome water or nutrient limitation when they suffer from stress in the vegetative stage, whereas new assimilate invested in leaves generally gives a highly efficient return from the photosynthesis product (Penning de Vries et al., 1989). Unlike cereal crops, nodulated plants, particularly soybeans, are more likely to achieve a higher photosynthesis rate and delay leaf senescence due to the continued N supply from biological N fixation (Abu-shakra et al., 1978;Kaschuk et al., 2010). A precise representation of assimilate partitioning to the plant organs when modeling BNF in grain legumes is especially important considering the high C cost from fixing N from the atmosphere. Productivity loss would be simulated if the leaf photosynthesis rate did not increase to compensate for the costs (Macduff et al., 1996;Kaschuk et al., 2009).
Following Olin et al. (2015), relationships between assimilate allocation to legume organs were established based on the data from Penning de Vries et al. (1989) and Boote et al. (2002). We fitted the allocation functions using a Richards logistic growth curve (Eq. 2, Richards, 1959) to model the allocation to each organ dynamically and separately. For each allocation function f i (see Eqs. 3-5 below), where DS is crop development stage, and a i , b i , c i , and d i are fitting coefficients for the three functions (specific values given in Table S1). Maintaining BNF in the reproductive stage (i.e., after anthesis; DS > 1) would reduce the flow of carbon assimilation to storage organs. We adjusted the allocation functions from Olin et al. (2015) so that the model allowed a dynamic adaptation of the allocation to grain over the seed-filling period in response to BNF cost (see Eqs. 3-5 for details).

Yield vs. the whole plant
After anthesis (DS > 1), most assimilates are allocated and re-translocated from the vegetative organs to the grains. During the late seed-filling period (DS ≥ d 1 , see Eq. 3), we assumed that the fraction of carbon allocated to yield would increase to partly compensate for the productivity loss caused by spending on N fixation at the cost of reducing the flow 818 J. Ma et al.: Modeling biological N fixation with LPJ-GUESS of carbon to leaves and stem (see Eq. 4). We established the ratio of the allocation to yield relative to the whole plant as where P yield and P veg are the fraction of carbon allocated to yield and vegetative organs, respectively, ranging from 0 to 1; P BNFcost is the proportion of net primary production (NPP) used for BNF to today's total NPP; and d 1 is the fitting coefficient representing the DS of the maximum growth rate of grain (d = 1.41 for soybean and 1.46 for faba bean, see Table S1).

Leaf vs. shoot vegetative organs
Similarly, the ratio of leaf vs. shoot vegetative allocation is specified as where P leaf and P root are the fraction of carbon allocated to leaf and root, respectively. The fitting function of leaf vs. shoot vegetative organs in soybean is given in Fig. 1a.

Root vs. vegetative organs
When a plant experiences water or nutrient stress, it invests more assimilate to roots relative to shoot vegetative organs (Penning de Vries et al., 1989). We implemented dynamic increases in the allocation to roots during the late seed-filling period to help legumes cope with the C loss from BNF cost and established the relationship between the allocation to root and that to vegetative organs as In addition, carbon partitioning to vegetative organs (P veg ) can be calculated by subtracting the reproductive allocation (i.e., P yield ) from the whole plant as Finally, we can achieve dynamic carbon allocation to the plant organs over the growing season by combining Eqs. (3)- Partitioning functions are plotted for soybean in Fig. 1b and for faba bean in Fig. S1 in the Supplement. Significant differences in allocation patterns can exist between cultivars. Compared to cereals , we found that grain legumes are more likely to allocate more assimilate to leaves not only in partitioning proportion but also in the length of allocation time, probably corresponding to their higher leaf activities in response to N fixation (Kaschuk et al., 2010).

Representation of BNF
Fixing N from the atmosphere and N uptake from soils represents two N sources for grain legumes to meet their total plant N demand. The latter has a higher priority for plants because the process is less energy-consuming than N fixation (Ryle et al., 1979;Macduff et al., 1996). Following on this idea, in LPJ-GUESS, N fixation will only be triggered when the following two assumptions are valid at the same time ( Fig. 2): (1) if today's plant growth still suffers from N limitation after N uptake from soils (i.e., the N deficit, plant N demand minus soil N uptake, is greater than zero). The plant will then be allowed to fix N from the atmosphere to fill the N deficit. (2) Since N fixation is strongly related to photosynthetic assimilate due to its high energy consumption, BNF in the model is assumed to take place only when today's NPP is positive so that adequate C supply can be provided to meet the BNF cost.
Modeling the BNF rate is adapted from previously published methods (e.g., CROPGRO, EPIC, APSIM; see Liu et al., 2011) in that it considers (1) the potential N fixation rate, (2) the limitation of temperature, (3) soil water status, and (4) the crop growth stage as where N fix is the N fixation rate; N fixpot is the potential N fixation rate; and f T , f W , and f DS are limitations (ranging 0 to 1) on BNF by soil temperature, soil water availability, and crop development stage function, respectively. The definition of the potential N fixation rate in some studies is based on the strong relationship between N fixation and either nodule size, biomass (Weisz et al., 1985;Voisin et al., 2003), or root dry matter (Soussana et al., 2002;Voisin et al., 2007). Due to the difficulties in measuring both nodules and roots in the field directly, some studies also adopt shoot biomass to replace nodule or root biomass based on the empirical relationship between these two variables (Yu et al., 2002;Corre-Hellou et al., 2009;Wu et al., 2020). In our implementation, since the nodulation process of legumes has Figure 1. The organ's relative allocation (a) and assimilate partitioning (b) to roots, leaves, stem, and yields for soybean. Solid lines represent the fitted Richards functions in this study, and dashed lines are the allocation scheme from Penning de Vries et al. (1989). f 2 in (a) denotes leaf relative allocation to shoot vegetative organs (Eq. 4), whereas f 3 is root relative allocation to vegetative organs (Eq. 5).
not yet been implemented in LPJ-GUESS, N fixpot is assumed to be proportional to root dry matter: where N maxfixpot is the maximum nitrogen fixation rate of roots (g N g −1 root DM), and DM root is root dry matter (g root DM m −2 ). Since the experimental parameter N maxfixpot is strongly related to the effectiveness of rhizobial strains and varies widely between species and sites, it is not easy to obtain the parameter for each legume crop.
In this study, we assume that legumes are either inoculated or there are high enough populations of strains in the soil that N maxfixpot is not constrained by the effectiveness of rhizobia. Here N maxfixpot is assumed to be a constant as 0.03 g N g −1 root DM for both grain legumes as a moderate value taken from the literature (Soussana et al., 2002;Eckersten et al., 2006;Boote et al., 2008). Soil temperature is a controlling factor for both microbial activities and plant growth. For soybean, 20-35 • C has been found to be optimal for nitrogenase activity and for faba bean the optimal soil temperature can range from 16-25 • C (Boote et al., 2008). The influence of soil temperature on legume BNF is represented in the model as a four-thresholdtemperature function: where T is soil temperature ( • C) at a depth of 25 cm representing the mean temperature of the topsoil layer in the model (0-50 cm), T min (T max ) is the minimum (maximum) temperature below (above) which N fixation stops, and T optL and T optH are the lower and higher optimal temperatures within which N fixation is not limited by temperature. The values of these four temperature thresholds vary among legume crops and are given in Table 1.
In addition to temperature, soil water content is a major factor controlling the rate of N fixation (Srivastava and Ambasht, 1994). Too little water strongly inhibits BNF due to impacts of drought stress on nodule nitrogenase activity (Serraj et al., 1999;Marino et al., 2007). Although oxygen is needed to support the respiration of legume roots and bacteria in the nodules, nitrogenase is more active in anoxic, waterlogged environments (Jiang et al., 2021). A linear water limitation function is thus incorporated into LPJ-GUESS (Wu and McGechan, 1999) and is represented as where W f is relative soil water content in the topsoil layer (0-50 cm) ranging from 0 to 1, ϕ 1 and ϕ 2 are empirical coefficients, W a is the threshold of W f below which N fixation is fully restricted by soil water deficit, and W b is the value above which N fixation is not inhibited by soil water content. The values of the parameters are shown in Table 1. The influence of plant growth stage on legume BNF rate is taken into account in very few models; the process is generally stopped forcibly after the crop reaches a certain development stage. For example, in the CROPGRO model (Boote et al., 2008), N fixation in soybean starts in the second trifoliolate stage and continues until the end of physiological maturity, whereas it ceases at the middle of the seed-filling period in the EPIC model (Cabelguenne et al., 1999). Much experimental evidence has indicated that the N fixed by legumes varies widely among crop growth stages, with the largest BNF rate observed between the late vegetative phase and the early seed-filling period (Santachiara et al., 2017;Córdova et al., 2020;Ciampitti et al., 2021). In this study, a specific function, similar to the temperature response function, is thus implemented in the BNF scheme to represent the variation of N fixation with the course of the legume life cycle: where NDS is normalized crop development stage ranging from 0 to 1 (0, sowing; 0.5, flowering; 1, harvest), NDS min is the time before which there is no N fixation due to inadequate nodulation, NDS max is the time after which N fixation suspends due to nodule senescence, and NDS optL and NDS optH define the period within which the legume BNF rate is not inhibited by development stage. The values of the parameters for two grain legumes are derived from the literature and listed in Table 1.
In addition to the environmental limitation factors, the amount of daily NPP also affects N fixation in the model. The NPP requirement for BNF cost is computed based on the estimated N fixation rate (N fix , Eq. 8) by multiplying the C cost per unit fixed N, which is assumed to be a fixed value of 6 g C g −1 N as a moderate value taken from previous studies (Ryle et al., 1979;Patterson and Larue, 1983;Boote et al., 2008;Kaschuk et al., 2009). The NPP cost to maintain BNF is released as CO 2 to the atmosphere and modeled as part of the autotrophic respiration of the soil (Fig. 2). Since the fixed N is partly transported to plant leaves and continues to support photosynthesis, the plant may get additional C profits from the investment of BNF by enhancing the leaf N content that optimizes the carboxylation capacity (V max ) (Kull, 2002). Following on this idea, another assumption adopted in this study is that at most 50 % of today's NPP can be used for N fixation before the crops reach the development stage of grain maximum growth rate (DS < d 1 , see Eq. 13). After this the maximum proportion of today's NPP used for BNF cost is dynamically reduced and assumed to be the fraction of carbon allocation to leaves and stem: where MAXNPP BNFcost is the maximum proportion of today's NPP used for N fixation varying from 0-0.5, and P leaf and P stem are the fraction of carbon (i.e., NPP) allocated to leaf and stem, respectively (see Eq. 7 for details). A flowchart of the BNF scheme in LPJ-GUESS is shown in Fig. 2.

Experimental setup
Field-based data from the literature, together with global yield statistics from legume-producing countries and regionlevel N fixation data from published sources, were compared to model runs to examine performance in simulating yields and BNF rate from the site scale to a larger region. In order to build up cropland soil C and N pools, all simulations were initialized with a 500-year spin-up using atmospheric CO 2 from 1901 combined with repeating detrended 1901-1930 climate from GSWP3-W5E5 (Dirmeyer et al., 2006;Lange, 2019;Cucchi et al., 2020). The cropland fraction linearly increased from zero to the first historic value (1901) during the last 30 years of spin-up. Monthly atmospheric N deposition (NH x , NO y ) was used as simulated by CCMI (NCAR Chemistry-Climate Model Initiative). The value was interpolated to 0.5 • × 0.5 • from the original resolution (1.9 • × 2.5 • ) to match the resolution of the climate data (Tian et al., 2018). Below, the setup of the different experiments is explained in detail.

Model evaluation at site scale
To evaluate the model's ability to simulate BNF rate and yields, field-based N fixation trials with detailed measurements of soil N uptake, biomass, and N mass allocation were collected from the published literature. This dataset comprised 17 soybean and 7 faba bean sites located between ∼ 33 • S and ∼ 53 • N (Fig. 3). In these trials, BNF response to various management practices (such as N fertilizer addition and irrigation) were investigated. Details about these sites -their geographic coordinates, BNF trials, and the years of available data, as well as corresponding site-specific plant traits (e.g., specific leaf area and grain C : N ratio) -are provided in Table S2.
In some field experiments, BNF rate and/or soil N uptake are not directly reported in the literature, so we estimated these values as where %Ndfa is the proportion of plant N derived from the atmosphere (ranging 0-100), representing the contribution of N fixation to the plant total N uptake, and N plant is the amount of N accumulated in the plant (kg N ha −1 ), defined as either the shoot or the whole plant N mass, depending on the measurement method adopted in the experiment.
In general, grain yields, plant tissue dry mass, and N mass, together with %Ndfa, soil N uptake, and N fixation, are widely measured variables in field-based BNF trials (see Table S2). These data were chosen as our target variables used for model evaluation. In addition, to convert plant C mass to dry matter, a conversion factor of 2.0 was used . Dry weight was converted to wet weight by assuming a water fraction of 0.13 in the grain legumes (Córdova et al., 2019).
Since specific leaf area (SLA) and target grain C : N ratio play a very important role in determining N uptake and N retranslocation to grain during seed-filling in the model , we implemented two simulations to explicitly explore model performance across all sites. For "site-specific" simulations, the reported SLA and grain C : N ratio listed in Table S2 were adopted for the simulation (for sites for which these were available). For "global uniform" parameter simulations, SLA was set to 40 and 45 m 2 kg −1 C (Penning de Vries et al., 1989), and the target grain C : N ratio was represented as a constant of 8 for soybean and 10 for faba bean (Kattge et al., 2020). These values were also used for globalscale simulations.
Due to the unavailable information on weather data at the majority of the sites evaluated, gridded daily climate data for air temperatures (maximum, minimum, and mean), precipitation, and solar radiation were used from GSWP3-W5E5 (Dirmeyer et al., 2006;Lange, 2019;Cucchi et al., 2020), chosen for the 0.5 • ×0.5 • grid cell representative for each experimental site. We compared model-required input variables from GSWP3-W5E5 with observations at three sites, finding that the gridded climate data had fairly good agreement with weather records in the field, despite some solar radiation deviations between two datasets for individual days over the experimental period (Fig. S2). There was no information on land-use and management practices in years preceding the experiments at most sites. Therefore, to maintain soil N and C pools in equilibrium after model spin-up, we decided to implement a common cropping system of maize-legume rotation annually from 1901 to the year before the trial start, with no N fertilizer applied to legumes. Over the trial period, the management practices were implemented according to information provided in the literature (Table S2). In addition, site-specific soil physical properties, such as frac- NDS max the maximum development stage for the stop of N fixation 0.9 0.8 - Bouniols et al. (1991) tions of sand, silt, and clay, were also used as forcing to further compute corresponding soil water characteristics in the model .

Global yields and BNF rate
To evaluate the model's ability to simulate legume yields and BNF on a large scale, national crop yield statistics from FAO-STAT (http://www.fao.org/faostat/en/#data/QC, last access: 9 May 2021) were collected and compared with modeled output. Furthermore, Peoples et al. (2009) divided N fixation data for widely grown legume crops collated from a range of published sources into different geographical regions. In order to compare our simulated BNF with the literature-based records, each simulated 0.5 • × 0.5 • grid cell was classified to be in one of the 10 regions given in Table 1 in Peoples et al. (2009) (Fig. S3).
For regional comparison, the modeled gridded yield and BNF rate were aggregated to national and continental scales, respectively, using information on crop-specific cover area in the spatial pattern (described below): where Var is yield or BNF rate; Var region is the aggregated result in a given region; i is the grid cell number in that region, ranging from 1 to n; Var rain and Var irri represent the modeled yield or BNF rate under rain-fed and irrigated conditions, respectively; and Area rain and Area irri are the crop-specific rain-fed and irrigated areas used in simulations, respectively.
As land-use and land cover input, data from LUH2 (Land-Use Harmonization 2, Hurtt et al., 2020) with fractions of cropland, pasture, and natural vegetation at each grid cell were adopted, spanning from 1901 to 2014 in 0.5 • resolution. The fractional cover of different crop species was derived from MIRCA (Monthly Irrigated and Rain-fed Crop Areas; Portmann et al., 2010). Since no detailed information was available on the growth distribution of the faba bean, the "pulse" fraction in MIRCA was used as input instead, and pulse country-level yield statistics provided by FAO-STAT (2021) were collected to compare with faba bean outputs by LPJ-GUESS. As information on cropland soil characteristics, data in the top layer (30 cm) were derived from the GGCMI (Global Gridded Crop Model Intercomparison) phase 3 soil input dataset (Volkholz and Müller, 2020). In general, although the total cropland cover in a grid cell could change annually over the course of the simulation, the relative fractions of each crop species within that cover fraction were held constant.
In terms of timing of N fertilizer application, a recent meta-analysis conducted by Mourtzinis et al. (2018) indicated that splitting N application between planting and the early reproductive stage resulted in significantly greater soybean yields than a single application. Mineral N fertilizer for legumes in the model was thus split into two equal applications at the time of sowing (DS = 0) and flowering (DS = 1.0). Manure was added to soils at the time of sowing as a single application to reflect real-world practices that account for the time required for manure N to be made available to plants. Data sources for mineral N fertilizer and manure over the period 1901-2014 were derived from Ag-GRID (AgMIP GRIDded Crop Modeling Initiative; Elliott et al., 2015 andZhang et al., 2017, respectively) (Fig. S4).

Statistical methods
In order to quantify the agreement between modeled and observed variables, the coefficient of determination (adjusted R 2 ), relative bias (RB, Eq. 16), absolute bias (AB, Eq. 17), and the root mean square error (RMSE, Eq. 18) were computed: where M i and O i indicate modeled and observed values, and n is the number of observations. To evaluate the fit of the interannual variability of modeled and reported yields on the country level, the standard deviation (SD) and Pearson correlation coefficient (r, Eq. 19) were calculated: where M and O represent modeled and observed mean, and n is the number of reported years.

Model performance across all sites
In order to examine model performance in simulating BNFrelated variables across all grain legume sites described in Table S2, we compiled six widely measured variables related to N fixation at harvest, as shown in Fig. 4. Modeled yields generally agreed well with observations, especially in the site-specific simulation setup. These had a higher regression slope (0.83) and lower absolute bias (28 %) compared with the global uniform simulation setup (Fig. 4a). N content in grains and shoots showed lower agreement, with simulated values underestimating the observations for most sites (Fig. 4b-c), likely arising from two important N sources to grain legumes not being captured well by the model (i.e., soil N uptake and BNF, shown in Fig. 4d-e). The global uniform run did not capture observed N fixation well, with a regression slope of 0.22 and absolute bias of 39 %. The simulated BNF compared to observations was notably improved when using site-specific parameters, with the regression slope increasing to 0.41 and the absolute bias being reduced to 31 % (Fig. 4e). The field-based measurements showed that the N derived from the atmosphere (%Ndfa) was the main contributor to the legumes' total N uptake, ranging from 15 % to uniform runs, respectively, despite several extreme disagreements at several faba bean sites (Fig. 4f).
A linear relationship between legume yields and the rate of BNF was found across a range of field sites in this study (Fig. S5a). Simulations from LPJ-GUESS mostly captured the close correlation between these variables, with R 2 ranging 0.46-0.63 (p < 0.001) in both runs, which is not far from the measured value of 0.67 (Fig. S5a). Linear regression parameters (i.e., slope and intercept) in both runs were close to the observations, indicating that the model reproduces the N fixation effect on yield well for individual sites.
A negative exponential relationship was observed between N fertilizer application rate and N fixation across the field trials (Fig. S5b). LPJ-GUESS reasonably reproduced the decreased trend of BNF to N fertilizer increase, with similar fitting functions to observations, although higher N fixation rates were modeled in the highest-fertilized trial (600 kg N ha −1 ) compared with measurements (Fig. S5b).

Response to irrigation
The ability of the model to simulate the observed response of soybean tissue biomass and N mass to irrigation management was examined using data from an experiment with rain-fed and irrigated treatments in Florida, USA (82.4 • W, 29.6 • N; see Table S2). Since the timing and quantity of irrigation were not reported in the literature (DeVries et al., 1989a, b), we assumed that soybean was irrigated automatically when it experienced water stress in the model, with the amount of plant water deficit as supplemental irrigation.
The mean observed grain yields at harvest were 2.0 and 2.9 t ha −1 under rain-fed and irrigated conditions, respectively, whereas the modeled yields were 1.9 and 2.5 t ha −1 for the site-specific parameter run and 1.6 and 2.1 t ha −1 for the global uniform parameter run, suggesting good model performance for rain-fed crops but an underestimation of the effect of irrigation on yields (Fig. 5a). Grain dry matter over the cropping season was simulated to increase by 32 % and 45 % on average in response to irrigation in the site-specific and global uniform runs, respectively. The observations show a similar response but with a higher increase of 75 %. The modeled increase in grain N content caused by irrigation also showed good agreement, with an increase of 35 %-58 % in both runs, in line with the observed response of 42 % (Fig. 5b).
The model generally reproduced observed leaf biomass and N mass better than the total aboveground production under rain-fed and irrigated treatments, with higher accuracy in the site-specific run. Over the growing season there was an obvious underestimation of the total aboveground production of biomass for both runs (Fig. 5a). This may be partially due to the fact that LPJ-GUESS at this point does not model soybean hulls, which account for ∼ 15 %-20 % of the total aboveground dry matter at harvest in the US soybean rain-fed cropping system (Córdova et al., 2020). The ob-served increase in shoot and leaf biomass due to water supply was 19 % and 21 %, respectively. In comparison, the sitespecific parameterized model resulted in increases of 13 % and 14 %, respectively (15 % and 14 % for the global uniform parameter run, see Fig. 5b). Overall, the observed soybean tissue biomass and N content under rain-fed and irrigated conditions, as well as their response to irrigation management, were captured reasonably well by the model at the US Florida site, despite some deviations from observations in some cases.

Response to nodulating soybean
In Zapata et al. (1987), two field trials with non-nodulating and nodulating soybean were conducted in Seibersdorf, Austria (16.5 • E, 48.0 • N; see Table S2), resulting in different plant C and N production at various growth stages. As described in Sect. 2.3, the nodulation process of legumes has not yet been implemented in LPJ-GUESS; we thus switched off (on) the BNF function in the model to simply represent the non-nodulating (nodulating) soybean experiment.
During the growing season, yield and grain N mass in the field trials increased rapidly after the vegetative stage, peaking around harvest. Simulations from LPJ-GUESS mostly captured those seasonal dynamics and the response to nodulating soybean (Fig. 6a-b): the modeled increase in yield and grain N mass due to nodulation was 34 % and 51 % in the site-specific run (34 % and 45 % in the global uniform run), respectively, in line with the observed response of 20 % and 41 % at harvest (Table 2), which suggests appropriate sensitivity of yield and N content in grain to N addition from N fixation. Similarly, the model generally reproduced the observed seasonal pattern of shoot N mass well, but with some underestimations in the nodulation trial (Fig. 6c).
Accumulated soil N uptake was captured reasonably well over the entire growing season, with higher accuracy at harvest in the global uniform simulation (Fig. 6d). Measured mineral N uptake from soils declined on average by 25 % in response to nodulation. In comparison, the simulated reduction in uptake was 50 % and 46 % for the site-specific and global uniform runs ( Table 2). The BNF rates were low at the early growth stages when nodules were still establishing and increased rapidly between floral initiation and the early seed-filling, after which nodule senescence occurred and the increase in N fixation rate declined until physiological maturity (Fig. 6e). Simulations from LPJ-GUESS reproduced the seasonal pattern of N fixation with some overestimations in the accumulated BNF at the end of the growth period; the site-specific and global uniform runs simulated 113 and 140 kg N ha −1 , respectively, compared to the measured value of 103 kg N ha −1 (Table 2). , and %Ndfa (the proportion of plant N derived from the atmosphere) (f) at harvest across all soybean and faba bean sites. Filled red and grey circles depict the "site-specific" and "global uniform" runs, respectively. The dashed line is a fitted linear regression with red for site-specific and grey for global uniform; * * * and * * denote regressions statistically significant at p = 0.001 and 0.01, respectively; AB is absolute bias (Eq. 17), represented in percent (%); the unit of RMSE is the same as the associated variable; AVG in (f) is the averaged value of %Ndfa across all field trials. Table 2. Comparison of modeled and observed yield (t ha −1 ), grain N mass (kg N ha −1 ), shoot N mass (kg N ha −1 ), soil N uptake (kg N ha −1 ), and N fixation rate (kg N ha −1 ) from a soybean nodulation and non-nodulation experiment at harvest. The observed data were compiled using Tables 2-4 in Zapata et al. (1987).

Response to N fertilizer in faba bean
In the N fertilizer experiment from Mínguez et al. (1993), four field trials were compared with N applications between 0 and 300 kg N ha −1 at three crop growth stages and two faba bean varieties grown in a Mediterranean climate (Spain, 4.8 • W, 37.9 • N; see Table S2). Over the entire growing season, leaf biomass and N content in the field trials increased until around May, after which leaf senescence started and biomass and N content declined ( Fig. 7a-b). The model broadly reproduced these seasonal patterns and the response to different N application rates. The largest difference between modeled and measured leaf biomass was found at the end of the growing season as a result of the simulated leaf senescence rate being much lower than derived from measurements (Fig. 7a). In addition, the simulations showed modeled leaf N mass to decline rapidly during the late reproductive phase. This can be attributed to the transfer of N from vegetative parts to grain because of the high N demand in seeds during the grain-filling period. compared with those grown at rain-fed conditions. Red and grey circles depict "site-specific" and "global uniform" run, respectively; the dashed line is fitted linear regression; * * * denotes the regression statistically significant at p = 0.001. Box plots in (b) denote the 5th and 95th percentiles with whiskers, median and interquartile range with box lines, and mean with a white dot (all data distributed next to the box).
The seasonal data at each phenological stage for tissue biomass are available from 1978-1979 and 1984-1985 with rain-fed and irrigated treatments; those for N mass are available for 1979 and 1984, while the seasonal shoot N mass is only available for 1984.
As seen in Fig. 7c, modeled soil N uptake was stimulated by soil mineral N availability, with an increase of 120 %-160 % compared to the unfertilized treatment. In contrast, fixing N from the atmosphere was constrained in the presence of elevated levels of soil mineral N, with a reduction of 15 %-20 %. The total N uptake for the cropping season 1987-1988 was observed to only increase by 3 % in response to N application as a consequence of the inoculation implemented in the unfertilized treatment (Mínguez et al., 1993). By contrast, LPJ-GUESS produced relatively large increases of 14 %-16 % in both runs, resulting in the observed increase in plant biomass and N mass accumulation caused by N addition being largely overestimated in the model (Fig. 7c).

Attained yields
Using the global uniform parameters described in Sect. 2.4.1, combined with the time-dependent gridded N fertilizer dataset introduced in Sect. 2.4.2, we simulated soybean and all pulses (applying the faba bean parameterization, see Sect. 2.4.2) at a global scale. We computed data for the period 1996-2005, since crop-specific fractional cover from the MIRCA dataset was available for the year 2000 (Portmann et al., 2010).
Modeled yields in the top 10 soybean-producing countries showed good agreement, with a higher R 2 of 0.52 (p < 0.001) and lower RMSE value of 0.8 t ha −1 yr −1 when low-productivity countries (defined as all countries not belonging to the top 10 producer countries) were excluded. With all producer countries included, R 2 of 0.17 (p < 0.001) Figure 6. Observed (circles) and modeled (lines) yield (a), grain N mass (b), shoot N mass (c), soil N uptake (d), and BNF (e) for a field site in Austria (Zapata et al., 1987) for the cropping season 1984 with nodulating and non-nodulating soybean. The observed values of soil N uptake and BNF across all growth stages were calculated based on Fig. 1 given in Zapata et al. (1987), and the vertical bars represent the standard error of a four-replicate mean in the original literature. Veg. and Rep. indicate vegetative and reproductive growth phase, respectively. and RMSE of 1.4 t ha −1 yr −1 were found (Fig. 8a). LPJ-GUESS generally tended to overestimate the reported yield for most countries where soybean production is low (e.g., most African countries, see Fig. 9a), with a mean relative bias in such countries of 81 % (Fig. 8a). Modeled low yields were found in some arid and semi-arid countries (e.g., Egypt, Iran, and Turkey), with the underestimation spanning from 10 %-70 % (Fig. 9a). Overestimated yields were also found when comparing simulated yields using the faba bean parameterization against FAO-reported values for pulses in general, with an overestimation also visible for some of the top producing countries (Fig. 8b). Likely, the higher yields simulated by LPJ-GUESS arise from the fairly high N fixation capacity simulated with the faba bean parameterization (see Sect. 3.2.2) and the wide distribution of pulses worldwide, which grow under a broad range of climate and soil conditions.
A good fit of the interannual variability of modeled and reported yields is a further indicator of model performance. Despite the deviation between the model and observations for individual years, simulated variation in soybean yield over the period 1981-2016 matched reported yields well among the top 10 producer countries -especially in Argentina, India, and China -with a high Pearson correlation coefficient (r) around 0.60 (p < 0.001) and similar standard deviations (Fig. 9). The degree of yield variability between years was larger than seen in the FAO records, especially in the US, Canada, and Italy (Fig. 9), indicating high sensitivity of modeled soybean yield to changing environmental factors on spatial scales, such as weather, N fertilizer application rates, and climate-related N fixation.

N fixation and %Ndfa
The modeled spatial pattern of soybean N fixation showed large spatial variation (Fig. 10a). Modeled BNF rates as high as 250 kg N ha −1 yr −1 were found in western South America and most of Africa, where neither water nor temperature was a critical limitation for N fixation. Moreover, the relatively low fertilizer application in Africa (0-20 kg N ha −1 yr −1 , Fig. S4b) leaves a nitrogen deficit that causes enhanced soybean N fixation. In contrast, in arid and semi-arid regions, soil water constrains BNF, while temperature limitation is seen in high latitudes and alpine areas (e.g., Andes in Peru). BNF rates in most regions (South Asia, West Asia, sub-Saharan Africa, and northwestern China) were as low as 50 kg N ha −1 yr −1 , particularly in Pakistan and northern India, where simulated BNF is severely constrained by the extreme high temperature over the cropping season. The eastern United States, Europe, southern China, and central-western Brazil showed intermediate fixation rates, which were greater than 150 kg N ha −1 yr −1 . Overall, the spatial variation of the modeled legume BNF rate reflects to a large degree the spatial climate patterns, in addition to N fertilizer application. The low modeled %Ndfa of 45±3 % in East Asia may reflect high N uptake from soils in response to substantial fertilizer investment in China (80-180 kg N ha −1 yr −1 , Fig. S4b) over the past 40 years. In contrast, the modeled %Ndfa in Africawith lower N application rates -was as high as 70 ± 3 %, although still lower than the reported mean value of 77 % (Table 3). The spatial response of N fixation rate to climate constraining factors (i.e., soil temperature and water) is shown for pulses in Fig. S6. At a regional scale, the modeled outputs compare well with N fixation rates from the literature (Fig. 10b-f, Table 3). For example, in South America and North America, both major soybean-producing regions, simulated BNF rates were 156 ± 14 and 127 ± 44 kg N ha −1 yr −1 over the period 1981-2016, respectively, compared with literature-derived values of 136 and 144 kg N ha −1 yr −1 (Peoples et al., 2009).   1996-2005. In total, 887 and 1506 country-year yield data points were used for comparison in soybean and pulses, respectively. The top 10 producer countries (shown in color) were chosen based on their total production over the same period, and marker size from large to small indicates their total relative production in descending order. Rep. and Mod. respectively denote reported and modeled yield (t ha −1 yr −1 ) averaged from 1996-2005. RB is relative bias (Eq. 16), represented in percent (%). The unit of RMSE is the same as yield (t ha −1 yr −1 ). Globally, the modeled soybean N fixation rate of 132 ± 21 kg N ha −1 yr −1 was reasonably consistent with the metaanalysis result of 111-125 kg N ha −1 yr −1 in Salvagiotti et al. (2008) and the FAO-based estimate of 176 kg N ha −1 yr −1 from Herridge et al. (2008). The contribution of N fixation to total N uptake in soybean was somewhat underestimated in several regions. A similar trend to underestimate reported %Ndfa was also found for pulses (Table 3).
Having large soybean-planting areas and high yields, South America and North America contributed 80 % of simulated global soybean N fixation, followed by East Asia, South Asia, and Europe (Table 3). Globally, simulated annual N fixed over the period 1981-2016 was 11.6 ± 2.2 Tg in soybean, which showed good agreement with the estimate of 16.4 Tg N reported by Herridge et al. (2008) and the extrapolated result of 10.4 Tg N estimated by Gelfand and Robertson (2015) based on US field trials. However, we modeled pulses to fix 5.6±1.0 Tg N annually, which is almost 2 times higher than the 2.95 Tg N estimated by Herridge et al. (2008). The difference in the case of pulses is most likely due to the low N fixation rate used by Herridge et al. (2008) ranging from 23-107 kg N ha −1 yr −1 , which is lower than the mean value of 119 ± 15 kg N ha −1 yr −1 modeled by LPJ-GUESS (Table 3).

Model performance at site scale
The overall model agreement with measured legume yield and grain N mass was good across a range of field sites (Fig. 4). Values at harvest were on average about 20-30 % lower than values reported in the measurements (Fig. 4ab). A similar small underestimation was found in the shoot N mass (Fig. 4c), indicating that the productivity is generally somewhat too low in the model. One factor contributing to the underestimation is that LPJ-GUESS applies a conversion factor of 2.0 from plant C mass to dry matter , which is ∼ 10 % lower than a published measurement of 2.24 reported in Osaki (1993). In addition, we found that the model underestimated aboveground biomass while  Table 1 in Peoples et al. (2009). simultaneously overestimating belowground productivity at the three sites where measured root biomass was available. This could be addressed by adjusting the root : shoot allocation (i.e., modifying the daily assimilate partitioning function in grain legumes; Eq. 5), but this is currently prevented by the lack of sufficient observed root biomass information.
Modeled soil N uptake was sensitive to soil mineral N concentration and hence driven by fertilizer application rates (Figs. 7c, S5c). Generally, LPJ-GUESS tended to overestimate soil N uptake in regions where legumes were not fertilized or only lightly fertilized (Fig. S5c). This might be partially due to the selected legume cultivars at the experimental plots, which have been reported to have low mineral N uptake potential (Gan et al., 2002(Gan et al., , 2003Santachiara et al., 2017Santachiara et al., , 2018. Moreover, the saturation effect of mineral N concentration on N uptake implemented in the model might result in the discontinuation of N uptake when soilavailable N is abundant (Zaehle and Friend, 2010;Wårlind et al., 2014). Under high fertilization rates (up to 260-600 kg N ha −1 , Fig. S5c), a strong underestimation in soil N uptake was expected because of the modeled saturation response to high soil mineral N, resulting in little change in the level of soil N uptake no matter how much N fertilizer was applied.
Adding mineral N to the soil in LPJ-GUESS can increase soil N uptake, reducing the plant's N deficit and therefore also reducing the upper limit of the daily N fixation rate (Fig. 2). Although the modeled negative relationship between fertilizer application rates and N fixation showed generally good agreement with the observed response across a range of field sites, the simulated BNF rates in the highly fertilized trials (i.e., 260-600 kg N ha −1 , Fig. S5b) were about 50 %-80 % higher than the measured values (Figs. 4e,S5b). This might be partially explained by the underestimation in soil N uptake under higher N concentration, resulting in plant N demand remaining very high and substantial N still being fixed. The large discrepancies between modeled and observed N uptake in the highly fertilized treatments suggest that the N uptake representation in LPJ-GUESS should be further improved. A step forward could be to incorporate the inhibitory effects of soil mineral N content on N fixation into the model (Chen et al., 2016;Wu et al., 2020), since experimental evidence indicates that high soil mineral N not only affects plant N uptake in roots, but also depresses legume nodule initiation, nodule size, and specific nodule activity, therefore reducing the amount of N fixation from the atmosphere (Herridge et al., 1984;Purcell and Sinclair, 1990;Thornley and Cannell, 2000).
The percentage of plant N derived from the atmosphere (i.e., %Ndfa) is a key parameter required for quantifying N fixation in the field and varies widely, caused by differences in climate, soil type, and degree of N fertilization (Herridge et al., 2008). LPJ-GUESS captured the range and mean value of %Ndfa well across different field trials, with some disagreements, especially for faba bean (Fig. 4f). An underestimated %Ndfa is likely caused by the combined effects of underestimated N fixation (Fig. 4e) and overestimated soil N uptake (Fig. 4d). Nevertheless, we found modeled %Ndfa to Table 3.  Peoples et al. (2009). b Faba bean data in Peoples et al. (2009). c Salvagiotti et al. (2008). d Herridge et al. (2008). e The values do not represent zero grain-legume-planting area in that region. decline with increasing N fertilizer application, which is also the observed response in the field trials. A negative correlation between %Ndfa and fertilizer application rates was also reported by Salvagiotti et al. (2008). These results all suggest that LPJ-GUESS is able to effectively capture the observed overall patterns of soil mineral N uptake and N fixation in grain legumes and their responses.
Since the SLA and C : N ratio of plant organs play a vital role in determining N uptake when modeling vegetation C-N dynamics , it is to be expected that applying measured values for site-scale modeling resulted in much better agreement when comparing simulation results to measurements (Figs. 4-7). Remaining discrepancies between modeled and observed N-cycle variables may reflect missing processes in the model, such as inoculation effectiveness, phosphorus limitation, and soil acidity, especially in terms of inoculant application. Field experiments have shown that proper inoculation of rhizobia promotes nodulation and results in an efficient increase in N fixation, although there are large variations between strains of rhizobia (Mínguez et al., 1993;Sanginga et al., 1997;Tewari et al., 2004;Denton et al., 2017). Using a fixed parameter (N maxfixpot , Eq. 9) to represent all inoculation situations such as in a global uniform calibration cannot reflect this variability. In addition, due to the difficulties in measuring both nodules and roots in the field directly, in many studies the observed BNF rates were determined from plant aboveground biomass. Excluding the root contribution to the whole plant BNF rates most likely results in an underestimation of N fixation (Córdova et al., 2019(Córdova et al., , 2020: N associated with nodules and roots in soybean and faba bean may account for 20 %-40 % of the total N accumulation at the mid-flowering phase (Unkovich and Pate, 2000;Khan et al., 2003).
Compared to non-BNF (i.e., non-nodulation treatment, see Sect. 3.1.3), BNF in LPJ-GUESS greatly improves simulated soybean yield and aboveground N mass, with an overall increase in both variables of 30 %-50 % (Table 2). Córdova et al. (2019) found a yield increase of 150 % in response to nodulation in an unfertilized treatment, but that increase was reduced to 55 % -similar to our modeled yield increasewhen a high N input was applied (i.e., 135 kg N ha −1 ). N fixation can help grain legumes to dramatically enhance their total N accumulation and to achieve higher N concentration in seeds. However, these benefits are accompanied by an increase in respiration cost amounting to 4 %-16 % of fixed total photosynthetic carbon (Kaschuk et al., 2009(Kaschuk et al., , 2010. Such a respiratory photosynthate consumption would reduce productivity if the photosynthesis rate was not increased to compensate for the cost. In LPJ-GUESS, as described in Sect. 2.3, we assumed that up to 50 % of daily NPP can be consumed to fix N. This approach has the advantage that legumes are able to maximize photosynthetic gain due to reduced N limitation in carboxylation capacity (V max ), but it entails the risk of lower productivity if too much NPP is invested in fixation. Nevertheless, in most cases our modeled NPP cost over the soybean growing season ranged from 1 %-40 % at the site scale (Fig. S7) and 5 %-25 % on a large region (Fig. S8). Such NPP consumption was not only lower than our assumed upper limit of 50 %, but also appropriately consistent with the reported range of 14 %-32 % described by Kaschuk et al. (2009), demonstrating that the C cost scheme implemented for N fixation in our model is reasonable. Taken together, the modeled C profits due to N fixation can be attributed to the positive feedback between BNF and photosynthesis in LPJ-GUESS: C-cost-based N fixation results in a higher rate of photosynthesis because of the enhanced leaf N concentration; in turn, the increased rate compensates for the C cost, allocates more assimilate to roots, and thus enhances N fixation.

Global yields, N fixation, and %Ndfa
Agreement between FAO-reported and simulated yields at the country level was reasonable for the major soybeanproducing countries. However, in some arid and semi-arid countries, the modeled yields were up to 70 % lower than FAO-reported values, probably because of the simulated low N fixation rate caused by severe water constraints (Fig. S5). By contrast, LPJ-GUESS produced an overestimation of 100 %-300 % in yield production among some African countries, with BNF rates of 300-350 kg N ha −1 yr −1 being modeled in these regions (Fig. 10a). More recent studies that report data from African farms have indicated that the soybean N fixation rate can be as low as 0-50 kg N ha −1 yr −1 in most farmers' fields, largely because of the inconsistent effectiveness of inoculation in the acid soils (Ulzen et al., 2016;Muleta et al., 2017;Vanlauwe et al., 2019). The BNF implementation and soil representation in LPJ-GUESS do not account for inoculation effectiveness in response to soil pH.
In our simulations, the annual amount of N fixed by global grain legumes (i.e., soybean and all pulses) of 17.2 ± 2.9 Tg averaged over the period 1981-2016 agreed well with the estimate of 19.4 Tg provided by Herridge et al. (2008), who used crop production statistics from FAOSTAT and legumespecific %Ndfa from farmers' fields for estimating global N fixation. In an earlier study, a total of 10 Tg N (range of 8-12 Tg N) was estimated from legume crop BNF annually (Smil, 1999), which is far lower than our findings. The discrepancy between the estimates in Smil (1999) and Herridge et al. (2008) likely reflects the lower values of %Ndfa for soybean and pulses used for calculations in Smil (1999). Also, Smil (1999) excluded belowground fixed N associated with roots and nodules, which contributes to the low estimate. Our modeled N fixation from grain legumes amounts to ∼ 12 % of the annual mean of ca. 140 Tg N that was estimated to be fixed in all global terrestrial ecosystems (Cleveland et al., 1999Galloway et al., 2004;Wang and Houlton, 2009;Vitousek et al., 2013;Meyerholt et al., 2016;Xu-Ri and Prentice, 2017;Yu and Zhuang, 2020;Davies-Barnard and Friedlingstein, 2020), indicating the importance of BNF input in agricultural systems for the global terrestrial N cycle, although a large proportion of the fixed N is removed in grains from the ecosystems each year.
Currently, three environmental factors, soil temperature, moisture, and soil mineral N concentration, affect modeled N fixation. As discussed in Sect. 4.1, increased soil N availability would depress N fixation as plant total N can be met more "cheaply" via soil mineral N uptake. This effect can also be seen from the spatial pattern of %Ndfa in the northern temperate region, such as the United States, western Europe, and China. Here, anthropogenic N deposition, together with the intensive application of fertilizers, results in soils being Nrich, inhibiting simulated BNF. This could explain why our modeled soybean N fixation rate was not high in East Asia and only contributed 45 ± 3 % of plant total N uptake (Table 3). In comparison, the high rate of N fixation found in tropical regions is primarily due to their high nitrogenase activity under warm and moist soil conditions (Fig. S6), resulting in %Ndfa of ∼ 70 % being modeled for all grain legumes in the tropics (i.e., Africa and Southeast Asia; Table 3). A similar spatial variation between temperate and tropical regions in N fixation was also reported by other modeling studies in global terrestrial ecosystems (e.g., Wang and Houlton, 2009;Meyerholt et al., 2016;Xu-Ri and Prentice, 2017;Yu and Zhuang, 2020). Taken together, these results reveal that LPJ-GUESS broadly captures how N management practices and climate variation affect soil N uptake and biological N fixation in grain legumes at large spatial scales.

Modeling challenges and future work
Similar to most ecosystem and crop models, specific leaf area (SLA) in LPJ-GUESS is used to compute leaf area index (LAI) and indirectly affects the amount of photosynthesis. SLA also further impacts plant total N uptake since the N demand in plant organs is always associated with the photosynthetic assimilate in the model. The disagreements between modeled and observed C-N variables in the seasonal pattern (Figs. 6-7) can therefore be partially attributed to the static value of SLA implemented in LPJ-GUESS. Some studies have shown that SLA varies with crop growth development (Boote et al., 2002;Ainsworth et al., 2007) and environmental conditions (Poorter et al., 2009). In addition, low temperature, excess radiation, water deficit, or rising CO 2 concentration would also result in reduced SLA through affecting leaf area expansion and internode elongation (Ainsworth and Long, 2005;Yin and Struik, 2010). Applying SLA as a constant in the model (see Sect. 2.4.1) cannot reflect these responses. Incorporation of dynamic SLA over the crop growing season and its response to the environment remains to be taken into account in future model development.
Despite many experimental studies on the limitation of soil water deficit in biological N fixation, the nature of the relationship between legume BNF and soil water content is not well characterized in models. A linear water-limitation function incorporated in LPJ-GUESS (Eq. 11) implies, for instance, that the model has little potential to represent the situation when plants experience stress from excessive water (flooding). The impact of excess soil water on legume N fixation is either omitted or oversimplified in most crop models. For instance, a simple assumption adopted in Sinclair's model is that the N fixation process is stopped forcibly when flooding takes place (Sinclair et al., 1987). In STICS, the N fixation inhibition by water excess is represented as stress from hypoxia in the roots (Brisson et al., 2003). The process of legume BNF restraint by flooding is implemented into CROPGRO (Boote et al., 2008) by calculating the proportion of water-filled pore space. N fixation is assumed to only be restricted when all pore space is filled with water; however, this rule has not been well evaluated so far.
Although high soil mineral N concentration suppresses legume root nodulation and further impacts N fixation (Xia et al., 2017;Mourtzinis et al., 2018;Brar and Lawley, 2020), a moderate level of soil N in the vegetative growth stage is conducive to root growth and nodule formation, stimulating N fixation (Waterer and Vessey, 1993;Salvagiotti et al., 2008). In the field trials a specific threshold of soil N concentration above (below) which N fixation is inhibited (stimulated) is hard to measure. In addition, the timing of N application remains a challenge (Córdova et al., 2020). Some studies reported that applying N fertilizer at planting as starter N can increase yield gains because of sufficient soil-available N to stimulate early season soybean growth (Pikul et al., 2001;Osborne and Riedell, 2011;Gai et al., 2017). However, other studies argued that the best time to apply additional N would be at early reproductive growth stages, during which legumes have the greatest N demand for seed development; also, soil N reserves are depleting and N fixation rate starts slowing down (Mourtzinis et al., 2018;Córdova et al., 2019;Zhou et al., 2019). Unfortunately, as mentioned earlier, there are no consistent results on these measured factors, resulting in the difficulties in incorporating the mechanistic processes or setups into LPJ-GUESS at this point.
Taken together, the challenge of modeling legume N fixation is primarily due to its large variance between species, sites, and managements. Symbiotic nitrogen fixation by rhizobia is an extremely complex natural process, which is associated not only with host plant and soil N status in the macro-environment (see Fig. 2), but also with the process of Rhizobium or Bradyrhizobium bacteria in root nodules in the micro-environment (Rice et al., 2000). It is difficult to incorporate these two different but highly related processes into one model (Liu et al., 2011;Chen et al., 2016). Furthermore, there is an inadequate amount of information available to establish a reliable relationship between BNF and other factors such as soil pH (Rice et al., 2000;Vanlauwe et al., 2019), inoculation effectiveness (Tewari et al., 2004;Denton et al., 2017;Liu et al., 2019), salinity (Zahran, 1999;Bruning and Rozema, 2013), oxygen (Jiang et al., 2021), and other nutrition availability (Le Roux et al., 2009;Singh et al., 2012), which are currently missing from LPJ-GUESS and other crop models despite many field experiments demonstrating their importance.

Conclusions
In this study we implemented a mechanistic process of symbiotic biological N fixation in grain legumes into the crop module of LPJ-GUESS. The modeled C-N variables of soybean and faba bean were extensively evaluated with observed data from the site scale to a larger region. Our results showed that the BNF scheme adopted in LPJ-GUESS realistically responded to water and N managements, as well as to climate variation, and produced N fixation and yields which generally agreed with measurements.
Our model estimated that global biological N fixation in grain legumes (i.e., soybean and all pulses) was 17.2 ± 2.9 Tg N yr −1 during the period 1981-2016 and that the highest fixation rate occurred in tropical and temperate regions with a warm and moist climate. Soil water and temperature were dominant controls on N fixation, in addition to N fertilizer application rate. Processes missing from the model, such as inoculation effectiveness and soil acidity, might have biased our estimates of N fixation and yields at a global scale.
The dynamic process of N fixation with a C-N allocation scheme for crops in LPJ-GUESS provides an opportunity to estimate the changes in global grain legume production and global terrestrial C and N pools under future landuse or climate change scenarios. It can also help to predict and detect the potential contribution of N-fixing plants as "green manure" to reducing or removing the use of N fertilizer in global agricultural systems, considering different climate conditions, management practices, and land-use change scenarios.
Code and data availability. Global daily climate data from GSWP3-W5E5 are available at https://doi.org/10.48364/ISIMIP.342217 (Lange et al., 2021). National soybean and pulse yield statistics from FAO-STAT presented in this study can be retrieved from http://www.fao.org/faostat/en/#data/QC (last access: 9 May 2021, FAOSTAT, 2021). The rest of the model input data and measurement results used in this study can be accessed at https://doi.org/10.5281/zenodo.5148255 (Ma et al., 2021).
LPJ-GUESS is tested, refined, and developed by a global research community, but the model code is managed and maintained by the Department of Physical Geography and Ecosystem Science, Lund University, Sweden. The source code can be made available with a collaboration agreement under the acceptance of certain conditions. The code used in this paper is available to the editor and reviewers via a restricted link on the condition that the code is only for review purposes. Additional details and information can be found at the LPJ-GUESS website (https://web.nateko.lu.se/lpj-guess/, last access: 14 July 2021) or by contacting the corresponding author.
Author contributions. AA, SO, and JM conceived this study. SO and JM developed the model code. JM, SSR, ADB, PA, and AA designed the experimental protocol runs. JM carried out the analysis and produced the figures. SO, PA, and SSR assisted with data collection and parameter tuning. SSN provided soybean data in Kenya for model evaluation. JM wrote the original draft, with further editing from AA, SSR, ADB, PA, SO, and SSN.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.