Potential yield simulated by Global Gridded Crop Models: a process-based emulator to explain their differences

How Global Gridded Crop Models (GGCMs) differ in their simulation of potential yield and reasons for those differences have never been assessed. The GGCM Inter-comparison (GGCMI) offers a good framework for this assessment. Here, we built an emulator (called SMM for Simple Mechanistic Model) of GGCMs based on generic and simplified formalism. The SMM equations describe crop phenology by a sum of growing degree days, canopy radiation absorption by the Beer-Lambert law, and its conversion into aboveground biomass by a radiation use efficiency (RUE). We fitted the parameters of this emulator against gridded aboveground maize biomass at the end of the growing season simulated by eight different GGCMs in a given year (2000). Our assumption is that the simple set of equations of SMM, after calibration, could reproduce the response of most GGCMs, so that differences between GGCMs can be attributed to the parameters related to processes captured by the emulator. Despite huge differences between GGCMs, we show that if we fit both a parameter describing the thermal requirement for leaf emergence by adjusting its value to each grid-point in space, as done by GGCM modellers following the GGCMI protocol, and a GGCM-dependent globally uniform RUE, then the simple set of equations of the SMM emulator is sufficient to reproduce the spatial distribution of the original aboveground biomass simulated by most GGCMs. The grain filling is simulated in SMM by considering a fixed in time fraction of net primary productivity allocated to the grain (frac) once a threshold in leaves number (nthresh) is reached. Once calibrated, these two parameters allow to capture the relationship between potential yield and final aboveground biomass of each GGCM. It is particularly important as the divergence among GGCMs is larger for yield than for aboveground biomass. Thus, we showed that the divergence between GGCMs can be summarized by the differences in few parameters. Our simple but mechanistic model could also be an interesting tool to test new developments in order to improve the simulation of potential yield at the global scale.


Introduction
Potential yield corresponds to the yield achieved when an adapted crop cultivar grows in nonlimiting environmental conditions (i.e. without water and nutrient stresses and in the absence of damages from weeds, pests and diseases) under a given crop management (e.g. plant density).
Fundamentally, it is determined by a reduced number of environmental variables: prevailing radiation, temperature and atmospheric CO2 concentration. Biotic variables such as cultivar characteristics (e.g. maturity group, leaf area index, root depth, harvest index), plant density and sowing date modulate how the environmental conditions are converted into yield. At local scale (field, farm or small region), potential yields can be estimated from field experiments, yield census, or by crop growth models (Lobell et al., 2009). Crop simulation models provide a robust approach because they account for the interactive effects of genotype, weather, and management (van Ittersum et al., 2013). These models are mathematical representations of our current understanding of biophysical crop processes (phenology, carbon assimilation, assimilate allocation) and of crop responses to environmental factors. Such models have been designed to separate genotype * environment * management interactions, for example by factorial simulations where one driver is varied at a time. Models require site-specific inputs, such as daily weather data, crop management practices (sowing date, cultivar maturity group, plant density, fertilization and irrigation amounts and dates), and soil characteristics; with some of them being not useful in the purpose to simulate potential yield. At local scale, crop models can be calibrated to account for local specificities, in particular for specificities related to the cultivar used at these sites (Grassini et al., 2011) Potential yield is also a variable of interest at large (country, global) spatial scales, in particular as it is required for yield gap analyses (van Ittersum et al., 2013;Lobell et al., 2009). Such analyses are necessary to get a large-scale picture of yield limitations and to investigate questions related to production improvement strategies, food security and management of resources with a global perspective. However, while crop models used at local scale can be calibrated to account for local specificities, it is much more complicated to model the spatial variations of yields at the global scale. Local crop models have been applied at the global scale either directly or through the implementation of some of their equations into global vegetation models (Elliott et al., 2015).
Global Gridded Crop Models (GGCMs in the following) provide spatially explicit outputs, typically at half degree resolution in latitude and longitude. Their simulations are prone to uncertainty. In particular, it is quite difficult to get reliable information about the diversity of cultivars   Increasing our confidence in potential yield simulated by GGCMs is required to improve our ability in replying to societal questions mentioned above. To do so, we need first to understand how and why GGCMs potentially diverge in potential yield simulation. The GGCM Inter-comparison (GGCMI phase I) provides a framework relevant to investigate the differences between GGCMs, as all GGCM modellers followed the same protocol (Elliott et al., 2015). Model outputs are available on the GGCMI data archive (Müller et al., 2019a). In the GGCMI framework, a simulation performed with harmonized growing period, absence of nutrient stress and irrigated conditions (see below) is particularly adapted to simulate potential yield. Figure 1 displays, for maize, the average and coefficient of variation (CV) of such simulated aboveground biomass (biom) and yield (grain) among 8 GGCMs participating to GGCMI that have been used in our following analysis. While the GGCM divergence under potential conditions is lower than the GGCM divergence when limiting factors are represented (Fig.S2), it remains relatively high. Figure 1 shows that the CV in potential conditions is higher for grain than for biom and the CV for grain can reach locally more than 50%.
To understand what could explain these differences, we built a mechanistic emulator of GGCMs based on generic processes controlling the accumulation of biomass (phenology from the sum of growing degree days, light absorption, radiation use efficiency) and the transformation of biomass into grain yield (trigger of yield formation, allocation of net primary production (NPP) to yield). We then calibrated the parameters of the emulator independently for each GGCM against GGCM simulated biom and GGCM simulated relationship between biom and grain. Our assumption is that a simple set of equations, with calibrated parameters, can reproduce the outputs of most GGCMs and could be used to explore the sources of differences between them. We choose to use a processbased (even if very simple) model as we expect that this model could propose interesting perspectives as explained in the Discussion. In particular, if able to reproduce the results of an ensemble of GGCMs, it could be an alternative to the model ensemble mean or median usually used in inter-comparison exercises (Martre et al., 2015). Running much faster than GGCMs, it would also be an interesting tool to test new developments, such as the implementation of cultivar diversity, to improve the simulation of potential yield at the global scale.

GGCM emulator
For any given day d of the growing season (defined here as the period between the planting day, tp, and the crop maturity, tm), we used the equations 1-7 which rely on concepts commonly used in modelling of ecosystem productivity to compute the potential aboveground biomass (biom, in t DM ha -1 ). Variables and parameters are summarized in Table 1 and a simplified flow chart is given in For any d in [tp, tm], the thermal time (TT) is computed from the daily mean temperature (tas, in °C) by using a reference temperature (T0): GDD is the sum of growing degree days, defined as follows: The number of leaves per plant (nleaf) is computed from GDD and one parameter representing the thermal requirement for the emergence of any leaf (GDD1leaf): where maxnleaf is the maximum number of leaves. In our model, as soon as one leaf emerges, we assumed that it reaches its fully expanded leaf area, which is the same for all leaves (individual leaf area, called Sleaf hereafter). The incoming photosynthetic active radiation (PARinc) is derived from the short-wave downwelling radiation (rsds in MJ m -2 day -1 ) and its active fraction, f: The absorbed PAR by the canopy (APAR) is determined by assuming an exponential function according to the Beer-Lambert law: where C is a constant (see below The parameter C of Eq.5 can be decomposed in different parameters: with k is the light extinction coefficient, Sleaf is the individual leaf area and dplant is the plant density. The product of Sleaf, dplant and the number of leaves of a given day d (i.e. nleaf(d)), corresponds to the Leaf Area Index (LAI) of the same day, i.e.: in such a way that Eq.5 can be re-written as: We preferred Eq.5 instead of Eq.5bis as we cannot calibrate separately the different parameters composing C and because we do not have any information about the LAI from GGCMs (see discussion).
To compute the grain biomass at maturity, we first define the day k such as : The variable grain (in t DM ha -1 ) could be considered either as reproductive structures + grain, or grain only. The parameter nthresh is a threshold in the number of leaves from which either the formation of reproductive structures starts, or the grains form or the grain filling starts. Above equations aim to be generic and to reproduce the diversity of approaches in GGCMs. That is why we do not distinguish here the production of reproductive structures and the accumulation of assimilates in grains after anthesis.
Equations 1-7 and 10-13 are called SMM (for Simple Mechanistic Model) in the following. We The variable grainSMM is used to approach the potential yield. Our analysis focuses on maize because of the importance of cereals in human food and because of the widespread distribution of this crop across latitudes.

Set-up
We focused first on the computation of biomSMM, then on the relationship grainSMM vs biomSMM. The sensitivity of SMM to each parameter involved in the computation of biomSMM was first studied.
Then, we calibrated SMM against each GGCM to make the spatial distribution of biomSMM mimic the spatial distribution of aboveground biomass at maturity simulated by each GGCM (called biomGGCM hereafter). This calibration happened in two steps. The 1 st step concerned C and RUE which have one value at the global scale. The 2 nd step concerned GDD1leaf that we made varying in space to mimic procedure used by GGCM modellers in GGCMI (see below). The choice of focusing on C, RUE and GDD1leaf is justified below. In a last step (step 3), we calibrated nthresh and frac to make SMM mimic the relationship grain vs biom of each GGCM.
In GCCMI, all GGCMs followed a common protocol and were forced by the same weather datasets.
We focused here on the simulations forced by one of them, the AgMERRA dataset . of the growing season which were derived from a combination of two global datasets (MIRCA (Portmann et al., 2010) and SAGE (Sacks et al., 2010), (Elliott et al., 2015)). In harmnon simulations, in addition to forced timing and duration of the growing season, all GGCMs experienced no nutrient limitation, through prescribed fertilizer inputs. Besides this harmonization level, two water regimes have been considered: irrigated and non-irrigated. For our analysis focusing on the simulation of potential yield, we decided to select the configuration (harmnon and irrigated). This is true for all GGCMs considered, but CGMS-WOFOST. In fact, the harmnon simulation was not provided for CGMS-WOFOST but, because i) this model does not consider nutrient limitation, and ii) the growing season was prescribed in the default simulation, we assumed that the potential yield could be approached by the (default and irrigated) simulation.
For EPIC family models (here, GEPIC and EPIC-IIASA), we used a corrected biomGGCM computed as below as it has been noticed that some issues related to the variable biom appeared in the outputs available on the GGCMI data archive likely related to the output time-step of specific variables (Folberth, personnal communication, 2019): where HImax is the maximum harvest index (no unit), varying in space as function of cultivars. In EPIC, the actual HI at harvest only differs to HImax if a drought stress occurs in the reproductive phase. Because this stress was virtually eliminated by sufficient irrigation in the harmnon x irrigated simulations, the Eq.16 provides the most accurate estimate of aboveground biomass at harvest. Map of cultivar distribution, used as input to the EPIC models ( Figure 1 and Table D in ), have been considered here to compute the corrected biomGGCM.

Input variables for SMM
We focused our analysis on the growing season starting in calendar year 2000 (and potentially finishing in calendar year 2001). SMM was forced by the short-wave downwelling radiation (rsds in MJ m -2 day -1 ) and the daily mean temperature (tas, in °C) from the AgMERRA weather dataset . SMM also needs the begin and end of the maize growing season and we used respectively the planting day (tp) and the timing of maturity (tm), both being provided in the output of each GGCM. Despite the fact that all GGCMs are forced by the same growing season in harmnon, some GGCMs allow flexibility in regards to tp and tm prescribed as input (Müller et al., 2019b), as suggested by the GGCMI protocol: "crop variety parameters (e.g., required growing degree days to reach maturity, vernalization requirements, photoperiodic sensitivity) should be adjusted as much as possible to roughly match reported maturity dates". Thus, we cannot use tp and tm from GGCM input files (Text S1).
for each GGCM growing season. For a given GGCM, SMM simulations were performed only for grid-cells considered in the given GGCM. In addition, grid-cells for which information about the growing season from MIRCA and SAGE was not available are masked to prevent to consider gridcells where internal GGCM computation was performed.
The maps of cultivar distribution used by EPIC models  were also used as inputs to SMM in the simulation aiming to mimic the biom vs grain relationship of EPIC models (see Sect.2.2.4.2).

Sensitivity of global biomSMM to SMM parameters
Except the active fraction of short wave downward radiation (f in Eq.4) the value of which is physically well-known, other parameters involved in the computation of biomSMM (T0, maxnleaf, GDD1leaf , C, RUE) are relatively uncertain. The sensitivity of the global averaged biomSMM to these parameters was assessed by performing 3125 (i.e. 5 5 ) SMM simulations allowing to combine 5 different values for each parameter. In each of these SMM simulations, a given parameter was constant in space. The initial estimate of each parameter was provided in Table S1. While the initial estimate of each parameter was based on literature, we chose quite arbitrarily the same range of variation of [50-150%] (in % of the initial estimate) for all parameters, with the 5 values tested uniformly distributed within the range of variation (i.e. 50, 75, 100, 125, 150% of initial guess).
Following our current knowledge based on observations, it would be partly possible to choose a different uncertainty range for each parameter: for instance, literature tends to show that RUE is relatively well constrained for maize (Sinclair and Muchow, 1999) while the C parameter, which depends on plant density, is expected to vary a lot as a function of the farming practices (Sangoi et al., 2002;Testa et al., 2016) (Table S1) GGCMs. In the following, we used MJ of absorbed PAR, to be consistent with our Eq.6.
Observations also showed that RUE decreases during grain filling following the mobilization of leaf nitrogen to the grain (Sinclair and Muchow, 1999). Thus, RUE is larger during vegetative growth (3.8-4.0 g DM (MJ of absorbed PAR) -1 (Kiniry et al., 1989)) than averaged over the whole season (3.1-4.0 g DM MJ -1 (Sinclair and Muchow, 1999)). It is likely than some GGCMs used RUE values which are not representative to the whole growing season. Thus, we used the same range of uncertainty for all parameters in our calibration procedure. Exploring a wider range of values also allows for a more complete assessment of GGCM performance.
Potential confusion in units mentioned above also lead us to chose an initial estimate of RUE (2 g DM MJ -1 ) lower than values commonly reported in the literature (3.1-4.0 g DM MJ -1 ) (Sinclair and Muchow, 1999) but note that the highest values of RUE tested during our calibration (3.0 g DM MJ -1 ) reach the literature-based range.
The global mismatch between each GGCM and SMM was quantified thanks to the Root Mean Square Error (RMSE) computed as follows: where u is a combination of parameters and g is a grid-cell among the N grid-cells considered for each GGCM. All grid-cells are assumed independent and have the same weight in the RMSE computation. RMSE has the same unit as biom (t DM ha -1 ).

SMM calibration against each GGCM
SMM was calibrated following 3 steps. The first two steps aimed to mimic biomGGCM distribution while the last step aimed to make SMM reproduce the relationship grain vs biom of each GGCM.
The procedure of calibration was summarized in Table 2. "Emulated GGCM" is used from now to define SMM output after SMM calibration aiming to mimic a given GGCM.

Parameters involved in the computation of biom SMM
Regarding the simulation of biom, we restricted the calibration to RUE, C and GDD1leaf  GDD1leaf (see below). These parameters (f, maxnleaf, T0) were prescribed equal to their initial estimate and were the same for all SMM simulations.
We choose to make C and RUE globally constant and GGCM-dependent. The decision to not make C and RUE vary in space is consistent with the rule of parsimony, that we aimed with SMM. It also follows the procedure commonly used in GGCMs that involved similar approach. For instance, GEPIC is based on a biomass-energy conversion coefficient that does not vary in space (Folberth et al., 2016). Plant density (hidden in C) is constant in space in LPJmL (Schaphoff et al., 2018b). We calibrated C and RUE at the same time to assess potential compensation between these parameters in SMM. The three pairs (C, RUE) that minimized the most the global RSME computed following Eq.17 among the pairs tested were chosen. A fourth pair corresponding to (C, RUE), where C is equal to its initial estimate, has been used. The use of four different pairs aimed to assess the sensitivity of our conclusions to the parameter values. For each (C, RUE) pair, we finally calibrated GDD1leaf. We made GDD1leaf vary in space as it it is allowed in the GGCMI exercise. In the GGCMI protocol, accumulated thermal requirements were adjusted to catch the growing season (duration and timing) prescribed as input in the harmnon GGCM simulation. In SMM, the procedure slightly differs as we calibrated thermal requirements to match biomGGCM: for each grid-cell, GDD1leaf is chosen among its 5 possible values to minimize the absolute difference between biomGGCM and biomSMM. Grid-cells were considered independently.
The ability of SMM to match the spatial distribution of biomGGCM for each GGCM after SMM calibration was measured through: the bias, RMSE and Nash-Sutcliffe model efficiency coefficient where g refers to any grid-cell and biomGGCM is the average of biomGGCM over grid-cells. NS=1 means that SMM perfectly matches the spatial distribution of biomGGCM.
To assess the mismatch between biomGGCM and biomSMM after SMM calibration for a given GGCM, we compared the RMSE of two sub-groups of grid-cells that differ according to a third variable related to climate or soil type. are involved in the computation of grain for any day from biom, namely nthresh and frac. The calibration of these parameters aims to make SMM able to mimic the relationship between grain and biom at the end of the growing season from each GGCM. One global and GGCM-dependent pair (nthresh, frac) was chosen by using the following criteria: where u corresponds to a given pair (nthresh , frac), AX is the area defined by the grid-cell clouds in GGCMs introduced a cultivar diversity in parameters related to grain filling and in that case, the calibration of (nthresh, frac), instead of being done at the global scale, was made for each cluster of grid-cells corresponding to a given cultivar. The distribution of cultivars from EPIC was used as input of SMM in that case.

Contribution of different processes to yield in SMM
We computed ratios between some SMM internal variables to assess the global contribution of different processes represented in SMM to the achievement of grainSMM. The following ratios have been computed: nleaf/tas, APAR/rsds, APAR/nleaf, biom/APAR, grain/biom. The ratio nleaf/tas represents the phenology sensitivity to temperature; APAR/rsds reflects how radiation is absorbed by the canopy and APAR/nleaf represents the absorption sensitivity to phenology, biom/APAR reflects the conversion from absorbed radiation to biomass and grain/biom represents the harvest index.
We also investigated how the contribution of the different processes to the achievement of grainSMM varies between emulated GGCMs. Variations in these ratios reflects the difference in global averaged key parameters between emulated GGCMs. For instance, variations in grain/biom between emulated GGCMs reflects differences in calibrated nthresh and frac (Fig.S3).
To compute the different ratios, averages over the growing season were used for tas, rsds, APAR and nleaf, while the end of the growing season were used for biom and grain (so called biomSMM and grainSMM, Eq. 14-15). A given ratio was computed for each grid-cell and its grid-cell distribution was plotted in the following as barplot (Sect.3.4 and Fig.8).

Results
3.1 Global averaged biomSMM: sensitivity to each parameter and calibration of (C, RUE) As expected when looking at Eq.6-7, the global averaged biomSMM sensitivity to RUE was large ( Fig.2). Varying RUE was the only way possible to capture the global averaged biomGGCM for LPJ-GUESS (Fig.2). The global averaged biomSMM was only slightly sensitive to other parameters as In Fig.3, we plotted how RMSE changes according to both C and RUE varying at the same time, all other parameters being equal to their initial estimate. Figure 3 shows that C and RUE can compensate in SMM. Calibrating (C, RUE) (with one global value for each parameter) allows to reach RMSE around 4 t DM ha -1 for all GGCMs but LPJ-GUESS and LPJmL (around 2 and 3 t DM ha -1 respectively) (Fig.3). We chose 3 pairs (C, RUE) among the 25 tested couples that minimized the RMSE to assess the sensitivity of our conclusions to the pair chosen. Using a fourth pair with the same C for all GGCMs equal to its initial estimate decreased only slightly the ability of SMM to match the GGCMs (magenta dots in Fig.3) and did not change drastically the RUE compared to the ones when both C and RUE were calibrated.

Calibration of GDD1leaf
Once (C, RUE) was globally chosen, a spatially varying GDD1leaf was calibrated. After calibration, SMM was able to catch the spatial variability of biomGGCM for most GGCMs (Fig.4 and Fig.5a).
Difference in percent can be large, especially for regions with small biom but the global distribution was relatively well captured (Fig.4).
Global RMSE reaches between ~1 (LPJ-GUESS) and 3.3 t DM ha -1 (EPIC-IIASA) (Fig.5a). The 0.52 (with heat stress) for pAPSIM (Fig.S5). The limited increase can be explained by the fact that optimized GDD1leaf in the SMM simulation without heat stress encompasses a part of the heat stress for these grid-cells.
EPIC family GGCMs simulate some other stresses, such as stresses related to salinity and aeration, that could have an effect on the potential yield even in the (harmnon and irrigated) simulations during the calibration give similar results (3 rd line of Fig.S7). These results are obvious as allowing more variation in SMM allows a best fit to GGCM in more grid-cells. This underlines the difficulty to make SMM functioning as a mechanistic model (see Discussion).

Calibration of parameters involved in grain computation (nthresh, frac)
Varying (nthresh, frac) allows the dots (corresponding to the different grid-cells) to define different shapes in the space yieldSMM vs biomSMM ( Fig.S8 for pDSSAT as example). nthresh=0 leads to linear relationship between yieldSMM and biomSMM with a slope equal to frac (left panels of Fig.S8). Nonnull nthresh make some grid-cells deviate to this linear relationship and the number of such grid-cells increases with nthresh (Fig.S8). For all GGCMs, we found a (nthresh, frac) combination that allows the relationship yieldSMM vs biomSMM to fit the relationship yieldGGCM vs biomGGCM (Fig.7). For CLM-crop, we are not able to reproduce the cloud of dots corresponding to grid-cells where the potential yield is below the line grain=80%*biom. For EPIC family GGCMs, a calibration per cluster of grid-cells sharing the same cultivar is required.

Contribution of different processes to the achievement of grainSMM
The ratio nleaf/tas is relatively constant among the emulated GGCMs and this is true whatever the (C, RUE) pair chosen (Fig.8a). The ratio nleaf/tas reflects GDD1leaf. The calibrated GDD1leaf, even if its spatial distribution varies from one GGCM to the other (see previous section), remains relatively constant at the global scale between GGCMs.
The ratios APAR/rsds and APAR/nleaf (Fig.8b-c) vary a lot between GGCMs but this variation is of the same order of magnitude as the one between (C, RUE) pairs. These ratios reflect C, which is highly variable between GGCMs and between pairs.
The ratio biom/APAR (Fig.8d) reflects global RUE. Calibrated RUE varies a lot between GGCMs and only slightly between pairs for a given GGCM.
The ratio grain/biom (Fig.8e) varies a lot between GGCMs and is only slightly sensitive to the (C, RUE) pair. This ratio reflects a combination of nthresh and frac. GGCMs with nthresh equal to 0 (LPJmL, EPIC-IIASA) have no grid-cell variability in grain/biom (Fig.8e). Overall, and whatever the GGCM variability at grid-cell scale, we can distinguish i) emulated GGCMs that convert a large fraction of biom to grain, as CLM-crop, ii) emulated GGCMs that convert around 50% of biom to grain/biom between GGCMs is consistent with the fact that difference in grain among GGCMs is larger than the one in biom (Fig.1).

Discussion & Conclusions
We showed that a simple set of equations with one GGCM-dependent global RUE and spatially variable thermal requirement (GDD1leaf) is able to mimic spatial distribution of aboveground biomass of most GGCMs. Calibrating one additional global parameter at the same time as RUE (namely C) improves only slightly the fit between SMM and GGCM and modified in a small extent the calibrated value of RUE. RUE represents canopy photosynthesis and GDD1leaf determines crop duration, i.e. the two main drivers of crop productivity (Sinclair and Muchow, 1999). The relationship between potential yield and aboveground biomass of GGCM is captured by the calibration of two additional global parameters: one that triggers the start of grain filling and one corresponding to a time-invariant fraction of NPP allocated to the grain. These two parameters allow to catch the relationship between biom and grain from all GGCMS. This feature of SMM is particularly important as we showed that the divergence between GGCMs is larger for grain than for biom (Fig.1). Cultivar diversity regarding these latter parameters is nevertheless required to catch the behavior of some GGCMs. Despite apparent complexity in GGCMs, we showed that differences between them in regards to potential yield can be explained by differences in few key parameters.
Our approach has few caveats. First, SMM could be able to fit individual GGCMs for the wrong reasons, i.e. following a compensation between SMM internal processes which is not representative of the considered GGCM. We think that this issue is nevertheless minimized in our approach. First, we investigated how parameters can compensate, e.g. by calibrating at the same time RUE and C.
We showed that calibration of C is of second order importance and getting calibrated RUE less varying among emulated GGCMs would require very extreme values for C, well behind the range of values allowed in our calibration. The parameter C encompasses different parameters (see Eq.8) and a better alternative would be to separate them as well as to explicitly simulate the Leaf Area Index (LAI) variable. SMM-simulated LAI would be compared to GGCM output and this comparison would reduce further the risk of compensation between processes in SMM. However, LAI was not available neither from GGCMI data archive nor upon request to GGCM modellers. We stress the need to incorporate this output variable in next inter-comparison exercise. It is also important to note that the average over the growing season of LAI or LAI at set fractions of the growing season (including anthesis) would be more interesting than LAI at harvest as, under potential conditions, LAI at harvest is very likely close to maximum LAI allowed by the different GGCMs. This could be explained by differences between GGCM and SMM in the choice of processes represented (e.g. net productivity in SMM instead of balance between gross productivity and plant respiration in some GGCMs) or for a given process, in the choice of equations used to represent it (e.g. Farquhar (CLM-crop) vs RUE (SMM) for assimilation). The representation of stomatal conductance and CO2 assimilation rate within Farquhar equations introduces a sensitivity to variables not considered in the RUE-based approach (e.g. water vapour pressure deficit) in line with observations that show that RUE is sensitive to many variables (Sinclair and Muchow, 1999). This would lead to differences in the spatial variability of simulated aboveground biomass as compared to one simulated with spatially constant RUE. The succession of phenological stages with different parametrizations in some GGCMs (e.g. in pAPSIM (Wang et al., 2002)) can also partly contribute to differences with SMM as plant development is continuously simulated in SMM, as it is in other GGCMs (e.g. LPJmL, (Schaphoff et al., 2018a)). Other possibility of mismatch is related to some limiting factors (nutrients, water, etc.) that could exist in the harmnon and irrigated simulation for  (Sinclair and Muchow, 1999) concern aboveground biomass only. Second, LAImax in GEPIC is lower than values used in SMM: in GEPIC, LAImax which vary with plant density and is equal to 3.5 at plant density of 5 used in these simulations while LAImax in SMM (derived from Eq.9 when nleaf reaches its maximum) is equal to 5. Lower LAImax in GEPIC than SMM can compensate higher The diversity in apparent RUE found between GGCMs raises the question about the physical meaning of the parameters used in each GGCM. GGCMs are tools first dedicated to simulate actual yield and could have been tuned in that purpose against local observations. During that tuning, processes not explicitly represented in a given GGCM could be implicitly incorporated in parametrizations of other processes. For instance, it could be the case for GGCMS that do not incorporate explicitly nutrient limitations. Potential yield is a variable that has been computed in a second step and that could suffer from these implicit incorporations. At the end, the divergence in potential yield between GGCMs raise the questions about their ability to reproduce real process at the basis of actual yield as this latter depends on the combination between potential yield and many limiting factors.
Our study has some implications for GGCMs modellers in regards to the simulation of potential yield. We showed that differences between GGCMs can be explained by differences in few key parameters, namely the RUE and parameters driving the grain filling (nthres and frac). For RUE, we recommend to GGCMs modellers to investigate why each individual GGCM has a so small (explicit or implicit) apparent RUE. We showed that nthresh and frac vary a lot between GGCMs. For GGCMs with nthresh equal to 0, a parametrization based on a better distinction between emergence of all leaves and the period from flowering to maturity could be interesting. We also showed that nthresh and frac determines harvest index (HI) and we showed that HI vary a lot between GGCMs. Thus, we advice that more effort needs to be directed in assembling a global dataset of HI for either parametrization or evaluation of GGCMs. Maximum HI that plant can reach is a cultivar characteristic and one possibility to build such dataset could partly rely on information from seed companies. Finally, we suggested that next inter-comparison exercise encompass LAI and the beginend of the different growing season periods (vegetative period, flowering, etc.) for GGCMs that distinguish such stages.
Model ensemble mean or median from inter-comparison exercise is commonly preferred to individual GGCM as it has better skill in reproducing the observations (Asseng et al., 2014;Martre et al., 2015). Our mechanistic-model tuned against GGCMs could be a viable alternative to this ensemble mean/median as it allows to keep tracks of processes leading to the final variable, namely here the potential yield. Once tuned, SMM could be forced by an ensemble of parameters to reproduce the ensemble of GGCMs. Our emulator could also offer a potential for GGCMs conditions would nevertheless require the implementation of some missing mechanisms (e.g. heat stress and effect of CO2).
In purpose to simulate potential yield at the global scale, our emulator forced by daily temperature and radiation, growing season and with few adjustable parameters could be considered as an interesting alternative to GGCMs as they are easier to manipulate and allow much faster simulations. For instance, our model could be used to investigate the implementation of cultivar diversity at the global scale. The introduction of cultivar diversity is a keystone in development of crop models at the global scale (Boote et al., 2013). Cultivar diversity considered at the global scale was mainly related to phenological development through sensitivity to photoperiod, sensitivity to temperature (and vernalization for winter cultivars) (van Bussel et al., 2015b). The effect of cultivar diversity on allometry (e.g. through variability in harvest index) was considered in a less extensive extent at the global scale and restricted to some EPIC models  or pDSSAT in a specific study (Gbegbelegbe et al., 2017). Through protocol of GGCMI in which thermal requirements are tuned to match the growing season, a cultivar diversity was implicitly accounted for. The same applies for SMM. The parameters of the emulator, here fitted to reproduce GGCM output could also be fitted to global dataset based on census/observations in a manner similar to that done with PEGASUS (Deryng et al., 2011) but here applied on potential yield (against real yield for PEGASUS). For instance, SMM could be calibrated against global dataset of potential yield based on statistical approach (Mueller et al., 2012)  Variable in space and GGCM-dependent variability Spatial variability of biom *: nthres and frac are variable in space as function of the cultivar when SMM aims to mimic EPIC family models as these latter consider some cultivar diversity in harvest index.

Supplementary Information
Text S1: Growing season as input/output of GGCMs Text S2: Implementation of heat stress Table S1: Parameter values during the calibration procedure Five different values are used for each SMM parameter and thus, 5 5 SMM simulations have been performed. The different lines correspond to the different GGCMs. For a given line, and for a given column (i.e. for a given SMM parameter), each dot represents all SMM simulations sharing the same value for this parameter, which defines its color. Error-bars corresponds to the standard deviation among these SMM simulations. Each dot is defined through the RMSE(biomGGCM,biomSMM) (x-axis, in [t ha -1 ]) and the global averaged biomSMM (y-axis, in [t ha -1 ]). The black horizontal line reports the global averaged biomGGCM on y-axis. The difference of biomSMM between rows for a given column results from differences in growth periods between GGCMs (and in a lesser extent from difference in the spatial coverage of maize between GGCMs). Global values (average, RMSE) are computed by giving the same weight to each grid-cell.  comparison between GGCM and SMM after its calibration. The figure displays the grain vs. biom relationship for GGCMs (left column) and for SMM after independent calibration against each GGCM (right column). For SMM, the relationship is given for the (nthresh, frac) combination that maximizes the match between GGCM and SMM (see Methods). Calibrated combination is given in top of each SMM panel. When multiple cultivars are considered, one calibrated (nthresh, frac) combination is given per cultivar. In right panels, pink shading defines AGGCM (see Methods).