The GGCMI phase II emulators: global gridded crop model responses to changes in CO2, temperature, water, and nitrogen (version 1.0)

1. Department of the Geophysical Sciences, University of Chicago, Chicago, IL, USA. 2. Center for Robust Decision-making on Climate and Energy Policy (RDCEP), University of Chicago, Chicago, IL, USA. 3. Potsdam Institute for Climate Impact Research, Member of the Leibniz Association, Potsdam, Germany. 4. Department of Computer Science, University of Chicago. 5. NASA Goddard Institute for Space Studies, New York, NY, United States. 6. Joint Global Change Research Institute, Pacific Northwest National Laboratory, College Park, MD, USA. 7. Unité de Modélisation du Climat et des Cycles Biogéochimiques, UR SPHERES, Institut d’Astrophysique et de Géophysique, University of Liège, Belgium. 8. Met Office Hadley Centre, Exeter, United Kingdom. 9. Ecosystem Services and Management Program, IIASA, Laxenburg, Austria. 10. Department of Geography, Ludwig-Maximilians-Universität, Munich, Germany. 11. Department of Geographical Sciences, University of Maryland, College Park, MD, USA. 12. Texas Agrilife Research and Extension, Texas A&M University, Temple, TX, USA. 13. Department of Statistics, University of Chicago, Chicago, IL, USA. 14. EAWAG, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland. 15. Laboratoire des Sciences du Climat et de l’Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France. 16. Department of Physical Geography and Ecosystem Science, Lund University, Lund, Sweden. 17. Earth Institute Center for Climate Systems Research, Columbia University, New York, NY, USA. 18. School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, UK. 19. Birmingham Institute of Forest Research, University of Birmingham, Birmingham, UK.


Sampling in variable space and cultivated area
Simulation sampling across the defined variable space is not uniform tn the GGCMI Phase II experiment, with only some models providing all cases in the protocol. Figure S1 compares the sampling density of the models used in the emulator analysis. Figure S1: Heatmap illustrating the number of models providing simulations for each of the scenarios in CTWN variable space. Black boxes mark the "baseline" cases for rainfed and irrigated simulations. The maximum number is 9, the number of models included in the emulator analysis. (That is, we exclude here the three GGCMI Phase II models not included in the emulator analysis.) For cases with N levels lower than 200 kg/ha, the maximum number of models is 6 since three models (CARAIB, JULES, and PROMET) do not represent varying N levels. One model (GEPIC) provided additional simulations at T+5 not specified by the protocol; these are not used in emulation. Normalized error calculations are run only over scenarios in which 9 models contribute simulations (pink boxes). Figure S2: Presently cultivated area in the real world for rainfed (left) and irrigated (right) crops, from the MIRCA2000 dataset (Portmann, Siebert, and Doell, 2010). Data are taken directly from the MIRCA2000 dataset for maize, rice, and soy. Winter and spring wheat areas are adapted from MIRCA2000 and sorted by growing season.

Variability changes in future climate projections
Because the GGCMI Phase II simulation dataset does not sample across changes in climate variability, large impacts to yields driven by future changing variability would decrease the practical utility of the emulator for impacts assessments. We therefore assess the scale of potential future changes in temperature variability, in RCP8.5 simulations from the five climate models used in ISIMP (the Inter-Sectoral Impact Model Intercomparison Project; Warszawski et al., 2014;Frieler et al., 2017). In manuscript section 4.3 we use one of these climate simulations, that from HadGEM2-ES, to assess the ability of GGCMI emulators to reproduce yield changes simulated under more realistic climate projections. We choose the HadGEM2-ES model because it shows the largest variability changes, and therefore provides a stricter test of the utility of a GGCMI emulator. Table S1 summarizes daily T max variability changes for each crop and model weighted by production. Figures S3 and S4 below show changes in variability in minimum and maximum temperatures in the HadGEM2 simulation for each crop growing season and area, and Figure S5 shows changes in daily T max variability for maize across the 4 additional ISIMIP climate simulations. (Compare to Figure  S3 upper left panel.) Most crop models included in GGCMI phase II take daily minimum and maximum temperature as inputs, though PROMET and JULES take sub-daily temperature inputs. Table S1: Global production-weighted fractional change in growing season daily maximum temperature variability under RCP8.5 for the five climate models included in the ISIMIP project (a subset of the CMIP-5 archive). Value for each crop and model is mean within-growing season temperature standard deviation across 30 growing seasons of 2070-2099 relative to that for 1981-2010, with grid-cell values weighted by LPJmL model simulated yields and current cultivation area (MIRCA). Values in parenthesis are the change in variability by the same metric for daily minimum temperature within the growing season. The HadGEM2-ES model is highlighted in bold because this model is used for our emulator evaluation in manuscript Section 4.3. The HadGEM2-ES model is chosen because it shows the highest changes in variability.

Model
Maize % Soybean % Rice % S. Wheat % W. Wheat % HadGEM2-ES 9.7 (2.1) 10.4 (-0.6) 10.1 (-3.3) 6.4 (4.7) 3.6 (1.7) GFDL-ESM2M 3.6 (0.9) 3.4 (0.6) 2.7 (-0. To determine the change we compute the mean standard deviation of daily T min in each historical growing season and take the mean across all 30 years; this metric therefore includes changes both in seasonality and in short-term variations but excludes interannual variability and longer-term trends. For winter wheat, growing-season variability reductions reflect the dampening of the seasonal cycle (stronger warming in winter). Strong percentage increases in the tropics reflect very low variability in the baseline. Productionweighted mean changes across crops range from -3% for rice to +5% for spring wheat (Table S1).
Note that changes may differ if calculated using an ensemble of simulations rather than a single projection as is done here. Figure S4: As in Figure S3 except now for daily maximum temperature. Changes in daily maximum temperature variability are generally higher than those for daily minimum temperature. Figure S5: As in Figure S4, change in daily maximum temperature variability, except now for maize only, for the remaining 4 ISIMIP climate simulations. Values are lower on average than for HadGEM2-ES but patterns can differ.

Yield response for A1 (growing season adaptation) simulations
This section shows illustrations of emulator ability to capture yield changes in A1 simulations; compare to main text Figures 5 and 6 showing A0 simulations. Responses to CWN factors are similar in both but responses to T are substantially weaker in A1 simulations, in which growing season length does not contract in warmer future conditions. Figure S6: Illustration of spatial variations in yield response, which are successfully captured by the emulator for the A1 simulations. Panels show simulations (points) and emulations (lines) of rainfed maize in the pDSSAT model in six example locations selected to represent highcultivation areas around the globe. Legend includes hectares cultivated in each selected grid cell. Each panel shows variation along a single variable, with others held at baseline values. Figure S7: Illustration of variations in yield response across models for A1 simulations, again successfully captured by the emulator. Panels show simulations and emulations from six representative GGCMI models for rainfed maize in the same Iowa grid cell shown above, with the same plot conventions. Three models (PROMET, JULES, and CARAIB) that do not simulate the nitrogen dimension are omitted for clarity.

Normalized error for other cases
In manuscript Figure 7 we show normalized error for the A0 emulators over all rainfed crops, models, and T and W values for baseline CO 2 and nitrogen levels (360 ppm and 200 kg ha-1). Here we show normalized error in some alternate cases for comparison: Figure S6: A0 emulators of rainfed crops at higher CO 2 , Figure S7: A1 emulators of rainfed crops at baseline values, Figure S8): A0 emulators of irrigated crops at baseline values. Results are generally similar, with a few exceptions. Normalized errors at higher CO 2 are generally lower because model disagreement is larger, lowering the denominator. Some model emulators for irrigation water demand are under-performing: LPJ-GUESS and CARAIB for some crops. A1 errors are larger than A0 errors for several crops and models: LPJmL rice, pDSSAT spring wheat, and PROMET winter wheat. Figure S8: Fraction of currently cultivated hectares with normalized emulation error less than 1 for the CO 2 =810 ppm and 200 kg N ha −1 yr −1 case for the temperature and precipitation perturbations scenarios provided by all 9 models included in the emulator analysis.

Emulation of yields in a realistic climate simulation at high latitude
In manuscript Section 4.3 we test the emulator against crop model simulations driven by a more realistic future climate projection to evaluate the impact of future variability changes that are not captured by the emulator. Figure S11 below isolates the mid-and high latitudes; compare to manuscript Figure 9 that shows global currently cultivated land. Results are generally unchanged by the restriction in latitude except for rice, which is typically grown in tropics and subtropics: only 20% of global rice production is grown north of 30N and 1% north of 45N, with even less in the Southern hemisphere, only 0.8% south of 30S and none south of 45S. Figure S11: Illustration of the ability of the emulator to capture a more realistic future climate simulation, as in main text Figure 9 but here restricted to latitudes north of 30N.

Emulator products
This section amplifies on manuscript Section 5 with additional figures analogous to manuscript Figures 10 and 11.   Figure S15: Illustration of the factors affecting yields in more realistic climate scenarios for rainfed and irrigated (current mix) soy. Conventions as in main text Figure 11. The split in PROMET soybean temperature response (panel a, note distinct groups of points) results from the model's sensitivity to differences in spatial patterns of temperature change across climate models.

Reduced specification (23-term) emulator examples
In this section we present analogous figures to those in the main text for the reduced-form (23-term) emulator. Issues with the reduced-form model are most prominent in PROMET for rice and soy, and JULES soy and spring wheat. We identify several potential factors that may in some way contribute to these models showing qualitatively different responses that require additional terms for emulation.
• PROMET and JULES do not allow nitrogen variation. (However, CARAIB also cannot vary N and is readily emulatable with the 23-term specification.) • Both JULES and PROMET models are land system process models, originally developed with a broad focus, which have been adapted for managed vegetation (agriculture) only recently (2015). (CARAIB, by contrast, was originally developed as a vegetation model in the early 90's and has a longer history of agricultural focus.) • Both PROMET and JULES have anomalously strong responses to individual factors in those crops problematic to emulate. PROMET is the most sensitive model of all the models for rice in C, T, and W, and JULES for soybeans in C, T, and W. For spring wheat, JULES is a high outlier in C, the most sensitive model in W and T, and shows an extra inflection point in the global temperature response not seen in any of the other models.
• PROMET is the quantitatively lowest-performing model for soybeans when compared to the historical FAO data for the top 10 producing countries. Figure S16: As in manuscript Figure 4, simulated (a.) and emulated (b.) yield under historical conditions for rainfed LPJmL maize, but here for the reduced (23-term) emulator specification. Emulator performance is worse primarily where crops are not currently grown. Figure S17: As in manuscript Figure 5, emulator performance in selected high-yield regions for rainfed pDSSAT maize (and one region for PROMET), but now with the reduced (23-term) emulator specification. Emulator performance is similar. Figure S18: As in manuscript Figure 6, emulator performance across models for rainfed maize in one grid cell in Iowa, but now with the 23-term emulator specification. Note that JULES and PROMET are not shown. Figure S19: As in manuscript Figure 7, normalized error of all 9 models emulated on currently cultivated land, over all crops and all sampled T and W inputs, with CO 2 and nitrogen held fixed at baseline values, now with the reduced (23-term) emulator specification. Degradation of performance is most evident in JULES soy and spring wheat and PROMET rice and soy. Figure S20: As in manuscript Figure 8, normalized error for rainfed crops in CARAIB for the T+4 scenario, but here with the reduced (23-term) emulator specification. Degradation of performance is most evident in marginal lands where crops are not currently grown. Figure S21: As in manuscript Figure 11, rainfed maize on currently cultivated land, but here with the reduced (23-term) emulator specification. Note that strong C response for PROMET is different here than with the full-form emulator, because higher order C (C 3 , C 2 * T ...) interaction terms are needed for accurate emulation.

Yield responses for other crops and models
Spatial patterns of yields are well captured for all crops and models. Manuscript Figure  4 illustrated this using LPJmL maize; for reference, we show here yield response spatial patterns for other crops and models. Figure S24: Spatial yield response and emulator error for LPJmL for all 5 GGCMI Phase II crops. Convention as in manuscript Figure 4. Figure S25: Spatial yield response and emulator error for pDSSAT for maize. Convention as in manuscript Figure 4. pDSSAT absolute yields are significantly higher than those in LPJmL but spatial patterns are similar.

Cross validation error for all models
In this section we present maps of cross validation error (values found in main text Table 3 are aggregated up from the grid cell level). Errors are generally low as a percentage of yield change in each grid cell. Errors above 10% of yield change in the out-of-sample test occur very rarely; the only significant instance is spring wheat in southern China in the PROMET model.