the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The statistical emulators of GGCMI phase 2: responses of year-to-year variation of crop yield to CO2, temperature, water and nitrogen perturbations
Weihang Liu
Christoph Müller
Jonas Jägermeyr
James A. Franke
Haynes Stephens
Shuo Chen
Abstract. Understanding the impact of climate change on year-to-year variation of crop yield is critical to global food stability and security. While crop model emulators are believed to be lightweight tools to replaces the models per se, few emulators have been developed to capture such interannual variation of crop yield in response to climate variability. In this study, we developed a statistical emulator with machine learning algorithm to reproduce the response of year-to-year variation of four crop yield to CO2 (C), temperature (T), water (W) and nitrogen (N) perturbations defined in the Global Gridded Crop Model Intercomparison Project (GGCMI) phase 2 experiment. The emulators were able to explain more than 92 % variance of simulated yield and performed well in capturing the year-to-year variation of global average and gridded crop yield over current croplands in the baseline. With the changes in CTWN perturbations, the emulators could well reproduce the year-to-year variation of crop yield over most current cropland. The variation of R and the mean absolute error was small under the single CTWN perturbations and dual factor perturbations. These emulators thus provide statistical response surfaces of yield, including both its mean and interannual variability, to climate factors. They could facilitate spatiotemporal downscaling of crop model simulation, projecting the changes in crop yield variability in the future, and serving as a lightweight tool of multi-model ensemble simulation. The emulators enhanced the flexibility of crop yield estimates and expanded the application of large-ensemble simulation of crop yield under climate change.
- Preprint
(2687 KB) - Metadata XML
-
Supplement
(1914 KB) - BibTeX
- EndNote
Weihang Liu et al.
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2023-74', Anonymous Referee #1, 07 Jul 2023
Liu et al. present a well motivated analysis that could provide the climate and crop community with a lightweight tool to apply to a variety of climate impacts studies. Their manuscript is well written and clearly presented, but their cross validation analysis, which provides the basis of the manuscript, is flawed. To their credit, they admit this flaw, but recognizing it is not sufficient.
The 10-fold cross validation should be withholding the entire domain for selected years, not just random samples for the exact reason that they state (e.g. spatial autocorrelation). The Köppen–Geiger approach within a 10-fold cross validation could also work (withhold entire Köppen–Geiger class for 10% of years). But the 10-fold cross validation that is presented is not valid and should not be the default metric presented throughout the paper. Provided the strong spatial autocorrelation of climate and yields in this model setup in many locations at the grid-cell level, the 10-fold cross validation cannot be considered an out-of-sample analysis and should not be presented as such. Few of the graphs specify whether the 10-fold cross validation is the basis of the results, but I assume that is the default cross validation chosen to compute the model evaluations based on Figure 2. I encourage the authors to correct their cross-validation and revise their results accordingly.
L75-76 grammar typos
L169: CDD is consecutive dry days not drought days
L194-196: I’m not sure what a “spatial difference term” means or what the clarification of a “temporal constant growing season length” means. Is this a location fixed effect applied at the grid-cell level? Could you clearly write out an equation for the covariates going into you XGBoost model? This would help to clarify which variables are location invariant, which are time invariant, and which vary both with location and with time. Table 2 is good, but it doesn’t clarify which covariates change in space vs time. Alternatively you could add this information into Table 2.
Line 270: remind readers what A0 and A1 simulations mean
Table 3: consider adding a key for A0 and A1 as you have for W and Winf
Minor point: While the manuscript is generally clear and well written, grammar should be checked throughout the manuscript. There are typos throughout.
Citation: https://doi.org/10.5194/gmd-2023-74-RC1 -
RC2: 'Comment on gmd-2023-74', Anonymous Referee #2, 01 Aug 2023
The content of this study meets the requirements of this journal, and the paper is highly complete with exquisite images. However, there are certain deficiencies in using process-based models to simulate the impacts of current climate change. Therefore, exploring the use of artificial intelligence algorithms to improve the limitations of the current process-based models is a worthwhile direction, in my opinion, and has the potential for publication.
Line 34: To be frank, I didn't know what "per se" meant, so I looked it up and found that it is a commonly used term in written language. However, I am unsure if it is appropriate to use it here. I have been using English for over twenty years, and this is the first time I have come across this usage.
Line 59-60: I am somewhat confused about the author's statement here. The author mentions that "the relationship between climate factors and crop yield is constrained by the current climate conditions." What does this mean? Aren't both of these models also constrained by the current climate conditions?
Line 75: The readers are already aware that the scenario-based future crop yield projection is not a systematic perturbation of climate factors. What other limitations are there in the scenario-based future crop yield projection that should be mentioned here?
Line 114: I think you can avoid mentioning soybean here. It is confusing why soybean is not included in your subsequent research.
Regarding Figure 3 and the following 1:1 graphs, while they are visually appealing, they provide limited information. We can see that the points align along the 1:1 line, but no statistical indicators are presented.
Citation: https://doi.org/10.5194/gmd-2023-74-RC2 -
AC1: 'Comment on gmd-2023-74', Tao Ye, 21 Aug 2023
Response to Reviewers
Reviewer #1
Liu et al. present a well motivated analysis that could provide the climate and crop community with a lightweight tool to apply to a variety of climate impacts studies. Their manuscript is well written and clearly presented, but their cross validation analysis, which provides the basis of the manuscript, is flawed. To their credit, they admit this flaw, but recognizing it is not sufficient.
RE: Thanks for your suggestions and comments. We will revise the text according to your suggestions and respond to your comments point-by-point as follows.
The 10-fold cross validation should be withholding the entire domain for selected years, not just random samples for the exact reason that they state (e.g. spatial autocorrelation). The Köppen–Geiger approach within a 10-fold cross validation could also work (withhold entire Köppen–Geiger class for 10% of years). But the 10-fold cross validation that is presented is not valid and should not be the default metric presented throughout the paper. Provided the strong spatial autocorrelation of climate and yields in this model setup in many locations at the grid-cell level, the 10-fold cross validation cannot be considered an out-of-sample analysis and should not be presented as such. Few of the graphs specify whether the 10-fold cross validation is the basis of the results, but I assume that is the default cross validation chosen to compute the model evaluations based on Figure 2. I encourage the authors to correct their cross-validation and revise their results accordingly.
RE: Thanks for your suggestions. We are now using a new cross-validation strategy and update the corresponding results.
Considering the spatiotemporal autocorrelation of simulated crop yield given by GGCM, we now used a “held out years and regions” strategy for leave one-year-out cross-validation (Roberts et al., 2017; Sweet et al., 2023). Specifically, the all grid-year samples are split into N folds. N is determined by the number of Köppen–Geiger (KG) classes, which have more than 100 grid cells with harvested areas. If there are too few harvested areas in one KG class, it will not be included in the cross-validation process. As N differed by crop and emulator, we would later report it in the revised manuscript. For each fold of emulator training and validation, we withhold 10% of years (the last 3 years) and one entire KG class for validation, and the other grid-year samples are used for training the emulator. We think selecting continuous years for validation can avoid temporal autocorrelation. If we randomly select 10% of years, the correlation between adjacent years still exist. Actually, any continuous three years are able to solve this problem, such that we just use the last years according to the choice of (Sweet et al., 2023).
We expect that results will change with this new cross-validation approach (see e.g. Sweet et al. 2023) and the presentation of results and the discussion of the results will be adjusted accordingly. We still expect no qualitative change, i.e. that the revised emulators will facilitate lightweight estimates of yields and their year-to-year variability, albeit with lower accuracy than originally reported.
L75-76 grammar typos
RE: Revise as suggested.
L169: CDD is consecutive dry days not drought days
RE: Revise as suggested.
L194-196: I’m not sure what a “spatial difference term” means or what the clarification of a “temporal constant growing season length” means. Is this a location fixed effect applied at the grid-cell level? Could you clearly write out an equation for the covariates going into you XGBoost model? This would help to clarify which variables are location invariant, which are time invariant, and which vary both with location and with time. Table 2 is good, but it doesn’t clarify which covariates change in space vs time. Alternatively you could add this information into Table 2.
RE: Thanks for your suggestions. We intended to say “time invariant” variables but used “spatial difference term”. And we will use “temporal constant growing season length” to indicate “the length of days from planting date to maturity date given by GGCMI phase2 crop calendar input”. Those two phrases will be corrected in the revision. Meanwhile, we will add a column to Table 2 to clarify the time/space variant information.
Line 270: remind readers what A0 and A1 simulations mean
RE: Revise as suggested. “The A0 denotes no adaptation and A1 denotes adaptation of the growing season to regain the original growing season length under warming scenarios that otherwise lead to accelerated phenology and thus shorter growing seasons.”
Table 3: consider adding a key for A0 and A1 as you have for W and Winf
RE: We will add a key for Table3 “The A0 denotes no adaptation and A1 denotes cultivar adaptation to regain original growing season length under warming scenarios.”
Minor point: While the manuscript is generally clear and well written, grammar should be checked throughout the manuscript. There are typos throughout.
RE: Thanks for pointing this out. We will check the typos throughout the manuscript. After the revision, we will seek for the help of a proof-reading service to check the grammar throughout the manuscript.
Reviewer #2
The content of this study meets the requirements of this journal, and the paper is highly complete with exquisite images. However, there are certain deficiencies in using process-based models to simulate the impacts of current climate change. Therefore, exploring the use of artificial intelligence algorithms to improve the limitations of the current process-based models is a worthwhile direction, in my opinion, and has the potential for publication.
RE: Thanks for your comments. We will revise the manuscript and respond to your comments point-by-point as follows.
Line 34: To be frank, I didn't know what "per se" meant, so I looked it up and found that it is a commonly used term in written language. However, I am unsure if it is appropriate to use it here. I have been using English for over twenty years, and this is the first time I have come across this usage.
RE: Thanks for pointing this out. We will replace the phrase “models per se” as “raw models”.
Line 59-60: I am somewhat confused about the author's statement here. The author mentions that "the relationship between climate factors and crop yield is constrained by the current climate conditions." What does this mean? Aren't both of these models also constrained by the current climate conditions?
RE: Thanks for pointing it out. We intended to say that the statistical relationship can only reproduce the historical climate-yield relationship, such that using historical climate-yield relationship to project the crop yield in future are not convincing. We will revise the sentence as “the relationship between climate factors and crop yield is based the historical climate conditions and their effects on crop yields, which can hardly be used for future projection with new, unprecedented climate conditions”.
Line 75: The readers are already aware that the scenario-based future crop yield projection is not a systematic perturbation of climate factors. What other limitations are there in the scenario-based future crop yield projection that should be mentioned here?
RE: Thanks for your suggestions. We will add some limitations of scenario-based future crop yield projection here.
Line 114: I think you can avoid mentioning soybean here. It is confusing why soybean is not included in your subsequent research.
RE: Thanks for your suggestions. We will remove the mentioning of soybean and highlighted the emulators are developed for major cereal crops.
Regarding Figure 3 and the following 1:1 graphs, while they are visually appealing, they provide limited information. We can see that the points align along the 1:1 line, but no statistical indicators are presented.
RE: Thanks for your suggestions. We will add correlation coefficient in Figure3 and Figure 6.
References
Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A., Hartig, F., Dormann, C.F., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography (Cop.). 40, 913–929. https://doi.org/10.1111/ecog.02881
Sweet, L., Müller, C., Anand, M., Zscheischler, J., 2023. Cross-validation strategy impacts the performance and interpretation of machine learning models. Artif. Intell. Earth Syst. 1–35. https://doi.org/https://doi.org/10.1175/AIES-D-23-0026.1
Citation: https://doi.org/10.5194/gmd-2023-74-AC1
Weihang Liu et al.
Weihang Liu et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
284 | 85 | 13 | 382 | 27 | 4 | 5 |
- HTML: 284
- PDF: 85
- XML: 13
- Total: 382
- Supplement: 27
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1