the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The multiple linear regression modelling algorithm ABSOLUT v1.0 for weather-based crop yield prediction and its application to Germany at district level
Abstract. ABSOLUT v1.0 is an adaptive algorithm that uses correlations between time-aggregated weather data and crop yields for yield prediction. At its core, locally (i.e. district-) specific multiple linear regressions are used to predict the annual crop yield based on four weather aggregates and a linear trend in time. In contrast to other statistical yield prediction methods, the input weather features are not predefined or based on a limited number of observed correlations but they are exhaustively tested for maximum explanatory power across all of their possible combinations in all districts of the modelling domain. Principal weather variables (such as temperature, precipitation, or sunshine duration) are aggregated over two to six consecutive months from the 12 months preceding the harvest. This gives 45 potential input features per original weather variable. In a first step, this zoo of possible input features is subset to those very probably holding explanatory power for observed yields. The second, computationally demanding step is making out-of-sample predictions for all districts with all possible combinations of the remaining features. Step three selects the seven combinations of four different weather features that have the highest explanatory power averaged over the districts. Finally, the district-specific best performing regression among these seven is used for district predictions, and the results can be spatially aggregated. To evaluate the new approach, ABSOLUT v1.0 is applied to predict the yields of ten major crops at the district level in Germany based on two decades of yield and weather data from about 300 districts. When aggregated to the national level, the predictions explain 70–90 % of the observed variance between years depending on crop type and time frame considered. District-level performance maps for winter wheat and silage maize show areas with > 40 % variance explanation covering about two thirds of the country.
- Preprint
(7127 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on gmd-2021-21', Anonymous Referee #1, 16 Jun 2021
Review of paper « The multiple linear regression modelling algorithm ABSOLUT v1.0 for weather-based crop yield prediction and its application to Germany at district level » by Tobias ConradtMajor comments:- I don’t know if GMD journal is different from the journals I usually work with, but to me, this paper is too much oriented towards the computer implementation of your codes. The organisation of you codes is not of special interest for the reader, description of the readers, etc. One line should be enough to tell the language (R), machine on what it was done, and this is enough. Because of this orientation of the paper towards a specific implementation of the code, the paper describes steps like describing a code. But in a paper, we want a synthesis, much shorter, with the methods that are used, some formulas if needed, and explain the general ideas and comments. In the current state, the paper is too much oriented towards a report I am afraid. But if the GMD is OK with this, then this is the decision of the editor.- The main point of this paper is that a complete test of all the combinations of predictors is accomplished, where a traditional step-wise regression only test some of them. I don’t think that this justifies a name for a particular method, you just need to say that you have an exhaustive search of the combinations of the predictors. This is enough. So the papier can largely be reduced.- It is claimed that it is important to test all the combinations of predictors in order to find the best model. I actually think that the approach that is used has a major problem. We actually run into similar difficulties and found recently the solution. Basically, since you are limited in your database (only 20 years for instance) then you use leave-one-out procedure to train in the database minus one sample, and test it on the left sample. This could be considered legitimate, but… by doing so, you actually chose your model (with a particular combination of predictors) based on the testing score of the LOO. So you use the testing base to select and estimate the generalisation ability of your model. We have shown that this is not correct, because you are overtraining, and your generalisation score is not reliable. When using such a procedure, your results push you to use more inputs, and more complex models. It is a good thing that you are using only a linear model, but still, your assessment of the generalisation is not reliable to my understanding. This is a very subtile thing, and many people do such a mistake. I would like to have your opinion on it, and maybe a solution.- I actually think that modeling crop yield with a statistical model from a very small database of samples is a true challenge. Crop expert actually think that many variables are important for the development of the plant, but actually, samples are just not enough to calibrate a complex statistical model (in terms of complexity or number of predictors). A true assessment of the generalisation of the errors should show you that very simple models (linear with 2 or 3 inputs) are actually what we can do the best. The search for complexity is flawed, because no large historical record of crop yield is available.Citation: https://doi.org/
10.5194/gmd-2021-21-RC1 -
AC1: 'Reply on RC1', Tobias Conradt, 17 Jun 2021
Many thanks to the anonymous reviewer for pointing out important weaknesses of the paper in its current state. Here are my replies to the four points of concern:
I don’t know if GMD journal is different from the journals I usually work with, but to me, this paper is too much oriented towards the computer implementation of your codes. […] But in a paper, we want a synthesis, much shorter, with the methods that are used, some formulas if needed, and explain the general ideas and comments. […]
A colleague of mine also wondered whether so many technical details are of interest to the reader. I just thought a “model description paper” should include a comprehensive description of the code, but I have no problem to adapt.
Suggested solution: Subsections 2.1.1, 2.1.2, 2.2.3, and the methods section 3 will be stripped from technical details of the implementation such as package versions, file names, etc., and they will eventually be merged or restructured. Figure 1 (the flowchart) should be kept, though, but in a simplified manner. To retain the technical information, a “directions for use“ leaflet containing these details will be added to the code and data repositories.
The main point of this paper is that a complete test of all the combinations of predictors is accomplished, where a traditional step-wise regression only test some of them. I don’t think that this justifies a name for a particular method, you just need to say that you have an exhaustive search of the combinations of the predictors. This is enough. So the papier can largely be reduced.
I don't agree here. As explained in the introduction, there are always so many more decisions or steps made in statistical crop yield modelling that most applications deserve a name. Method naming is probably not so common with statistical modelling compared to other disciplines, but following your argument even dynamical climate models could be downplayed to just a lot of coupled differential equations.
I really would like to call my approach ABSOLUT which not only means exhaustive search for predictors in multiple linear regressions but also refers to all the other specifics like using two-to-six-month aggregates of weather data from the twelve months ahead of harvesting, the two-step approach of the predictor selection using binomial probabilities, or the decision to apply the locally best combinations of predictors out of a small set of globally selected combinations.
Suggested solution: No change regrding the name, but I will try to give a more concise explanation of what it comprises besides the regressions instead of rhetorically asking: “But what is so special about a couple of linear regressions […] that justifies […] naming it ABSOLUT?”
It is claimed that it is important to test all the combinations of predictors in order to find the best model. I actually think that the approach that is used has a major problem. We actually run into similar difficulties and found recently the solution. Basically, since you are limited in your database (only 20 years for instance) then you use leave-one-out procedure to train in the database minus one sample, and test it on the left sample. This could be considered legitimate, but… by doing so, you actually chose your model (with a particular combination of predictors) based on the testing score of the LOO. So you use the testing base to select and estimate the generalisation ability of your model. We have shown that this is not correct, because you are overtraining, and your generalisation score is not reliable. When using such a procedure, your results push you to use more inputs, and more complex models. It is a good thing that you are using only a linear model, but still, your assessment of the generalisation is not reliable to my understanding. This is a very subtile thing, and many people do such a mistake. I would like to have your opinion on it, and maybe a solution.
Hit and sunk. You may have noticed that I was principally aware of the blotted mixing of training and testing data which only applied to the selection of the predictive features while the resulting regressions were fitted to the training data only (leave-one-out validation). Therefore I wrote “quasi-out-of-sample”, e.g. in line 373 on page 16, but I even did not do so consistently, cf. “out-of-sample errors” in line~490 on page 24.
My approach to deal with the resulting errors was sloppy, too: “[… T]he interval should be ±1.96 RMSE, but here a factor of 2.00 was used to account for the fact that all data have been used to determine the weather aggregate combinations in the regressions so that not a pure out-of-sample approach had been applied […]” (page 16, line 365ff.) I am still convinced that the possible overconfidence resulting from the incomplete separation between training and testing data is minimal, but you are right, it should not be done this way.
Of course I would like to learn about the solution you found recently, but I see the problem regarding your anonymity. Here comes my
Suggested solution: I will change the algorithm to determine also the predictor combinations separately for each single forecast/hindcast using only data from the remaining years (training set), this shall become version 1.1 of ABSOLUT. I will then repeat all the testing with the German data and adjust the paper accordingly (provided the nice validation performance does not completely vanish).
I actually think that modeling crop yield with a statistical model from a very small database of samples is a true challenge. Crop expert actually think that many variables are important for the development of the plant, but actually, samples are just not enough to calibrate a complex statistical model (in terms of complexity or number of predictors). A true assessment of the generalisation of the errors should show you that very simple models (linear with 2 or 3 inputs) are actually what we can do the best. The search for complexity is flawed, because no large historical record of crop yield is available.
Yes, the historical records are always too short or incomplete. ABSOLUT does however attempt to compensate for just that by considering many parallel realisations of the process in spatially distributed units (districts). It is not unique in that point, simple panel models are for instance based on the same idea of combining both spatial and temporal dimensions to broaden the data basis.
Finally, ABSOLUT delivers regressions with just five inputs, but I agree that using only three or four may be better in some cases. See Table~4, especially the winter wheat predictions show practically no performance gains from more than three weather inputs.
Suggested solution: The fixation to time plus exactly four weather aggregates as input variables shall be relieved in ABSOLUT v1.1. As the validation performance indicators R² and RMSE won't suffer from overconfidence biases any more the most favourable, potentially smaller number of predictors can then be determined straightforward.
Citation: https://doi.org/10.5194/gmd-2021-21-AC1
-
AC1: 'Reply on RC1', Tobias Conradt, 17 Jun 2021
-
RC2: 'Comment on gmd-2021-21', Anonymous Referee #2, 17 Sep 2021
This is a model-description paper. The weather-based crop yield prediction is a critical to yield estimate and climate change impact assessment and adaptation. Focusing on leveraging the limited input features and explanatory power is a key question to be addressed when developing a yield prediction. The major contribution of the study is, according to the author, program coding by which the linear regression with the best explanatory power could be found based on exhaustive combinations of predictors. Such a practice has been widely used in developing empirical models (258 out of 362 studies, lines 78-79), except that few study have developed a formal tool. To this end, the novelty of this study is relatively weak as compared with the standard of GMD.
Major comments:
In the “Introduction” section, author firstly provide the research gap that “the entirety of crop and landscape specific weather effects is hardly captured by the existing models” but the “crop and landscape specific weather effects” is not clear because weather effect consist of extreme adverse weather effect, weather trend and fluctuation effect, and the current linear regression model can capture the effect of annual and seasonal mean weather (Peng et al 2018). More specific information about “weather effect” is helpful to emphasize the question aimed to be addressed.
If the key question is to optimize the current linear regression model, the innovation of the study is controversial. As numerous studies have found the linear regression performs not as well as machine learning algorithms in reproducing the spatio-temporal pattern of crop yield (Leng and Hall 2020, Cao et al 2021, Cai et al 2019, Zhang et al 2020), I think optimized current linear regression is not able to improve the yield prediction accuracy significantly. Thus, I suggest to revise the research gap.
After introducing the research gap, nine questions related to yield prediction were provided but the key question aimed to be addressed in this paper is still not clear for me. Also, I think the literature review is not strongly associated with the question this paper intended to address. This study focused to on weather-based yield prediction but there were papers uniting remote sensing data and weather data to predict yield, which may be related to the question 4 but considering two much non-meteorological factors will departure the goal of weather-based yield prediction.
In the “Materials” and “Method” section, it is necessary to provide some information about the weather relevant covariates and their correlated relationship with crop yield, which can help us understand the primary response of crop yield to climate change established in the model.
Reproducing the impact of climate extremes is becoming increasingly important in yield prediction models. As the weather data are monthly, I wonder whether the ABSOLUT model can predict the effect of climate extremes, which is a major source of yield loss in Europe (Trnka et al 2014). From the results of Figure 5, the predicted model underestimates the impact of drought in Northern Germany and the district-level performance is not good.
The comparison of other prediction approaches should not be limited to the yield forecast in Germany but some other yield prediction across the major breadbaskets of the world (Cai et al 2019, Li et al 2021), making the readers understand whether the ABSOLUT is an advanced yield prediction method across the globe. Also, more information about uncertainty of the model should be addressed.
Specific comments:
Line 22: the full name should be added when the ABSOLUT first mentioned.
Line 26: the equation of the ABSOLUT should not be moved to “Method” section.
Figure 2 can demonstrate the model performance at country level but why not provide the results of maize simultaneously.
Line 160: why use some ellipsis?
References
Cai Y, Guan K, Lobell D, Potgieter A B, Wang S, Peng J, Xu T, Asseng S, Zhang Y, You L and Peng B 2019 Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches Agric. For. Meteorol. 274 144–59
Cao J, Zhang Z, Tao F, Zhang L, Luo Y, Zhang J, Han J and Xie J 2021 Integrating Multi-Source Data for Rice Yield Prediction across China using Machine Learning and Deep Learning Approaches Agric. For. Meteorol. 297
Leng G and Hall J W 2020 Predicting spatial and temporal variability in crop yields: An inter-comparison of machine learning, regression and process-based models Environ. Res. Lett. 15 44027
Li L, Wang B, Feng P, Wang H, He Q, Wang Y, Liu D L, Li Y, He J, Feng H, Yang G and Yu Q 2021 Crop yield forecasting and associated optimum lead time analysis based on multi-source environmental data across China Agric. For. Meteorol. 308–309 108558
Peng B, Guan K, Pan M and Li Y 2018 Benefits of Seasonal Climate Prediction and Satellite Data for Forecasting U.S. Maize Yield Geophys. Res. Lett. 45 9662–71 Online: https://doi.org/10.1029/2018GL079291
Trnka M, Rötter R P, Ruiz-Ramos M, Kersebaum K C, Olesen J E, Žalud Z and Semenov M A 2014 Adverse weather conditions for European wheat production will become more frequent with climate change Nat. Clim. Chang. 4 637–43
Zhang L, Zhang Z, Luo Y, Cao J and Tao F 2020 Combining optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield in China using machine learning approaches Remote Sens. 12
Citation: https://doi.org/10.5194/gmd-2021-21-RC2 -
AC2: 'Reply on RC2', Tobias Conradt, 24 Sep 2021
Many thanks also to the second anonymous referee for valuable input! Here are my replies, after the referee's comments highlighted in bold:
In the “Introduction” section, author firstly provide the research gap that “the entirety of crop and landscape specific weather effects is hardly captured by the existing models” but the “crop and landscape specific weather effects” is not clear because weather effect consist of extreme adverse weather effect, weather trend and fluctuation effect, and the current linear regression model can capture the effect of annual and seasonal mean weather (Peng et al 2018). More specific information about “weather effect” is helpful to emphasize the question aimed to be addressed.
Agreed. Especially yield losses caused by weather extremes are hardly mirrored in modelling yet, and this gap is indeed not targeted by the present study. I propose to change the sentence in question – lines 21f, page 1 – into: “The full spectrum of potentially yield-relevant meteorological averages in varying seasonal time windows is however rarely scrutinized by the existing models; the same holds for landscape-specific weather response patterns of different crops.”
If the key question is to optimize the current linear regression model, the innovation of the study is controversial. As numerous studies have found the linear regression performs not as well as machine learning algorithms in reproducing the spatio-temporal pattern of crop yield (Leng and Hall 2020, Cao et al 2021, Cai et al 2019, Zhang et al 2020), I think optimized current linear regression is not able to improve the yield prediction accuracy significantly. Thus, I suggest to revise the research gap.
In the comparison of the model performance to a linear model with pre-defined input weather aggregates the performance gain seemed to be massive, see Section 4.6.2 with Figs 7–9. As I will show in my final comment, this was, as Reviewer no. 1 had already suspected, largely owing to using out-of-sample data for the automated regression variables selection. I wonder whether less constrained machine learning is prone to fall into a similar trap as the highly specific, nonlinear patterns it identifies are probably even harder to uniformly reproduce from different sets of input years. Anyhow, my hypothesis for this paper is that the full potential of linear regression models is regularly not maxed out because of sub-optimal input factor selections. I will clarify that in the introduction and also set my exhaustive testing approach in perspective to the recent machine learning studies – thanks for the references!
After introducing the research gap, nine questions related to yield prediction were provided but the key question aimed to be addressed in this paper is still not clear for me. Also, I think the literature review is not strongly associated with the question this paper intended to address. This study focused to on weather-based yield prediction but there were papers uniting remote sensing data and weather data to predict yield, which may be related to the question 4 but considering two much non-meteorological factors will departure the goal of weather-based yield prediction.
The nine questions were not meant to define the research question, they should just illustrate how many decisions are left to the modeller applying “multiple linear regression” making any resulting model setup highly individual. The formatting of the questions as numbered list lets them however stand out in an unfortunate way. I will blend them in the normal text paragraphs and emphasize the research question(s) instead.
In the “Materials” and “Method” section, it is necessary to provide some information about the weather relevant covariates and their correlated relationship with crop yield, which can help us understand the primary response of crop yield to climate change established in the model.
This could only be done in a very general way, because there are dozens of candidate variables (time-averages of meteorological variables over different seasonal periods), and their predictive power varies between crops and regions. The setup described in the Methods section is deliberately designed to start without any prior information about variable–yield correlations.
Some observations from the Germany example can however be discussed in the following sections: The overhauled version of the model shows a clear difference in predictive power between Western and Eastern Germany. It only works satisfactory in the east where crop performance is much more limited by water availability. However, not only precipitation but also temperatures and sunshine are correlated to water availability through evapotranspiration. High temperatures in early summer negatively affect crop fertility, this is relevant in the south-west but not so in Northern Germany.
Reproducing the impact of climate extremes is becoming increasingly important in yield prediction models. As the weather data are monthly, I wonder whether the ABSOLUT model can predict the effect of climate extremes, which is a major source of yield loss in Europe (Trnka et al 2014). From the results of Figure 5, the predicted model underestimates the impact of drought in Northern Germany and the district-level performance is not good.
Drought, unlike extreme precipitation, is very well represented in monthly weather data. It is true that the errors in individual districts are high – which is related to the high variability in soils and landscape characteristics. The regional focus of the drought-related yield losses was correctly determined though, and this has been reflected in the text, cf. lines 431–434 on page 19 and 452–454 on page 20. The general underestimation of the drought impact is more likely caused by the absence of a comparable drought situation in the other years on whose observation the 2018 prediction is based on, cf. lines 458f on page 21.
The comparison of other prediction approaches should not be limited to the yield forecast in Germany but some other yield prediction across the major breadbaskets of the world (Cai et al 2019, Li et al 2021), making the readers understand whether the ABSOLUT is an advanced yield prediction method across the globe. Also, more information about uncertainty of the model should be addressed.
I do agree, but such an extensive evaluation would not fit any more into this model description paper which already seems a bit long. As GMD state in their guidelines for model description papers (https://www.geoscientific-model-development.net/about/manuscript_types.html#item1): “Where evaluation is very extensive, a separate paper focussed solely on this aspect may be submitted.”
Specific comments:
Line 22: the full name should be added when the ABSOLUT first mentioned.
OK, no problem.
Line 26: the equation of the ABSOLUT should not be moved to “Method” section.
Should be or should not be? I guess there was originally something like “should not appear in the Introduction” in the editing process, so I will move the equation to the Methods where it indeed seems better placed.
Figure 2 can demonstrate the model performance at country level but why not provide the results of maize simultaneously.
Good idea which I will follow. I will also juxtapose the ABSOLUT results to the ones from the Gornott and Wechung approach for direct comparison in Figure 7.
Line 160: why use some ellipsis?
The ellipsis is meant to supplement the original German title in italics with the English translation. The same format is used in line 158.
* * *
Citation: https://doi.org/10.5194/gmd-2021-21-AC2
-
AC2: 'Reply on RC2', Tobias Conradt, 24 Sep 2021
-
AC3: 'Final Author Comment on gmd-2021-21', Tobias Conradt, 24 Sep 2021
A first remarkable output of the overhauled ABSOLUT 1.1 version
The big difference to the 1.0 version on which the preprint under discussion was based is the selection of input features (weather aggregates) in an out-of-sample manner. In the former 1.0 version individual input feature combinations were selected and fixed for each district only once – based on all available yield data –, and only the coefficient and yield estimates were based on censored data not including the yields of the actual year to predict (out-of-sample). The overhauled 1.1. version determines also the variable combinations for the district regressions separately for each target year, using only observations from the remaining years.
Performance losses had to be expected, but they are quite dramatical. The triple-map graphic…
[full-resolution PDF attached]
…is an extension of Figure 8 in the preprint visualizing shares of winter wheat yield variances explained out-of-sample on district level. The left and middle panel repeat the original figure, these are the performance maps for the Gornott and Wechsung and the ABSOLUT 1.0 approch, the latter being only quasi out-of-sample. The right panel shows what remains for the true out-of-sample estimations of ABSOLUT 1.1 – at least still a little better than the Gornott and Wechsung performance.
I think this is highly instructive, and I will still show and discuss the bogus performance of ABSOLUT 1.0 in the revised manuscript: The choice of explaining variables is as relevant for the observed model performance as is the actual information content of the available regressors. It is possible that many weather-based yield prediction approaches are in fact less accurate than reported in the literature because the available training data are rarely censored already in the model development phase. This could also explain the obvious performance losses of the Gornott and Wechsung approach compared to the results presented in their 2016 paper.
As this point is so central – thanks again to referee no. 1 for breaking the window – I suggest the altered title: “Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.1 – Initial tests for the districts of Germany and an over-confidence trap in statistical modelling.” Completing the extensive revision of the manuscript will take some extra time, but I hope it can still be finally published in GMD.
Status: closed
-
RC1: 'Comment on gmd-2021-21', Anonymous Referee #1, 16 Jun 2021
Review of paper « The multiple linear regression modelling algorithm ABSOLUT v1.0 for weather-based crop yield prediction and its application to Germany at district level » by Tobias ConradtMajor comments:- I don’t know if GMD journal is different from the journals I usually work with, but to me, this paper is too much oriented towards the computer implementation of your codes. The organisation of you codes is not of special interest for the reader, description of the readers, etc. One line should be enough to tell the language (R), machine on what it was done, and this is enough. Because of this orientation of the paper towards a specific implementation of the code, the paper describes steps like describing a code. But in a paper, we want a synthesis, much shorter, with the methods that are used, some formulas if needed, and explain the general ideas and comments. In the current state, the paper is too much oriented towards a report I am afraid. But if the GMD is OK with this, then this is the decision of the editor.- The main point of this paper is that a complete test of all the combinations of predictors is accomplished, where a traditional step-wise regression only test some of them. I don’t think that this justifies a name for a particular method, you just need to say that you have an exhaustive search of the combinations of the predictors. This is enough. So the papier can largely be reduced.- It is claimed that it is important to test all the combinations of predictors in order to find the best model. I actually think that the approach that is used has a major problem. We actually run into similar difficulties and found recently the solution. Basically, since you are limited in your database (only 20 years for instance) then you use leave-one-out procedure to train in the database minus one sample, and test it on the left sample. This could be considered legitimate, but… by doing so, you actually chose your model (with a particular combination of predictors) based on the testing score of the LOO. So you use the testing base to select and estimate the generalisation ability of your model. We have shown that this is not correct, because you are overtraining, and your generalisation score is not reliable. When using such a procedure, your results push you to use more inputs, and more complex models. It is a good thing that you are using only a linear model, but still, your assessment of the generalisation is not reliable to my understanding. This is a very subtile thing, and many people do such a mistake. I would like to have your opinion on it, and maybe a solution.- I actually think that modeling crop yield with a statistical model from a very small database of samples is a true challenge. Crop expert actually think that many variables are important for the development of the plant, but actually, samples are just not enough to calibrate a complex statistical model (in terms of complexity or number of predictors). A true assessment of the generalisation of the errors should show you that very simple models (linear with 2 or 3 inputs) are actually what we can do the best. The search for complexity is flawed, because no large historical record of crop yield is available.Citation: https://doi.org/
10.5194/gmd-2021-21-RC1 -
AC1: 'Reply on RC1', Tobias Conradt, 17 Jun 2021
Many thanks to the anonymous reviewer for pointing out important weaknesses of the paper in its current state. Here are my replies to the four points of concern:
I don’t know if GMD journal is different from the journals I usually work with, but to me, this paper is too much oriented towards the computer implementation of your codes. […] But in a paper, we want a synthesis, much shorter, with the methods that are used, some formulas if needed, and explain the general ideas and comments. […]
A colleague of mine also wondered whether so many technical details are of interest to the reader. I just thought a “model description paper” should include a comprehensive description of the code, but I have no problem to adapt.
Suggested solution: Subsections 2.1.1, 2.1.2, 2.2.3, and the methods section 3 will be stripped from technical details of the implementation such as package versions, file names, etc., and they will eventually be merged or restructured. Figure 1 (the flowchart) should be kept, though, but in a simplified manner. To retain the technical information, a “directions for use“ leaflet containing these details will be added to the code and data repositories.
The main point of this paper is that a complete test of all the combinations of predictors is accomplished, where a traditional step-wise regression only test some of them. I don’t think that this justifies a name for a particular method, you just need to say that you have an exhaustive search of the combinations of the predictors. This is enough. So the papier can largely be reduced.
I don't agree here. As explained in the introduction, there are always so many more decisions or steps made in statistical crop yield modelling that most applications deserve a name. Method naming is probably not so common with statistical modelling compared to other disciplines, but following your argument even dynamical climate models could be downplayed to just a lot of coupled differential equations.
I really would like to call my approach ABSOLUT which not only means exhaustive search for predictors in multiple linear regressions but also refers to all the other specifics like using two-to-six-month aggregates of weather data from the twelve months ahead of harvesting, the two-step approach of the predictor selection using binomial probabilities, or the decision to apply the locally best combinations of predictors out of a small set of globally selected combinations.
Suggested solution: No change regrding the name, but I will try to give a more concise explanation of what it comprises besides the regressions instead of rhetorically asking: “But what is so special about a couple of linear regressions […] that justifies […] naming it ABSOLUT?”
It is claimed that it is important to test all the combinations of predictors in order to find the best model. I actually think that the approach that is used has a major problem. We actually run into similar difficulties and found recently the solution. Basically, since you are limited in your database (only 20 years for instance) then you use leave-one-out procedure to train in the database minus one sample, and test it on the left sample. This could be considered legitimate, but… by doing so, you actually chose your model (with a particular combination of predictors) based on the testing score of the LOO. So you use the testing base to select and estimate the generalisation ability of your model. We have shown that this is not correct, because you are overtraining, and your generalisation score is not reliable. When using such a procedure, your results push you to use more inputs, and more complex models. It is a good thing that you are using only a linear model, but still, your assessment of the generalisation is not reliable to my understanding. This is a very subtile thing, and many people do such a mistake. I would like to have your opinion on it, and maybe a solution.
Hit and sunk. You may have noticed that I was principally aware of the blotted mixing of training and testing data which only applied to the selection of the predictive features while the resulting regressions were fitted to the training data only (leave-one-out validation). Therefore I wrote “quasi-out-of-sample”, e.g. in line 373 on page 16, but I even did not do so consistently, cf. “out-of-sample errors” in line~490 on page 24.
My approach to deal with the resulting errors was sloppy, too: “[… T]he interval should be ±1.96 RMSE, but here a factor of 2.00 was used to account for the fact that all data have been used to determine the weather aggregate combinations in the regressions so that not a pure out-of-sample approach had been applied […]” (page 16, line 365ff.) I am still convinced that the possible overconfidence resulting from the incomplete separation between training and testing data is minimal, but you are right, it should not be done this way.
Of course I would like to learn about the solution you found recently, but I see the problem regarding your anonymity. Here comes my
Suggested solution: I will change the algorithm to determine also the predictor combinations separately for each single forecast/hindcast using only data from the remaining years (training set), this shall become version 1.1 of ABSOLUT. I will then repeat all the testing with the German data and adjust the paper accordingly (provided the nice validation performance does not completely vanish).
I actually think that modeling crop yield with a statistical model from a very small database of samples is a true challenge. Crop expert actually think that many variables are important for the development of the plant, but actually, samples are just not enough to calibrate a complex statistical model (in terms of complexity or number of predictors). A true assessment of the generalisation of the errors should show you that very simple models (linear with 2 or 3 inputs) are actually what we can do the best. The search for complexity is flawed, because no large historical record of crop yield is available.
Yes, the historical records are always too short or incomplete. ABSOLUT does however attempt to compensate for just that by considering many parallel realisations of the process in spatially distributed units (districts). It is not unique in that point, simple panel models are for instance based on the same idea of combining both spatial and temporal dimensions to broaden the data basis.
Finally, ABSOLUT delivers regressions with just five inputs, but I agree that using only three or four may be better in some cases. See Table~4, especially the winter wheat predictions show practically no performance gains from more than three weather inputs.
Suggested solution: The fixation to time plus exactly four weather aggregates as input variables shall be relieved in ABSOLUT v1.1. As the validation performance indicators R² and RMSE won't suffer from overconfidence biases any more the most favourable, potentially smaller number of predictors can then be determined straightforward.
Citation: https://doi.org/10.5194/gmd-2021-21-AC1
-
AC1: 'Reply on RC1', Tobias Conradt, 17 Jun 2021
-
RC2: 'Comment on gmd-2021-21', Anonymous Referee #2, 17 Sep 2021
This is a model-description paper. The weather-based crop yield prediction is a critical to yield estimate and climate change impact assessment and adaptation. Focusing on leveraging the limited input features and explanatory power is a key question to be addressed when developing a yield prediction. The major contribution of the study is, according to the author, program coding by which the linear regression with the best explanatory power could be found based on exhaustive combinations of predictors. Such a practice has been widely used in developing empirical models (258 out of 362 studies, lines 78-79), except that few study have developed a formal tool. To this end, the novelty of this study is relatively weak as compared with the standard of GMD.
Major comments:
In the “Introduction” section, author firstly provide the research gap that “the entirety of crop and landscape specific weather effects is hardly captured by the existing models” but the “crop and landscape specific weather effects” is not clear because weather effect consist of extreme adverse weather effect, weather trend and fluctuation effect, and the current linear regression model can capture the effect of annual and seasonal mean weather (Peng et al 2018). More specific information about “weather effect” is helpful to emphasize the question aimed to be addressed.
If the key question is to optimize the current linear regression model, the innovation of the study is controversial. As numerous studies have found the linear regression performs not as well as machine learning algorithms in reproducing the spatio-temporal pattern of crop yield (Leng and Hall 2020, Cao et al 2021, Cai et al 2019, Zhang et al 2020), I think optimized current linear regression is not able to improve the yield prediction accuracy significantly. Thus, I suggest to revise the research gap.
After introducing the research gap, nine questions related to yield prediction were provided but the key question aimed to be addressed in this paper is still not clear for me. Also, I think the literature review is not strongly associated with the question this paper intended to address. This study focused to on weather-based yield prediction but there were papers uniting remote sensing data and weather data to predict yield, which may be related to the question 4 but considering two much non-meteorological factors will departure the goal of weather-based yield prediction.
In the “Materials” and “Method” section, it is necessary to provide some information about the weather relevant covariates and their correlated relationship with crop yield, which can help us understand the primary response of crop yield to climate change established in the model.
Reproducing the impact of climate extremes is becoming increasingly important in yield prediction models. As the weather data are monthly, I wonder whether the ABSOLUT model can predict the effect of climate extremes, which is a major source of yield loss in Europe (Trnka et al 2014). From the results of Figure 5, the predicted model underestimates the impact of drought in Northern Germany and the district-level performance is not good.
The comparison of other prediction approaches should not be limited to the yield forecast in Germany but some other yield prediction across the major breadbaskets of the world (Cai et al 2019, Li et al 2021), making the readers understand whether the ABSOLUT is an advanced yield prediction method across the globe. Also, more information about uncertainty of the model should be addressed.
Specific comments:
Line 22: the full name should be added when the ABSOLUT first mentioned.
Line 26: the equation of the ABSOLUT should not be moved to “Method” section.
Figure 2 can demonstrate the model performance at country level but why not provide the results of maize simultaneously.
Line 160: why use some ellipsis?
References
Cai Y, Guan K, Lobell D, Potgieter A B, Wang S, Peng J, Xu T, Asseng S, Zhang Y, You L and Peng B 2019 Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches Agric. For. Meteorol. 274 144–59
Cao J, Zhang Z, Tao F, Zhang L, Luo Y, Zhang J, Han J and Xie J 2021 Integrating Multi-Source Data for Rice Yield Prediction across China using Machine Learning and Deep Learning Approaches Agric. For. Meteorol. 297
Leng G and Hall J W 2020 Predicting spatial and temporal variability in crop yields: An inter-comparison of machine learning, regression and process-based models Environ. Res. Lett. 15 44027
Li L, Wang B, Feng P, Wang H, He Q, Wang Y, Liu D L, Li Y, He J, Feng H, Yang G and Yu Q 2021 Crop yield forecasting and associated optimum lead time analysis based on multi-source environmental data across China Agric. For. Meteorol. 308–309 108558
Peng B, Guan K, Pan M and Li Y 2018 Benefits of Seasonal Climate Prediction and Satellite Data for Forecasting U.S. Maize Yield Geophys. Res. Lett. 45 9662–71 Online: https://doi.org/10.1029/2018GL079291
Trnka M, Rötter R P, Ruiz-Ramos M, Kersebaum K C, Olesen J E, Žalud Z and Semenov M A 2014 Adverse weather conditions for European wheat production will become more frequent with climate change Nat. Clim. Chang. 4 637–43
Zhang L, Zhang Z, Luo Y, Cao J and Tao F 2020 Combining optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield in China using machine learning approaches Remote Sens. 12
Citation: https://doi.org/10.5194/gmd-2021-21-RC2 -
AC2: 'Reply on RC2', Tobias Conradt, 24 Sep 2021
Many thanks also to the second anonymous referee for valuable input! Here are my replies, after the referee's comments highlighted in bold:
In the “Introduction” section, author firstly provide the research gap that “the entirety of crop and landscape specific weather effects is hardly captured by the existing models” but the “crop and landscape specific weather effects” is not clear because weather effect consist of extreme adverse weather effect, weather trend and fluctuation effect, and the current linear regression model can capture the effect of annual and seasonal mean weather (Peng et al 2018). More specific information about “weather effect” is helpful to emphasize the question aimed to be addressed.
Agreed. Especially yield losses caused by weather extremes are hardly mirrored in modelling yet, and this gap is indeed not targeted by the present study. I propose to change the sentence in question – lines 21f, page 1 – into: “The full spectrum of potentially yield-relevant meteorological averages in varying seasonal time windows is however rarely scrutinized by the existing models; the same holds for landscape-specific weather response patterns of different crops.”
If the key question is to optimize the current linear regression model, the innovation of the study is controversial. As numerous studies have found the linear regression performs not as well as machine learning algorithms in reproducing the spatio-temporal pattern of crop yield (Leng and Hall 2020, Cao et al 2021, Cai et al 2019, Zhang et al 2020), I think optimized current linear regression is not able to improve the yield prediction accuracy significantly. Thus, I suggest to revise the research gap.
In the comparison of the model performance to a linear model with pre-defined input weather aggregates the performance gain seemed to be massive, see Section 4.6.2 with Figs 7–9. As I will show in my final comment, this was, as Reviewer no. 1 had already suspected, largely owing to using out-of-sample data for the automated regression variables selection. I wonder whether less constrained machine learning is prone to fall into a similar trap as the highly specific, nonlinear patterns it identifies are probably even harder to uniformly reproduce from different sets of input years. Anyhow, my hypothesis for this paper is that the full potential of linear regression models is regularly not maxed out because of sub-optimal input factor selections. I will clarify that in the introduction and also set my exhaustive testing approach in perspective to the recent machine learning studies – thanks for the references!
After introducing the research gap, nine questions related to yield prediction were provided but the key question aimed to be addressed in this paper is still not clear for me. Also, I think the literature review is not strongly associated with the question this paper intended to address. This study focused to on weather-based yield prediction but there were papers uniting remote sensing data and weather data to predict yield, which may be related to the question 4 but considering two much non-meteorological factors will departure the goal of weather-based yield prediction.
The nine questions were not meant to define the research question, they should just illustrate how many decisions are left to the modeller applying “multiple linear regression” making any resulting model setup highly individual. The formatting of the questions as numbered list lets them however stand out in an unfortunate way. I will blend them in the normal text paragraphs and emphasize the research question(s) instead.
In the “Materials” and “Method” section, it is necessary to provide some information about the weather relevant covariates and their correlated relationship with crop yield, which can help us understand the primary response of crop yield to climate change established in the model.
This could only be done in a very general way, because there are dozens of candidate variables (time-averages of meteorological variables over different seasonal periods), and their predictive power varies between crops and regions. The setup described in the Methods section is deliberately designed to start without any prior information about variable–yield correlations.
Some observations from the Germany example can however be discussed in the following sections: The overhauled version of the model shows a clear difference in predictive power between Western and Eastern Germany. It only works satisfactory in the east where crop performance is much more limited by water availability. However, not only precipitation but also temperatures and sunshine are correlated to water availability through evapotranspiration. High temperatures in early summer negatively affect crop fertility, this is relevant in the south-west but not so in Northern Germany.
Reproducing the impact of climate extremes is becoming increasingly important in yield prediction models. As the weather data are monthly, I wonder whether the ABSOLUT model can predict the effect of climate extremes, which is a major source of yield loss in Europe (Trnka et al 2014). From the results of Figure 5, the predicted model underestimates the impact of drought in Northern Germany and the district-level performance is not good.
Drought, unlike extreme precipitation, is very well represented in monthly weather data. It is true that the errors in individual districts are high – which is related to the high variability in soils and landscape characteristics. The regional focus of the drought-related yield losses was correctly determined though, and this has been reflected in the text, cf. lines 431–434 on page 19 and 452–454 on page 20. The general underestimation of the drought impact is more likely caused by the absence of a comparable drought situation in the other years on whose observation the 2018 prediction is based on, cf. lines 458f on page 21.
The comparison of other prediction approaches should not be limited to the yield forecast in Germany but some other yield prediction across the major breadbaskets of the world (Cai et al 2019, Li et al 2021), making the readers understand whether the ABSOLUT is an advanced yield prediction method across the globe. Also, more information about uncertainty of the model should be addressed.
I do agree, but such an extensive evaluation would not fit any more into this model description paper which already seems a bit long. As GMD state in their guidelines for model description papers (https://www.geoscientific-model-development.net/about/manuscript_types.html#item1): “Where evaluation is very extensive, a separate paper focussed solely on this aspect may be submitted.”
Specific comments:
Line 22: the full name should be added when the ABSOLUT first mentioned.
OK, no problem.
Line 26: the equation of the ABSOLUT should not be moved to “Method” section.
Should be or should not be? I guess there was originally something like “should not appear in the Introduction” in the editing process, so I will move the equation to the Methods where it indeed seems better placed.
Figure 2 can demonstrate the model performance at country level but why not provide the results of maize simultaneously.
Good idea which I will follow. I will also juxtapose the ABSOLUT results to the ones from the Gornott and Wechung approach for direct comparison in Figure 7.
Line 160: why use some ellipsis?
The ellipsis is meant to supplement the original German title in italics with the English translation. The same format is used in line 158.
* * *
Citation: https://doi.org/10.5194/gmd-2021-21-AC2
-
AC2: 'Reply on RC2', Tobias Conradt, 24 Sep 2021
-
AC3: 'Final Author Comment on gmd-2021-21', Tobias Conradt, 24 Sep 2021
A first remarkable output of the overhauled ABSOLUT 1.1 version
The big difference to the 1.0 version on which the preprint under discussion was based is the selection of input features (weather aggregates) in an out-of-sample manner. In the former 1.0 version individual input feature combinations were selected and fixed for each district only once – based on all available yield data –, and only the coefficient and yield estimates were based on censored data not including the yields of the actual year to predict (out-of-sample). The overhauled 1.1. version determines also the variable combinations for the district regressions separately for each target year, using only observations from the remaining years.
Performance losses had to be expected, but they are quite dramatical. The triple-map graphic…
[full-resolution PDF attached]
…is an extension of Figure 8 in the preprint visualizing shares of winter wheat yield variances explained out-of-sample on district level. The left and middle panel repeat the original figure, these are the performance maps for the Gornott and Wechsung and the ABSOLUT 1.0 approch, the latter being only quasi out-of-sample. The right panel shows what remains for the true out-of-sample estimations of ABSOLUT 1.1 – at least still a little better than the Gornott and Wechsung performance.
I think this is highly instructive, and I will still show and discuss the bogus performance of ABSOLUT 1.0 in the revised manuscript: The choice of explaining variables is as relevant for the observed model performance as is the actual information content of the available regressors. It is possible that many weather-based yield prediction approaches are in fact less accurate than reported in the literature because the available training data are rarely censored already in the model development phase. This could also explain the obvious performance losses of the Gornott and Wechsung approach compared to the results presented in their 2016 paper.
As this point is so central – thanks again to referee no. 1 for breaking the window – I suggest the altered title: “Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.1 – Initial tests for the districts of Germany and an over-confidence trap in statistical modelling.” Completing the extensive revision of the manuscript will take some extra time, but I hope it can still be finally published in GMD.
Data sets
ABSOLUT v.1.0 Input data for an example application on the districts of Germany Tobias Conradt https://doi.org/10.5281/zenodo.4468691
Model code and software
ABSOLUT v.1.0 R programs Tobias Conradt https://doi.org/10.5281/zenodo.4468609
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
853 | 569 | 58 | 1,480 | 33 | 41 |
- HTML: 853
- PDF: 569
- XML: 58
- Total: 1,480
- BibTeX: 33
- EndNote: 41
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
2 citations as recorded by crossref.
- Modeling crop yields amidst climate change in the Nile basin (2040–2079) S. Ahmed 10.1007/s40808-021-01199-0
- Climate variability impacts on crop yields and agriculture contributions to gross domestic products in the Nile basin (1961–2016): What did deep machine learning algorithms tell us? S. Ahmed et al. 10.1007/s00704-024-04858-1