The multiple linear regression modelling algorithm ABSOLUT v1.0 for weather-based crop yield prediction and its application to Germany at district level
Abstract. ABSOLUT v1.0 is an adaptive algorithm that uses correlations between time-aggregated weather data and crop yields for yield prediction. At its core, locally (i.e. district-) specific multiple linear regressions are used to predict the annual crop yield based on four weather aggregates and a linear trend in time. In contrast to other statistical yield prediction methods, the input weather features are not predefined or based on a limited number of observed correlations but they are exhaustively tested for maximum explanatory power across all of their possible combinations in all districts of the modelling domain. Principal weather variables (such as temperature, precipitation, or sunshine duration) are aggregated over two to six consecutive months from the 12 months preceding the harvest. This gives 45 potential input features per original weather variable. In a first step, this zoo of possible input features is subset to those very probably holding explanatory power for observed yields. The second, computationally demanding step is making out-of-sample predictions for all districts with all possible combinations of the remaining features. Step three selects the seven combinations of four different weather features that have the highest explanatory power averaged over the districts. Finally, the district-specific best performing regression among these seven is used for district predictions, and the results can be spatially aggregated. To evaluate the new approach, ABSOLUT v1.0 is applied to predict the yields of ten major crops at the district level in Germany based on two decades of yield and weather data from about 300 districts. When aggregated to the national level, the predictions explain 70–90 % of the observed variance between years depending on crop type and time frame considered. District-level performance maps for winter wheat and silage maize show areas with > 40 % variance explanation covering about two thirds of the country.
ABSOLUT v.1.0 Input data for an example application on the districts of Germany https://doi.org/10.5281/zenodo.4468691
Model code and software
ABSOLUT v.1.0 R programs https://doi.org/10.5281/zenodo.4468609
Viewed (geographical distribution)