The multiple linear regression modelling algorithm ABSOLUT v1.0 for weather-based crop yield prediction and its application to Germany at district level

ABSOLUT v1.0 is an adaptive algorithm that uses correlations between time-aggregated weather data and crop yields for yield prediction. At its core, locally (i.e. district-) specific multiple linear regressions are used to predict the annual crop yield based on four weather aggregates and a linear trend in time. In contrast to other statistical yield prediction methods, the input weather features are not predefined or based on a limited number of observed correlations but they are exhaustively tested for maximum explanatory power across all of their possible combinations in all districts of the modelling domain. 5 Principal weather variables (such as temperature, precipitation, or sunshine duration) are aggregated over two to six consecutive months from the 12 months preceding the harvest. This gives 45 potential input features per original weather variable. In a first step, this zoo of possible input features is subset to those very probably holding explanatory power for observed yields. The second, computationally demanding step is making out-of-sample predictions for all districts with all possible combinations of the remaining features. Step three selects the seven combinations of four different weather features that have the highest 10 explanatory power averaged over the districts. Finally, the district-specific best performing regression among these seven is used for district predictions, and the results can be spatially aggregated. To evaluate the new approach, ABSOLUT v1.0 is applied to predict the yields of ten major crops at the district level in Germany based on two decades of yield and weather data from about 300 districts. When aggregated to the national level, the predictions explain 70–90 % of the observed variance between years depending on crop type and time frame considered. District-level performance maps for winter wheat and silage 15 maize show areas with >40 % variance explanation covering about two thirds of the country.

2 Materials 2.1 General requirements 2.1.1 Hard-and software 85 As ABSOLUT builds on exhaustive searches for optimal feature combinations in linear regressions, parallel execution of the code on a cluster computer using several dozen cores is advisable. A state-of-the-art single PC or notebook would probably need 2-3 days for the Germany example given all CPU cores (usually four) are hooked up.
The R software (https://www.r-project.org/) is required in version 3.5.1 or newer, and the following R packages must be in- Finally, the ncdf4 package (v.1.16.1) can be useful for preprocessing gridded weather data in NetCDF-format within R, it however requires a NetCDF system library.

Input data
Any ABSOLUT application needs a domain divided into many spatial subunits, henceforth called districts, for which there are individual crop yield time series and monthly weather data available. Longer yield time series are also required, and a 100 minimum requirement can be set to automatically exclude counties lacking enough data points from the calculation. Large numbers of districts (ideally more than 100) and years with yield data (preferably more than 20) are required for the selection of valid regression feature combinations (cf. sections 3.1.2 and 3.3).
Monthly weather data are needed for each district and should spatially correspond to the agricultural areas within the them.
The weather data should start at least one year before the yield records start and end not earlier than with the growing season 105 of the final year covered by the yield data, otherwise not all yield information can be considered. In detail, the modeller fixes the last calendar month whose weather data are to be considered for the growing seasons, and the model will evaluate weather-yield correlations in the twelve-month periods ending with this month. For example, if there are yield data for the years 1996-2015 and this "cut-off month" is set to May, the weather data must cover the period from June 1995 to May 2015 to utilize the complete yield data. 110 As it is not necessary to include the very end of the growing season (the actual harvest dates shift between years anyway), a timely cut-off and weather data reaching far enough into the current year can be used for pre-harvest yield forecasts (cf. Schauberger et al., 2017b, regarding the effects of shortened weather input). Eventual additional input of seasonal weather forecasts or climate scenario data may serve as basis for extended forward guidance or climate change impact studies. coastal and higher elevated regions in the west where 1000 mm year −1 and more are not an exception. Extreme rainfall events and hailstorms can affect agriculture in all parts of the country but have been observed more frequently in the south.
The most important factor for different cropping conditions is however a highly differentiated soil landscape: region-specific orogenetic and erosive processes including a number of glacial formations in Northern Germany and near the Alps provided the full spectrum of substrates from clays to sands and peat soils (cf. European Soil Bureau Network, 2005;Richter et al., 150 2007;Hennings, 2013). The challenging variety of regional cropping conditions and a decent availability of weather and yield observations qualify Germany as test environment for any kind of weather-related crop yield modelling.
Crop yield data were provided by the Statistical Offices of Germany through regional statistics table 41241-01-03-4 Erträge ausgewählter landwirtschaftlicher Feldfrüchte (yields of selected crops) accessible via https://www.regionalstatistik.de. A download made in May 2020 contained national, state, and district level data for ten crop species in the harvest years 1999-2019.
Additionally, table 41141-02-02-4 Anbau auf dem Ackerland . . . nach Fruchtarten (crop growing areas . . . by crop type) was 160 obtained to use the areas reported for 2016 as weighting factors in aggregating the estimated district yields to state and national averages.
Monthly weather data were provided by the German Weather Service (DWD) through their CDC OpenData portal. Point of departure were the 1-km grid products available via http://opendata.dwd.de/climate_environment/CDC/grids_germany/ monthly/. Last access was in September 2020 to complete continuous time series from January 1991 to August 2020. 165 The spatial distribution of agricultural areas within the districts has to be taken into account for determining the locally relevant weather conditions, especially for districts including both an agricultural lowland and a mountainous part (Conradt et al., 2016). The 2012 CORINE Land Cover data were used for this purpose, namely the 100-m raster product at version v20 (release date 1 May 2019) obtained from the Copernicus Land Monitoring Service via https://land.copernicus.eu/pan-european/ corine-land-cover/clc-2012?tab=download.

Preprocessing and actual input data
A special preprocessing action was required due to a history of district reorganisations (major ones took place in Eastern Germany in 2007Germany in , 2008Germany in , and 2011. For some federal states the original yield table contained data still based on historical district geometries while pre-reorganisation data from other states had already been resampled to the current geometries. A spatial homogenisation adjusted everything to the 2018 administrative geometries. Special care was taken where old districts 175 merged into new ones differed in their agricultural area shares or the delineation of new districts was only partly based on former boundaries. Spatial resampling of weather data to the agricultural areas within the districts is a preprocessing effort usually required for any application of ABSOLUT. One possibility would have been interpolating weather station data to area centroids (cf. Conradt et al., 2016), but here weighted averages of the DWD grid cells overlapping the croplands were used. First, 401 binary 180 raster maps (one per district) were made with cell values of one indicating cropland (CLC code 211 = non-irrigated arable land) located within the district and zero elsewhere. These maps were then resampled by averaging the cell values to the coarser 1-km grid of the DWD weather data in another map projection. For that operation the gdalwarp utility of GDAL was used with the -r average switch. The output grids were interpreted by an R script as weight matrices for resampling the gridded weather data. This worked for practically all districts, even cities with small shares of agriculture. Only five districts in the vicinity of 185 the Alps, the city of Suhl, and the forest-dominated mountainous domain of Siegen-Wittgenstein contained less than 1 km 2 assigned to CLC land use code 211. Class 231 (pastures) was used instead in these cases assuming a uniform interspersal with patches of cropland in the real landscape.
As these preprocessing steps are not part of the algoritm as such a directory with district weather data, the two CSV files, and a preconfigured absolutcontrol.dat are readily provided along the program code (Conradt, 2021b). It should be 190 noted that it is not necessary to reproduce the very weather variables present in the example input to run the model; the example application resorts to mean air temperature (tav), precipitation (pr), and sunshine duration (sund) as selected in absolutcontrol.dat while any alternative combination of monthly variables would be technically possible.

Methods
The R code of ABSOLUT consists of five programs that have to be run in their naming order: from 100_absolut.R to 195 500_absolut.R in steps of hundreds (Conradt, 2021a). The sectional workflow, delineated in Fig. 1, is partly owing to different stages of code development and was kept to allow for checks into the intermediate output/input files. The following subsections describe the purposes and main features of these programs; for implementation details the source code should be examined as such.

200
Program 100 is principally blunt exhaustive input variable testing for obtaining the "best" multiple linear regressions. Although the results are far from optimal and cannot be used for predictive purposes directly (see section 4.1.1 with Fig. 3) it contributes to narrowing the search window for regressions of higher predictive capability.

Initialization, preparation of weather input features
It starts with loading the libraries and reading a control file named absolutcontrol.dat whose contents are transferred 205 into a couple of control variables. This information includes paths to a directory of input weather data (one file per district) and an empty one for time-aggregated weather variables to be derived and used in the regressions. The names of the original weather input files are registered, and the yield data in yield-indat.csv is read into memory. The official yield  Figure 1. Flowchart of the ABSOLUT programs (yellow boxes) with input, intermediate, and output data. Coloured in green are the inputs which have to be provided to run the programs, any other data will be calculated. Principal outputs are tinted in red.
hold data for different crop species but only one of these selected via the control file is to be considered; modelling several species requires several runs of the full program sequence.

210
The program loops over all districts for which both climate and yield data are present. Districts are identified through fivedigit IDs, a case-specific setting that might need readjustment for other established key formats. The number of considered districts will usually be further reduced by the minimum requirement of yield data for the currently selected crop. Setting this minimum too low will include unreliable regressions while a high setting will unneccessarily throw out a lot of otherwise useful districts; it usually requires balancing between these downsides based on the actual data availability: Longer time series 215 allow for higher thresholds.
The "last month of weather data to be considered" to be set in the control file should be a calender month towards the end of the average growing season of the crop to be considered. Within each 12-months period ending in this month time aggregates of the chosen weather variables are calculated for periods of two to six months, these are called weather features. Using all possible start months this means 11 different two-month features per variable, ten three-month features, nine four-month aggregations over more than six months was a somehow arbitrary (and possibly sub-optimal) design decision taken for limiting the number of features.
The weather features are named by combining the meteorological variable acronym with two-digit numbers of the start and end months, thus pr1203 accordingly identifies precipitation in the four months from December to March. For each district 225 considered a file with a wide table of all weather features is stored in the designated directory.

Naïve exhaustive search and feature selection
For each district all possible combinations of three different weather features chosen from the pool generated in the previous step are used as input to multiple linear regressions targeting the available yield data. The three best-fitting feature combinations (determined by Pearson's r) are found by exhaustive search. In addition, out-of-sample predictions are evaluated: Yield data of 230 single years are subsequently removed from the input and the search for the best regressions is repeated based on the remaining weather-yield time series. Again, the best-performing feature combinations are recorded inclusive their yield predictions for the years with censored observations. The output from this excersize is collected in two files: absolut-100-output.dat containing virtually everything including the weather aggregates and the coefficients of determination for both the full data regressions (R 2 ) and the out-of-235 sample predictions (R 2 val ), and absolut-100-r2map containing part of the results conveniently formatted for GIS import and mapping.
As already mentioned, the best-fitting feature combinations found here cannot be directly used for reliable yield predictions, because the per-district information is not sufficient to both select a valid combination of regressors and at the same time estimate their coefficients; the textbook calculation of degrees of freedom and standard error only applies if there is no extra 240 freedom in choosing the regressors. This is why the exhaustive search for optimal regressions could be called naïve in the first place.
The better purpose of program 100 is however to subset the pool of all possible regression features to those very probably containing predictive power, and this is based on counting the number of occurrences of weather features among the "winning" combinations. So why exactly three features? Why always the three best-fitting regressions on the complete time series?

245
Once more, these are somewhat subjective choices. It is assumed that the preferred features will be similarly preferred for any multiple regression with 3-5 input features while regressions with only one or two features would suppress potential combination effects and unnecessarily limit the number of samples.
Accordingly all weather features used in the best-fitting regressions are sorted along their frequencies of use, and a relevance cutoff is determined based on binomial probabilities -features need to have been selected more often then they would have 250 occurred by pure chance with 99.9 % probability; the number of features above this threshold is henceforth denoted q. Details of the calculation are explained for the case study example (see section 4.1.1). Another small output file listing the q features, absolut-100-preselection.dat, is produced. Finally the district files containing the wide tables of weather features are rewritten, now constrained to these features as only those will be used in the remaining working steps.  3.3 Program 300 -"the gold pan" The logically following task addressed by this program is determining the optimal regression model for each spatial subunit.
Simply choosing the feature combinations producing the highest correlations per district ("local heroes") would however be misleading because there is not enough validating information in a single yield time series; any feature combination needs to be cross-validated by above-average performances in many districts.

280
The solution currently implemented is a compromise between resorting to a single "generalist" combination with the best average performance and a zoo of isolated regressions: The seven on-average best-performing combinations are determined globally, and in each district the locally best-performing regression out of these is implemented.
There are in fact a number of alternative selection methods implemented in the code, however deactivated in the distributed version. Among the tested methods whose results are reported for the example application (Tables 2 and 3) are:

285
Significantly elevated r: Inclusion of any combination with a significant shift towards higher-than-average r. This is determined by a rank-based approach; combinations which are significantly overrepresented in a the upper halves, thirds, quarters, and so on (down to five per cent) of the district rankings are globally selected. This may yield several thousand "allowed" feature combinations.
Optimal rank sum of seven: Out of the combinations validated in the previous approach seven are chosen such that the rank 290 sum of their best per-district correlations is minimized.
Global and local heroes: The 42 globally best performing combinations with the highest average r values are merged with those working exceptionally well in smaller subsets of the districts (down to 5 % of the districts). The idea behind is to account for special conditions in certain landscapes.
Best global: Finally the set of combinations is simply confined to a certain number of global top performers, this is what is 295 currently implemented for a number of seven.
Whatever confinement is used, the general principle is always subsetting the A possible combinations to a subset of C "allowed" combinations. As for each district the locally best-performing feature combination belonging to this subset is used, only c combinaions will be finally used, and it holds A > C ≥ c. And C = 7 of the example appication is chosen somewhat arbitrarily -why not use the 3, 21, or 42 best combinations? Or an alltogether different selection approach?  Among the outputs are time series plots of observations and predictions for the spatial aggregations, Fig. 2 shows an example.

Application to Germany: setup, results and discussion
The district-level administrative regions of Germany are identified by a five-digit key (Regionalschlüssel, "regional key", abbreviated RS) which was adopted for the application, e. g. in the file names for weather variables and features. Three principal weather variables were selected: average temperature, precipitation, and sunshine duration. The minimum of yield data per Yield in dt/ha  district had been set to 17 because the example dataset was limited to 21 years (1999-2019) and a higher requirement would have excluded many districts with incomplete observations.
The principal test crop was winter wheat, the last month for weather input before each year's harvest (typically in July or August) was set to June. For the secondary test crop silage maize the weather input season was set to end by August; maize harvest may occur late in the year but growth stagnates in autumn.  The average temperature towards the end of the growing season (temperature aggregates for May and June and March to June) stick out clearly; this is in full agreement to the often described temperature sensitivity of wheat during and after anthesis 340 (Akter and Islam, 2017;Farooq et al., 2011;Schauberger et al., 2017a). Consequently more than half of the aggregates shown in

Running programs 200 and 300
In the winter wheat example the big output table of program 200 has A = 37 4 = 66 045 numbered lines (below a header) for all possible input variable combinations, and 325 columns representing the districts for which enough yield data were available.

345
Negative correlation coefficients could be found in no less than 28 % of the table cells. The negative extreme in the example is a near perfect anticorrelation of r min = −0.964.
Given the number of approximately 20 out-of-sample regression estimates behind every correlation coefficient, the computational demand is significant. The winter wheat example took about 2 hours using 112 CPU cores in parallel.
The repeated estimation of parameters for the q = 37 weather features (in varying combinations) per district is still rather 350 noisy given only approximately 20 data points from each observed yield time series. Consequently 304 different combinations make up the best-performing regressions in the 325 districts of the winter wheat case, and despite an average correlation of r = 0.788 they perform better than the regressions from the initial "naïve" search ( Fig. 3b) but still very poorly when used for predictions outside the training period.
The input feature combinations leading to the highest average correlations over all spatial subunits as determined by pro-355 gram 300 are shown in Table 1. The individual district rankings among these finally decide which one is applied where (and if all of them are used at all -like in this example with C = c = 7. For winter wheat 13 different weather features are finally included, and the average correlation of the so determined district models isr = 0.597.

Running programs 400 and 500
Program 400 -utilizing the district-specific regressions determined in the previous step for the quasi-out-of-sample yield 360 estimations and target-year predictions -runs within seconds through all districts. Their spatial time-series aggregation towards Germany's national winter wheat yields, computed by program 500, is shown in Fig. 2. Figure 2 includes the R 2 val and root mean square error (RMSE) for the predictions: 0.795 and 2.39 dt ha −1 , respectively. The RMSE is used as basis for the 95 % confidence interval indicated by the green error bars. Assuming a normal distribution of the residuals, the interval should be ±1.96 RMSE, but here a factor of 2.00 was used to account for the fact that all data have 365 been used to determine the weather aggregate combinations in the regressions so that not a pure out-of-sample approach had been applied despite the regression parameters being estimted without the observed values in the target years. The rough and rather intutitive confidence correction serves the needs of this example framework but should eventually be refined.  (Table 3), and RMSE p shows also minima for the least restrictive "significantly elevated r" method, but it is clear that the lower RMSE g values connected to flexible regressor choices systematically overestimate their predictive performance.
The seven best globally performing combinations have been defined as standard approach despite the suboptimal results of this method in the 2018 silage maize case. This can be justified by comparably low errors in the true out-of-sample cases and 385 a relative parsimony of different feature combinations. Parsimony of different weather aggregates included adds to the latter, because the seven combinations of the best global approach require only about 13 different weather features while the seven combinations of the optimal rank sum approach regularly contain 20 or more.
However, this currently adopted standard might be revised in the future, cf. the minimal errors achieved for the target year forecasts of winter wheat yields using more than 200 district-specific combinations in the "significantly elevated r" approach 390 ( Table 2). The solution chosen here might not be optimal but is a viable selector for demonstrating the power of the exhaustive search approach.

The number of climate aggregate features
To fix the number of climate aggregate input features to d = 4 had been a fundamental design decision. Table 4 shows the goodness-of-fit metrics for two to five climate input variables for winter wheat, silage maize, and winter rape; now the complete 395 available input data (with the observed yields up to and including the year 2019) are used.
For winter wheat, three weather aggregates would have been sufficient, three show higher performance than two in all three indicators -the average correlation of predicted district yields with local observationsr, the coefficient of determination for the time series of national yield averages R 2 g , and the national forecast error level RMSE g . With four or five input weather aggregates the results deteriorate slightly.

400
The silage maize results support four weather inputs. Although there are monotonous performance improvements for each additional weather aggreagate regressor, the advance with going from four to five is too small to justify the additional computational burden. Winter rape yield estimations show a somehow opposite behaviour: While there is practically no performance change from two to four regressors, introducing a fifth one causes an upward jump. Here it probably plays a role that, as only 22 weather aggregates are preselected, even five variables mean just 26 334 different combinations, much fewer than there are 405 for wheat and maize with four variables.
As using four weather input variables hardly spoils the predictive power in cases where two or three weather inputs would suffice but clearly enhances the performance for silage maize while five variables often afford extreme computational demand -for winter wheat it took 112 CPU cores running 14 hours -four weather aggregates were set as standard. Making the number of weather inputs flexible remains an option worth considering for future model versions, though.

Selection of principal weather variables
As already mentioned above, mean air temperature, precipitation, and sunshine duration had been chosen as meteorological input. Other variables have not been systematically investigated for their predictive potential due to a number of reasons.
Temperature and precipitation are fundamental climatological variables definitely governing plant growth, they should always be considered. Aggregates of daily maximum or minimum temperatures are considerable alternatives for mean temperature, 415 but combinations of different temperature "flavours" should be avoided because of strong multicollinearity. Experimenting with temperature differences (diurnal temperature variations) would also be an option, but these are correlated with radiation.
Sunshine duration, chosen as proxy for photosynthetic active radiation, could also have been replaced by direct radiation measurements or cloud coverage, another proxy. Tests using global radiation instead of sunshine duration showed however gradual performance losses which may be related to differences in data acquisition: There are only few stations recording 420 radiation directly compared to those recording sunshine duration. Sunshine duration was preferred for another reason, toothe respective monthly DWD grid product becomes regularly available shortly after the turn of the month while the radiation product takes several days longer to be released. Wind was also tested occasionally but crop yields seem to be rather insensitive to it, the devastating effects of regional gales and local tornadoes hit too erratic to be captured by multi-month wind averages.
Data availability will generally determine the decision which variables to select. Climate scenario data may e. g. lack sun-425 shine duration and provide downwelling short radiation instead. Among the DWD data, evapotranspiration and especially soil wetness, estimated by the AMBAV model (Löpmeier, 2014;Friesland and Löpmeier, 2007), seem very promising for yield forecasting. They were deliberately cold-shouldered here as the general applicability of ABSOLUT with purely meteorological regressors should be demonstrated.

Regional performance 430
Using district regressions provides not only a basis for aggregate results but can also help identify spatial patterns despite the higher noise in single district results. In contrast to major agricultural zones of the world like the North China Plain or the US corn belt Germany is characterized by a high diversity of soil landscapes forming a distinct pattern of high and low yield regions (Hennings, 2013;Kruse, 2016). Hence it makes a difference whether the ability to predict spatial yield patterns is measured by absolute values or relative changes which is also demonstrated in the following.

435
The silage maize harvest forecast for the drought year 2018 was chosen due to extreme yield losses observed in Eastern Germany. For 2019 yield changes of winter wheat, the most frequently grown crop in Germany, are presented. Unless stated otherwise, quoted figures in this section refer to all of Germany.

Silage maize 2018 on district level
Silage maize became increasingly popular as fodder and energy crop over the last decades and is now grown on approximately Despite this high noise level, the maps of the predicted and observed relative yield changes in districts shown in Fig. 5 (a) and (b) show a general similarity of spatial patterns. There are regional misses along the coast to the Baltic Sea and in some western and south-eastern parts of the country, but the center of gravity of the strongest yield losses could correctly be located.
Panel (c) of Fig. 5 gives an impression of the regional distribution of prediction power through absolute RMSE values 455 calculated from the complete record of out-of-sample prediction errors. In general, the method works fine in Northern Germany and some parts of the south, but has some issues in western to central areas. The actual 2018 errors (d) were much larger in Winter wheat is Germany's most frequently grown crop covering approximately three million hectares of the agricultural areas.
The distributions of predicted and observed district yields for the year 2019, shown in Table 6, match reasonably well and there was also a certain general overprediction as with silage maize for 2018. The correlation between predicted and observed district yields is however a little higher, for the relative changes to the 2012-2017 yields Pearson's r reaches 0.518 (n = 253) 465 which means 26.8 % of the variance being explained.
The spatial distributions of predicted and reported changes shown in maps (a) and (b) of Fig. 6 are also rather similar, except that the magnitude of observed yield losses was not captured by the prediction. On the other hand, the situation in the coastal and western parts matches the predictions quite well contrasting the poor performance for the 2018 silage maize. In the south-east the results suffer from slight overestimations, but a local gradient towards the southern boundary is generally 470 matched.

Error compensation in spatial aggregates
By aggregating district results to regional averages some noise is filtered out by mutual error compensation and error lev-    In the case of the 2019 winter wheat predictions RMSE and MAE of district results (n = 280) are at 8.6 and 6.6 dt ha −1 , respectively; the state level RMSE and MAE are at 6.6 and 5.1 dt ha −1 , respectively; and the national aggregate prediction 480 missed the official national yield by 5.7 dt ha −1 . The 2018 and 2019 predictions were generally too optimistic, thus for other years without general biases better error reductions can be expected from spatial aggregations.
Predictions from linear regressions are usually associated with confidence intervals calculated from the standard errors and correlations of the parameter estimations. Hence it should theoretically be possible to spatially aggregate these estimates as well and accordingly give dynamic confidence intervals for state and national predictions of individual years. However, the 485 covariances between the district errors would have to be known for the uncertainty propagation calculation, and this is currently unachievable. The estimated error levels of district predictions hardly correspond to the errors actually observed in single years.
Each year exposes a general bias, and in addition to that the district prediction errors are locally correlated.
At the moment, the confidence intervals given for district and aggregate predictions on different levels (cf. the green bars in Fig. 2) are therefore based on the RMSE of the out-of-sample errors of past year predictions for the given prediction unit.

490
Under certain conditions (extreme weather) errors do however exceed the expected level by far, often concentrated in regional clusters as shown in Fig. 5 (d). The spatial correlation of errors should eventually be further investigated.   Gornott and Wechsung (2016) Experiments with district-based crop yield prediction in Germany through multiple regression using weather aggregates had already been presented by Gornott and Wechsung (2016). In contrast to the algorithm presented here only year-on-year (YoY) changes were considered saving the explicit estimation of an underlying linear trend. The study investigated different options 520 to couple the coefficients of the district models (paneling), an approach further pursued with cluster analysis (Conradt et al., 2016). The nearest equivalent to ABSOLUT are therefore the independent district regressions of Gornott and Wechsung (2016), called there "separate time series models" (STSMs), and the most fundamental difference is that all STSMs used the same set of input variables -predefined per crop -while ABSOLUT searches for some optimal combinations.
How powerful are the predefined input variables in terms of prediction accuracy compared to the combinations drawn 525 by ABSOLUT? To answer this question, the district regressions were charged with the weather variables originally used by Gornott and Wechsung (2016), the base trend estimation for absolute yield outputs was however left untouched to avoid side effects from the other methodological differences. The weather variables uniformly used in the STSMs were -For winter wheat: potential evapotranspiration from November to April and from May to July, temperature normalized radiation from May to July, and precipitation from November to April and from May to July;

530
-For silage maize: potential evapotranspiration from May to July and from August to October, temperature normalized radiation from May to July, and precipitation from May to july and from August to October.
The potential evapotranspiration had been calculated daily according to Haude (1955), cf. Schrödter (1985. The same holds for the temperature normalized radiation which is solar radiation in J cm −2 divided by the average air temperature in°C elevated by 20 to avoid negative denominators. All weather data were taken from meteorological stations with each district using the 535 data of the one closest to its centre, only in case of multiple stations located within the same district their measurements were averaged . To exclude these peculiar differences in weather data processing from the comparison all weather data are now consistently taken from the monthly DWD grids and preprocessed for the districts as described above in Sect. 2.2. Potential evapotranspiration is consequently calculated by AMBAV (Löpmeier, 2014;Friesland and Löpmeier, 2007), very probably closer to 540 reality than with Haude's formula. Only temperature normalized radiation is not originally computed on daily basis but using the monthly grids of global radiation and average temperature. The original Gornott and Wechsung (2016) approach used also fertilizer price and acreage of the target crop among their input variables which are simply omitted here.
The time series for national predictions of winter wheat in Germany calculated from the five predefined weather aggregates is shown in Fig. 7. Compared to the result of the ABSOLUT algorithm in Fig. 2   prediction of interannual changes. This is however clearly below the 90.8 % obtained with ABSOLUT.

Conclusions and outlook
Among many studies investigating the correlations between weather variables and agricultural yields and using these for yield prediction this one utilizes ordinary multiple linear regressions but massive parallelized computations for a systematic search of the most relevant input aggregates among tens of thousands of possible combinations. Test bed are about 300 district-level 560 administrative areas in Germany for which there are comprehensive high resolution weather data and individual yield statistics for ten different crops.  The first lesson learned was that the excessive testing and optimization of regressor combinations consumes degrees of freedom, thus predictive power, just like the estimation of many coefficients within the multiple regressions. As the time frame of example applications was relatively short (20-22 years from 1999 to 2018, 2019, or 2020) the solution was to not rely on 565 the top-performing combination of each district but to require significant above-average performances in many districts for a specific combination of weather aggregates to qualify as predictor. While this approach always requires many spatial subunits with respectively distributed training data the performance of the predictions clearly exceeded the results of precursor studies Conradt et al., 2016). The prediction performance achieved by the latter could be reconstructed with more recent data to some extent for silage maize but hardly at all for winter wheat -probably the formerly observed 570 correlations between weather and wheat yield detoriate under climate change and become replaced by others (to which the ABSOLUT algorithm will automatically adopt).
There are two critical spots of the present 1.0 version of the algorithm which are at the same time the biggest opportunities for improvements with future revisions: The first one is the assumption of a linear base trend of yields independently estimated The target year of this simulation was 2019.
for each spatial subunit. This allows straightforward prediction of absolute yields instead of relative changes, but as recent 575 observations do not support the assumption of a smooth long-term development any linear estimation basis will cause prediction biases especially if a long history of training data is applied (which is principally advantageous).
The other issue is the final selection of independent regressor variables, i.e. weather aggregates, for each spatial subunit.
The variants tested in Sect. 4.1.2 are probably just building blocks for a more intelligent way to select the most reliable combinations. The general challenge is about finding the optimum balance between the multiple, spatially distributed confirmations 580 of predictive power and the flexibility needed to adopt to smaller regions demanding alternative combinations for more exact predictions. The assumption of the correlation of weather aggregates to yields changing over time has been confirmed by other studies (Trnka et al., 2016;Ceglar et al., 2020) which calls for additional flexibility in handling long-term time series (a luxury problem).
A third, less critical opportunity for improvement has been identified in Sect. 4.5.3: An adaptive estimation of confidence 585 intervals for the spatial aggregates of yield predictions and for individual years. Some results suggest that standard errors of district predictions are hardly connected to actual prediciton errors for state and national yield aggregates. An obviously high spatial correlation of subunit errors calls for further investigation.
Nevertheless it could be demonstrated that the ABSOLUT algorithm, already in its present stage of development, is capable of explaining 70-90 % of the national yield variations of major crops in Germany solely based on weather variables, a much 590 higher share than usually assigned to weather signals alone: Global studies assessing the relative impact of weather factors on crop yield variations (Frieler et al., 2017;Schauberger et al., 2017b) give typical ranges of 50-60 % for wheat and maize yields of main producer countries, for wheat in the United States even as low as 30-40 %. A careful assessment separating the impacts of farm management and weather effects on wheat yield variations across Germany (Albers et al., 2017) found average shares of only 43 % of the variations caused by the weather and 49 % owing to management. These numbers are however averaged 595 from single farm samples and should not directly be compared to the ABSOLUT results; individual farm management effects are probably just cancelling out in the national averages of district yields. Considering also non-meteorological drivers is generally advisable in developing countries where factors like irrigation status, fertilizer price and general farming conditions are much more decisive (Assefa et al., 2020).
National aggregate yields of staple crops in Europe may however depend more on weather than previously assumed (cf. 600 Agnolucci and De Lipsis, 2020). Related questions about the impact of climate change on food security consequently underline the need for further research into this field.
Code and data availability. The model code, consisting of five R programs, is publicly available at https://doi.org/10.5281/zenodo.4468609 (Conradt, 2021a). The input data needed for the example application to Germany consist of crop yields and cropping areas in administrative regions, district-level monthly weather data, and a control file. They are publicly available at https://doi.org/10.5281/zenodo.4468691 (Conradt, 2021b).
Author contributions. The sole author Tobias Conradt conducted all parts of the work, including conceptualisation, data acquisition and preprocessing, software coding and testing, running the example computations, visualization, and manuscript writing.
Competing interests. The author declares that he has no conflict of interest.
Disclaimer. All parts of the ABSOLUT v1.0 program code are released under the GNU General Public License, Version 3 (https://www.