Preprints
https://doi.org/10.5194/gmd-2021-218
https://doi.org/10.5194/gmd-2021-218

Submitted as: methods for assessment of models 06 Aug 2021

Submitted as: methods for assessment of models | 06 Aug 2021

Review status: this preprint is currently under review for the journal GMD.

Using the leave-two-out method to determine the optimal statistical crop model

Thi Lan Anh Dinh and Filipe Aires Thi Lan Anh Dinh and Filipe Aires
  • Sorbonne Université, Observatoire de Paris, Université PSL, CNRS, LERMA, 75014 Paris, France

Abstract. The use of statistical models to study the impact of weather on crop yield has not ceased to increase. Unfortunately, this type of application is characterised by datasets with a very limited number of samples (typically one sample per year). In general, statistical inference uses three datasets: the training dataset to optimise the model parameters, the validation datasets to select the best model, and the testing dataset to evaluate the model generalisation ability. Splitting the overall database into three datasets is impossible in crop yield modelling. The leave-one-out cross-validation method or simply leave-one-out (LOO) has been introduced to facilitate statistical modelling when the database is limited. However, the model choice is made using the testing dataset, which can be misleading by favouring unnecessarily complex models. The nested cross-validation approach was introduced in machine learning to avoid this problem by truly utilising three datasets, especially problems with limited databases. In this study, we proposed one particular implementation of the nested cross-validation, called the leave-two-out method (LTO), to chose the best model with an optimal model complexity (using the validation dataset) and estimated the true model quality (using the testing dataset). Two applications are considered: Robusta coffee in Cu M'gar (Dak Lak, Vietnam) and grain maize over 96 French departments. In both cases, LOO is misleading by choosing too complex models; LTO indicates that simpler models actually perform better when a reliable generalisation test is considered. The simple models obtained using the LTO approach have reasonable yield anomaly forecasting skills in both study crops. This LTO approach can also be used in seasonal forecasting applications. We suggest that the LTO method should become a standard procedure for statistical crop modelling.

Thi Lan Anh Dinh and Filipe Aires

Status: open (until 30 Oct 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Thi Lan Anh Dinh and Filipe Aires

Thi Lan Anh Dinh and Filipe Aires

Viewed

Total article views: 237 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
194 38 5 237 4 2
  • HTML: 194
  • PDF: 38
  • XML: 5
  • Total: 237
  • BibTeX: 4
  • EndNote: 2
Views and downloads (calculated since 06 Aug 2021)
Cumulative views and downloads (calculated since 06 Aug 2021)

Viewed (geographical distribution)

Total article views: 205 (including HTML, PDF, and XML) Thereof 205 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Sep 2021
Download
Short summary
We proposed the leave-two-out method (i.e., one particular implementation of the nested cross-validation) to determine the optimal statistical crop model (using the validation dataset) and estimate its true generalisation ability (using the testing dataset). This approach is applied over two examples (Robusta coffee in Cu M'gar and grain maize in France). The results suggested that the simple models are more suitable in crop modelling where a limited number of samples is available.