Articles | Volume 15, issue 9
https://doi.org/10.5194/gmd-15-3519-2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-15-3519-2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Nested leave-two-out cross-validation for the optimal crop yield model selection
Sorbonne Université, Observatoire de Paris, Université PSL, CNRS, LERMA, 75014 Paris, France
Filipe Aires
Sorbonne Université, Observatoire de Paris, Université PSL, CNRS, LERMA, 75014 Paris, France
Related authors
Thi Lan Anh Dinh, Daniel Goll, Philippe Ciais, and Ronny Lauerwald
Geosci. Model Dev., 17, 6725–6744, https://doi.org/10.5194/gmd-17-6725-2024, https://doi.org/10.5194/gmd-17-6725-2024, 2024
Short summary
Short summary
The study assesses the performance of the dynamic global vegetation model (DGVM) ORCHIDEE in capturing the impact of land-use change on carbon stocks across Europe. Comparisons with observations reveal that the model accurately represents carbon fluxes and stocks. Despite the underestimations in certain land-use conversions, the model describes general trends in soil carbon response to land-use change, aligning with the site observations.
Lan Anh Dinh, Filipe Aires, and Victor Pellet
EGUsphere, https://doi.org/10.5194/egusphere-2026-2360, https://doi.org/10.5194/egusphere-2026-2360, 2026
This preprint is open for discussion and under review for Earth Observation (EO).
Short summary
Short summary
Soil moisture (SM) plays a key role in weather, agriculture, and water management. While satellites can measure SM from space, obtaining accurate, frequent measurements throughout the day remains challenging. Here, we explore how deep learning models can improve sub-daily SM estimates. Our approach focuses on capturing spatial patterns and adapting to local conditions. Using data from the ASCAT satellite instrument, we show that this model can produce reliable SM estimates multiple times a day.
Bernhard Lehner, Mira Anand, Etienne Fluet-Chouinard, Florence Tan, Filipe Aires, George H. Allen, Philippe Bousquet, Josep G. Canadell, Nick Davidson, Meng Ding, C. Max Finlayson, Thomas Gumbricht, Lammert Hilarides, Gustaf Hugelius, Robert B. Jackson, Maartje C. Korver, Liangyun Liu, Peter B. McIntyre, Szabolcs Nagy, David Olefeldt, Tamlin M. Pavelsky, Jean-Francois Pekel, Benjamin Poulter, Catherine Prigent, Jida Wang, Thomas A. Worthington, Dai Yamazaki, Xiao Zhang, and Michele Thieme
Earth Syst. Sci. Data, 17, 2277–2329, https://doi.org/10.5194/essd-17-2277-2025, https://doi.org/10.5194/essd-17-2277-2025, 2025
Short summary
Short summary
The Global Lakes and Wetlands Database (GLWD) version 2 distinguishes a total of 33 non-overlapping wetland classes, providing a static map of the world’s inland surface waters. It contains cell fractions of wetland extents per class at a grid cell resolution of ~500 m. The total combined extent of all classes including all inland and coastal waterbodies and wetlands of all inundation frequencies – that is, the maximum extent – covers 18.2 × 106 km2, equivalent to 13.4 % of total global land area.
Thi Lan Anh Dinh, Daniel Goll, Philippe Ciais, and Ronny Lauerwald
Geosci. Model Dev., 17, 6725–6744, https://doi.org/10.5194/gmd-17-6725-2024, https://doi.org/10.5194/gmd-17-6725-2024, 2024
Short summary
Short summary
The study assesses the performance of the dynamic global vegetation model (DGVM) ORCHIDEE in capturing the impact of land-use change on carbon stocks across Europe. Comparisons with observations reveal that the model accurately represents carbon fluxes and stocks. Despite the underestimations in certain land-use conversions, the model describes general trends in soil carbon response to land-use change, aligning with the site observations.
Marie Bouillon, Sarah Safieddine, Simon Whitburn, Lieven Clarisse, Filipe Aires, Victor Pellet, Olivier Lezeaux, Noëlle A. Scott, Marie Doutriaux-Boucher, and Cathy Clerbaux
Atmos. Meas. Tech., 15, 1779–1793, https://doi.org/10.5194/amt-15-1779-2022, https://doi.org/10.5194/amt-15-1779-2022, 2022
Short summary
Short summary
The IASI instruments have been observing Earth since 2007. We use a neural network to retrieve atmospheric temperatures. This new temperature data record is validated against other datasets and shows good agreement. We use this new dataset to compute trends over the 2008–2020 period. We found a warming of the troposphere, more important at the poles. In the stratosphere, we found that temperatures decrease everywhere except at the South Pole. The cooling is more pronounced at the South pole.
Cited articles
Agri4cast: Crop Calendar,
https://agri4cast.jrc.ec.europa.eu/DataPortal/Index.aspx?o=,
last access: 20 June 2021. a
Allen, D. M.: The Relationship Between Variable Selection and Data Agumentation
and a Method for Prediction, Technometrics, 16, 125–127,
https://doi.org/10.1080/00401706.1974.10489157, 1974. a, b
Amarasinghe, U. A., Hoanh, C. T., D'haeze, D., and Hung, T. Q.: Toward
sustainable coffee production in Vietnam: More coffee with less water,
Agr. Syst., 136, 96–105, https://doi.org/10.1016/j.agsy.2015.02.008, 2015. a
Ambroise, C. and McLachlan, G. J.: Selection bias in gene extraction on the
basis of microarray gene-expression data, P. Natl. Acad.
Sci. USA, 99, 6562–6566, https://doi.org/10.1073/pnas.102102699, 2002. a
Anh, D. T. L. and Filipe, A.: Code and Data for the Leave-Two-Out Method,
Zenodo [code], https://doi.org/10.5281/zenodo.5159363, 2021. a
Beillouin, D., Schauberger, B., Bastos, A., Ciais, P., and Makowski, D.:
Impact of extreme weather conditions on European crop production in 2018,
Philos. T. Roy. Soc. B, 375, 20190510, https://doi.org/10.1098/rstb.2019.0510, 2020. a
Bishop, C. M.: Neural Networks for Pattern Recognition, Oxford University
Press, Inc., USA, ISBN 0198538642, 1995. a
Bunn, C., Laderach, P., Ovalle Rivera, O., and Kirschke, D.: A bitter cup:
climate change profile of global production of Arabica and Robusta coffee,
Climatic Change, 129, 89–101, https://doi.org/10.1007/s10584-014-1306-x, 2015. a, b
Çakir, R.: Effect of water stress at different development stages on
vegetative and reproductive growth of corn, Field Crop. Res., 89, 1–16,
https://doi.org/10.1016/j.fcr.2004.01.005, 2004. a
Cawley, G. C. and Talbot, N. L.: On Over-Fitting in Model Selection and
Subsequent Selection Bias in Performance Evaluation, J. Mach. Learn. Res.,
11, 2079–2107, 2010. a
Ceglar, A., Zampieri, M., Gonzalez-Reviriego, N., Ciais, P., Schauberger, B.,
and Van Der Velde, M.: Time-varying impact of climate on maize and wheat
yields in France since 1900, Environ. Res. Lett., 15, 094039,
https://doi.org/10.1088/1748-9326/aba1be, 2020. a, b
Craparo, A., Asten, P. V., Laderach, P., Jassogne, L., and Grab, S.: Coffea
arabica yields decline in Tanzania due to climate change: Global
implications, Agr. Forest Meteorol., 207, 1–10,
https://doi.org/10.1016/j.agrformet.2015.03.005, 2015a. a
Craparo, A., Asten, P. V., Läderach, P., Jassogne, L., and Grab, S.: Coffea
arabica yields decline in Tanzania due to climate change: Global
implications, Agr. Forest Meteorol., 207, 1–10,
https://doi.org/10.1016/j.agrformet.2015.03.005, 2015b. a, b
de Oliveira Aparecido, L. E., de Souza Rolim, G., Camargo Lamparelli, R. A.,
de Souza, P. S., and dos Santos, E. R.: Agrometeorological Models for
Forecasting Coffee Yield, Agron. J., 109, 249–258,
https://doi.org/10.2134/agronj2016.03.0166, 2017. a
Descroix, F. and Snoeck, J.:
Environmental Factors Suitable for Coffee Cultivation,
in: Coffee: Growing, Processing, Sustainable Production, John Wiley & Sons, Ltd
6, 164–177,
https://doi.org/10.1002/9783527619627.ch6,
2004. a
Dinh, T. and Aires, F.: River Discharge Estimation based on Satellite Water
Extent and Topography: An Application over the Amazon, J.
Hydrometeorol., 20, 1851–1866, https://doi.org/10.1175/JHM-D-18-0206.1, 2019. a
EUROSTAT: Database in Agriculture, forestry and fisheries,
https://ec.europa.eu/eurostat/web/products-datasets/-/tag00093,
last access: 22 September 2021. a
FAO: FAOSTAT Crops production database,
http://www.fao.org/faostat/en/#home (last access: 22 April 2020),
2019. a
Gaudio, Escobar-Gutiérrez, A. J., Casadebaig, P., Evers, J. B.,
Gérard, F., Louarn, G., Colbach, N., Munz, S., Launay, M., Marrou, H.,
Barillot, R., Hinsinger, P., Bergez, J. E., Combes, D., Durand, J. L., Frak,
E., Pagès, L., Pradal, C., Saint-Jean, S., van der Werf, W., and
Justes, E.: Current knowledge and future research opportunities for modeling
annual crop mixtures: A review, Agron. Sustain. Dev., 39, 20, https://doi.org/10.1007/s13593-019-0562-6, 2019. a
Gornott, C. and Wechsung, F.: Statistical regression models for assessing
climate impacts on crop yields: A validation study for winter wheat and
silage maize in Germany, Agr. Forest Meteorol., 217, 89–100,
https://doi.org/10.1016/j.agrformet.2015.10.005, 2016. a
Hawkins, E., Fricker, T. E., Challinor, A. J., Ferro, C. A., Ho, C. K., and
Osborne, T. M.: Increasing influence of heat stress on French maize yields
from the 1960s to the 2030s, Glob. Change Biol., 19, 937–947,
https://doi.org/10.1111/gcb.12069, 2013. a
Hersbach, H., de Rosnay, P., Bell, B., Schepers, D., Simmons, A., Soci, C.,
Abdalla, S., Alonso-Balmaseda, M., Balsamo, G., Bechtold, P., Berrisford, P.,
Bidlot, J.-R., de Boisséson, E., Bonavita, M., Browne, P., Buizza, R.,
Dahlgren, P., Dee, D., Dragani, R., Diamantakis, M., Flemming, J., Forbes,
R., Geer, A. J., Haiden, T., Hólm, E., Haimberger, L., Hogan, R.,
Horányi, A., Janiskova, M., Laloyaux, P., Lopez, P., Munoz-Sabater, J.,
Peubey, C., Radu, R., Richardson, D., Thépaut, J.-N., Vitart, F., Yang,
X., Zsótér, E., and Zuo, H.: Operational global reanalysis: progress,
future directions and synergies with NWP, European Centre for Medium Range Weather Forecasts, https://doi.org/10.21957/tkic6g3wm, 2018. a
Iizumi, T., Sakuma, H., Yokozawa, M., Luo, J. J., Challinor, A. J., Brown,
M. E., Sakurai, G., and Yamagata, T.: Prediction of seasonal climate-induced
variations in global food production, Nat. Clim. Change, 3, 904–908,
https://doi.org/10.1038/nclimate1945, 2013. a
Jayakumar, M., Rajavel, M., and Surendran, U.: Climate-based statistical
regression models for crop yield forecasting of coffee in humid tropical
Kerala, India, Int. J. Biometeorol., 60, 1943–1952,
https://doi.org/10.1007/s00484-016-1181-4, 2016. a
Kath, J., Byrareddy, V. M., Craparo, A., Nguyen-Huy, T., Mushtaq, S., Cao, L.,
and Bossolasco, L.: Not so robust: Robusta coffee production is highly
sensitive to temperature, Glob. Change Biol., 26, 3677–3688, https://doi.org/10.1111/gcb.15097,
2020. a, b
Kath, J., Mittahalli Byrareddy, V., Mushtaq, S., Craparo, A., and Porcel, M.:
Temperature and rainfall impacts on robusta coffee bean characteristics,
Climate Risk Management, 32, 100281,
https://doi.org/10.1016/j.crm.2021.100281, 2021. a
KC, K. B., Montocchio, D., Berg, A., Fraser, E. D. G., Daneshfar, B., and
Champagne, C.: How climatic and sociotechnical factors influence crop
production: a case study of canola production, SN Applied Sciences, 2, 2063,
https://doi.org/10.1007/s42452-020-03824-6, 2020. a
Kern, A., Barcza, Z., Marjanović, H., Árendás, T., Fodor, N.,
Bónis, P., Bognár, P., and Lichtenberger, J.: Statistical
modelling of crop yield in Central Europe using climate data and remote
sensing vegetation indices, Agr. Forest Meteorol., 260–261,
300–320, https://doi.org/10.1016/j.agrformet.2018.06.009, 2018. a, b
Kogan, F., Kussul, N., Adamenko, T., Skakun, S., Kravchenko, O., Kryvobok, O.,
Shelestov, A., Kolotii, A., Kussul, O., and Lavrenyuk, A.: Winter wheat
yield forecasting in Ukraine based on Earth observation, meteorologicaldata
and biophysical models,
Int. J. Appl. Earth Obs., 23, 192–203, https://doi.org/10.1016/j.jag.2013.01.002, 2013. a, b, c
Kuhn, M. and Johnson, K.: Applied predictive modeling, Springer, ISBN 978-1-4614-6848-6, 2013. a
Läderach, P., Ramirez-Villegas, J., Navarro-Racines, C., Zelaya, C.,
Martinez-Valle, A., and Jarvis, A.: Climate change adaptation of coffee
production in space and time, Climatic Change, 141, 47–62,
https://doi.org/10.1007/s10584-016-1788-9, 2017. a
Laudien, R., Schauberger, B., Makowski, D., and Gornott, C.: Robustly
forecasting maize yields in Tanzania based on climatic predictors, Sci.
Rep.-UK, 10, 19650, https://doi.org/10.1038/s41598-020-76315-8, 2020. a
Laudien, R., Schauberger, B., Waid, J., and Gornott, C.: A forecast of staple
crop production in Burkina Faso to enable early warnings of shortages in
domestic food availability, Sci. Rep.-UK, 12, 1638,
https://doi.org/10.1038/s41598-022-05561-9, 2022. a
Lecerf, R., Ceglar, A., López-Lozano, R., Van Der Velde, M., and
Baruth, B.: Assessing the information in crop model and meteorological
indicators to forecast crop yield over Europe, Agr. Syst., 168,
191–202, https://doi.org/10.1016/j.agsy.2018.03.002, 2019. a, b, c, d
Li, Y., Guan, K., Yu, A., Peng, B., Zhao, L., Li, B., and Peng, J.: Toward
building a transparent statistical model for improving crop yield prediction:
Modeling rainfed corn in the U.S., Field Crop. Res., 234, 55–65,
https://doi.org/10.1016/j.fcr.2019.02.005, 2019. a, b
Liliane, T. N. and Charles, M. S.: Factors Affecting Yield of Crops, in:
Agronomy, edited by: Amanullah, A., chap. 2, IntechOpen, Rijeka,
https://doi.org/10.5772/intechopen.90672, 2020. a
Lobell, D. B. and Burke, M. B.: On the use of statistical models to predict
crop yield responses to climate change, Agr. Forest Meteorol.,
150, 1443–1452, https://doi.org/10.1016/j.agrformet.2010.07.008, 2010. a
Mathieu, J. A. and Aires, F.: Using Neural Network Classifier Approach for
Statistically Forecasting Extreme Corn Yield Losses in Eastern United
States, Earth and Space Science, 5, 622–639, https://doi.org/10.1029/2017EA000343,
2018a. a, b
Mathieu, J. A. and Aires, F.: Assessment of the agro-climatic indices to
improve crop yield forecasting, Agr. Forest Meteorol., 253-254,
15–30, https://doi.org/10.1016/j.agrformet.2018.01.031,
2018b. a, b
Meroni, M., Waldner, F., Seguini, L., Kerdiles, H., and Rembold, F.: Yield
forecasting with machine learning and small data: What gains for grains?,
Agr. Forest Meteorol., 308–309, 108555,
https://doi.org/10.1016/j.agrformet.2021.108555, 2021. a
Miao, R., Khanna, M., and Huang, H.: Responsiveness of Crop Yield and Acreage
to Prices and Climate, Am. J. Agr. Econ., 98,
191–211, https://doi.org/10.1093/ajae/aav025, 2016. a
Muñoz Sabater, J.: ERA5-Land monthly averaged data from 1981 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], https://doi.org/10.24381/cds.68d2bb3, 2019.
a
Niedbała, G.: Application of multiple linear regression for multi-criteria
yield prediction of winter wheat,
Journal of Research and Applications in Agricultural Engineering, 63, 4, 2018. a
Olesen, J., Børgesen, C., Elsgaard, L., Palosuo, T., Rötter, R. P.,
Skjelvåg, A., Peltonen-Sainio, P., Börjesson, T., Trnka, M., Ewert, F.,
Siebert, S., Brisson, N., Eitzinger, J., Asselt, E., Oberforster, M., and
Van der Fels-Klerx, H. I.: Changes in time of sowing, flowering and maturity
of cereals in Europe under climate change, Food Addit. Contam.
A, 29, 1527–42, https://doi.org/10.1080/19440049.2012.712060, 2012. a
Prasad, A. K., Chai, L., Singh, R. P., and Kafatos, M.: Crop yield estimation
model for Iowa using remote sensing and surface parameters, International
J. Appl. Earth Observ., 8, 26–33,
https://doi.org/10.1016/j.jag.2005.06.002, 2006. a, b
Ray, D. K., Gerber, J. S., MacDonald, G. K., and West, P. C.: Climate
variation explains a third of global crop yield variability, Nat.
Commun., 6, 1–9, https://doi.org/10.1038/ncomms6989, 2015. a, b, c
Ripley, B. D.: Pattern Recognition and Neural Networks, Cambridge University
Press, https://doi.org/10.1017/CBO9780511812651, 1996. a, b
Schauberger, B., Ben-Ari, T., Makowski, D., Kato, T., Kato, H., and Ciais, P.:
Yield trends, variability and stagnation analysis of major crops in France
over more than a century, Sci. Rep.-UK, 8, 1–12,
https://doi.org/10.1038/s41598-018-35351-1, 2018. a, b
Schmidhuber, J.: Deep learning in neural networks: An overview,
Neural Networks, 61, 85–117, https://doi.org/10.1016/j.neunet.2014.09.003,
2015. a
Schuch, U. K., Fuchigami, L. H., and Nagao, M. A.: Flowering, Ethylene
Production, and Ion Leakage of Coffee in Response to Water Stress and
Gibberellic Acid, J. Am. Soc. Hortic. Sci., 117, 158–163, 1992. a
Siebert, S., Kummu, M., Porkka, M., Döll, P., Ramankutty, N., and Scanlon, B. R.: A global data set of the extent of irrigated land from 1900 to 2005, Hydrol. Earth Syst. Sci., 19, 1521–1545, https://doi.org/10.5194/hess-19-1521-2015, 2015. a
Stone, M.: Cross-Validatory Choice and Assessment of Statistical Predictions,
J. Roy. Stat. Soc. B, 36,
111–133, https://doi.org/10.1111/j.2517-6161.1974.tb00994.x, 1974. a, b, c
USDA: Coffee: World Markets and Trade,
https://downloads.usda.library.cornell.edu/usda-esmis/files/m900nt40f/sq87c919h/8w32rm91m/coffee.pdf
(last access: 22 April 2020), 2019. a
Wintgens, J. N.: Coffee: Growing, Processing, Sustainable Production: A
Guidebook for Growers, Processors, Traders, and Researchers, John Wiley & Sons, Ltd, https://doi.org/10.1002/9783527619627.ch1, 2004. a, b
Zhao, Y., Vergopolan, N., Baylis, K., Blekking, J., Caylor, K., Evans, T.,
Giroux, S., Sheffield, J., and Estes, L.: Comparing empirical and
survey-based yield forecasts in a dryland agro-ecosystem, Agr. Forest Meteorol., 262, 147–156,
https://doi.org/10.1016/j.agrformet.2018.06.024, 2018. a, b
Short summary
We proposed the leave-two-out method (i.e. one particular implementation of the nested cross-validation) to determine the optimal statistical crop model (using the validation dataset) and estimate its true generalization ability (using the testing dataset). This approach is applied to two examples (robusta coffee in Cu M'gar and grain maize in France). The results suggested that the simple models are more suitable in crop modelling where a limited number of samples is available.
We proposed the leave-two-out method (i.e. one particular implementation of the nested...