Articles | Volume 10, issue 9
Geosci. Model Dev., 10, 3519–3545, 2017
Geosci. Model Dev., 10, 3519–3545, 2017

Development and technical paper 25 Sep 2017

Development and technical paper | 25 Sep 2017

Reverse engineering model structures for soil and ecosystem respiration: the potential of gene expression programming

Iulia Ilie1, Peter Dittrich2,3, Nuno Carvalhais1,4, Martin Jung1, Andreas Heinemeyer5, Mirco Migliavacca1, James I. L. Morison8, Sebastian Sippel1, Jens-Arne Subke6, Matthew Wilkinson8, and Miguel D. Mahecha1,3,7 Iulia Ilie et al.
  • 1Max Planck Institute for Biogeochemistry, Department Biogeochemical Integration, Hans-Knoell-Str. 10, 07745 Jena, Germany
  • 2Bio Systems Analysis Group, Institute of Computer Science, Jena Centre for Bioinformatics and Friedrich Schiller University, 07745 Jena, Germany
  • 3Michael Stifel Center Jena for Data-Driven and Simulation Science, 07745 Jena, Germany
  • 4CENSE, Departamento de Ciéncias e Engenharia do Ambiente, Faculdade de Ciéncias e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal
  • 5Department of Environment, Stockholm Environment Institute, University of York, York, YO105NG, UK
  • 6Biological and Environmental Sciences, School of Natural Sciences, University of Stirling, Stirling, UK
  • 7German Centre for Integrative Biodiversity Research (iDiv), Deutscher Platz 5e, 04103 Leipzig, Germany
  • 8Forest Research, Alice Holt Lodge, Farnham, Surrey, GU10 4LH, UK

Abstract. Accurate model representation of land–atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.

Short summary
Accurate representation of land-atmosphere carbon fluxes is essential for future climate projections, although some of the responses of CO2 fluxes to climate often remain uncertain. The increase in available data allows for new approaches in their modelling. We automatically developed models for ecosystem and soil carbon respiration using a machine learning approach. When compared with established respiration models, we found that they are better in prediction as well as offering new insights.