Preprints
https://doi.org/10.5194/gmd-2021-317
https://doi.org/10.5194/gmd-2021-317

Submitted as: development and technical paper 14 Oct 2021

Submitted as: development and technical paper | 14 Oct 2021

Review status: this preprint is currently under review for the journal GMD.

KGML-ag: A Modeling Framework of Knowledge-Guided Machine Learning to Simulate Agroecosystems: A Case Study of Estimating N2O Emission using Data from Mesocosm Experiments

Licheng Liu1, Shaoming Xu2, Zhenong Jin1,3, Jinyun Tang4, Kaiyu Guan5,6,7, Timothy J. Griffis8, Matthew D. Erickson8, Alexander L. Frie8, Xiaowei Jia9, Taegon Kim1, Lee T. Miller8, Bin Peng5,6,7, Shaowei Wu10, Yufeng Yang1, Wang Zhou5,6, and Vipin Kumar2 Licheng Liu et al.
  • 1Department of Bioproducts and Biosystems Engineering, University of Minnesota, Saint Paul, MN, 55108, USA
  • 2Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
  • 3Institute on the Environment, University of Minnesota, Saint Paul, MN, 55108, USA
  • 4Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
  • 5Agroecosystem Sustainability Center, Institute for Sustainability, Energy, and Environment, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
  • 6Department of Natural Resources and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
  • 7National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
  • 8Department of Soil, Water, and Climate, University of Minnesota, Saint Paul, MN 55108, USA
  • 9Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, 15260, USA
  • 10School of Physics and Astronomy, University of Minnesota, Minneapolis, MN, 55455, USA

Abstract. Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gases (GHGs) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human activities. Existing approaches such as process-based (PB) models have well-known limitations due to insufficient representations of the processes or constraints of model parameters, and to leverage recent advances in machine learning (ML) new method is needed to unlock the “black box” to overcome its limitations due to low interpretability, out-of-sample failure and massive data demand. In this study, we developed a first of its kind knowledge-guided machine learning model for agroecosystems (KGML-ag), by incorporating biogeophysical/chemical domain knowledge from an advanced PB model, ecosys, and tested it by simulating daily N2O fluxes with real observed data from mesocosm experiments. The Gated Recurrent Unit (GRU) was used as the basis to build the model structure. To optimize the model performance, we have investigated a range of ideas, including: 1) Using initials of intermediate variables (IMVs) instead of time series as model input to reduce data demand; 2) Building hierarchical structures to explicitly estimate IMVs for further N2O prediction; 3) Using multitask learning to balance the simultaneous training on multiple variables; and 4) Pretraining with millions of synthetic data generated from ecosys and fine tuning with mesocosm observations. Six other pure ML models were developed using the same mesocosm data to serve as the benchmark for the KGML-ag model. Results show that KGML-ag did an excellent job in reproducing the mesocosm N2O fluxes (overall r2 = 0.81, and RMSE = 3.6 mg N m−2 day−1 from cross-validation). Importantly KGML-ag always outperforms the PB model and ML models in predicting N2O fluxes, especially for complex temporal dynamics and emission peaks. Besides, KGML-ag goes beyond the pure ML models by providing more interpretable predictions as well as pinpointing desired new knowledge and data to further empower the current KGML-ag. We believe the KGML-ag development in this study will stimulate a new body of research on interpretable ML for biogeochemistry and other related geoscience processes.

Licheng Liu et al.

Status: open (until 17 Dec 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on gmd-2021-317', Ather Abbas, 05 Nov 2021 reply
    • AC1: 'Reply on CC1', Zhenong Jin, 05 Nov 2021 reply
      • CC2: 'Reply on AC1', Ather Abbas, 23 Nov 2021 reply
  • RC1: 'Comment on gmd-2021-317', Anonymous Referee #1, 19 Nov 2021 reply
  • RC2: 'Comment on gmd-2021-317', Anonymous Referee #2, 23 Nov 2021 reply
  • RC3: 'Comment on gmd-2021-317', Anonymous Referee #3, 25 Nov 2021 reply

Licheng Liu et al.

Data sets

Code and data for "KGML-ag: A Modeling Framework of Knowledge-Guided Machine Learning to Simulate Agroecosystems: A Case Study of Estimating N2O Emission using Data from Mesocosm Experiments " Licheng Liu, Zhenong Jin https://doi.org/10.5281/zenodo.5504533

Licheng Liu et al.

Viewed

Total article views: 597 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
439 147 11 597 35 2 3
  • HTML: 439
  • PDF: 147
  • XML: 11
  • Total: 597
  • Supplement: 35
  • BibTeX: 2
  • EndNote: 3
Views and downloads (calculated since 14 Oct 2021)
Cumulative views and downloads (calculated since 14 Oct 2021)

Viewed (geographical distribution)

Total article views: 550 (including HTML, PDF, and XML) Thereof 550 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 04 Dec 2021
Download
Short summary
By incorporating the domain knowledge into a machine learning model, KGML-ag overcomes the well-known limitations of process-based models due to insufficient representations and constraints, and unlocks the “black box” of machine learning models. Therefore KGML-ag can outperform existing approaches on capturing the hot moment and complex dynamics of N2O flux. This study will be a critical reference for the new generation of modeling paradigm for biogeochemistry and other geoscience processes.