Submitted as: development and technical paper 10 Aug 2021

Submitted as: development and technical paper | 10 Aug 2021

Review status: this preprint is currently under review for the journal GMD.

Deep-Learning Spatial Principles from Deterministic Chemical Transport Model for Chemical Reanalysis: An Application in China for PM2.5

Baolei Lyu1, Ran Huang2, Xinlu Wang2, Weiguo Wang3, and Yongtao Hu4 Baolei Lyu et al.
  • 1Huayun Sounding Meteorological Technology Co. Ltd., Beijing 100081, P. R. China
  • 2Hangzhou AiMa Technologies, Hangzhou, Zhejiang 311121, P. R. China
  • 3I.M. System Group, Environment Modeling Center, NOAA/National Centers for Environmental Prediction, College Park, Maryland 20740, United States
  • 4School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States

Abstract. Well-estimated air pollutant concentration fields through data fusion are critically important to compensate the observations that are only sparsely available, especially over non-urban areas. Previous data fusion methods generally used statistical models to relate target observations and supporting data variables at known stations. In this study, we built a new data fusion paradigm by designing a dedicated deep learning framework to learn multi-variable spatial correlations from Chemical Transport Model (CTM) simulations, before using it to estimate PM2.5 reanalysis fields from station observations. The model was composed of two modules, which include an explainable PointConv operation to pre-process isolated observations and a regression grid-to-grid network to reflect correlations among multiple variables. The model was evaluated in two aspects of reproducing PM2.5 CTM simulations and generating reanalysis/fused PM2.5 fields. First, the fusion model was able to well reproduce CTM simulations from sampled station CTM data items with an average R2 = 0.94. Second, the fusion model achieved good performance with R2 = 0.77 and R2 = 0.83 respectively evaluated at the stringent city-level and station-level. The generated reanalysis PM2.5 fields have complete spatial coverage within the modelling domain and at daily time scale. One significant benefit of our fusion framework is that the model training does not rely on observations, which can be used to predict PM2.5 fields in newly-setup observation networks such as those using portable sensors. The fusion model has high computing efficiency (< 1 s/day) in predicting PM2.5 concentrations due to acceleration using GPU. As an alternative to generate chemical/meteorological reanalysis fields, the method can be readily applied for other simulated variables that with measurements available.

Baolei Lyu et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2021-253', Anonymous Referee #1, 21 Aug 2021
    • AC1: 'Comment on gmd-2021-253', Baolei Lyu, 02 Oct 2021
  • RC2: 'Comment on gmd-2021-253', Anonymous Referee #2, 11 Sep 2021
  • AC1: 'Comment on gmd-2021-253', Baolei Lyu, 02 Oct 2021
    • RC3: 'Reply on AC1', Anonymous Referee #1, 02 Oct 2021

Baolei Lyu et al.


Total article views: 385 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
311 60 14 385 20 2 1
  • HTML: 311
  • PDF: 60
  • XML: 14
  • Total: 385
  • Supplement: 20
  • BibTeX: 2
  • EndNote: 1
Views and downloads (calculated since 10 Aug 2021)
Cumulative views and downloads (calculated since 10 Aug 2021)

Viewed (geographical distribution)

Total article views: 314 (including HTML, PDF, and XML) Thereof 314 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 22 Oct 2021
Short summary
Data fusion is to estimate spatially completed and smooth reanalysis fields from multiple data sources of observations and model simulations. We developed a well-designed deep learning model framework to embed spatial correlation principles. The model has very high accuracy to predict reanalysis data fields from isolated observation data points. Besides, it is also feasible for operational applications due to computational efficiency.