Preprints
https://doi.org/10.5194/gmd-2022-301
https://doi.org/10.5194/gmd-2022-301
Submitted as: development and technical paper
 | 
13 Feb 2023
Submitted as: development and technical paper |  | 13 Feb 2023
Status: this preprint is currently under review for the journal GMD.

EnKF-based fusion of site-available machine learning air quality predictions from RFSML v1.0 and gridded chemical transport model forecasts from GEOS-Chem v13.1.0

Li Fang, Jianbing Jin, Arjo Segers, Ke Li, Bufan Xu, Wei Han, Mijie Pang, Hai Xiang Lin, and Hong Liao

Abstract. Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are trained using the historical measurement datasets independently collected at the environmental monitoring stations, and their operational forecasts onward by the inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions. In contrast, deterministic chemical transport models (CTM), which simulate the full life cycles of air pollutants, provide forecasts that are continuous in 3D field. However, owing to the complex error sources due to the emission, transport, and removal of pollutants, CTM forecasts are typically biased particularly in fine scale. In this study, we proposed a gridded prediction with high accuracy by fusing predictions from our recent regional-feature-selection machine learning prediction (RFSML v1.0) and a CTM forecast. The prediction fusion was conducted using the Bayesian theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented significant improvements than the pure CTM. Moreover, covariance inflation further enhanced the fused prediction particularly in the cases of severe underestimation.

Li Fang et al.

Status: open (until 10 Apr 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2022-301', Anonymous Referee #1, 07 Mar 2023 reply
  • RC2: 'Comment on gmd-2022-301', Anonymous Referee #2, 16 Mar 2023 reply

Li Fang et al.

Viewed

Total article views: 337 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
280 48 9 337 24 2 2
  • HTML: 280
  • PDF: 48
  • XML: 9
  • Total: 337
  • Supplement: 24
  • BibTeX: 2
  • EndNote: 2
Views and downloads (calculated since 13 Feb 2023)
Cumulative views and downloads (calculated since 13 Feb 2023)

Viewed (geographical distribution)

Total article views: 335 (including HTML, PDF, and XML) Thereof 335 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 20 Mar 2023
Download
Short summary
Machine learning model have gained great popularity in air quality prediction. However, they are only available at the air quality monitoring stations. In contrast, chemical transport models (CTM) provide forecasts that are continuous in 3D field. Owing to complex error sources, they are typically biased. In this study, we proposed a gridded prediction with high accuracy by fusing predictions from our regional-feature-selection machine learning prediction (RFSML v1.0) and a CTM forecast.