the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Novel Method for Quantifying the Contribution of Regional Transport to PM2.5 in Beijing (2013–2020): Combining Machine Learning with Concentration-Weighted Trajectory Analysis
Abstract. Fine particulate matter (PM2.5) is closely linked to human health, with its sources generally divided into local emissions and regional transport. This study combined concentration-weighted trajectory (CWT) analysis with the HYSPLIT trajectory ensemble to obtain hourly-resolution pollutant source results. The Extreme Gradient Boosting (XGBoost) model was then employed to simulate local emissions and ambient PM2.5 in Beijing from 2013 to 2020. The results revealed that clean air masses influencing the Beijing area mainly originated from the north and east regions, exhibiting a strong winter and weak summer pattern. Following the implementation of the Air Pollution Prevention and Control Action Plan (Action Plan) by the Chinese government in 2017, pollution in Beijing decreased significantly, with the most substantial reduction in regional transport pollution events occurring in the west region during summer. Regional transport pollution events were most frequent in spring, up to 1.8 times higher than in winter. Pollutants mainly originated from the west and south regions, while polluted air masses from the east showed the least reduction, and the proportion of pollution sources from this region is gradually increasing. From 2013 to 2020, local emissions were the main contributors of pollution events in Beijing. The Action Plan has more effectively reduced pollution caused by regional transport, particularly during autumn and winter. This finding underscores the importance of Beijing prioritizing local emission reduction while also considering potential contributions from the east region to effectively mitigate pollution events.
- Preprint
(2095 KB) - Metadata XML
-
Supplement
(646 KB) - BibTeX
- EndNote
Status: open (until 30 Dec 2024)
-
RC1: 'Comment on gmd-2024-157', Anonymous Referee #1, 06 Dec 2024
reply
Comments attached.
-
CEC1: 'Comment on gmd-2024-157 - No compliance with the policy of the journal', Juan Antonio Añel, 08 Dec 2024
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have not published the following code and data necessary to replicate your work:
ECMWF Meteorological Data
GDAS Data
PySplit
HySplit Trajectory EnsembleIt is mandatory that you publish all these assets in a permanent repository of the acceptable ones as listed in our policy. Your manuscript should have not been accepted in Discussions with such shortcomings, and therefore, the current situation is highly irregular.
Therefore, I request you to address this situation publishing the requested information and replying to this comment with the corresponding links and permanent identifiers (e.g. DOI) for the new repositories containing it.
Also, you must include a modified 'Code and Data Availability' section with this new information in any potentially reviewed version of your manuscript.
Please, address this issue in a quick manner, otherwise we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2024-157-CEC1 -
CC1: 'Reply on CEC1', Kang Hu, 11 Dec 2024
reply
Dear Editor,
Thank you for your suggestions. We have separately uploaded the ECMWF Meteorological Data, GDAS Data, HySplit Trajectory Ensemble results, and PySplit module on Zenodo, making them openly accessible to the public. The specific links are as follows:
ECMWF: https://zenodo.org/records/14353871 (doi: 10.5281/zenodo.14353871)
GDAS: https://zenodo.org/records/14347277 (doi: 10.5281/zenodo.14347277)
HySplit Trajectory Ensemble: https://zenodo.org/records/14375567 (doi: 10.5281/zenodo.14375567)
PySPLIT: https://zenodo.org/records/14354765 (doi: 10.5281/zenodo.14354765)We hope these additions satisfy all the requirements. Furthermore, we will include a modified 'Code and Data Availability' section containing this new information in the revised version of our manuscript.
Sincerely,
Kang HuCitation: https://doi.org/10.5194/gmd-2024-157-CC1 -
CEC2: 'Reply on CC1', Juan Antonio Añel, 11 Dec 2024
reply
Dear authors,
Many thanks for addressing these issues. We can now consider them solved. Please, remember to include the information the new repositories in any potentially reviewed version of your manuscript.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2024-157-CEC2
-
CEC2: 'Reply on CC1', Juan Antonio Añel, 11 Dec 2024
reply
-
CC1: 'Reply on CEC1', Kang Hu, 11 Dec 2024
reply
-
RC2: 'Comment on gmd-2024-157', Anonymous Referee #2, 17 Dec 2024
reply
This is a very interesting study estimating the contribution of regional transport to PM2.5 in Beijing. This analysis can support policy-makers in both validating and designing effective policies. In addition, the methodology can be potentially applied in other regions as well, unraveling the contribution of local emissions and regional pollution transport. Nevertheless, there are some points that need further information and clarifications. If these points are addressed, then I would be happy to suggest publication in GMD.
Main comments
- A better description of the XGBoost model goal is needed. Is there a separate XGBoost model for ambient and local? Which exactly is the use of term ambient in this study? Please explain in more detail what the difference (in train data) is in building these two models. Which are the features and which are the targets in each case? One should clearly understand the transition from HYSPLIT CWT analysis to the XGBoost models. Which information from the HYSPLIT and CWT analysis are used and in which way to build the XGBoost models, and for which specific reasons (goals).
- The year 2020 is used in the analysis and to validate the XGBoost model training. Yet, 2020 is a “special” year due to COVID-19 and the associated restriction measures having direct effect on emissions and air pollution levels. How do COVID-19 restriction measures affect the analysis, and the conclusions raised for the Action Plan? This is something that needs to be well clarified.
- Do you apply any “feature selection” based on feature importance? What is the rationale for using both month of the year and day of the year? How is overfitting prevented in the XGBoost model training? Several studies (e.g. Akritidis et al., 2021; Zhang et al., 2020) applied an early stopping technique to prevent overfitting. Is there something similar applied here? Please explain and discuss in the manuscript accordingly.
Comments
Line 53: I believe dust deserves to be included compared to tsunamis and volcanic eruptions.
Line 59: The more recent studies by Smith et al. (2020) and Kalisoras et al. (2024) based on CMIP6 ESMs can be also included here (see details in References).
Lines 60-61: The study by Geng et al. (2021) can be also cited here.
Lines 68-69: For this statement a reference is required.
Line 72: Change to Wu et al. (2021) and apply accordingly where applicable in the manuscript.
Lines 116-117: “meteorological data, synoptic scale, planetary boundary layer height (PBLH),” What do you mean by synoptic scale? PBLH can be considered a meteorological parameter as well. Please rephrase.
I suggest listing the selected hyperparameters for each XGBoost model to facilitate reproduction if needed.
Lines 131-132: A url and/or reference for the PM2.5 observations is needed here.
Line 136: A reference is needed here for ERA5 data set.
Lines 148-160: Equation 2 is referred first. Please either refer to equation 1 first or change the order of equations.
Lines 154-156: I am a bit confused here. First you say “𝜏ijl is the residence time of trajectory l passing through the grid point” and then “In calculation, the number of trajectories falling on each grid point is used instead of the residence time”. The residence time is calculated for each trajectory, but then how can a residence time for a trajectory be calculated from a number of trajectories? I may miss something here, please clarify.
Lines 162-166: Which is the rationale behind the definition of the regions? I think a small sentence is needed.
Lines 168-170: So, if the contribution from the north sector is 41% and the one from local is 40% then is classified as regional from the north sector?
Figure 5: Just for clarification, summing the individual histograms over the years will result in100%? In some cases, the pie charts sum is not 100%. I assume this is related to the rounding of percentages.
References
Akritidis, Dimitris, et al. "Implications of COVID-19 restriction measures in urban air quality of Thessaloniki, Greece: A machine learning approach." Atmosphere 12.11 (2021): 1500
Geng, Guannan, et al. "Drivers of PM2. 5 air pollution deaths in China 2002–2017." Nature Geoscience 14.9 (2021): 645-650.
Kalisoras, Alkiviadis, et al. "Decomposing the effective radiative forcing of anthropogenic aerosols based on CMIP6 Earth system models." Atmospheric Chemistry and Physics 24.13 (2024): 7837-7872.
Smith, Christopher J., et al. "Effective radiative forcing and adjustments in CMIP6 models." Atmospheric Chemistry and Physics 20.16 (2020): 9591-9618
Zhang, Qiang, et al. "A PM2. 5 concentration prediction model based on multi-task deep learning for intensive air quality monitoring stations." Journal of Cleaner Production 275 (2020): 122722
Citation: https://doi.org/10.5194/gmd-2024-157-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
154 | 39 | 11 | 204 | 24 | 4 | 4 |
- HTML: 154
- PDF: 39
- XML: 11
- Total: 204
- Supplement: 24
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1