Simulation model of Reactive Nitrogen Species in an Urban Atmosphere using a Deep Neural Network: RNDv1.0

Gil, Junsu; Lee, Meehye; Kim, Jeonghwan; Lee, Gangwoong; Ahn, Joonyoung; Kim, Cheol-Hee

doi:https://doi.org/10.5194/gmd-16-5251-2023

Articles | Volume 16, issue 17

https://doi.org/10.5194/gmd-16-5251-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Special issue:

Benchmark datasets and machine learning algorithms for Earth...

https://doi.org/10.5194/gmd-16-5251-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 16, issue 17

Model description paper

|

13 Sep 2023

Model description paper |

| 13 Sep 2023

Simulation model of Reactive Nitrogen Species in an Urban Atmosphere using a Deep Neural Network: RNDv1.0

Junsu Gil, Meehye Lee, Jeonghwan Kim, Gangwoong Lee, Joonyoung Ahn, and Cheol-Hee Kim

Download

Final revised paper (published on 13 Sep 2023)
Supplement to the final revised paper
Preprint (discussion started on 10 Dec 2021)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2021-347', Anonymous Referee #1, 13 Feb 2022

Review of gmd-2021-347

Simulation Model of Reactive Nitrogen Species in an Urban Atmosphere using a Deep Neural Network: RND v1.0

by Junsu Gil et al.

General comments:

This manuscript describes a new application of a simple feed forward neural network model to calculate HONO mixing ratios based on a set of other measured variables. While this is an interesting and worthwhile application, the paper lacks the necessary details in the description of the deep learning model and contains no ablation studies which are needed to provide the credibility in the results. I also question the validity of the cross validation and test cases that are discussed, because I doubt that these test cases are truly independent data samples. There is no proof of the generalisation capability of the model, so it may well be that this model fails completely if it were applied to measurement data obtained under different conditions.

In summary, this manuscript falls between "major revisions" and "reject". In computer science conferences it would be ranked "weak reject", which means the paper could be saved if the authors invest substantial work in rerunning their model several times and improving the text.

Specific comments:

Abstract: Confusing sentence after "In this study,". After reading 3 times I understood that you are resolving the acronym RND here, but this is well hidden. Suggestion: "In this study, a new simulation approach to calculate HONO mixing ratios using a deep learning technique based on measured variables was developed. The 'Reactive Nitrogen species Deep neural network' (RND) has been implemented in Python. It was trained, ..."

Abstract: Why should RND be called a *supplementary* model? What does it supplement?

l.35: too vague "observatrional constraints on individual species". Does this refer to NOy compounds or any species involved in the tropospheric ozone production cycle?

l.40: NOy has been the focus of attention already in the 1990s. See for example papers by Sandy Sillman et al. You may say "renewed attention".

l.43: to the uniniated reader it might not be clear what heterogeneous reactions have to do with NOy and ozone chemistry. This would merit one or a few more general sentence(s) to describe NOy chemistry. If this text will get a little longer, please consider sumamrizing the HONO/NOy chemistry in a supplement and refer to it. Nevertheless, one or two sentences will be needed here.

l.44: you could add https://doi.org/10.5194/acp-18-3147-2018 to the list of references here.

l.52/53: it would be useful to know if there is general agreement among these different measurement methods or if they haven't reached a satisfactory level of consistency yet. In the following sentence, please provide some order of magnitude numbers of observed versus simulated HONO levels (or a value range).

l.57: the recent adaptation of machine learning techniques in atmospheric sciences is more general that "multi layer artificial neural network". In this context, it suffices to say that "machine learning" has been adopted. Then, in a following sentence you can narrow this down to the employment of deep (artificial) neural networks, which have a capability to learn more complex non-linear relations in data, but also require larger amounts of data for training." The selection of references appears a bit arbitrary. For example, there is a whole special issue in Philosophical Transactions A () on machine learning for weather and climate. Indeed, you may want to first provide two or three general references for ML in atmospheric science (with cf.), then write a sentence which refers specifically to atmospheric chemistry/atmospheric composition and provide some more references there.

l.59-62: the description why deep learning might be useful for the analysis of atmospheric chemical measurements remains vague and superficial. You should state explicitly that neural networks learn relations in data (similar to function fitting) and you should state in what way NNs may improve on numerical simulations (I guess you refer to the fact that they are inherently bias-free?).

l.62/63: introduction of the model acronym: difficult to disentangle the sentence - see comment on abstract above.

l.67: as this is supposed to be a manuscript for the special issue on "machine learning methods and benchmark datasets", you should add a statement here that the code and training data can be downloaded from ..." (you can of course also refer to the code and data availability section here). Re-usability of your model is a key aspect for this special issue (and for GMD in general).

l.70: the steps which are described don't guide the development of RND, but describe the typical machine learning workflow.

l.77: similar issue - this reads as if every user of RND will first have to perform measurements for her/himself. Please separate the dataset preparation from the model development. The model should be generalizable, i.e. be independent of the specific set of measurements which you describe in the paper.

l.105: "wind direction should be converted..." - please describe what you did, not what should be done.

l.106: "missing values" same as above. Did you filter or interpolate?

l.107: what is an "array of measurement data"? Also, what is missing is a description of the time resolution of the measurements and how many independent samples were prepared for the machine learning. How was the train-test-val split done? Have you checked the frequency distributions of the (normalized) variables? Have you considered log transform for non Gaussian variables? How many time steps are included in each sample?

Section 2.3: there is a lot of information missing from the network description: how many nodes per layer? What is the learning rate? How many epochs were trained? Did the learning rate change during training? Did you try out different numbers of layers and nodes per layer to determine the optimum model? Did you perform a hyperparameter search? Also, what exactly is the input data and what exactly is the target output? Loss function... Those things are standard in the machine learning literature and should be adhered to. I see some of this information appears in the figures and the following section (varying the number of nodes), but this belongs in the model description text.

l.136: if June 2018 has been used in the training already, then this month is not an ondependent test dataset any more.

l.154: does this mean that you always used the same number of modes in each layer? And you did not try to reduce the number of layers? 1600 samples appears rather small for a network with 5 layers.

l.160: I don't understand this. First you train the network for 2016 to 2019, then you run it again to obtain HONO results? You already have them from the training.(?)

l.167: I doubt that the inability fo the model to capture minima and maxima is due to the limited amount of data. This is a general aspect of regression models and extensively discussed in Kleinert et al (2021): https://doi.org/10.5194/gmd-14-1-2021

l.205 and following: this discussion of atmospheric chemsitry doesn't belong into a section describing the application of the model. Is this supposed to be a general discussion section, comparing RND to other (CTM) models?

l.235 Finally, here is a list of the input variables. But is has not been discussed, which variable has which influence on the results. I have a suspicion that the network really makes use only of 3 or 4 of the 9 variables it is given. See Kleinert et al. (2021) for a way how this can be tested with bootstrapping.

l.250/251: the ML model doesn't gain any physical understanding of the HONO chemistry, so it cannot be used to test the existing knowledge. You could use such a tool to forecast HONO levels, for example to determine if it might be worthwhile conducting HONO measurements at a specific location or during a specific time period. You may also be able to use the tool in the context of quality controlling the measurements: any strong disagreement would raise a warning that measurements should be checked with extra care.

Also, you can of course use it to estimate HONO concentrations when these were not measured in order to then perform 0D model runs, as you show in Figure 8.

And in this light, I would agree with the statement that RND is a "supplementary tool".

l.262: please provide an explicit URL here (you can still add the reference)

Technical corrections:

l.55: related to the comment on l.43: you presume that the reader is familiar with the basics of HONO chemistry, but this cannot be taken for granted.

l.30 play instead of plays

l.34 and *it* determines...

Citation: https://doi.org/10.5194/gmd-2021-347-RC1
- AC1: 'Reply on RC1', Junsu Gil, 02 Apr 2022
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2021-347/gmd-2021-347-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2021-347-AC1
RC2:
'Comment on gmd-2021-347', Anonymous Referee #2, 14 Feb 2022

In this paper, deep neural network based model is used to calculate nitrous acid (HONO) mixing ratios based on the analysis using HONO measurement data from Seoul between 2016 and 2019. Since I am not an expert in atmospheric sciences, but in data and computer science, I will in my review focus on the computational method used and its validity based on the size and type of the data.

The paper is generally well written and takes action to document the use of the suggested model. The citation to code availability is missing DOI (and one has to go over to Zenodo to locate the code)

The approach taken is motivated by the success of deep learning based methods in various areas. However, here (as often elsewhere) it is not taken into account, that deep learning is most useful in situations in which there are massive amounts of training data — which is not the case here. There are nine input features and there are 1636 data items (1122 for training and 514 for validation). Hence, the data is not really massive and because the amount of interactions is limited (only nine input variables), its is quite likely that more traditional machine learning methods would work well (e.g., ordinary linear regression could be used to provide a baseline (and could even suffice), then one could see how e.g., support vector machine or random forest would work). In the paper, the use of deep neural networks is argued by them being more useful than traditional models, because they are able to handle large amounts of data. For the data used, there is no reason to assume that it could not be handled using also some of the traditional methods, in particular, when the data is small, more complicated models are quite prone to overfitting.

Suggestion for improvement 1: Test different ML learning models to be able to evaluate properly the usability of the suggested model.

My second concern is the feature selection or the lack of it. The model blindly uses the nine input variables from the data. This kind of "taking an ML model off-the-shelf" very rarely produces the best possible results and can seriously affect the performance of the model. In addition to feature selection, it might be also possible to compute some surrogate features, e.g., provide information about dependencies in the modelling domain, reducing the need for the ML models to explicitly model these dependencies.

Suggestion for improvement 2: Use feature selection (for all the models) to search for a best possible set of input features.

Finally, the testing of the model using data from April 2019, shows some of the limitations of the developed model. It seems that there is an occurrence of concept drift (when the distribution of data changes, the model does not work well anymore). Also, the error might increase due to overfitting of the model. This aspect should be studied further, in particular it would be important to be able to provide the region in which the model’s accuracy is on an acceptable level. There is a rich body of literature in detecting concept drift (for a survey, e.g., see Zliobaite I., Pechenizkiy M., Gama J. (2016) An Overview of Concept Drift Applications. In: Japkowicz N., Stefanowski J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_4).

Suggestion for improvement 3: Analyse the region in which the proposed model can be expected to work, at least provide some discussion on the effect of overfitting and concept drift and how theses affect the usability of the model.

Based on these observations, I would reject the paper in its current form, with the encouragement to resubmit, taking the suggestions for improvement into account.

Citation: https://doi.org/10.5194/gmd-2021-347-RC2
- AC2: 'Reply on RC2', Junsu Gil, 02 Apr 2022
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2021-347/gmd-2021-347-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2021-347-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Junsu Gil on behalf of the Authors (02 Apr 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Apr 2022) by Leena Järvi

RR by Anonymous Referee #2 (10 Jun 2022)

ED: Reconsider after major revisions (23 Jun 2022) by Leena Järvi

AR by Junsu Gil on behalf of the Authors (10 Aug 2022) Author's response Author's tracked changes Manuscript

ED: Reconsider after major revisions (02 Sep 2022) by Leena Järvi

Dear Authors,
I have gone through your response and it does yet answer the concerns of the reviewers. Both reviewers have raised the issue with generalization: how well would the model perform in other conditions. Your validation and testing datasets are very small covering quite narrow range of meteorological conditions. I understand that the amount of data you have is limited but still the concerns raised by the reviewers should be answered before the manuscript can be published in GMD. Now from the manuscript it appears that you have developed deep-learning model to your data but it does not show that you would have developed a model which is applicable elsewhere.

It is good that you have tested the 1-layer ANN model. But in order to answer the reviewer concern you need to test also some simpler ML model(s) and add those to the manuscript. In you response you show comparison of you model and ANN, but this is for the training data. In general showing good correspondence with the training data (Figures 5-7 in the manuscript) does not tell how the model performs for "independent" datasets. Thus, after you have conducted additional simulations you should improve the model performance analysis with non-training data with proper scatter plots (similar to Fig A in the referee response) and statistical information (not just MAE and IOP but also RMSE, R). In addition, you need to answer the reviewer comment "Since the idea is to develop a model for others to use (of the shelf), it should be made very precise what are the capabilities and restrictions of using the developed model" by clearly stating the limitations and benefits of the model.

In addition to these, the manuscript should be checked by language services as there are several issues with the language. In addition, at the end of page 7, the sentence on line 204 is unfinished.

Hide

AR by Junsu Gil on behalf of the Authors (22 Oct 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (02 Nov 2022) by Leena Järvi

RR by Anonymous Referee #3 (16 Nov 2022)

RR by Anonymous Referee #4 (11 Dec 2022)

Suggestions for revision or reasons for rejection

Gil et al present a model based on Deep Neural Networks for estimation of HONO concentrations in urban environments using measurements of classical atmospheric pollutants and meteorological variables as input. Because HONO measurements are hardly available, the authors argue that using estimated HONO as input in photochemical models improves the calculation of the OH production rate and of O3 concentrations. This is an interesting and valuable piece of work, however, there are a few issues that should be resolved before this manuscript can be published in Geoscientific Model Development.

My main concern is the way the performance of the RND v0.1 model is evaluated and the conclusions and recommendations that are drawn from the model performance evaluation. For performance testing, both, the training set and the testing set are used. This is not correct, the training and validation data should only be used for model building, the performance assessment should only be done based on the test data. It is found that the model performance is much better for the period used for training and validation than for the test data. This is of course not surprising and indicates clear limitations of the model (e.g. over-fitting). The model performance assessment needs to be changed accordingly.

The model performance was particularly poor for the test data from April 2029. The authors explain this by the fact that the conditions during April 2019 were different from the conditions covered by the training data. This points to another important aspect that is entirely neglected in the current manuscript: What are the conditions the RND v0.1 model can be applied with a performance as determined? What happens when the model is applied to conditions that are not covered by the training data (model applied to meteorological conditions and/or atmospheric pollutant concentrations outside the range covered in the training data)? It is very likely that applications of the proposed DNN model at other locations and during other times of the year will face this situation. It is necessary that this issue is addressed.

The authors say in the abstract and in the introduction section that the RND v0.1 model is proposed for calculation of HONO mixing ratios in highly polluted urban environments. In the results section, the model is described as being fit for application in any urban area (page 6, line 172). The conditions (in terms of air pollutant concentrations) where RND v0.1 can be applied should be made more clear.

The paper is generally well written, however, there are rather many small linguistic errors such as missing articles (e.g. page 3, line 84; pg. 5, lines 139 and 140; page 6 line 159) and wrong grammar (e.g. should consequently be "training and validation" instead of "train and validation", and also often "testing" instead of "test". The manuscript should again be carefully checked and corrected.

Other comments:
Page 2, line 54-56: The authors write about "the" model and "this underestimation". It is unclear what model is meant, it seems that it is referred to photochemical models in general. Please make this clear and revise accordingly.
Page 3, line 70, should be "including data collection" instead of "including collecting data".
Page 4, line 95-97. The 10th and 90th percentile mixing ratios for the input variables are given. It is not mentioned what the time basis of these values are, are these hourly or daily values? The temporal resolution should be provided.
Page 4, line 102. Terminology "chemical and meteorological parameters" is not correct here. In the usual convention, the input variables are denoted as "variables" and not as "parameters". The parameters are their weights in a statistical model. Please change.
Page 4, lines 105-107. The authors write that wind direction "should" be converted and there "should" be no missing values. From the text it seems clear that the authors have converted the measured wind direction and they have removed observations with missing values. I think the authors should rephrase the text so that it is clear what data conversion and selection steps have been done.
Page 4, equation 1. I stumbled over the notation F1 and F2. It seems that these are simply the observed min and max of variable x. Why not denoting F1 and F2 as x_min and x_max? Would probably be more clear.

Hide

ED: Reconsider after major revisions (21 Dec 2022) by Leena Järvi

AR by Junsu Gil on behalf of the Authors (03 Feb 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (07 Feb 2023) by Leena Järvi

RR by Anonymous Referee #4 (01 Jun 2023)

ED: Publish subject to minor revisions (review by editor) (26 Jun 2023) by Leena Järvi

AR by Junsu Gil on behalf of the Authors (29 Jun 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (08 Aug 2023) by Leena Järvi

AR by Junsu Gil on behalf of the Authors (09 Aug 2023)

Short summary

In this study, the framework for calculating reactive nitrogen species using a deep neural network (RND) was developed. It works through simple Python codes and provides high-accuracy reactive nitrogen oxide data. In the first version (RNDv1.0), the model calculates the nitrous acid (HONO) in urban areas, which has an important role in producing O₃ and fine aerosol.