the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Neural networks for data assimilation of surface and upper-air data in Rio de Janeiro
Abstract. The practical feasibility of neural networks models for data assimilation using local observations data in the WRF model for the Rio de Janeiro metropolitan region in Brazil is evaluated. Surface and multi-level variables retrieved from airport meteorological stations are used: air temperature, relative humidity, and wind (speed and direction). Also, 6-hour forecast from WRF high-resolution simulations are used – domain centered in the Rio de Janeiro city with nested grids of 8 and 2.6 km. Periods of 168 h from 2015–2019 are used with 6 h and 12 h assimilation cycles for surface and upper-air data, respectively, applied to 6-hour forecast fields. The observed data (interpolated to grid points close to airport locations and influence computed in its surroundings) and short-range forecasts are used as input for training model and the 3D-Var analysis on 6-hour forecast fields for each grid point is used as target variable. The neural network models are built using two different approaches: WEKA mul- tilayer perceptron model and TensorFlow’s deep learning implementation. The year of 2019 is used as an independent dataset for forecast validation from the trained models. Results employing 6-hour forecast fields with neural network models are able to emulate the 3D-Var results for surface and multi-level variables, with better results for the NN-TensoFlow implementation. The main result refers to CPU time reduction enabled by the neural networks models, reducing the data assimilation CPU-time by 121 times and 25 times for NN-TensorFlow and NN-WEKA, respectively, in comparison to the 3D-Var method under the same hardware configurations.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(1399 KB)
Interactive discussion
Status: closed
-
CEC1: 'Comment on gmd-2022-50', Juan Antonio Añel, 25 Oct 2022
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlFirst, your manuscript does not contain the mandatory Code Availability section. You have included some information about WEKA and TensorFlow inline in the text, but it should not be there, but in the specific section required at the end of the manuscript. Also, in the text, you point out web pages that are not trustable repositories for scientific archival. In this way, you must publish the WEKA and TensorFlow codes in a new repository, one from our list of suitable ones. The same applies to WRF. You must indicate too the specific version of the model that you use.
Second, all the repositories that you provide in your current Data Availability section do not comply with our requirements. It is especially striking in the case of GitHub. GitHub is not a suitable repository, and it instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo (which you can create directly from GitHub).
Therefore, please, publish all the software used in your manuscript in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage. Also, please, include the relevant primary input/output data. In this way, you must reply to this comment with the DOI and link for those repositories so that they are available during the Discussions stage (as requested). Moreover, please, include in any potential reviewed version of your manuscript the modified 'Code and Data Availability' section with the requested information.
Please, be aware that failing to comply promptly with this request will result in rejecting your manuscript for publication.
Regards,
Juan A. Añel
Geosci. Model Dev. Exec. Editor
Citation: https://doi.org/10.5194/gmd-2022-50-CEC1 -
AC1: 'Reply on CEC1', Vinícius Almeida, 31 Dec 2022
Dear Editor,
Ref.: https://gmd.copernicus.org/preprints/gmd-2022-50/
Ref.: https://doi.org/10.5194/gmd-2022-50A "Code Availability" section was included after the "Conclusions" section. (revised manuscript)
It is important to point out that in July/22 data was migrated from Github (https://github.com/aa-vinicius/data-assimilation-nn) to Zenodo (https://doi.org/10.5281/zenodo. 6806170) in order to comply with the journal's rules.
This modification had already been made and sent to
Polina Shvedko <polina.shvedko@copernicus.org> on 11/Jul.
I look forward to further observations.
Best Regards,
Vinícius.Citation: https://doi.org/10.5194/gmd-2022-50-AC1
-
AC1: 'Reply on CEC1', Vinícius Almeida, 31 Dec 2022
-
RC1: 'Comment on gmd-2022-50', Anonymous Referee #1, 21 Nov 2022
The motivation to replace data assimilation with neural network is attractive. Application of assimilation did demand high computational costs, e.g., forwarding ensemble members in EnKF, maintaining the adjoint and optimization in 4DVar. In this work, simple MLP models are tested to replace a 3DVar assimilation in a relatively small city region with limited number of observations. I would suggest authors to make substantial modification before submitting it again.
Major ones:
- Assimilation like 4D-Var or EnKF did requires huge computation efforts. However, the 3D-Var calculation complexity is proportion to the size of model or observations, it is usually trivial as illustrated in Table 4 (several seconds). Even handling models with larger size or with super data like remote sensing obsers, the issue could be solved through regional analysis easily. The choice of 3D-Var is faint to support the motivation.
- In Figure 3 and 4: The author provides very limited samples or snapshots of analysis for testing their trained NN model, without stating the overall performance in the whole testing dataset.
- Page 9, line 206: only 5 airport measurements are assimilated for analysis. Meanwhile, these same data are used for generation of pseudo-observation for validating the analysis? That is not the corrected way to using the measurements. Crossing validation is required. Please Check Ref: Peter Rayner. Data assimilation using an ensemble of models: a hierarchical approach., 2020, ACP.
- In Table 3, NN-TensorFlow outperforms the 3D-Var? It is not solid, afterall, 3D-Var analysis is the learning object of NN? Performance should be examined in-depth.
Minor:
As long as they described the CPU time for assimilation in 3D-Var, NN-TF, NN-Weka in Table 4. It is essential to illustrate the size of the problem, vec x and y in Eq(1), and the solver/environment for 3D-Var and NN. Otherwise, the comparison is unfair.
How to train the NN is unclear, what is the output actually? the analysis over the whole model domain? Or is it trained grid by grid? How many samples in their 4-year dataset?
Citation: https://doi.org/10.5194/gmd-2022-50-RC1 -
AC2: 'Reply on RC1', Vinícius Almeida, 12 Jan 2023
Dear Sir/Madam,
The authors would like to thank the editors for comments/suggestions/corrections, helping to improve the present version of the paper. We have carefully revised the manuscript.
Parts of the text were rewritted and reorganized. Attached, we present a document with point-by-point answers for all questions.
-
RC2: 'Comment on gmd-2022-50', Anonymous Referee #2, 03 Jan 2023
I liked this paper, easy to read and understand. I have given this major corrections just due to my major review questions. If they are easy to answer (which I imagine they are), then it's not really anything major at all, but if they are not then it's kind of major, so I put major just to be safe. See my attached reviews.
-
AC3: 'Reply on RC2', Vinícius Almeida, 12 Jan 2023
Dear Sir/Madam,
The authors would like to thank the editors for comments/suggestions/corrections, helping to improve the present version of the paper. We have carefully revised the manuscript.
Parts of the text were rewritted and reorganized. Attached, we present a document with point-by-point answers for all questions.
-
AC3: 'Reply on RC2', Vinícius Almeida, 12 Jan 2023
Interactive discussion
Status: closed
-
CEC1: 'Comment on gmd-2022-50', Juan Antonio Añel, 25 Oct 2022
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlFirst, your manuscript does not contain the mandatory Code Availability section. You have included some information about WEKA and TensorFlow inline in the text, but it should not be there, but in the specific section required at the end of the manuscript. Also, in the text, you point out web pages that are not trustable repositories for scientific archival. In this way, you must publish the WEKA and TensorFlow codes in a new repository, one from our list of suitable ones. The same applies to WRF. You must indicate too the specific version of the model that you use.
Second, all the repositories that you provide in your current Data Availability section do not comply with our requirements. It is especially striking in the case of GitHub. GitHub is not a suitable repository, and it instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo (which you can create directly from GitHub).
Therefore, please, publish all the software used in your manuscript in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage. Also, please, include the relevant primary input/output data. In this way, you must reply to this comment with the DOI and link for those repositories so that they are available during the Discussions stage (as requested). Moreover, please, include in any potential reviewed version of your manuscript the modified 'Code and Data Availability' section with the requested information.
Please, be aware that failing to comply promptly with this request will result in rejecting your manuscript for publication.
Regards,
Juan A. Añel
Geosci. Model Dev. Exec. Editor
Citation: https://doi.org/10.5194/gmd-2022-50-CEC1 -
AC1: 'Reply on CEC1', Vinícius Almeida, 31 Dec 2022
Dear Editor,
Ref.: https://gmd.copernicus.org/preprints/gmd-2022-50/
Ref.: https://doi.org/10.5194/gmd-2022-50A "Code Availability" section was included after the "Conclusions" section. (revised manuscript)
It is important to point out that in July/22 data was migrated from Github (https://github.com/aa-vinicius/data-assimilation-nn) to Zenodo (https://doi.org/10.5281/zenodo. 6806170) in order to comply with the journal's rules.
This modification had already been made and sent to
Polina Shvedko <polina.shvedko@copernicus.org> on 11/Jul.
I look forward to further observations.
Best Regards,
Vinícius.Citation: https://doi.org/10.5194/gmd-2022-50-AC1
-
AC1: 'Reply on CEC1', Vinícius Almeida, 31 Dec 2022
-
RC1: 'Comment on gmd-2022-50', Anonymous Referee #1, 21 Nov 2022
The motivation to replace data assimilation with neural network is attractive. Application of assimilation did demand high computational costs, e.g., forwarding ensemble members in EnKF, maintaining the adjoint and optimization in 4DVar. In this work, simple MLP models are tested to replace a 3DVar assimilation in a relatively small city region with limited number of observations. I would suggest authors to make substantial modification before submitting it again.
Major ones:
- Assimilation like 4D-Var or EnKF did requires huge computation efforts. However, the 3D-Var calculation complexity is proportion to the size of model or observations, it is usually trivial as illustrated in Table 4 (several seconds). Even handling models with larger size or with super data like remote sensing obsers, the issue could be solved through regional analysis easily. The choice of 3D-Var is faint to support the motivation.
- In Figure 3 and 4: The author provides very limited samples or snapshots of analysis for testing their trained NN model, without stating the overall performance in the whole testing dataset.
- Page 9, line 206: only 5 airport measurements are assimilated for analysis. Meanwhile, these same data are used for generation of pseudo-observation for validating the analysis? That is not the corrected way to using the measurements. Crossing validation is required. Please Check Ref: Peter Rayner. Data assimilation using an ensemble of models: a hierarchical approach., 2020, ACP.
- In Table 3, NN-TensorFlow outperforms the 3D-Var? It is not solid, afterall, 3D-Var analysis is the learning object of NN? Performance should be examined in-depth.
Minor:
As long as they described the CPU time for assimilation in 3D-Var, NN-TF, NN-Weka in Table 4. It is essential to illustrate the size of the problem, vec x and y in Eq(1), and the solver/environment for 3D-Var and NN. Otherwise, the comparison is unfair.
How to train the NN is unclear, what is the output actually? the analysis over the whole model domain? Or is it trained grid by grid? How many samples in their 4-year dataset?
Citation: https://doi.org/10.5194/gmd-2022-50-RC1 -
AC2: 'Reply on RC1', Vinícius Almeida, 12 Jan 2023
Dear Sir/Madam,
The authors would like to thank the editors for comments/suggestions/corrections, helping to improve the present version of the paper. We have carefully revised the manuscript.
Parts of the text were rewritted and reorganized. Attached, we present a document with point-by-point answers for all questions.
-
RC2: 'Comment on gmd-2022-50', Anonymous Referee #2, 03 Jan 2023
I liked this paper, easy to read and understand. I have given this major corrections just due to my major review questions. If they are easy to answer (which I imagine they are), then it's not really anything major at all, but if they are not then it's kind of major, so I put major just to be safe. See my attached reviews.
-
AC3: 'Reply on RC2', Vinícius Almeida, 12 Jan 2023
Dear Sir/Madam,
The authors would like to thank the editors for comments/suggestions/corrections, helping to improve the present version of the paper. We have carefully revised the manuscript.
Parts of the text were rewritted and reorganized. Attached, we present a document with point-by-point answers for all questions.
-
AC3: 'Reply on RC2', Vinícius Almeida, 12 Jan 2023
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
945 | 344 | 49 | 1,338 | 38 | 45 |
- HTML: 945
- PDF: 344
- XML: 49
- Total: 1,338
- BibTeX: 38
- EndNote: 45
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1