the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Optimized Dynamic Mode Decomposition for Reconstruction and Forecasting of Atmospheric Chemistry Data
Abstract. We introduce the optimized dynamic mode decomposition algorithm for constructing an adaptive and computationally efficient reduced order model and forecasting tool for global atmospheric chemistry dynamics. By exploiting a low-dimensional set of global spatio-temporal modes, interpretable characterizations of the underlying spatial and temporal scales can be computed. Forecasting is also achieved with a linear model that uses a linear superposition of the dominant spatio-temporal features. The DMD method is demonstrated on three months of global chemistry dynamics data, showing its significant performance in computational speed and interpretability. We show that the presented decomposition method successfully extracts known major features of atmospheric chemistry, such as summertime surface pollution and biomass burning activities. Moreover, the DMD algorithm allows for rapid reconstruction of the underlying linear model, which can then easily accommodate non-stationary data and changes in the dynamics.
- Preprint
(8869 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2024-77', Anonymous Referee #1, 21 Oct 2024
It is always exciting to see DMD variants applied to datasets in earth systems science. This paper considers one such application and sets the stage for more detailed analyses of such datasets using OptDMD and BOPDMD. The authors begin with a small example of atmospheric chemistry data that emphasizes the superiority of OptDMD over exact DMD. Then, OptDMD is used to reconstruct and forecast atmospheric chemistry data and BOP-DMD is used to provide uncertainty quantifications and other forecasts. These forecasts, along with their errors appear stable over the prediction windows. They also use the dynamic modes detected from OptDMD to visualize the spatial modes of these data.
One of my general concerns is a lack of scientific interpretation of the results. Some of my questions include: why choose only certain chemical species, latitudes, longitudes, and elevations for these experiments? Can you provide a more in-depth interpretation of the spatial modes?
Further comments are below:
- On lines 6-7 authors claim that their method “successfully extracts known major features of atmospheric chemistry, such as summertime surface pollution and biomass burning activities.” But, the only other time “pollution” is mentioned is on lines 339-343, where the authors state that these methods have “the potential to produce a reliable estimate of business-as-usual patterns.” I may have missed something, but I couldn’t find something in the results where the authors specifically point out the detection of summertime surface pollution and biomass-burning activities.
- Line 59 “highlighted” should be “hightlights”
- Can the authors elaborate on why they choose to only consider eigenvalues with a non-positive real part other than the fact that it produces “accurate eigenvalues and high-fidelity stable and robust forecasts?” Does this choice to restrict the eigenvalues to align with some physical assumption in atmospheric chemistry?
- Figure 1. Should $x_m$ be bold-face? Can you be more specific in Figure 1 with what exactly “the data $\mathbf{x}(t_k)$” represents in terms of the global chemistry discretization? I imagine this changes per experiment, but maybe you can be more specific.
- Line 87 shouldn’t “unpractical” be “impractical”?
- Line 123 what does elevation = 1 mean physically? Are these units here?
- Lines 130-133, 207, 251, 339, please use ` for the left quotation mark
- Lines 133-134 Should the sentence “On the bottom right … “ belong in the figure caption
- Figure 2- Why did the authors choose latitude 30? A space is missing between the meridian and (Lon=...
- Lines 160-163 is there any physical interpretation why this restriction of eigenvalues makes sense? Could this have something to do with energy conservation?
- Line 177 and 184 the reference to algorithm 1 is missing.
- Figures 3, 5, 6, 10, 11, 12, 13, 15, 16 Please label the rows of the figures.
- Lines 254-255 can you specify exactly how many modes were used for each method in a table?
- Line 259 remove the degree sign from 12
- Line 264-265 Why does it make sense to have “high-variance features at the coastlines and within hot spots in the land?” Has this been seen before in prior work? If so, please provide a citation.
- Figures 15,16 what does i,j represent in $\langle _{i,j}^2 \rangle$?
- Line 269-271 why is the analysis restricted to only 6 longitudes, 1 latitude, and 1 elevation? Which longitude is used? Also, a space is missing between surface and (elevation=1). Again, what does elevation=1 mean physically? What are the units?
- Figure 7, 14, 15, 16 shouldn't Tend be TEND
- Figure 8, 9 what do the subscripts i,j in $\phi_{i,j}$ represent? Also, shouldn't it be “shows” instead of “show?”
- Figures 10-13 does it make more sense to plot the MSE and correlation between STARTData and the OptDMD reconstructions/predictions at each time step rather than the raw signals? By doing this, one could compress these four figures into one or two figures for easier comparison of the results across different chemical species and CONC vs TEND predictions.
- Figures 15-16 What is the advantage of plotting the uncertainty quantification of these eigenvalues? What does this tell us about the model and the resulting predictions? Does this provide us with any insight into the physical processes?
There are certainly other type-os that I have missed. Please take a very careful pass through the manuscript to check for any other type-os before submitting revisions.
Citation: https://doi.org/10.5194/gmd-2024-77-RC1 -
RC2: 'Comment on gmd-2024-77', Narendra Ojha, 12 Nov 2024
Atmospheric chemistry simulations over large spatial regions are computationally expensive and the new developments of alternative modeling approaches, such as statistical and AI/ML are increasing. In this regard, Velagar et al. have presented a comprehensive study exploring the application of Dynamic Mode Decomposition for simulating spatio-temporal variability in chemical species. Out of several experiments, they suggest that optimized DMD with constraint can be a better model for atmospheric chemistry application. While the study is novel, detailed, and in-general well written, some comments need to be addressed before publication, as listed below.
The DMD approach is explored to reduce computation speed. If possible, provide some estimation on how much time this method is taking for the considered periods of training / simulations. How it is varying with complexity (number of chemical species, etc.). Authors propose interpretability as an additional advantage (In abstract). This should be elaborated (and compared with other modeling approaches) in the discussions. I am not quite sure how this approach has more interpretability than for example machine-learning approaches which also provide relative significance of different features in governing variability?
l.15: 5-dimensional data! Would not it be better to call 4-D data, variability in chemical species are 4-D data, e.g., O3(lon, lat, alt, time)?
l.52-55: “these limitations make their use in global atmospheric modeling problematic.”. The machine learning approaches for atmospheric modeling are still in developmental stages with several successful applications (e.g., Arcomano et al., GRL, 2020; Kochkov et al., Nature, 2024). Is your statement inclined more towards atmospheric chemistry? The discussion may be put in better way and may be supported with references (or this sentence may be removed).
l.73-74: “Understanding the composition of atmosphere……(Jacob, 1999).” This is a general introductory text and may be moved somewhere in introduction or may be deleted since similar information and citation has appeared already.
l.105: Model output was for 1 year. Why later on about 40/60 days’ data have been used for training and validation, to save computation time? How does length of data impact the performance?
l.112: nfeatures = 143 + 91 + 3 + 143 = 380; mention what each number is; (143 is given to be chemical species, what about other numbers here)
l.117: define abbreviation SVD at first usage
l.133: “turn-off of dynamics during night times”. Do you mean “variability due to photo-chemistry is absent in night”. If so, you may re-phrase that way so not to confuse reader with atmospheric dynamics (that is active in night also!).
l.157-158: Yes, optDMD is clearly better but is there a more quantitative evaluation of the “time evolution” shown in the figure? Is the performance constant with time? How long in time (beyond shown), performance may sustain?
Page 10: Few question marks are appearing after Algorithm. Check and refer correctly. Several places citations are also having some issues which may be proof-read/ corrected.
l.210: Correct “Nitrous Oxide” to “Nitric Oxide” and “Nitrous dioxide” to “Nitrogen dioxide”
l.220: “The results are consistent for all chemical species”. What exactly this statement means? Is Fig 5 being refereed to? Describe rationale behind choosing ranks (r=25 for CONC and r=50 for TEND)?
A scope of improvement is that several figures on results (Fig 7, 8, 9) have limited discussion and that too is qualitative. Explain what authors observe in relative errors (Fig 7) and how they decide on number of modes. Fig 8, 9: are you comparing these maps with GEOS-Chem based maps for evaluation? Add some quantitative discussion in line with major results.
l.276-278: missing spikes are attributed to selection of fewest modes possible. What is the rationale behind not taking few more modes and trying to improve model performance. Air quality spikes are often of significant interests.
Fig. 10: the legend STARTData does not seem to be defined in manuscript. Please check.
If possible do include results on ozone (O3) also, similar to that done for CO, NO (Fig 8-9). Ozone is affected by photochemistry as well as transport (and is precursor of OH). Getting O3 distributions right could be very useful.References: First reference “Global modeling of tropospheric…….” Is incomplete, please add names of authors.
References
Arcomano, T., Szunyogh, I., Pathak, J., Wikner, A., Hunt, B. R., & Ott, E. (2020). A machine learning-based global atmospheric forecast model. Geophysical Research Letters, 47, e2020GL087776. https://doi.org/10.1029/2020GL087776Kochkov, D., Yuval, J., Langmore, I. et al. Neural general circulation models for weather and climate. Nature 632, 1060–1066 (2024). https://doi.org/10.1038/s41586-024-07744-y
Citation: https://doi.org/10.5194/gmd-2024-77-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
165 | 37 | 158 | 360 | 4 | 4 |
- HTML: 165
- PDF: 37
- XML: 158
- Total: 360
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1