the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deep learning applied to CO2 power plant emissions quantification using simulated satellite images
Joffrey Dumont Le Brazidec
Pierre Vanderbecken
Alban Farchi
Grégoire Broquet
Gerrit Kuhlmann
Marc Bocquet
Abstract. The quantification of emissions of greenhouse gases and air pollutants through the inversion of plumes in satellite images remains a complex problem that current methods can only assess with significant uncertainties. The anticipated launch of the CO2M satellite constellation in 2026 is expected to provide high-resolution images of CO2 column-averaged mole fractions (XCO2), opening up new possibilities. However, the inversion of future CO2 plumes from CO2M will encounter various obstacles. A challenge is the CO2 plume low signal-to-noise ratio, due to the variability of the background and instrumental errors in satellite measurements. Moreover, uncertainties in the transport and dispersion processes further complicate the inversion task.
To address these challenges, deep learning techniques, such as neural networks, offer promising solutions for retrieving emissions from plumes in XCO2 images. Deep learning models can be trained to identify emissions from plume dynamics simulated using a transport model. It then becomes possible to extract relevant information from new plumes and predict their emissions.
In this paper, we employ convolutional neural networks (CNN) to estimate the emission fluxes from a plume in a pseudo XCO2 image. Our dataset used to train and test such methods includes pseudo images based on simulations of hourly XCO2, NO2 and wind fields near various power plants in Eastern Germany, tracing plumes from anthropogenic and biogenic sources. CNN models are trained to predict emissions from three power plants that exhibit diverse characteristics. The power plants used to assess the deep learning model's performance are not used to train the model. We find that the CNN model outperforms state of the art plume inversion approaches, achieving highly accurate results with an absolute error about half of that of the cross-sectional flux method. Furthermore, we show that our estimations are only slightly affected by the absence of NO2 fields or a detection mechanism as additional information. Finally, interpretability techniques applied to our models confirm that the CNN automatically learns to identify the XCO2 plume and to assess emissions from the plume concentrations. These promising results suggest a high potential of CNNs in estimating local CO2 emissions from satellite images.
- Preprint
(1652 KB) - Metadata XML
- BibTeX
- EndNote
Joffrey Dumont Le Brazidec et al.
Status: final response (author comments only)
-
RC1: 'A useful exploratory paper that would benefit from more caveats', Evan D. Sherwin, 23 Aug 2023
This paper uses simulated power plant emissions data, generated via the COSMO-GHG model as part of the SMARTCARB project, to simulate carbon dioxide and nitrogen dioxide retrieval via the Copernicus CO2 Monitoring (CO2M) satellite. Developing algorithms for the CO2M satellite is valuable, as the satellite itself will not launch until 2026, limiting possibilities for algorithm development using non-simulated data.
The authors focus on the task of quantification, rather than detection, of CO2 emissions from power plants with a known location. The type of power plant is not specified, but presumably they use coal or natural gas.
The authors use simulated data from eight power plants, as well as the city of Berlin as the basis for their CO2 quantification efforts, based primarily on a convolutional neural network (CNN) machine learning approach.
For each of three selected power plants (Lippendorf, Turow, and Boxberg), the authors train and validate a bespoke version of their CNN model on all power plants but the selected one, which is used as a test dataset.
The authors compare quantification error metrics for the baseline CNN with two alternate CNN specifications including the NO2 field and a segmentation map, respectively, as well as with what the authors claim is a standard application of a cross-sectional flux method.
The authors apply two interpretability analyses based on analysis of pixel gradients and feature permutation. The results suggest that the CNN is indeed primarily focusing on CO2 emissions from the desired power plant.
While acknowledging some of the limitations associated with this simulated data approach, the authors conclude that a CNN-based approach is promising for CO2 quantification with the CO2M satellite once it launches.
This paper is a valuable exercise and indeed provides suggestive evidence that a computer vision-based approach such as a CNN can be valuable in CO2 quantification with satellites such as CO2M.
However, the approach employed in this paper has several limitations that should be more clearly addressed before it is published.
1. The train/validation/test approach taken by the authors does not include a true test set. In standard machine learning, a model is trained and all aspects (including network architecture and hyperparameters) are validated and finalized before any version of the model sees the test set. In the approach employed by the authors, a version of the model is trained and validated on all but the selected power plant, and is then tested on that power plant. However, the fact that this process is repeated at least three times means that any hyperparameter tuning that takes place for the model for the first power plant will translate over to all subsequent models.
The authors should clearly acknowledge this limitation and clarify that future work with truly held-out test sets is needed to validate the true performance of such models.
On a related note, it would be valuable to more clearly situate this work in the computer vision remote sensing literature. This is a huge field, one of the most active at the intersection of climate change and AI. How has this work learned from the prior body of accumulated knowledge in this field? What is new in this particular paper? Are there methodological innovations, or does the novelty come solely from the application to CO2M-like data?
2. The authors train a new, bespoke model for each of the three power plants they focus on in this study. In some cases, they even alter the training dataset to only include other power plants that have emissions similar in magnitude to what they know emissions from the test power plant are. Furthermore, it appears that the test power plant is always in the center of the scene.
Are the authors proposing that a new CNN be trained for every potential CO2 source targeted by CO2M? This seems very inefficient and prone to overfitting. It should be possible to train a generalizable model that both detects and quantifies CO2 emissions at a wide variety of sites. If the authors think this is not the case, they should state this very clearly and explain the rationale for source-specific models, including how they plan to get true emission rate values for all sources for which they plan to train a model.
Many power plants in the United States and I think across Europe have continuous emissions monitoring systems that provide ground-truth data. See Cusworth et al. 2021 for more detail 1. The main value provided by CO2 remote sensing is for CO2 sources that do not have this sort of ground truth.
3. The authors acknowledge this to a certain extent, but the structure of the simulated data are likely quite different from true CO2 emissions data.
True background noise in greenhouse gas remote sensing is generally not purely Gaussian, but includes significant surface artifacts due to highly reflective/absorptive surface features, as illustrated in Zhang et al. 2023 4. A model trained only on simulated Gaussian noise may experience difficulty when given more realistic data.
Probably more importantly, the authors use simulated wind data from ERA-5 that appears to be much more uniform and less turbulent than true wind fields. All the plumes shown in this paper appear to be more or less Gaussian, with a little variability in direction (presumably caused by slow changes in wind speed over time).
4. This raises the related question of limited references to the previous literature in CO2/greenhouse gas remote sensing.
CO2M will not be the first CO2 remote sensing instrument. For example, Cusworth et al. 2021 use the PRISMA satellite and the AVIRIS-NG aircraft (based on a spectrometer very similar to the upcoming Carbon Mapper satellite constellation) to detect CO2 emissions from power plants 1. As you can see from the emissions detected in this paper, they are not really following neat Gaussian plume-style dispersion.
More accurately capturing realistic emission shapes will likely require large-eddy simulation, e.g. the approach employed in Gorroño et al. 2023 in the context of satellite-based methane sensing 2.
The paper mentions the OCO-2 and OCO-3 satellites as already doing CO2 monitoring, but does not include a clear assessment of their CO2 quantification capabilities. How much of an advance would we expect CO2M to be?
It would also be valuable to situate this work in the context of other remote sensing-based GHG monitoring initiatives, such as Climate TRACE 3.
GHGSat also has targeted CO2 detection capabilities: https://www.ghgsat.com/en/newsroom/ghgsat-to-launch-worlds-first-commercial-co2-satellite/
5. It is difficult to tell how the cross-sectional flux algorithm was implemented and how representative it is of the current standard of practice in the field. Please explain this more clearly, including any ways in which your implementation differs from current standard practice for this method.
6. What do your error metrics mean? Figure 6 and similar figures have no negative values. Are these simply reporting error magnitudes, or are there no underestimates?
Absolute value of % error is not a very informative metric here, as an error -99% is quite different than an error of +99%.
It would be better to include negative values in error distributions. It would also be useful to compare error metrics to errors achieved in past studies, e.g. Cusworth et al. 2021 and Zhang et al. 2023 1,4. It might also be valuable to compare your error metrics with those achieved in satellite-based remote sensing of methane, e.g. Sherwin et al. 2023 5, but this is not necessary.
7. This study does not appear to include many/any zeroes (instances with zero CO2 flux from the target source). This is a significant limitation and its implications should be discussed in more detail.
While this study does not focus on detection, it is presumably still possible to have a false positive (i.e. to estimate nonzero emissions when the power plant is not emitting).
This issue is related to the question of class imbalance that the authors note. Class imbalance is very common in computer vision, and a balanced training dataset is not always possible, especially for models aiming to detect/quantify features of multiple sizes. Enforcing an artificially balanced dataset could easily lead to this type of effect if one is not careful.
To help clarify these points, I recommend including more detailed summary statistics and/or full time series trend plots of power plant emission rates in the supplementary information.
Furthermore, re-training the Boxberg model on a more representative training dataset (presumably after the main Boxberg model had already seen the test data) means that this latter model in particular really does not have a test set in the true sense of the word. This highlights the exploratory nature of this work, which is still valuable, but requires additional testing on independent data to claim external validity.
Smaller points
- For simulated satellite images, please include a length scale in kilometers or other appropriate units of distance.
- Why are hourly emission rates reported in MtCO2/yr instead of an hourly unit?
- Figure 2: Are the results shown here from training, test, or validation data?
- Suggest “Targetted” -> “Targeted”
- L183: When the model takes 4-5 images as input, are these representative of 4-5 separate satellite overpasses? If so, are they from different simulated times (presumably with different emission rates in each)? This should be clarified.
- L287: What is the meaning of the “Precisely” in “Precisely, the segmentation model does not discriminate between plume pixels with high amplitude and those with low amplitude.”
- The results section would be easier to read with a single multi-panel figure for the three power plants side by side, and with one big table instead of one for each power plant. The surrounding text could also be consolidated to be less repetitive.
- Figure 10: Why does only Boxberg have a residual density plot? Why does this plot have negative residuals when none of the other density plots do?
- In the overfitting section, please include citations about overfitting in ML remote sensing models to support your points here.
- Figure 11: Please include a sentence in the figure caption explaining how it suggests overfitting (presumably the fact that validation error decreases monotonically with number of epochs, while test error does not).
- Also, make sure to explain what “None”, “Segmentation”, and “NO2” mean in the figure caption.
- Also, why are there epochs for the test set? Were the authors applying archived versions of the model from each epoch to the test dataset and computing error?
- L360: If you do not include a result in the paper (or at least in the supporting information), then it is best not to reference it in the paper.
- Not really clear to me whether there is improvement on the overfitting front around Figure 12. I may be missing something. Where are the results that suggest this?
- When were hyperparameter values set? Before or after any of the models saw test data?
- Figure 13: The gradient method shows that the model exclusively focuses on the plume in the center of the image. This seems like a sign that the model was able to pick up on structured elements of the data it is given, which will not necessarily be present in real satellite data.
- Table 6: Please explain in the figure caption what “Seg.” Means. What are the units of numbers in this table? Are these percent error? What do the colors mean? Simply saying in the caption that the colors will be discussed later is fairly confusing to the reader. Please explain in the figure caption what “Fourth feature” means.
- In the feature permutation analysis section, please include citations to other studies that do this, or to the method itself.
- To what extent are the hypotheses listed here supported by the analysis in this paper?
- How were the colors chosen? Was this arbitrary, or was there a clear method developed in advance of the analysis?
- Have previous papers done this sort of color-based analysis before?
- The point about clouds is worth highlighting further. I recommend including some summary statistics of cloudiness in Germany, e.g. from https://earthobservatory.nasa.gov/global-maps/MODAL2_M_CLD_FR.
- L459: Suggest “oppurtunities” -> “opportunities”
- L476: “We demonstrated that the design of a "universal" CNN, trained on a small power plant subset and highly accurate on all of them, is possible.”
- Unless I am missing something, this paper does not do this. As I understand it, the authors train one CNN per target power plant, using other power plants as training and validation data. They appear to use the same network architecture in each case, but these plant-specific models are definitely not a “universal” CNN that is highly accurate on all of them
- Also, it looks like these bespoke models have significant difficulty if the training dataset includes lower-emitting power plants but the test dataset is a higher-emitting power plant.
- L488: “The training dataset for each CNN is restricted to a dataset consisting of all other power plants except their target.” You mean the training and validation datasets, right?
- What does the terrain around the power plants look like? Would be good to include satellite images of the three test power plants studied in the main text (together with their surrounding scenes), perhaps including images of the rest of the plants in the SI.
- Would be good to have numbers in the abstract, e.g. the error profile of the best-performing method
Works cited
- Cusworth, D. H. et al. Quantifying Global Power Plant Carbon Dioxide Emissions With Imaging Spectroscopy. AGU Advances 2, (2021).
- Gorroño, J., Varon, D. J., Irakulis-Loitxate, I. & Guanter, L. Understanding the potential of Sentinel-2 for monitoring methane point emissions. Atmos. Meas. Tech. 16, 89–107 (2023).
- Couture, H. D. et al. Towards Tracking the Emissions of Every Power Plant on the Planet. in Tackling Climate Change with Machine Learning 9 (2020).
- Zhang, Z., Cusworth, D. H., Ayasse, A., Sherwin, E. D. & Brandt, A. R. Measuring carbon dioxide emissions from liquefied natural gas (LNG) terminals with imaging spectroscopy. https://essopenarchive.org/users/650760/articles/659095-measuring-carbon-dioxide-emissions-from-liquefied-natural-gas-lng-terminals-with-imaging-spectroscopy?commit=6c8be68c4b178d28722660acda294a75d341cd11 (2023) doi:10.22541/essoar.169160359.98787752/v1.
- Sherwin, E. D. et al. Single-blind validation of space-based point-source detection and quantification of onshore methane emissions. Sci Rep 13, 3836 (2023).
Citation: https://doi.org/10.5194/gmd-2023-142-RC1 -
RC2: 'Comment on gmd-2023-142', Anonymous Referee #2, 26 Aug 2023
The authors present a method to constrain power plant's CO2 emissions with a deep learning model. The model was trained and tested with simulated CO2 concentrations from SMARTCARB simulations. The analysis demonstrated the superior performance of the deep model to contain the low signal-to-noise ratio issue and is significantly better than the traditional CSF method. The topic is interesting, and the method is helpful for better inversion of CO2 emission in the future. However, there are still issues that need to be addressed before the paper can be published, particularly, an extended comparative analysis is suggested to provide a better understanding of the potential advantages and limitations of this method.
Comments:
- Line 42, the authors indicate that the purpose of this study is to address the second and third problems in the CO2 inversions: 2) the low signal-to-noise ratio and 3) the uncertainty in the transport and dispersion processes. I agree that the results demonstrated the remarkable ability to address the low signal-to-noise ratio issue, however, it is unclear how this analysis can address the issue of uncertainty in the transport and dispersion processes because the training and test are both based on SMARTCARB simulations by assuming no systematic errors in simulations.
- Sections 5.1-5.3: Here the model performance is demonstrated for Lippendorf, Turow, and Boxberg individually. It is suggested to demonstrate the performances of all PPs, including the city of Berlin, and provide a comparative discussion for the model performance over these PPs to investigate the possible consistency and discrepancy, which can provide a better understanding of the potential advantages and limitations of this method.
- Sections 5.1.1, 5.2.1 and 5.3.1 show the model performances, while Section 5.1.2 shows the effect of segmentation and NO2 fields and Section 5.3.2 shows overfitting investigation. The organization of these sections is orderless and needs to be improved.
Technical Comments:
- Abstract: The abbreviations, such as CO2M, CO2 and NO2, should be defined.
- It could be better to list the first, second and third problems more clearly, for example, 1); 2) and 3), otherwise, readers have to check Lines 34-41 carefully to determine which problems are the second and third.
- Figure 2: How is the targeted plume obtained?
- Line 131: The title of Section 4 should be “Deep learning method for the inversion of XCO2”.
- Table 1: It would be better to have a map to show the locations of these PPs.
- Fig 3: The model input features are the XCO2 image, u-wind, v-wind and additional NO2 field or segregation model output contour. Figure 3 is not fully drawn and may be misinterpreted by the reader as inputting only the four features in the figure.
- Line 159-164: How was the range of these factors determined?
- Line 169: What kind of standardization is used?
- It could be better to provide a brief explanation or definition for the kernel density (e.g., Fig. 6).
Citation: https://doi.org/10.5194/gmd-2023-142-RC2
Joffrey Dumont Le Brazidec et al.
Joffrey Dumont Le Brazidec et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
248 | 115 | 12 | 375 | 6 | 5 |
- HTML: 248
- PDF: 115
- XML: 12
- Total: 375
- BibTeX: 6
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1