the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GraphFlow v1.0: approximating groundwater contaminant transport with graph-based methods – an application to fault scenario selection
Abstract. Groundwater contaminant transport problems remain challenging with respect to their computing requirements. Thus, it often limits the exploration of conceptual uncertainty, that is mainly related to large scale structural features and due to limited characterization. Here, to facilitate geological conceptual uncertainty exploration, we develop further the use of graph representation for geological models to approximate groundwater flow and transport. We consider a faulted multi-heterogeneous-layer medium to test our approach. The existing rank correlation between shortest path distribution from a contaminant source to the model domain outlet and cumulative mass distribution at the outlet enables to perform scenario selection. The scenario selection approach relies on a metric combining the Jaccard dissimilarity and the Wasserstein distance to compare binary images. Among a set combining eight alternative scenarios, where three faults can either act as a flow barrier or a preferential path, we show that the use of graph-approximations allows to retain or reject scenarios with confidence as well as to estimate the individual probability of a fault to act as a barrier or a path. This methodology framework opens up possibilities to explore more thoroughly conceptual geological uncertainty for processes affected by flow and transport.
- Preprint
(4658 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 07 Dec 2024)
-
RC1: 'Comment on gmd-2024-154', Anonymous Referee #1, 29 Oct 2024
reply
Positive Aspects
- The graph-based approach simplifies complex geological models and reduces the computational costs.
- Distance map provides information about the potential pathways of contaminant transport.
- A new similarity measure used to compare the distance map to the cumulative mass distribution.
General Comments
- The term "groundwater" is often associated with specific subsurface conditions and flow regimes. While the principles of flow and transport in porous media can be applied to groundwater systems, the broader context of the study seems to be more general. It's important to use more accurate and inclusive terminology to avoid potential misunderstandings, a suggestion could be to use porous media.
- Including fault scenarios might seem unnecessary if the method doesn't perform well for cases without faults, as Appendix A shows.
- Justify the Fault Scenarios: If the fault scenarios are crucial for real-world applications, provide stronger justification. Perhaps there are specific geological settings where faults significantly impact flow and transport.
- Under this specific scenario, explore the limitations of the graph-based approach to justify the range of the metric that is considered acceptable.
- Appendix A needs to include details of parametrization for the MODFLOW simulation.
- The method still relies on a 3D simulation (MODFLOW) to generate the "ground truth" against which the graph-based method is compared. This limits the method's independence and its potential for significant computational savings. While the graph-based method can provide a quick and potentially accurate approximation, perhaps consider validation with simplified Analytical Solutions, Sensitivity Analysis or Machine Learning techniques. This would provide a more rigorous comparison without relying on numerical simulations.
- Similarity measure: A similarity coefficient of 0.3 might seem low, especially considering that a perfect match would be 1.0. While a higher similarity coefficient would be ideal, a value of 0.3 can still be considered reasonable but needs to be explicitly acknowledged, especially given the complexity of the problem. The authors should provide a detailed discussion of the factors influencing the similarity coefficient and explain why this value is acceptable in the context of their study. Additionally, the authors could explore ways to improve the accuracy of the graph-based method, such as refining the graph construction by experimenting with different graph configurations to capture the underlying geological features better.
- A comprehensive evaluation of the graph-based method requires a clear understanding of the underlying physics-based model, including its setup and initial conditions. The authors should provide a detailed description of the MODFLOW simulations, including:
- Model Domain: The spatial extent and discretization of the model domain.
- Hydrogeological Properties: The values assigned to hydraulic conductivity, porosity, and other relevant parameters.
- Boundary Conditions: The types of boundary conditions applied to the model boundaries.
- Initial Conditions: The initial distribution of hydraulic head and contaminant concentration.
- Comparing a single MODFLOW scenario to multiple graph-based scenarios can be misleading, as it doesn't directly assess the accuracy of each individual graph-based scenario. A more appropriate approach would be to compare each corresponding pair of scenarios.
- The paper should be understandable to a broad audience without requiring extensive external references. Consider providing a brief explanation of the algorithms used:
- Dijkstra's Algorithm
- Other Algorithms (Jaccard dissimilarity, Wasserstein distance,Otsu thresholding)
Specific Comments
- Abstract:
[2] The phrase "large-scale structural features" could be more specific. Explicitly mention geological features: "large-scale geological features, such as faults, fractures, and stratigraphic variations" and their standard scales compared to domain extension.
- Introduction:
[42-43] The paper should clearly state how the methodology " improves the consistency for subsurface flow”. The author should provide a more precise explanation of why faults are relevant for contaminant transport in porous media. The manuscript should provide a deeper analysis of the role of heterogeneity within the graph-based approach.
[47] Consider addressing the role of heterogeneity in the main body of the manuscript.
- Method:
[60] Figure 1. There are no dimensions indicated in the figure. Is there a reason for the orientation of the scheme?
[70-73] The description of the experimental setting should be more specific about the position of the source points relative to the grid size. The authors indicate only one coordinate point; it is unclear where the random 10 positions fall on the modeling grid.
[75-80] This section should also address how the authors evaluate the role of heterogeneity for the simulation domain for the different subsurface properties, as this section indicates a variability in the behavior of the faults but does not answer the effect of the hydraulic conductivity or porosity for this approach. Appendix A should be referenced here.
[98] Figure 2 shows the hydraulic conductivity values of one scenario. The color bar should be properly labeled, and the formatting of the relative position of the two plots needs to be adjusted.
[100] Equation 2. This equation needs to be properly referenced and described in the text. The variables are not defined.
[105] Equation 3. This equation needs to be properly referenced and described in the text.
[126] is the function “get_shortest_paths” the same as the Dijkstra algorithm?
[140] Figure 3. At this stage of the reading, it is still not clear what s32 is. The figure needs quality improvement. Include units for the color bars. Figures c and d should be moved further down as it is not clear at this point what they mean, and they are not formatted properly. Labels for figures c and d should indicate the modeling framework used (MODFLOW, GRAPHFLOW). Furthermore, the choice of histogram plot to compare the output of 80 simulations using the new methodology compared to one single scenario using MODFLOW is confusing as it does not indicate the performance of each simulation against its corresponding physics-based.
- Metrics
[148] Figure 4 needs to improve its quality. Some recommendations: use the same font size of the plots and add labels to the color bars and units of measure. Adjust formatting. Since this is a workflow of the proposed metric, use more descriptive texts next to the figures.
[178] Variables have different formatting than the previous equation. 2-Wassertein Distance (W2) needs to be numbered.
- Method of scenario selection
[205-214] This section seems to address a different problem: the uncertainty of uncharacterized faults. However, the proposed methodology to validate the graph model has not been discussed up to this point. Consider including the evaluation of the model with the proposed metric first. This analysis should reflect the desirable range of the metric and its limitations.
- Results
[265] In this section, the author should provide a thorough justification of why a metric of 0.3 is considered valid. Based on the plots presented in Figure 5, for a validation coefficient of 0.31, the cumulative mass and the shortest distances seem to differ.
[272] How does the discretization of the domain affect the binary maps and, consequently, its validation?
Figure 5. This figure needs to improve its quality. Consider including the name of the scenario presented in each plot.
[276] There is no reference to what position 5 is.
Figure 7. This plot references 8 different scenarios from the graph method against one single scenario solved using a physics-based model. In the following paragraph, the author should provide an explanation of why two different scenarios lead to similar or equal validation metrics. This is misleading as it could mean that the proposed validation metric is not robust.
Table 2. The caption and names of the scenarios don’t match.
Technical corrections
The figures in the manuscript could be significantly improved in terms of clarity and readability. To enhance the visual appeal and understanding of the results. The font size for labels, axis titles, and legends should be increased to improve visibility. Clear and concise labels should be used to identify different components of the figures. Avoid using abbreviations or overly technical terms. Employ distinct color bars for different variables to facilitate comparison and interpretation. Consider the overall layout of the figures, ensuring that the elements are well-organized and easy to follow.
Citation: https://doi.org/10.5194/gmd-2024-154-RC1
Data sets
GraphFlow Leonard Moracchini and Guillaume Pirot https://doi.org/10.5281/zenodo.13328938
Model code and software
GraphFlow Leonard Moracchini and Guillaume Pirot https://doi.org/10.5281/zenodo.13328938
Interactive computing environment
GraphFlow Leonard Moracchini and Guillaume Pirot https://doi.org/10.5281/zenodo.13328938
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
128 | 16 | 10 | 154 | 1 | 0 |
- HTML: 128
- PDF: 16
- XML: 10
- Total: 154
- BibTeX: 1
- EndNote: 0
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1