the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GEOMAPLEARN 1.0: Detecting geological structures from geological maps with machine learning
Abstract. The increasing availability of large geological datasets together with modern methods of data analysis facilitate a data science approach to geology in which inferences are drawn from geological data using automated methods based on statistics and machine learning. Such methods offer the potential for faster and less subjective interpretations of geological data than are possible from a human interpreter, but translating the understanding of a trained geologist to an algorithm is not straightforward. In this paper, we present automated workflows for detecting geological folds from map data using both unsupervised and supervised machine learning. For the unsupervised case, we use regular expression matching to identify map patterns suggestive of folds along lines crossing the map. We then use the hdbscan clustering algorithm to cluster these possible fold identifications into a smaller number of distinct folds, the number of which is not known a priori. For the supervised learning case, we use synthetic models of folds to train a convolutional neural network to identify folds using map and topographic data. We test both methods on synthetic and real datasets.
- Preprint
(1958 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on gmd-2024-35', Juan Antonio Añel, 15 Jun 2024
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlTo replicate your work it is necessary to use some shapefiles (The Lavelanet and Esternay map shapefiles). However, you have not published them. You state that these have been provided to you by the BRGM, but this is your primary affiliation, so it comes out that are assets provided by your institution. We can not accept this. Our policy is clear, all the code and data necessary to produce a manuscript must be published when submitting the manuscript in one of the acceptable repositories according to our policy. Therefore, please, publish your shapefiles in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. In this way, the current situation with your manuscript is irregular.
Also, you must include in a potentially reviewed version of your manuscript the modified 'Data Availability' section, containing the requested information (link and DOI of the public repository containing the data).
If you think that your case for not sharing the data is under one of the exceptions that we can consider (publishing the data is out of your control because a law, regulation or mandate forbids it), please, reply to this comment with the evidence of it. Myself and the Topical Editor can guide you on it.
Please, note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2024-35-CEC1 -
AC1: 'Reply on CEC1', David Oakley, 20 Jun 2024
Dear Editors,Thank you for bringing this issue to our attention. Although the shapefiles that we used are produced by the BRGM, they were not produced by us or as part of our present work, and we do not control their licensing. To rectify the situation, we propose to redo our analyses of the Lavelanet and Esternay areas using open-access map shapefiles, specifically those of the Bd Charm-50 dataset available at: http://infoterre.brgm.fr/formulaire/telechargement-cartes-geologiques-departementales-150-000-bd-charm-50. These maps are very similar to the ones we had originally used, so we do not anticipate any major changes to our results. When we have completed these analyses, we will publish a new version of the Zenodo repository containing the data and associated codes for the Lavelanet and Esternay sites.Sincerely,David OakleyCitation: https://doi.org/
10.5194/gmd-2024-35-AC1 -
AC2: 'Reply on CEC1', David Oakley, 11 Jul 2024
Dear Editors,
As we proposed in the previous comment, we have reproduced the analyses of the Lavelanet and Eseternay maps using the open-access BD Charm-50 map data as well as open-access elevation data (from the BD ALTI 25M dataset, available at https://geoservices.ign.fr/bdalti).
The new version of the code including the data for these two maps has been archived at: https://zenodo.org/records/12710554
The doi for the new version (version 1.1) is: https://doi.org/10.5281/zenodo.12710554The new data are very similar, but not identical, to the ones we used in the original version of the manuscript. Figures 12 and 14 show essentially no changes, but Figure 13 does have some visible changes due to the differences in the data. We have, therefore, produced a modified version of Figure 13, which is attached to this comment and which will be incorporated into the revised version of the manuscript. While the modified version of Figure 13 does show some differences in the folds detected compared to the previous version, it illustrates the same point: Changing the clustering parameters changes the number and extent of folds detected.
Sincerely,
David Oakley
-
AC1: 'Reply on CEC1', David Oakley, 20 Jun 2024
-
RC1: 'Comment on gmd-2024-35', Anonymous Referee #1, 25 Jun 2024
The authors present a very interesting study of evaluating the potential of both unsupervised and supervised machine learning approaches for the automatic detection of fold structures from geological maps.
Especially the consideration of an unsupervised machine learning approach holds interesting aspects. Nonetheless, it would be advantageous to clarify a couple of points and extend more in the direction of generalizability.
Major Comments:
- The title of the paper refers to the detection of geological structures, whereas the rest of the document focuses only on the detection of fold structures. The authors make clear that an extension to more general settings is desired for the future. It is clear how this can be achieved for the supervised approach, but might prove very challenging for the unsupervised technique. For isolated geological structures, this is likely possible but a combination of different structures might yield to problems in the unsupervised approach and potentially also for the supervised techniques. Therefore, it would be highly advantageous to have an example combining more than one geological feature to be able to judge the potential capabilities of both approaches in a more general setting. Without this example, it is challenging to see whether especially the presented unsupervised approach is extendable to complex structures or limited to more simple settings.
- For the unsupervised approach, the first step extracts rays. For the application to other studies, it is interesting to know what distance between the rays is in general desired. Are there any rules of thump? How would a too large distance affect the results? And would a too small distance significantly impact the efficiency/cost of the approach?
- For the supervised approach, it would be good to extend the results/discussion on the aspect whether there is a dependency on the hyperparameters. Furthermore, additional details regarding the architecture of the U-Net should be provided (e.g., number of hidden layers, number of neurons per layer, learning rate, ...).Minor Comments:
- Abstract: Not every reader might be familiar with the hdbscan clustering algorithm. Therefore, it would be useful to add a very brief explanation in the abstract.
- Abstract: The abstract is a bit generic and specified to better highlight the novelty of the approach presented in the paper.
- Introduction: The authors present previous work for both the automatic detection of geological structures from geological maps and the automatic classification of lithologies from remote sensing and geophysical data. It would be useful to extend this to include also the usage of machine learning approaches in the field of geological modeling. Especially since in this field also unsupervised approaches (Wang et al., 2017) have been tested, and it would be interesting to know if these approaches could potentially work also in the current settings.
- Equations 1 and 2: It might be better to move the description of equations 1 and 2 above the equations. Otherwise, these equations might be confusing for the reader at first since the type of notation might not be expected.
- Figure 1A: It would be good to add a verb to "Intersection of grid and map polygons" to unify this point with the rest of the figure.
- Figures: The resolution of some figures is relatively low. It might be advantageous to switch from bitmaps to vector graphics.References:
- Wang, Hui, et al. "A segmentation approach for stochastic geological modeling using hidden Markov random fields." Mathematical Geosciences 49 (2017): 145-177.Citation: https://doi.org/10.5194/gmd-2024-35-RC1 -
RC2: 'Comment on gmd-2024-35', Anonymous Referee #2, 07 Sep 2024
The authors explore the possibility of automatically detecting fold structures from geological maps using both unsupervised and supervised machine learning. This is an interesting and innovative job.
Major Comments:
1) In the scheme of the grid of sample lines, the recognition efficiency is low when there are many nodes; When there are few nodes, it is easy to miss folds. Folded strata are usually distributed in long strips due to compression (except for some special domes). Therefore, based on the direction of the exposed strata, drawing sample lines is expected to reduce the number of sample lines and improve recognition efficiency. Suggest adding relevant discussions during the discussion;
2) The use of 250m as the distance (lines 90-95) in attitude calculation may be appropriate for the experiments in this paper. The value of this parameter should be related to the scale of folds and the scale of the map. Geological maps of different scales reflect fold structures of different scales, and their values should be different. It is recommended to add appropriate discussions;
3) Unsupervised learning methods, due to the use of clustering algorithms, limit the automation level of the algorithm. In this scheme of Figure 3, only the midpoints of the possible fold segments were identified. If the boundary points of the folds are identified simultaneously, it is possible to segment the midpoints through the boundary points, which can avoid clustering processing and improve the automation level of the algorithm possibly;
4) The horizontal rock layers (horizontal structures) distributed along both sides of the valley may produce patterns similar to folds in geological maps. Suggest giving appropriate consideration.
Minor Comments:
1) The paper mainly explores the automatic detection method of folds, and it is suggested that the keyword of the fold be included in the title of the paper;
2) Why focus on detecting geological folds? Please state the appropriate reasons in the manuscript;
3) How to combine the two methods? Please provide more suggestions.
Citation: https://doi.org/10.5194/gmd-2024-35-RC2 -
RC3: 'Comment on gmd-2024-35', Guillaume Caumon, 10 Sep 2024
This paper proposes and compares two new methods to characterize folds from geological maps. One uses regular expressions and a set of rules applied to scanlines, while the other uses a supervised approach using synthetic training models and a convolutional neural network. The paper is very nicely written and clearly explained, and both methods are interesting and relevant to the considered problem. The application cases are convincing and the discussion highlight both the merits of the methods and the avenues for future improvements. Therefore, I recommend publication subject only to minor revisions.
General Comments:
- Estimating the dip of layers from unit polygons is a good idea but may be sensitive to noise or degenerate configurations (n aligned points only constrain one of the two orientation angles to describe the surface orientation). It also relies on labels on the (closed) unit polygons to make sure that faults and unconformities are properly handled. Please add a few more details about how this is done (See also Fernandez, Journal of Structural Geology 2005).
- It could be relevant to include and discuss how the proposed method compares to Jessel et al. (GMD, 2021 - map2loop paper). More fundamentally, geometric 3D interpolation from map data could also automatically generate fold geometry and the associated characteristics without the need for pre-computing surface orientations, see Caumon et al., IEEE Trans. Geosci and Remote Sensing 2013.
- The scan line strategy proposed in the unsupervised method is nice to generate uniformly oriented lines. I wonder if it generates uniformly located lines, and whether varying line densities could be a source of sampling bias. (A way to check could be to estimate the number of times that each pixel of the map is intersected by a line). Given the large number of lines used in the paper, this is probably not a big problem, but I prefer asking.
- The training and data assume that no growth strata are present on the anticline. Is there a way to account for unconformities with the proposed regular expression approach? Adding some elements on this would be welcome in the discussion.
- Instead of constraining the dip of layers in the unsupervised method, could you consider the difference between the topographic and layer slopes along the scan line? Indeed, this difference is what makes the distinction between a true fold and an apparent fold.
- If I understand correctly, the ‘?’ In the regular expression may possibly lead to underestimating the lateral extent of the anticlines / sync lines in the unsupervised method when more than 4 units are folded. However, this does not seem to happen. Please comment.
- In several figures, only the geology is shown, which makes the 3D perception of the topgraphy difficult. If possible, it would be nice to overlay topographic contours onto the colored geological units (as in classical geological maps) to help the reader better perceive the various configurations.
- I like the discussion between the relative merits of the supervised and unsupervised methods. To echo a point of the discussion, I hypothesize that the independence assumption between the geological model and the topography could make « realistic » intersection patterns (as determined by erosion) marginal in the training set. If so, I suspect that generating more realistic erosion patterns and the associated geomorphological features in the training set could possibly help the CNN. Another potential bias may come from the topography which is always located at 75% of the rock thickness. Maybe making this average elevation random could be an computationally efficient way to augment the training data set and generate more representative configurations. As these suggestions may or may not improve the results, I don't see their implementation as needed for the paper to be publised.
- Lines 380-385: Could simply turning the training data upside down be done to detect synclines ?
- Line 419: small typo: ‘Table S1’ should read Table A1.
- I agree with some previous comments that call for choosing a more specific title.
Citation: https://doi.org/10.5194/gmd-2024-35-RC3
Model code and software
GEOMAPLEARN David Oakley and Thierry Coowar https://doi.org/10.5281/zenodo.11073379
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
390 | 79 | 60 | 529 | 15 | 18 |
- HTML: 390
- PDF: 79
- XML: 60
- Total: 529
- BibTeX: 15
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1