the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An emulation-based approach for interrogating reactive transport models
Harold J. Bradbury
Jennifer L. Druhan
Alexandra V. Turchyn
Download
- Final revised paper (published on 05 Dec 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 04 Oct 2022)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-729', Anonymous Referee #1, 03 Jan 2023
Fotherby et al. presented an emulation-based approach for interrogating reactive transport models. The manuscript is interesting and well-written. It is also important because the modeling community is exploring a trade-off between computationally expensive mechanistic and inexpensive hybrid models. This manuscript developed emulators on RTM runs and touched on mechanistic and hybrid aspects but needed to demonstrate the actual use of emulators. The authors should demonstrate emulators' utility in reproducing 2D (if not 3D) simulations based on their emulators built using 1D simulations (as in the m/s). Otherwise, this manuscript is just another way of doing sensitivity analysis. In addition, the manuscript rehashes previous work done at the Rifle site by one of the coauthors because emulators were developed on that work in addition to methodological details. I would like to see a discussion on scientific insights, which is on the lesser side.
I have some questions as follows:
(1) Lines 16-18: "we use the emulator to explore how varying the boundary conditions in the RTM describing the aquifer impacts the rates and volumes of mineral precipitation." Were new boundary conditions implemented in RTMs that were used to train emulators? If yes, then what new insights can be drawn? If not, how can emulators extrapolate something they have not seen before? If new boundary conditions are within certain bounds that RTMs have already been run is not a big deal because simple interpolation might also work well.
(2) I have a similar question on recognizing an unanticipated dependency of pyrite precipitation on pCO2 in the injection fluid due to the stoichiometry of the microbially-mediated sulfate reduction reaction. The authors claim that this complex relationship was made apparent by the emulator, while the underlying RTM was not specifically constructed to create such feedback. I wonder how emulators expand their horizons when they have reduced representation of RTMs. What is the explanation that reduced order models can tell beyond RTMs capabilities?
(3) Did the authors try exploring the unanticipated dependency of pyrite precipitation on pCO2 using CrunchTope or other simulators (Geochemist's Workbench, ToughReact)? This is a vital conclusion of the manuscript, so the authors need to back this up using RTMs.
(4) It needs to be clarified how emulators can find local maxima, which it seems RTMs cannot. The authors claim that they demonstrate using emulators can maximise specific mineral precipitation or dissolution reactions to find local maxima. Is this something like hot spots and hot moments?
(5) Lines 63-66: "Unfortunately, due to the computational expense of many modern multi-component RTMs …" when the authors can run 10K simulations to build emulators, how they are finding RTMs computationally expensive. The authors might want to say that 2D and 3D RTMs are expensive. So my suggestion would be to demonstrate using at least 2D (if not 3D) simulations that their emulators built using 1D simulations can be reproduced. Otherwise, this manuscript is just another way of doing sensitivity analysis.
(6) Lines 173 and 176-178: The authors acknowledge that ranges of concentrations are somewhat higher than those in natural systems. Then performed, 10,927 runs and 9416 provided useable data. How do they ensure that synthetic data is realistic?
(7) Figure 3: NH4+ and Ca++ fail to capture the trends. Ca++ shows inverse trends. Was the emulator exposed to this particular RTM run? Emulators are not able to reproduce RTM runs.
(8) The authors should define several experiments based on complexities, not include them (and their close conditions), and test emulators' capabilities to reproduce those runs. This exercise will allow them to say how far emulators can be used.
Citation: https://doi.org/10.5194/egusphere-2022-729-RC1 -
RC2: 'Comment on egusphere-2022-729', Anonymous Referee #2, 25 Mar 2023
The paper claims to present a novel emulation approach for reactive transport models. The authors examine a multi-component reactive (0-D) system with 20 species. A dataset is generated by varying the RT parameters (Monte Carlo type) and this dataset is used to train/validate an XGBoost ML emulator model. Comparison via plotting between some outputs of the ML-based emulator and the RT model shows a decent match. The authors also discuss this method's pros and cons, which is appreciated. Overall the paper is well-written. However, there are some significant concerns:
1. The novelty in this paper is extremely weak. ML-based emulators for reactive transport are not novel. The authors should refer to previous works and clearly identify the novelty in their work. See references:
1.https://arxiv.org/pdf/2107.07598
2. https://doi.org/10.1016/j.jcp.2021.110147
3. https://doi.org/10.1007/s10596-021-10126-2
2. The motivation to build emulators in this paper is also weak. In my opinion, the whole point of building an emulator is to gain computational speedup. I do not see the benefit of building an emulator for a simple 0-D model (even if it is complex in terms of 20 species) since the 0-D is computationally very cheap. Perhaps using higher dimensional datasets would make a stronger case for the need for emulators.
3. A case is made that emulators can lead to new insights. However, the insights drawn in this paper can perhaps be drawn directly from the 0-D RT models. This part was unclear as to what the benefit over the 0-D model was in terms getting these insights, and it needs to be carefully addressed.
4. No metrics about the ML emulator accuracy are shown compared to the 0-D model. Only a few plots are shown. Authors are referred to see the above references, where detailed metrics are shown both for the ML training/validation and comparison to the RT model.Citation: https://doi.org/10.5194/egusphere-2022-729-RC2 -
AC1: 'Response to reviewers', Angus Fotherby, 21 Apr 2023
Reviewer 1
We thank the reviewer very much for taking the time to read and review our manuscript, as well as for their insightful questions and comments. The reviewer is correct in pointing out that we restrict ourselves to emulating 1D reactive transport models only. The reason for this is two-fold: first this manuscript is primarily designed as a proof-of-concept for the technique, and as such we wanted to apply it to previously published reactive transport models that had a high degree of geochemical complexity but were not so focussed on the transport aspect of the system for the method development. Second, we wanted to apply our emulator technique to solve a simple geochemical optimisation problem for each system (Section 4), to show that the approach had applicability beyond preforming sensitivity analysis, and we feel this capability is best demonstrated in a 1D system with geochemical complexity. We do acknowledge that transport is an important part of geochemical modelling, and that transport is not typically in 1D, and this represents a future challenge for emulator development in this space, but we would suggest that the primary focus here is on geochemistry and not transport.
The reviewer suggests that we should expand our scope to include 2 or 3D models lest our approach be “just another way of doing sensitivity analysis”. We appreciate this comment but suggest that it is for future work. All emulators of reactive transport models, regardless of whether they deal with transport in 1, 2 or 3 dimensions, are extremely well equipped for doing sensitivity analysis. In the revised manuscript, we acknowledge this as a potential (and useful) aspect of the approach we present (e.g. line 459) but we also demonstrate the ability of the emulator to solve simple geochemical optimisation problems in two very different geochemical systems (Section 4, and Supplementary Section 3.2), which is a future direction in which we hope to take this research, among other things.
The reviewer does however erroneously suggest that the manuscript rehashes work previously done at Old Rifle by one of the co-authors. This is not the case; this work has not been conducted before. This is the first time an emulator has been developed to reproduce a previously published RTM for Old Rifle (line 135).
We now turn to the reviewer’s specific comments in turn.
- The reviewer queries our statement that “we use the emulator to explore how varying the boundary conditions in the RTM describing the aquifer impacts the rates and volumes of mineral precipitation.” (lines 17–19). This is true, the new boundary conditions were used as labels for the net pyrite precipitated in the column (see section 3.4.1. Data Strategy, lines 230–232 and 239–243). The emulator must be trained to emulate the system based on some dataset but the reviewer is right that an interpolation scheme could be used—there is a full discussion of the advantages of emulation over interpolation in Section 4.3.1. The reviewer is also right to point out the potential hazards of emulators extrapolating beyond the trained region. Although we do not do this in this manuscript, there is a brief discussion of such extrapolation in lines 321–331.
- The reviewer also asks about our discovery of an unanticipated dependence of pyrite precipitation on pCO2 and how it is that an emulator, which is necessarily a reduced-order model, can provide more insight into the underlying RTM. This is a result of the growing complexity and sophistication of modern RTMs, which is one of our motivations in developing this methodology. Modern RTMs draw large suites of chemical and mineralogical data from vast databases, which constitute large sets of non-linear equations all coupled through transport and fluid chemistry—it is inevitable that in their development there will be feedbacks between quantities that are not realised. There is nothing inherent about reduced dimensionality that prevents such feedbacks from being captured in a dataset and learned by an emulator. In fact, we suggest that emulators are remarkably well placed for exploring such feedbacks because of the speed at which they can be interrogated. We acknowledge in lines 389–394 that such feedbacks need to be (subsequently) tested in the field and lab but suggest that the use of emulators in this way could be an interesting way to direct future research.
- The reviewer asks if we validated our finding that pyrite precipitation has a dependency on pCO2 using a reactive transport model. The results of us exploring and verifying this dependency in CrunchTope are shown in Figure 3 of the revised manuscript. We also discuss the mechanism by which this dependency occurs in lines 350—383.
- Reactive transport models are forward simulations built on a mechanistic understanding of the geochemical systems that they attempt to model. As such they lack built-in capability for finding the conditions under which they might maximise a given geochemical quantity. On the other hand, emulators are reduced representations of RTMs that synthesise a lot of data about the underlying model and can be run very quickly, making them ideally placed for finding maxima and minima. In the model, we do this by interrogating the emulator at regular intervals to find an approximation of the maxima. The reviewer is correct in pointing out that this is not made explicit, and we clarify this in the revised manuscript (lines 404–409). This approach is a simple one, but we do discuss more sophisticated, future applications of emulation techniques (Bayesian Optimisation) for doing this as well in section 4.3.3.
- The reviewer correctly points out that our claim that RTMs are computationally expensive does seem to be at odds with the large data set used to train the emulator and suggests that emulators of 2 or 3D RTMs would enhance the manuscript. We would suggest that modelling 1D flow is no guarantee of computational speed and that our focus in this manuscript is more focussed on the geochemistry than the transport (see our overarching comment above).
- The reviewer asks how it is that we ensure that our synthetic data is realistic. This is an excellent point and something we clarify in the revised manuscript. The data that was excluded was due to runs failing to complete before an arbitrary cut off time, or due to extreme boundary conditions that failed to speciate. This is now clarified in the revised manuscript, lines 194–199. We ensure synthetic data is realistic by having a well validated underlying RTM with a sound set of physical processes governing the behaviour. The developer of an RTM will know what assumptions have gone into their model and hence know where it is valid to probe with the emulator and where it is not.
- The reviewer asks about the data shown in Figures 3A and 3C and whether the emulator was exposed to data about the specific boundary conditions represented by the black crosses. The answer is no, they have not been exposed to those exact conditions, as the data for training was generated by a random sample (see Section 3.1) and we clarify this in the revised manuscript (lines 301–303). This is now in the figure caption to clarify. The reviewer also suggests that the fit lines in blue do not capture the trends shown in the underlying black data in Figures 3A and 3C. It is true that these fits are slightly offset but in both cases the error is small and does not greatly impact the conclusions we draw. We suggest that the reason why this may be is that weak/non dependencies get swamped by the larger signals in the dataset and are thus slightly drawn down on average. We thank the reviewer for the suggestion the manuscript would benefit from a discussion of this point and have implemented it (lines 306–310).
- The reviewer is alluding to testing the emulator on unseen data here. This has been done for both models and is shown in Figure 3 and Figure S8. We have also clarified our validation strategy and provided validation testing scores in the revised manuscript, lines 269–277.
We thank the reviewer for their time and help in improving this manuscript.
Reviewer 2
We thank the reviewer for taking the time to read and assess our manuscript. We would immediately point out that the two reactive transport models that we emulate in this paper are 1D models (e.g., see line 215). There are figures that make this clearer in the supplementary and perhaps the paper would benefit from returning those to the main manuscript body, but we have added a clarifying line earlier in the manuscript (line 98). Otherwise, we thank the reviewer for their concise and accurate overview. We will now respond to the reviewer’s specific concerns in turn:
- The reviewer correctly points out that the novelty of this paper is now a lot weaker than when it was first conceived back in 2020 and initially reviewed early 2021, as emulator approaches have now been applied in the RTM context in a variety of ways. We apologise for missing the papers suggested when this manuscript was posted to the GMD pre-print server nearly a year ago, which demonstrate how emulators can be used to speed up RTM simulations by replacing the geochemical solver with an emulator. Our is approach is closer in nature to the 2nd paper suggested to us (Ahmmed et al., 2021), which tests the ability of a wide variety of machine learning approaches to predicts the degree of mixing and production of a hypothetical species C from two reactants A and B. We extend this underlying principle to RTMs of real-world systems to develop new ways to explore geochemical parameters spaces and the effect of changing those geochemical parameters on the overall outcome of reactive transport simulations, with an eye towards predicting system outcomes in real world scenarios. We have added a paragraph to this effect in the revised manuscript (lines 74–85). We have removed references to the novelty of this approach in the revised manuscript, as it is no longer the case and have included references to the suggested papers. It is unfortunate that this paper spent nearly two years in review, so we are not the first to show this approach.
- The reviewer suggests that the motivation to build emulators is weak in the original submission. Leaving aside the point that the models we are emulating are 1D, rather than 0D models, we suggest that emulators have utility in their ability to enable optimisation routines (Section 4.3.3, for example) for RTMs, as well as their ability to facilitate the exploration of the geochemical space in an efficient way, which can allow for the discovery of new feedbacks (which the reviewer touches on in their next point) as well as performing efficient sensitivity analysis. We address these points in Sections 4.2 and 4.3 but we have added a comment clarifying that we are primarily interested in exploring the highly dimensioned geochemical space (lines 96–98). We thank the reviewer for the suggestion.
- The reviewer indicates that a careful discussion of how the emulator approach can be used to gain new insights into the geochemical behaviour of RTMs is needed. Of course, in a sufficiently simple model, coupled geochemical behaviour can be deduced by reasoning about the governing equations. However, modern RTMs draw large suites of chemical and mineralogical data from vast databases, which constitute large sets of non-linear equations all coupled through transport and fluid chemistry—it is inevitable that in their development there will be feedbacks between quantities that are overlooked. The reduced representation of the emulator allows investigators to quickly test a large variety of different hypotheses. Ultimately, we suggest that the benefit comes in providing another avenue for discovery and investigation and this is borne out by the fact that the original RTM for the Old Rifle was published in 2012, but the effect of pCO2 on pyrite precipitation was not reported until 2016 (see line 351). We agree that this point needs to be clarified and have added an additional subsection discussing this in the under section 4.3.
- The reviewer rightly points out that there is a lack of structured discission of the testing and training metrics for our emulators in the main body of the manuscript, although there is some in the supplementary, e.g. Figure S2. We have updated the manuscript to include our emulator testing and training results, lines 269—277.
We thank the reviewer for giving their time and effort to improve this manuscript.
Citation: https://doi.org/10.5194/egusphere-2022-729-AC1