These authors contributed equally to this work.

Geoscientific models are based on geoscientific data; hence, building better models, in the sense of attaining better predictions, often means acquiring additional data. In decision theory, questions of what additional data are expected to best improve predictions and decisions is within the realm of value of information and Bayesian optimal survey design. However, these approaches often evaluate the optimality of one additional data acquisition campaign at a time. In many real settings, certainly in those related to the exploration of Earth resources, a large sequence of data acquisition campaigns possibly needs to be planned. Geoscientific data acquisition can be expensive and time-consuming, requiring effective measurement campaign planning to optimally allocate resources. Each measurement in a data acquisition sequence has the potential to inform where best to take the following measurements; however, directly optimizing a closed-loop measurement sequence requires solving an intractable combinatoric search problem. In this work, we formulate the sequential geoscientific data acquisition problem as a partially observable Markov decision process (POMDP). We then present methodologies to solve the sequential problem using Monte Carlo planning methods. We demonstrate the effectiveness of the proposed approach on a simple 2D synthetic exploration problem. Tests show that the proposed sequential approach is significantly more effective at reducing uncertainty than conventional methods. Although our approach is discussed in the context of mineral resource exploration, it likely has bearing on other types of geoscientific model questions.

As the world weans itself off fossil fuels over the next decades, new forms of energy will heavily rely on Earth materials, in particular minerals. Rare earth elements are used in a variety of clean-energy technologies (Haque et al., 2014). Fully electrifying the light-duty auto fleet requires discovering new ore deposits of critical electric vehicle (EV) materials: copper, nickel, cobalt, and lithium (Sovacool et al., 2020). Increasing the required supply of these critical minerals requires a yet unattained discovery rate of new deposits. Mineral exploration is slow, requiring extensive guidance from human experts. As a result, the rate of new discoveries has declined over the last decades, since deposits with sections visible at the surface have mostly been discovered (Davies et al., 2021). At the same time, the demand will continue to increase, making minerals a targeted commodity subject to international conflict (National Research Council, 2008) as well as social and environmental concerns (Agusdinata et al., 2018). Enhancing and speeding up mineral exploration at a planet-wide scale is required. Our approach, using artificial intelligence for effective planning of exploration endeavors, aims to contribute to this challenge.

Mineral exploration requires making sequential decisions about what type of data to acquire, where to acquire them, and at what resolution with the goal of detecting an economically mineable deposit. In other words, mineral exploration is a sequential decision-making problem under uncertainty. These types of problems have previously been studied under several non-sequential frameworks in various areas of the geosciences. Optimizing spatial designs of experiments is a well-studied topic. McBratney et al. (1981) described a method for designing optimal sampling schemes based on the theory of regionalized variables (Matheron, 1971) by modeling spatial dependence with semi-variograms. The 1990s saw a significant debate arising in the soil sciences community (Brus and Gruijter, 1997; Van Groeningen et al., 1999; Lark, 2002; Heuvelink et al., 2006) around adaptation of geostatistics and their role in optimal survey design. Likewise, geostatistics-based optimal design of environmental monitoring has been significantly developed (De Gruijter et al., 2006; Melles et al., 2011). Geostatistical methods are often not Bayesian, which may be a disadvantage when the spatial structures (e.g., variograms) are uncertain themselves. A method for Bayesian optimal design in spatial analysis was developed by Diggle and Lophaven (2006).

Optimal placement of drill holes for mineral exploration and mining (resource delineation) has received significant attention. Some methodologies aim to minimize the uncertainty in spatial properties through use of geostatistical algorithms that model the effect of measured data on spatial uncertainty (Pilger et al., 2001; Koppe et al., 2011, 2017; Caers et al., 2022; Hall et al., 2022). Others rely on decision theoretic concepts of value of information to quantify the dollar value of gathered information to reduce uncertainty in an economic property of interest (Froyland et al., 2004; Eidsvik and Ellefmo, 2013; Soltani-Mohammadi and Hezarkhani, 2013). Bickel et al. (2008) recognize the sequential nature of the problem and illustrate the fact that sequential information gathering is superior to non-sequential schemes, a concept that goes back to the 1970s (Miller, 1975).

The above methodologies evaluate the performance of a given spatial survey design, but do not address the combinatorial problem of creating optimal survey plans. In general, the number of sequences to evaluate grows exponentially with the number of surveys. For example, when planning a sequence of 10 surveys at 100 possible locations, there are more than 17 billion possible sequences that could be evaluated. Many problems will likely require more than 10 data acquisition actions to discover a mineral deposit that is economically feasible. Therefore, methodologies (like Emery et al., 2008) that use optimization in combination with geostatistics are likely intractable for many practical problems.

Sequential planning methods solve for each action in a sequence only after observing the results of each previous action. Planning is typically done in either an open-loop or closed-loop fashion. Open-loop methods solve for each action in the sequence that gives the best immediate return according to some metric, without considering how the information learned from taking that action is likely to impact future decisions. Closed-loop methods solve for actions that maximize the expected return of all remaining actions in a sequence. Closed-loop methods tend to outperform open-loop methods, especially on tasks in which a lot of information is learned each step (Norvig and Russell, 2020, p. 120–122). Closed-loop methods, however, tend to require significantly more computational effort than open-loop approaches.

Recent work has applied Bayesian optimization to develop open-loop solutions to sequential experiment design (Shahriari et al., 2016). Marchant et al. (2014) specifically consider the application of Bayesian optimization to spatial–temporal measurement sequences. Receding horizon control has been used in sequential resource development (Grema and Cao, 2013) in conjunction with general particle swarm optimization. While these methods may be tractable, they are likely suboptimal over the entire measurement sequence, since each action only optimizes its own return.

Closed-loop methods solve for optimal conditional sequences of actions. Common closed-loop methods include reinforcement learning, dynamic programming, and Monte Carlo planning. These methods search for optimal actions through extensive interaction with a simulation of the target environment. Because of the large amounts of data required, these methods were initially developed on virtual domains such as video games (Chaslot et al., 2008). Recently learning-based approaches have achieved state-of-the-art performance in several real-world domains including autonomous driving (Brechtel et al., 2014) and robotic control (Grigorescu et al., 2020). Little work has been done, however, in applying these approaches to resource exploration. Torrado et al. (2017) proposed a Monte Carlo planning method for a similar task of optimal sequential reservoir development. This work, to the authors' knowledge, is the first proposal for a general approach to optimal closed-loop decision-making for geoscientific sequential data acquisition planning. In this work, we propose an approach based on Monte Carlo planning.

Our development will be illustrated on an analogue case setup that contains many elements common to resource exploration planning. In that sense we aim for modularity in the development where several components (inverse modeling, geological modeling, data forward modeling) can be changed out without changing the sequential data acquisition methodology.

Specifically, we will focus on the exploration of one or more ore bodies in the subsurface. The elements of the problem definition consists of (1) a description of the state of knowledge of the physical world, (2) a description of data that exist or are planned to be acquired on the physical world, and (3) rewards and costs associated with the exploration endeavor.

Knowledge and uncertainty about the subsurface is commonly represented by probability distributions over the parameters of the subsurface system. Gridded models describing parametric distributions over geological, geophysical, and geochemical properties may be too high-dimensional for practical use in decision-making. A realization (in geostatistical jargon) generated from a probability distribution over the subsurface represents a plausible representation of the physical world. An ensemble of plausible realizations is a tractable method to represent the distribution over the subsurface. The variation between multiple realizations is an empirical representation of uncertainty (lack of knowledge).

A subsurface ore body may be hard to identify in a real setting for various reasons. In geophysical surveys, many other geological features may act as ore bodies. An ore body is also not necessarily a perfect anomaly in a homogenous geological setting. Tectonic, metamorphic, sedimentary, and other alteration processes may have changed the nature of the original ore body. In Fig. 1, we show how we created an analogue situation that mimics many of these elements. Figure 1 represents a simplified 1D depiction, though the methodology will be applied to 2D and 3D settings. Figure 1 should only be referenced as a template containing the challenges present in mineral exploration.

First, we represent the mineralization by the function in Fig. 1a. The
example shows a unimodal function; however, a multiple of these
mineralization bumps may be present. Second, we introduce a “geological
background variation” as shown in Fig. 1b. This represents all geological
processes that have altered the original ore body shape. This variation is
not entirely random and has some structure. In our setting, we model it as a
Gaussian process with known correlation structure (variogram). In practice,
a much more complex model of the background geology may be used with the
presented methods, and hence the noise term in this simple example is used
to develop a methodology. By adding the “mineralization field” to the
“geological background field”, we obtain the “measurable variation”
shown in Fig. 1c. When a threshold

The next element is the set of measurements available to be taken.
Measurements are indirect indicators of what is desired: the economic
parameters of the ore body, which in our setting is the ore body volume.
Measurements generally do not directly observe this value; however, they may
reduce the uncertainty in it. Such uncertainty quantification is generally
conducted with Bayesian approaches. Bayesian methods require stating
measurement likelihood functions and prior distributions. In our setting,
the various alternative realizations constitute samples of the prior. In
this work, we consider taking point measurements of the total variational
field, as shown in Fig. 1c. We also consider taking only one measurement
at a time because measuring may be expensive, and the results may inform
where to best take the next measurement. Note that in this work, we will not
perform a traditional geostatistical conditional simulation using the
measurements as hard data because the function

Example 1D mineralization. Panel

We test the presented methodology on a 2D case that is analogous to the 1D
example. The 2D case setup is shown in Figs. 2 and 3. We define the
mineralization

Two-dimensional exploration problem. The mineralization field

Two-dimensional economic field. The massive ore field

The question we will address is the following: what is the optimal sequence of data acquisition that best informs a “mine” vs. “do not mine” decision based on a mineable volume exceeding some minimum threshold?

In this paper, we will need to merge nomenclature and mathematical notations
of two different domains: geosciences–geostatistics and artificial
intelligence (AI). Here we list some nomenclature from each field that
describes the same concept (see also Table 1).

A state is an instantiation of a set of parameters describing the world.
For example, a geostatistical realization is a set of geological parameters
representing the “state” of the subsurface in a gridded model. A state is
referred to as

Belief over a state is a probability distribution of instantiations of a set
of parameters. In probability theory, one defines a probability density over all possible outcomes
of a geological model. This density is very
high-dimensional in our setting. In AI ones uses

Belief update equals Bayesian update. A belief update requires stating the
prior and the likelihood model. The likelihood in AI is termed the
observation model

Observation space is the set of all possible outcomes of the measurements. In
AI observations are denoted as

Comparison between AI and geostatistical nomenclature.

This work frames mineral exploration as a sequential decision process. In a sequential problem, a decision-making agent must take a sequence of actions to reach a goal. Information gained from each action in the sequence can inform the choice of subsequent actions. An optimal action sequence will account for the expected information gain from each action and its impact on future decisions. This type of conditional planning may be referred to as closed-loop or feedback control. We will use the mineral exploration problem outlined above as a working example for the remainder of this section.

A sequential decision problem can be modeled formally as a Markov decision
process (MDP). An MDP is a mathematical description of a sequential decision
process defined by a collection of probability distributions, spaces, and
functions. The full MDP is typically defined by the tuple

In many decision-making problems, such as all subsurface problems, the state
at each time step (the geological model) is not fully known. In this case,
agents make decisions based on imperfect observations of the relevant states
of their environments. Sequential problems with state uncertainty are
modeled as

To solve a POMDP, an agent must account for all the information gained from
the sequence of previous observations when taking an action. It is common to
represent the information gained from an observation sequence as a

Each decision in the sequence is made using the belief updated from the preceding observation. The process is depicted in Fig. 4. An optimal choice in a sequential problem should consider all subsequent steps in the sequence. However, the number of trajectories of actions and observations reachable from a given state grows exponentially with the length of the sequence. As a result, optimizing conditional plans exactly is generally intractable. Instead, most POMDPs are solved approximately using stochastic planning and learning methods.

Exploration Markov decision process. At each decision step, the agent
selects an action

Monte Carlo tree search (MCTS) is a class of stochastic planning algorithms
that is commonly used to solve MDPs and POMDPs. MCTS methods solve for
actions each time a decision is made by simulating the potential outcome of
available action sequences. It uses the simulations to estimate the expected
value of each available action and then recommends the action with the
highest expected value. Each simulated trajectory is recorded in a tree
graph, as shown in Fig. 5. Each time a simulation is generated, the
trajectory is added to the tree. Future action sequence trials are guided by
the information in the tree at the start of that trial. MCTS algorithms are
considered

Monte Carlo search tree. Each simulation in an MCTS algorithm is
encoded into a search tree. The example tree is rooted at the belief,

We propose formulating the mineral exploration problem as a sequential decision problem. A sequential plan allows information from each measurement in the sequence to inform the choice of subsequent measurements.

We now return to the template example introduced in Fig. 1 and state the
elements of the POMDP.

In this section, we present a method to solve the example 2D mineral exploration POMDP. The methods presented may be generalized to additional mineral exploration problems. Algorithms to solve POMDPs can typically be applied to any valid POMDP model, though with differing effectiveness. The remaining subsections are divided into the tasks required to solve the POMDP: belief updating and searching over the large, combinatorial space of possible action sequences.

The proposed solver is based on Monte Carlo tree search (MCTS), which is a class of stochastic planning algorithms that is commonly used to solve MDPs and POMDPs. MCTS methods solve for actions each time a decision is made by simulating the potential outcome of available action sequences. It uses the simulations to estimate the expected value of each available action and then recommends the action with the highest expected value. Each simulated trajectory is recorded in a tree graph, as shown in Fig. 5. Each time a simulation is generated, the trajectory is added to the tree. Future action sequence trials are guided by the information in the tree at the start of that trial. MCTS algorithms are considered online planners, since they solve for an optimal action from a given starting state and therefore require computation every time a decision is made.

Reinforcement-learning-based approaches may also be used to solve POMDP,
though they are likely not as well suited as the presented Monte Carlo
method. Reinforcement learning methods learn the optimal action for each
possible encountered state

Belief updating in AI is the equivalent of inverse modeling in the
geosciences. In our setting, we have indirect measurements

A particle set is an ensemble of realizations of the state variable with a
sample distribution approximating the true state distribution. The initial
particle set is generated by first sampling an ensemble from the uniform
prior distribution. For an

When new information

Once a weight has been calculated for each particle in the set, a new
ensemble is generated. The new set is generated by sampling

UPDATEBELIEF. Pseudo algorithm for model inversion (belief update) using a hierarchical particle filter.

To solve the POMDP, we search for the optimal action at each step using a
variant of POMCPOW (Partially Observable Monte Carlo Planning with
Observation Widening; Sunberg and Kochenderfer, 2018), a Monte Carlo tree
search algorithm for POMDPs. At each time step

POMCPOW generates a fixed number of trial trajectories

At each step of a simulated trial, POMCPOW simulates taking the action with the
highest upper confidence bound on its estimated value. In this way, POMCPOW
optimistically explores the action space. This strategy has been proven to
converge to the optimal action in the limit of infinite samples. After all

For POMDPs with large action spaces, POMCPOW limits how often new actions
can be added to the search tree through a progressive widening rule. Under
progressive widening, the total number of child action nodes that a given
belief node may have is defined as a function of the total number of times
that node has been visited in previous trials. The limit is defined as

In this section, we present the result of solving the problem for the mineral field shown in Fig. 6. In all problems, rewards are measured in units of massive ore; one pixel in the massive ore map (Fig. 3) represents one unit of ore. In all the problems studied, the massive ore threshold was set to 0.7 and the extraction cost was set to 150 units. This example case has a total volume of 158 units of massive ore, making it a marginally profitable case. The measurement cost was 0.1 units per measurement taken. In this example, we constrained the measurements to be taken a maximum distance of 10 distance units away from the previous measurement; each pixel is one distance unit.

Figure 7 shows the mean and standard deviation mineralization

Illustration case. Panel

Initial ore belief. Panel

Initial belief ore histogram. The figure shows the distribution of massive ore volumes in the initial belief ensemble. The vertical line shows the actual volume of ore in the illustration case.

We ran POMCPOW for 10 000 trial simulations (trajectories) per step. The resulting actions taken in the first five steps are shown in Fig. 9. As can be seen, the deviation of the belief about the ore quantities decreases as measurements are taken, and the expected value tends toward the true value. The agent tends to take an “extent-finding” approach, whereby it alternates taking actions closer and then farther from the expected center of the ore body. This pattern may be interpreted as searching for the maximum extent of the ore body edge.

Initial measurement trajectory. Each panel shows the belief resulting from the measurements taken by the agent. The circles show the locations at which measurements were taken. The arrows indicate the sequence in which actions were taken.

The complete 22-measurement trajectory is shown in Fig. 10, along with the final histogram. At the conclusion of the measurements, the algorithm correctly decided to mine the deposit. As can be seen, at the time it made its decision, the expected value of the ore quantity was approximately 1 standard deviation above the extraction cost threshold of 150. The agent did not stop exploring once the expected value exceeded the threshold, but only once it had exceeded it by a significant threshold. This suggests that the agent would stop only when the value of the information gained by a measurement was exceeded by the cost of the measurement.

Complete measurement trajectory. The panel on the left shows the complete trajectory of all measurements taken in the illustration case. The panel on the right shows the resulting histogram.

To test the proposed approach, we conducted experiments on a variety of
problem configurations. For these experiments, we tested three different
ore settings.

Single body, fixed position: a single mineralization process generated an ore body with a known centroid location at the center of the exploration domain.

Single body, variable position: a single mineralization process generated an ore body with an unknown centroid location somewhere in the exploration domain.

Two bodies, variable positions: two mineralization processes generated ore bodies, both with unknown centroid locations within the exploration domain.

We also tested the performance of POMCPOW against a baseline grid-pattern
approach. In this method, measurements were taken at locations defined by

We tested grids with 4, 9, and 16 measurements, as well as a single point fixed at the center of the exploration area. We also tested a baseline in which measurement locations were selected at random at each step. This allows us to understand the improvement of the approaches relative to an achievable lower bound.

Baseline grid patterns. The panels show the baseline grid
patterns for 2-by-2, 3-by-3, and 4-by-4 grids, each with a total of 4, 9, and
16 measurements, respectively. The grids cover the extent of a

We ran Monte Carlo tests on the problem configurations described. For each case, we generated a set of 100 mineral-field realizations, each one assumed as a possible truth. For each realization, measurements were taken according to the constrained and unconstrained POMCPOW solvers, the grid policy, and the random policy. The change in mean error and standard deviation for all the approaches was calculated. For the POMCPOW solver, we also measured the expected number of measurements as a function of the total deposit size and the accuracy of the final MINE or ABANDON decision.

The data from the tests suggested that different behavior emerged through POMCPOW for cases that were non-economic, highly economic, and borderline–economic. To investigate this, we solved one of each economic level for the three deposit settings using POMCPOW with action constraints. At the end of this section, we present the results of these trials and a plot of the observed trend in the Monte Carlo data.

In this section, we present the results for the Monte Carlo tests on the
case with a single, unimodal mineralization process located at the center of
the exploration domain. For every solver, we measured the belief accuracy by
calculating the relative mean absolute error (RMAE) of the estimated deposit
volume resulting from each measurement. The relative MAE is the estimate
error relative to the true deposit volume and is defined as

Relative MAE single mineralization, fixed location. The plot shows the mean relative absolute error after a given number of measurements taken under each tested method. The mean absolute error is shown along with 1 standard error bounds for each trend.

We also measured the change in uncertainty (belief) by calculating the
standard deviation resulting from each measurement. After each measurement,
we calculated the ratio of the resulting volume standard deviation relative
to the initial belief standard deviation (the Bayesian prior of volume).
After measurement

Single-body, fixed-location standard deviation ratios. The plot shows the mean standard deviation ratio after a given number of measurements taken under each tested method. The mean ratio is shown along with 1 standard error bounds for each trend.

In addition to the belief trends shown above, we also further analyzed the behavior of the POMCPOW methods with and without action distance constraints. For each, we examined the accuracy of the algorithm in making its final MINE or ABANDON decision, as well as how many measurements it took before reaching a decision. We also looked at the general trend in where it took measurements relative to the mineralization centroid location. These are presented in the following subsections.

The final decision results for the POMCPOW solver with constraints on the maximum distance between measurement locations are shown in Table 2. This table presents the proportions of profitable and unprofitable deposits that POMCPOW decided to MINE or ABANDON at the end of each trial. A deposit is profitable if the ore volume exceeds the extraction threshold. A decision to MINE a profitable deposit or to ABANDON an unprofitable deposit is considered correct. The total amount of ore in profitable deposits that was mined is also presented. The average number of measurements taken before making a decision is shown for each deposit type and for all cases.

Single-body, fixed-location POMCPOW results with action constraints.

Among the assumed “true” deposits, 32 % are profitable. Among all the profitable cases, there is a total of 1154 units of ore, with POMCPOW deciding to mine 1097 units corresponding to 95 % of profitable ore correctly extracted. On average, POMCPOW took 1.8 more measurements in profitable cases than in unprofitable cases.

POMCPOW was able to decide when to terminate taking measurements at any point during the campaign. If it did not decide to terminate, it was limited to a total of 25 measurements. Figure 15 below shows the histogram of the number of measurements before termination taken by POMCPOW over the Monte Carlo trials.

Measurement histogram, POMCPOW with action constraints, single body with fixed location. This figure shows a histogram of the number of measurements taken by the POMCPOW solver over all Monte Carlo trials. The trials were limited to a maximum of 25 measurements.

We recorded the distance between each measurement in the sequence and the center of the mineralization. The average distance for each point in the sequence is shown for 10 measurements in Fig. 16, along with 1 standard error bars. One notices how the agent starts away from the center of the ore body, steps in toward the center, and then gradually steps away from the center.

Measurement distance to center, POMCPOW with action constraints, single body with fixed location. The plot shows the average distance between the measurement location and the mineralization center for the measurements at each time step. 1 standard error bars are also presented. The dotted line is the maximum ore body radius, and the dash–dotted line is the mean ore body radius. Note that the intelligent agent steps further out because of the imperfect measuring of the ore body size.

The final decision results for the POMCPOW solver with no constraints on measurement locations are shown in Table 3. The same set of trial deposits was used to test both the constrained and unconstrained cases. The same results as presented in the constrained case are presented here for the unconstrained case.

Single-body, fixed-location POMCPOW results without action constraints.

Among all the profitable cases, there is a total of 1154 units of ore, with POMCPOW deciding to mine 1058 units corresponding to 91.6 % of profitable ore correctly extracted. On average, POMCPOW took 1.7 more measurements in profitable cases than in unprofitable cases.

As in the constrained test, we plot the number of measurements taken before making the final decision in Fig. 17. We also present the average distance from the deposit center in Fig. 18.

Measurement histogram, POMCPOW without action constraints, single body with fixed location. This figure shows a histogram of the number of measurements taken by the POMCPOW solver over all Monte Carlo trials. The trials were limited to a maximum of 25 measurements.

Measurement distance to center, POMCPOW without action constraints, single body with fixed location. The plot shows the average distance between the measurement location and the mineralization center for the measurements at each time step. 1 standard error bars are also presented.

In this section, we present the results for the Monte Carlo tests on the case with a single, unimodal mineralization process located at a variable, unknown point in the exploration domain. For every solver, we measured the belief accuracy by calculating the relative mean absolute error (RMAE) of the estimated deposit volume resulting from each measurement. The resulting trends are shown in Fig. 19 with 1 standard error bounds.

Relative MAE for single mineralization, variable location. The plot shows the mean relative absolute error after a given number of measurements taken under each tested method. The mean absolute error is shown along with 1 standard error bounds for each trend.

We also measured the change in belief uncertainty by calculating the standard deviation ratios of the belief volume estimate resulting from each measurement. The mean standard deviation ratios over the Monte Carlo trials for each of the solvers are shown in Fig. 20 along with 1 standard error bounds.

Single-body, variable-location standard deviation ratios. The plot shows the mean standard deviation ratio after a given number of measurements taken under each tested method. The mean ratio is shown along with 1 standard error bounds for each trend.

The final decision results for the POMCPOW solver with distance constraints on measurement locations are shown in Table 4. The same set of trial deposits was used to test both the constrained and unconstrained cases.

Single-body, variable-location POMCPOW results with action constraints.

For the deposits tested, 19 % were profitable. Among all the profitable cases, there was a total of 814 units of ore, with POMCPOW deciding to mine 778 units corresponding to 95.6 % of profitable ore correctly extracted. On average, POMCPOW took 4.0 more measurements in profitable cases than in unprofitable cases.

We plotted the number of measurements taken before making the final decision in Fig. 21. We also present the average distance from the deposit center in Fig. 22.

Measurement histogram, POMCPOW with action constraints, single body with variable location. This figure shows a histogram of the number of measurements taken by the POMCPOW solver over all Monte Carlo trials. The trials were limited to a maximum of 25 measurements.

Measurement distance to center, POMCPOW with action constraints, single body with variable location. The plot shows the average distance between the measurement location and the mineralization center for the measurements at each time step. 1 standard error bars are also presented.

The final decision results for the POMCPOW solver with no constraints on measurement locations are shown in Table 5.

Single-body, variable-location POMCPOW results without action constraints.

Among all the profitable cases, there was a total of 814 units of ore, with POMCPOW deciding to mine 754 units corresponding to 92.6 % of profitable ore correctly extracted. On average, POMCPOW took 4.4 more measurements in profitable cases than in unprofitable cases.

As in the constrained test, we plotted the number of measurements taken before making the final decision in Fig. 23. We also present the average distance from the deposit center in Fig. 24.

Measurement histogram, POMCPOW without action constraints, single body with variable location. This figure shows a histogram of the number of measurements taken by the POMCPOW solver over all Monte Carlo trials. The trials were limited to a maximum of 25 measurements.

Measurement distance to center, POMCPOW without action constraints, single body with variable location. The plot shows the average distance between the measurement location and the mineralization center for the measurements at each time step. 1 standard error bars are also presented.

In this section, we present the results for the Monte Carlo tests on the case with two mineralization processes located at variable, unknown points in the exploration domain. For every solver, we measured the belief accuracy by calculating the relative mean absolute error (RMAE) of the estimated deposit volume resulting from each measurement. The resulting trends are shown in Fig. 25 with 1 standard error bounds.

Relative MAE, two mineralization processes. The plot shows the mean relative absolute error after a given number of measurements taken under each tested method. The mean absolute error is shown along with 1 standard error bounds for each trend.

We also measured the change in belief uncertainty by calculating the standard deviation ratios of the belief volume estimate resulting from each measurement. The mean standard deviation ratios over the Monte Carlo trials for each of the solvers are shown in Fig. 26 along with 1 standard error bounds.

Two-mineralization-process standard deviation ratios. The plot shows the mean standard deviation ratio after a given number of measurements taken under each tested method. The mean ratio is shown along with 1 standard error bounds for each trend.

The final decision results for the POMCPOW solver with no constraints on measurement locations are shown in Table 6. The same set of trial deposits was used to test both the constrained and unconstrained cases.

Multi-body POMCPOW results with action constraints.

For the deposits tested, 19 % were profitable. Among all the profitable cases, there was a total of 808 units of ore, with POMCPOW deciding to mine 713 units corresponding to 88.2 % of profitable ore correctly extracted. On average, POMCPOW took 4.7 more measurements in profitable cases than in unprofitable cases.

We plotted the number of measurements taken before making the final decision in Fig. 27.

Measurement histogram, POMCPOW with action constraints, multiple ore bodies. This figure shows a histogram of the number of measurements taken by the POMCPOW solver over all Monte Carlo trials. The trials were limited to a maximum of 25 measurements.

The final decision results for the POMCPOW solver with no constraints on measurement locations are shown in Table 7.

Multi-body POMCPOW results with action constraints.

Among all the profitable cases, there was a total of 814 units of ore, with POMCPOW deciding to mine 764 units corresponding to 93.0 % of profitable ore correctly extracted. On average, POMCPOW took 3.8 more measurements in profitable cases than in unprofitable cases.

As in the constrained test, we plotted the number of measurements taken before making the final decision in Fig. 28.

Measurement histogram, POMCPOW without action constraints, multiple ore bodies. This figure shows a histogram of the number of measurements taken by the POMCPOW solver over all Monte Carlo trials. The trials were limited to a maximum of 25 measurements.

Deposit size study results for the case of a single body with fixed centroid location. The sub-economic, borderline, and economic cases are shown in the left, center, and right columns, respectively. The top row shows the massive ore present in the tested case. The center row shows the trajectory taken by POMCPOW and the standard deviation of the resultant belief. The bottom row shows the histogram of the ore volumes in the final belief along with the true massive ore volume.

Deposit size study results for the case of a single body with variable centroid location. The sub-economic, borderline, and economic cases are shown in the left, center, and right columns, respectively. The top row shows the massive ore present in the tested case. The center row shows the trajectory taken by POMCPOW and the standard deviation of the resultant belief. The bottom row shows the histogram of the ore volumes in the final belief along with the true massive ore volume.

The POMCPOW solver was allowed to terminate the measurement campaign at any
point before the maximum of 25 measurements were taken. We hypothesized that
the size of the deposit being measured would impact how many measurements
POMCPOW decided to take. To test this, we ran POMCPOW on three different
deposit sizes for each of the three problem configurations.

Sub-economic: the total massive ore was below the economic cutoff threshold by more than 30 % of the threshold value.

Borderline–economic: the total massive ore was within 10 % of the economic cutoff threshold value.

Economic: the total massive ore was above the economic cutoff threshold by at least 20 % of the economic threshold value.

Deposit size study results for multi-body case. The sub-economic, borderline, and economic cases are shown in the left, center, and right columns, respectively. The top row shows the massive ore present in the tested case. The center row shows the trajectory taken by POMCPOW and the standard deviation of the resultant belief. The bottom row shows the histogram of the ore volumes in the final belief along with the true massive ore volume.

The number of measurements taken in each tested configuration is summarized in Table 8. In all three problem configurations, POMCPOW made significantly fewer measurements on the sub-economic deposits than it did on the borderline or economic deposits. In the single-body cases, POMCPOW measured the borderline–economic deposits more than the economic case. In the multi-body case, POMCPOW reached the maximum of 25 measurements for both the borderline and economic cases.

Deposit size study summary. The total number of measurements taken by POMCPOW before terminating the measurement campaign is shown for each test configuration and deposit size. Cases in which the maximum 25 measurements were taken are shown in bold.

We examined the results of the Monte Carlo studies for a trend in the measurement campaign length. There was a positive correlation between the size of the mineral deposit and the number of measurements taken in the single-body cases. This trend is shown in Fig. 32. The multi-body cases did not have a significant number of trials with fewer than 10 measurements.

Measurement campaign length and deposit size. The mean deposit size is shown for different measurement campaign lengths, along with 1 standard error bounds.

In all three deposit configurations tested in the Monte Carlo studies, the measurements taken by POMCPOW tended to improve the RMAE and the standard deviation ratio of the resulting belief significantly more quickly than the grid-pattern and random methods. In all cases, POMCPOW tended to reach the accuracy and precision of the full 16-measurement grid after just 7 to 10 measurements. With increasing complexity of the problem (more uncertainty, more bodies) the difference in performance between the AI and the grid-pattern method increases. In the single-body cases, the performance of the POMCPOW solver with and without action constraints was not significantly different. In several cases, the constrained trajectories outperformed the unconstrained trajectories in terms of both belief accuracy and variance. This suggests that the POMCPOW solver did not completely converge in the unconstrained cases, since the constrained trajectories are necessarily a subset of those reachable in the unconstrained case. This is likely a result of the unconstrained problem having significantly more locations for POMCPOW to select from at each step. Converging on larger search spaces tends to require more trial simulations in POMCPOW to converge. In the presented experiments, the POMCPOW trials were run with the same number of rollouts in both the constrained and unconstrained cases. In the multi-body cases, the unconstrained solver did tend to outperform the constrained solution. This suggests that the constraints pose a more significant limitation to the solution in the multi-body case than in the single-body case.

In the single-body cases, the final MINE or ABANDON decisions made by POMCPOW were accurate in both economic and non-economic cases, with the correct decision made in over 90 % of cases in most test configurations. The accuracy in non-economic cases tended to be slightly higher than in economic cases. This is likely the result of sub-economic deposits being more common in the prior distribution than economic deposits and the initial belief expected ore volume starting below the economic threshold. The percentage of profitable ore mined tended to be higher than the ratio of correct mining decisions. For example, in the single-body fixed-location case with measurement constraints, POMCPOW correctly identified approximately 89 % of the profitable cases, though it mined 95 % of all the profitable ore. This suggests that the economic cases which POMCPOW failed to correctly identify were only marginally economic.

The accuracy of the final POMCPOW decisions decreased significantly in the multi-body cases. In approximately 32 % of profitable cases, the algorithm incorrectly decided to abandon the prospect. Inspection of the test results suggested that this was due to the belief model (Bayes' model) failing to correctly resolve one of the two ore bodies before making a decision. An example of this is shown in Fig. 32, where the algorithm incorrectly abandoned the marginally economic deposit after seven measurements before resolving both bodies. This behavior is likely caused by the belief incorrectly concentrating probability on sub-economic, single-body cases, not by the POMCPOW algorithm. The observed belief behavior was likely a result of the particle ensemble failing to retain a sufficient number of multi-body instances. Many methods have been proposed to monitor and prevent this type of particle filter degeneracy (Thrun et al., 2005), and hence future research will focus on including better particle filter methods for these types of problems

Multi-body failure example. This figure shows an example of an incorrect ABANDON decision made based on the multi-body case. In this trial, the belief converged too quickly to a sub-economic case with a single ore body before resolving the second ore body in the southwest.

Interesting emergent behavior was observed in the single-body cases. The initial measurement was not typically taken at the center of the belief distribution but was instead offset slightly. The subsequent measurements tended to step in towards the center before gradually moving outward. This behavior can be understood as intuitive extent-finding methodology. Each measurement is taken to try to locate the edge of the deposit, where the most information about the deposit size can be learned. As more information is gained near the center, where positive observations are more likely, the measurements tend to move outward toward more information, but higher-variance data may be gathered.

One important feature of the defined POMDP is that it allows the solver to make a variable number of measurements before concluding. In each case studied, a wide variety of trajectory lengths were observed. Because there is a cost per measurement and a time discount on the eventual reward, POMCPOW tended to prefer shorter measurement campaigns when possible, with fewer than five measurements being the mode in most cases. However, clear evidence of truncation at the upper end can be seen in the measurement histograms, suggesting that in some cases, more than the maximum allowed 25 measurements would have been taken had the limit not been imposed. In general, it was observed that POMCPOW took more measurements on cases that we would consider more difficult. In cases that were borderline–economic, in which resolving the deposit size with good fidelity was necessary to make the correct final decision, POMCPOW tended to take more measurements. For clearly sub-economic cases, POMCPOW abandoned after just a few measurements. For clearly economic cases, POMCPOW took more measurements than in clearly sub-economic cases. This is likely caused by the initial belief starting with an expected sub-economic value. This would require more Bayesian updates to converge toward an economic value than a sub-economic value. We also noted that fewer measurements were taken in the fixed-location cases than in the variable location cases. This is likely the result of the latter cases requiring the POMCPOW solver to localize the deposit in addition to measuring its extent.

The hyperparameters of the POMCPOW were set through a basic grid search over widening and search parameters. To limit the computational expense, the total number of trial trajectories was fixed at 10 000, which allowed the study to be run with tractable computational limits. Changing progressive widening parameters also changed the computational expense and depth of search and therefore the greediness of the resultant policy. Overly aggressive widening tended to result in short-sighted policies that are one-step greedy, since the Monte Carlo estimates for each action will tend to be dominated by very short horizon trajectories. In our problem, this would tend to result in the degenerative policy of always abandoning the prospect on the first step, since that was the only action with a non-negative expected one-step return.

In this work, we presented a Bayesian sequential decision-making approach to improving geoscientific models through sequential data acquisition planning, with application to mineral exploration. We presented a framework to model challenges like mineral exploration problems by means of partially observable Markov decision processes (POMDPs). We demonstrated the general method with a specific example case in which we solved a 2D mineral exploration problem with a known exploration area. To solve this problem, we developed a hierarchical Bayesian belief using a particle filter, Gaussian process regression, and the Monte Carlo search algorithm POMCPOW.

The results of our studies demonstrate that a closed-loop sequential decision-making approach significantly outperforms a typical fixed-pattern grid approach. The measurements recommended by POMCPOW improved the accuracy and variance of the belief over the deposit extent significantly faster than the baseline methods. The resulting behavior that emerged from POMCPOW was intuitive and tended to result in shorter measurement campaigns than a fixed pattern, resulting in comparable accuracy.

The methods presented in this work are general to many areas of resource exploration. The belief and solver presented for the test case are not necessarily required to implement this approach. Future work should apply these methods to higher-fidelity exploration problems using more realistic geological models and measurement simulations, such as geophysical surveys. The POMCPOW solver was chosen because it is generally applicable to many POMDPs without modification. However, as seen in the unconstrained cases, POMCPOW may not have converged to an approximately optimal solution. Future work should investigate modifications to the baseline POMCPOW algorithm to improve its performance in exploration tasks. Extensions to POMCPOW should be explored to use the fact that the deposit state underlying the belief is static to reduce the variance of the value estimates and the required sample complexity of the search. Future work should also investigate other solver types, such as point-based value iteration (PBVI), that may handle high-variance beliefs more efficiently.

The current version of Intelligent Prospector is available from the project
website at

JM developed the code, methodologies, and conceptualization. JC developed methodologies and conceptualization as well as providing project supervision.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper was edited by Thomas Poulet and reviewed by two anonymous referees.