The rigorous quantification of uncertainty in geophysical inversions is a challenging problem. Inversions are often ill-posed and the likelihood surface may be multi-modal; properties of any single mode become inadequate uncertainty measures, and sampling methods become inefficient for irregular posteriors or high-dimensional parameter spaces. We explore the influences of different choices made by the practitioner on the efficiency and accuracy of Bayesian geophysical inversion methods that rely on Markov chain Monte Carlo sampling to assess uncertainty using a multi-sensor inversion of the three-dimensional structure and composition of a region in the Cooper Basin of South Australia as a case study. The inversion is performed using an updated version of the Obsidian distributed inversion software. We find that the posterior for this inversion has a complex local covariance structure, hindering the efficiency of adaptive sampling methods that adjust the proposal based on the chain history. Within the context of a parallel-tempered Markov chain Monte Carlo scheme for exploring high-dimensional multi-modal posteriors, a preconditioned Crank–Nicolson proposal outperforms more conventional forms of random walk. Aspects of the problem setup, such as priors on petrophysics and on 3-D geological structure, affect the shape and separation of posterior modes, influencing sampling performance as well as the inversion results. The use of uninformative priors on sensor noise enables optimal weighting among multiple sensors even if noise levels are uncertain.

Construction of 3-D geological models is plagued by the limitations
on direct sampling and geophysical measurement

The incompleteness and uncertainty of the information contained in
geophysical data frequently mean that there are many possible worlds
consistent with the data being analyzed

Bayesian statistical techniques provide a powerful framework for
characterizing and fusing disparate sources of probabilistic information

Although Bayesian methods provide rigorous uncertainty quantification,
implementing them in practice for complicated forward models with many
free parameters has proven difficult in other geoscientific contexts, such as
landscape evolution

While 1-D inversions for specific sensor types may use some quite
sophisticated sampling methods

The above methods are nonparametric in that the model parameters simply
form a 3-D field of rock properties to which sensors respond.
Although this type of method is flexible, parametric models, in which
the parameterized elements correspond more directly to geological
interpretation, comprise a more transparent and parsimonious approach.

In this paper, we revisit the inversion problem of

In this section we present a brief overview of the Bayesian forward-modeling paradigm to geophysical inversions. We also provide a discussion on implementing Bayesian inference via sampling using MCMC methods. We then present the background of the original Moomba inversion problem, commenting on choices made in the inversion process before we begin to explore different choices in subsequent sections.

A Bayesian inversion scheme for geophysical forward models is comprised of
three key elements:

the underlying parameterized representation of the simulated volume
or history, which we call the world or world view,
denoted by a vector of world parameters

a probability distribution

a probability distribution

Indeed, there is considerable flexibility in choosing the above elements
even in a fully probabilistic context.
For example, the partitioning of information into “data” and
“prior knowledge” is neither unique nor cut-and-dried. However,
there are guiding principles: the ideal set of parameters

The implicit assumption behind the use of mean square error as a
(log) likelihood – that the residuals of the data for each sensor from
the corresponding forward model are independent Gaussian – may not be
true if the data have been interpolated, resampled, or otherwise modified
from original point observations. For example, gravity anomaly and magnetic
anomaly measurements are usually taken at ground level along access trails
to a site or along spaced flight lines in the case of aeromagnetics.
In online data releases, the original measurements may then be interpolated
or resampled onto a grid, changing the number and spacing of points and
introducing correlations on spatial scales comparable to the scale of
the smoothing kernel.
This resampling of observations onto a regular grid may be useful
for traditional inversions using Fourier transform techniques.
However, if used uncritically in a Bayesian inversion context,
correlations in residuals from the model may then arise from the resampling
process rather than from model misfit,
resulting in stronger penalties in the likelihood for what would otherwise
be plausible worlds and muddying questions around model inadequacy.
If such correlations are known to exist, they can be modeled explicitly as
part of the likelihood. For example,
autoregressive models are already being used as error models
for 1-D inversions of magnetotelluric and seismic data

The inference process expresses its results in terms of either

The posterior distribution

An MCMC algorithm comprises a sequence of world parameter vectors

Metropolis–Hastings algorithms form a large class of sampling algorithms limited only by the forms of proposals. Although proofs that the chain will eventually sample from the posterior are important, clearly chains based on efficient proposals are to be preferred. A proposal's efficiency will depend on the degree of correlation between consecutive states in the chain, which in turn can depend on how well matched the proposal distribution is to the properties of the posterior.

One simple, commonly used proposal distribution is a (multivariate)
Gaussian random walk (GRW) step

The SGR method

Many other types of proposals can be used in Metropolis–Hastings
sampling schemes with information from ensembles of particles

The posterior distributions arising in geophysical inversion problems are
also frequently multi-modal; MCMC algorithms to sample such posteriors
need the ability to escape from, or travel easily between, local modes.
Parallel-tempered MCMC, or PTMCMC

Figure

Since only samples from the

Parallel-tempered relaxation of a bimodal distribution.

Even without regard to multiple modes, PTMCMC can also help to reduce
correlations between successive independent posterior samples.

Because MCMC guarantees results only in the limit of large samples,
criteria are still required to assess the algorithm's performance.
Suppose for the discussion below that up to the assessment point,
we have obtained

For Metropolis–Hastings MCMC, the acceptance fraction of proposals is
easily measured and for a chain that is performing well should be

We examine correlations between samples within each chain separated
by a lag time

The results from this procedure must still be evaluated according to how well the underlying statistical model describes the geophysical data and whether the results are geologically plausible – although this is not unique to MCMC solutions. The distribution of residuals of model predictions (forward-modeled datasets) from the observed data can be compared to the assumed likelihood. The standard deviation or variance of the residuals (relative to the uncertainty) provides a convenient single-number summary, but the spatial distribution of residuals may also be important; outliers and/or structured residuals will indicate places where the model fails to predict the data well and highlight parts of the model parameterization that need refinement.

Finally, representative instances of
the world itself should be visualized to check for surprising features.
Given the complexity of real-world data, the adequacy of a given model is in
part a matter of scientific judgment or fitness for a particular applied
purpose to which the model will be put.
We will use the term

For our experiments we use a customized fork

Obsidian was designed to run on large distributed architectures
such as supercomputing clusters.

Obsidian's world is parameterized as a series of discrete units, each with its own spatially constant rock properties separated by smooth boundaries. Each unit boundary is defined by a set of control points that specify the subsurface depth of the boundary at given surface locations. The depth to each unit boundary at any other location is calculated using a two-dimensional Gaussian process regression (kriging) through the control points; each unit is truncated against the overlying unit to allow for the lateral termination of units and ensure a strict stratigraphic sequence.

For a world with

The control point depth offsets within each unit

The likelihood for each Obsidian sensor

The sampling algorithm used by Obsidian is an adaptive form of
PTMCMC, described in detail in

The goal of the original Moomba inversion problem

The chosen region was a portion of the Moomba gas field with dimensions of
35 km

The original choices of how to partition knowledge between prior and
likelihood struck a balance between
the accuracy of the world representation and computational efficiency.
The empirical covariances of the petrophysical sample measurements for each
layer were used to specify a multivariate Gaussian prior on that layer's
rock properties; although these measurements could be construed as data,
the simplifying assumption of spatially constant mean rock properties
left little reason to write their properties into the likelihood.
The gravity, magnetic, magnetotelluric, and thermal data all directly
constrained rock properties relevant to the geothermal application and were
explicitly forward-modeled as data. “Contact points” from drilled wells,
directly constraining the layer depths in the neighborhood of a drilled hole
as part of the likelihood, were available and used to inform the prior
but were not treated as sensors in the likelihood.
Treating the seismic measurements as data would have dramatically increased
computational overhead relative to the use of interpreted reflection horizons
as mean functions for layer boundary depths in the prior. Using interpreted
seismic data to inform the mean functions of the layer boundary priors also
reduced the dimension of the parameter space, letting the control points
specify long-wavelength deviations from seismically derived
prior knowledge at finer detail: each reflection horizon was
interpolated onto a

Given this knowledge of the local geology

Figure

To demonstrate the impact of problem setup and proposal efficiency in a Bayesian MCMC scheme for geophysical inversion, we run a series of experiments altering the prior, likelihood, and proposal for the Moomba problem. We approach this variation as an iterative investigation into the nature of the data and the posterior's dependence on them, motivating each choice with the intent of relating our findings to related 3-D inversion problems.

The experiments described in this section were run on the Artemis high-performance computing cluster at the University of Sydney. Artemis's standard job queue provides access to 56 nodes with 24 Intel Xeon E5-2680-V3 (2.5 GHz) cores each and 80 nodes with 32 Intel Xeon E5-2697A-V4 (2.6 GHz) cores each. Each run used 32 cores and ran for up to 8 h of wall time.

The datasets we use for our experiments are the gravity anomaly, total magnetic intensity, and magnetotelluric readings originally distributed as an example Moomba configuration with v0.1.1 of the Obsidian source code. In order to focus on information that may be available in an exploration context (i.e., publicly available geophysical surveys without contact points), we omit the thermal sensor readings relying on a joint inversion of gravity, magnetic, and magnetotelluric data. Maps of the locations of these sensor readings, referring to the coordinate system of the inversion, are shown in Fig. 2.

Locations of sensor readings used in the inversions in this paper.

All experiments use PTMCMC sampling, with 4 simultaneous
temperature ladders or “stacks” of chains, each with 8 temperatures
unless otherwise specified. The posterior is formally defined in terms
of samples over the world parameters, so when quantifying predictions for
particular regions of the world and their uncertainty (such as entropy),
the parameter samples are each used to create a voxelized realization of
the 3-D world and the average observable calculated over these voxelized
samples. A quantitative summary of our results is shown
in Table

the shortest (

the standard deviations,

the mean information entropy

the CPU time spent per worst-case autocorrelation time as a measure of computational efficiency.

Performance metrics for each run, including the following:
best-case (

The initial work of

The iGRW proposal is the simplest proposal available, but as noted above, it loses efficiency in high-dimensional parameter spaces, and it is unable to adapt if the posterior is highly anisotropic – for example, if parameters are scaled inappropriately or are highly correlated. To maintain the target acceptance rate, the adapted step size approaches the scale of the posterior's narrowest dimension, and the random walk will then slowly explore the other dimensions using this small step size. The time it takes for a random walk to cover a distance scales as the square of that distance, so we might expect the worst-case autocorrelation time for random walk MCMC in a long, narrow mode to scale as the condition number of the covariance matrix for that mode.

The adaptive (anisotropic) Gaussian random walk

A third proposal, addressing high-dimensional parameter spaces, is the
preconditioned Crank–Nicolson (pCN) proposal

Our first three runs (A, B, C) use the iGRW, pCN, and aGRW (with

There are nevertheless differences in efficiency among the samplers. The pCN proposal has not only the lowest median autocorrelation, but also the lowest worst-case autocorrelation across dimensions. The aGRW proposal has the largest spread in autocorrelation times across dimensions, with its median performance comparable to iGRW and its worst-case performance at least 3 times worse (it had still failed to converge after over 1000 h of CPU time). Repeat trials running for twice as many samples with 12 chains per stack instead of 8 (runs A1, B1, C1) produced similar results, although we were then able to reliably measure the worst-case autocorrelation time for aGRW. For all samplers, but most noticeably aGRW, the step size can take a long time to adapt. Large differences are sometimes seen in the adapted step sizes between chains at similar temperatures in different stacks and do not always increase monotonically with temperature.

The differences are shown in Fig.

Trace plots

These behaviors suggest that the local shape of the posterior varies across
parameter space, so proposals that depend on a global fixed scaling
across all dimensions are unlikely to perform well. The clearly superior
performance of pCN for this problem is nevertheless intriguing, since for
a sufficiently small step size near

The different proposals vary in performance when hopping between modes despite the fact that all three proposals are embedded within a PTMCMC scheme with a relatively simple multivariate Gaussian prior, to which aGRW should be able to adapt readily. We believe pCN will prove to be a good baseline proposal for tempered sampling of high-dimensional problems because of its prior-preserving properties, which ensure peak performance when constraints from the data are weak. As the chain temperature increases, the tempered posterior density approaches the prior so that pCN proposals with a properly adapted step size will smoothly approach independent draws from the prior with an acceptance probability of 1. The result is that when used as the within-chain proposal in a high-dimensional PTMCMC algorithm, pCN proposals will result in near-optimal behavior for the highest-temperature chain and should explore multiple modes much more easily than GRW proposals.

This behavior stands in contrast to GRW proposals, for which the acceptance fraction given any particular tuning will approach zero as the dimension increases. In fact, aGRW's attempt to adapt globally to proposals with local structure may mean mid-temperature chains become trapped in low-probability areas and break the diffusion of information down to lower temperatures from the prior. A more detailed study of the behavior of these proposals within tempered sampling schemes would be an interesting topic for future research.

In the fiducial Moomba configuration used in

If specific informative prior knowledge about observational errors exists, using such a prior, or even fixing the noise level outright, makes sense. In cases in which the amplitude of the noise term is not well constrained, using a broader prior on the noise term may be preferable. When more than one sensor with unknown noise variance is used, identical broad priors allow the data to constrain the relative influence of each sensor on the final results. The trade-off is that a more permissive prior on the noise variance could mask structured residuals due to model inadequacy or non-Gaussian outliers in the true noise distribution.

The idea that such broad assumptions could deliver competitive results
arises from the incorporation of Occam's razor into Bayesian reasoning,
as demonstrated in

Typical residuals from the

Under this new likelihood the residuals from the gravity observations increase (by about a factor of 1.5–2), while the residuals from the magnetic sensors decrease (by a factor of 3–4). This rebalancing of residuals among the sensors with an uninformative prior can be used to inform subsequent rounds of modeling more readily.

The inference also changes: in run D, a granite bridge runs from the main
outcrop to the eastern edge of the modeled volume, with the presence of
granite in the northwest corner being less certain. Agreement with run B and
with the

The weight given to the gravity sensor is thus
an important factor determining the behavior of the inversion throughout half
the modeled volume. With weakened gravity constraints, the two modes for the
inferred rock density in layer 3 separate widely
(see Fig.

The comparison map for the inversion of

Without more information – a seismic sensor in our inversion,
priors based on the specific seismic interpretations of

Indeed, one potential weakness of this approach to balancing sensors is model inadequacy: the residuals from the inference may include systematic residuals from unresolved structure in the model, in addition to sensor noise. The presence of such residuals is a model selection question that in a traditional inversion context would be resolved by comparing residuals to the assumed noise level, but this depends strongly upon informative prior knowledge of the sensing process for all sensors used in the inversion. The remaining experiments will use the Cauchy likelihood unless otherwise specified.

Slices through the voxelized posterior probability of occupancy by granite for each run at a depth of 3.5 km.

Gravity anomaly at the surface (

Magnetic anomaly at the surface (

Volume renderings of the posterior mean for runs B

The boundary conditions that Obsidian imposes on world voxelizations
assume that rock properties rendered at a boundary edge
(north–south, east–west) extend indefinitely off the edges, e.g.,

For geophysical sensors with a localized response, one way to mitigate this is
to include in the world representation
a larger area than the sensor data cover, incorporating a margin with
a width comparable to the scale of boundary artifacts, in order to let the model respond to edge effects for sensors with a finite area of response. In run E,
we add a boundary zone of width 5 km around the margins of the world
while increasing the number of control points in the granite intrusion layer
boundary from 49 (

In cases in which samples of rock for a given layer are few or unavailable, the empirical covariance used to build the prior on rock properties may be highly uncertain or undefined. In these cases, the user may have to resort to a broad prior on rock properties. The limiting case is when no petrophysical data are available at all. Similarly, definitive data on layer depths may become unavailable in the absence of drill cores, or at least seismic data, so a broad prior on control point depths may also become necessary.

We rerun the main Moomba analysis using two new priors. The first (run J)
simulates the absence of petrophysical measurements. The layer depth priors
are the same as the

The run J voxelization shows reasonable correspondence with the baseline run D, though with larger uncertainty, particularly in the northwest corner. In the absence of petrophysical samples but taking advantage of priors on overlying structure from seismic interpretations, a preliminary segmentation of granite from basement can thus still be obtained using broad priors on rock properties. Although the algorithm cannot reliably infer the bulk rock properties in the layers, the global prior on structure is enough for it to pick out the shapes of intrusions by looking for changes in bulk properties between layers.

The second run (run K) removes structural prior information instead of
petrophysical prior information. The priors on rock properties are as in
the

Run K yields no reliable information about the location
of granite at 3.5 km of depth. This seems to be solely due to the
uncertain thickness of layers of sedimentary rock that are constrained to be
nearly uniform horizontal slabs in run J, corresponding to a known
insensitivity to depth among potential-field sensors.
When relaxed, these strong priors cause a crisis of identifiability for the
resulting models. Further variations on runs J and K show that
replacing these multiple thin layers with a single uniform slab of

As mentioned above and in

Our experiments show concrete examples of how the efficiency of MCMC
sampling changes with assumptions about the prior, likelihood, and proposal
distributions for an Obsidian inversion, particularly as tight constraints
on the solution are relaxed and uncertainty increases.
Unrealistically tight constraints can hamper sampling, but relaxing priors
or likelihoods may sometimes widen the separation between modes
(as shown in Fig.

While any single data source may be easy to understand on its own, unexpected interactions between parameters can also arise. Structural priors from seismic data or geological field measurements appear to play a crucial role in stabilizing the inversions in this paper, as seen by the collapse of our inversion after relaxing them.

Our findings reinforce the impression that to make Bayesian inversion
techniques useful in this context, the computational burden must be
reduced by developing efficient sampling methods. Three complementary ways forward present themselves:

developing MCMC proposals, or nonparametric methods to approximate probability distributions, that function in (relatively) high-dimensional spaces and capture local structure in the posterior;

developing fast approximate forward models for complex sensors (especially seismic) that deliver detailed information at depth, along with new ways of assessing and reducing model inadequacy; and

developing richer world parameterizations of 3-D geological models that faithfully represent real-world structure in as few dimensions as possible.

Taking derivatives of a complex forward model by finite differences
is also likely to be prohibitively expensive, and practitioners
may not have the luxury of rewriting their forward model code to return
derivatives.
This is one goal of writing fast emulations of forward models,
particularly emulations for which derivatives can be calculated analytically

Another source of overall model inadequacy comes from the world
parameterization, which can be viewed as part of the prior.
Obsidian's world parameterization is tuned to match sedimentary basins
and is thus best suited for applications such as oil, gas, and
geothermal exploration; it is too limited to represent more complex
structures, particularly those with folds and faults, that might arise
in hard rock or mining scenarios.
The GemPy package developed by

We have performed a suite of 3-D Bayesian geophysical inversions for the
presence of granite at depth in the Moomba gas field of the Cooper Basin, including
altering aspects of the problem setup to determine their effects on the
efficiency and accuracy of MCMC sampling. Our main findings are as follows.

Parameterized worlds have much lower dimensionality than nonparametric worlds, and the parameters also offer a more interpretable description of the world – for example, boundaries between geological units are explicitly represented. However, the resulting posterior has a complex local covariance structure in parameter space, even for linear sensors.

Although isotropic random walk proposals explore such posteriors inefficiently, poorly adapted anisotropic random walks are even less efficient. A modified high-dimensional random walk such as pCN outperforms these proposals, and the prior-preserving properties of pCN make it especially attractive for use in tempered sampling.

The shape of the posterior and number of modes can also depend in complex ways upon the prior, making tempered proposals essential.

Hierarchical priors on observational noise provide a way to capture uncertainty about the weighting among datasets, although this may also make sampling more challenging, such as when priors on world parameters are relaxed.

Useful information about structures at depth can sometimes be obtained through sensor fusion even in the absence of informative priors. However, direct constraints on 3-D geometry from seismic interpretations or structural measurements seem to play a privileged role among priors owing to the relatively weak constraints on depth of structure afforded by potential-field methods.

The code for version 0.1.2 of Obsidian is available at

The usual mean square likelihood often used in geophysical sensor inversions
assumes the residuals of the sensor measurements from each forward model are
independent Gaussian-distributed with some variance

Unless the noise levels in the sensors are themselves targets for inference,
sampling will be more efficient if their values are integrated out beforehand.
If the conditional likelihood

For a single observation

The supplement related to this article is available online at:

The study was conceptualized by SC, who with MG provided funding and resources. RS was responsible for project administration and designed the methodology under supervision from SC and GH. RS and DK carried out the development of the Obsidian code resulting in v0.1.2, carried out the main investigation and formal analysis, and validated and visualized the results. RS wrote the original draft text, for which all coauthors provided a review and critical evaluation.

The authors declare that they have no conflict of interest.

This work is part of the Lloyd's Register Foundation–Alan Turing Institute Programme for Data-Centric Engineering. Richard Scalzo thanks Lachlan McCalman, Simon O'Callaghan, and Alistair Reid for useful discussions about the development of Obsidian up to v0.1.1. The authors acknowledge the Sydney Informatics Hub and the University of Sydney’s high-performance computing cluster, Artemis, which have contributed to the results reported in this paper. This research was undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI), which is supported by the Australian Government.

This paper was edited by Thomas Poulet and reviewed by two anonymous referees.