Surf3DNet1.0: A deep learning model for regional-scale 3D subsurface structure mapping

This study introduces an efficient deep learning approach based on convolutional neural networks with joint autoencoder and adversarial structures for 3D subsurface mapping from surface observations. The method was applied to 10 delineate palaeovalleys in an Australian desert landscape. The neural network was trained on a 6,400 km domain by using a land surface tomography as 2D input and an airborne electromagnetic (AEM)-derived probability map of palaeovalley presence as 3D output. The trained neural network has a maximum square error < 0.10 and produces a square error < 0.10 across 93% of the validation areas, demonstrating that it is reliable in reconstructing 3D palaeovalley patterns beyond the training area. Due to its generic structure, the neural network structure designed in this study and the training algorithm have 15 broad application potential to construct 3D geological features (ore bodies, aquifer) from 2D land surface observations.


Introduction
Imaging the Earth's subsurface is crucial for the exploration and management of mineral, energy and groundwater resources; its reliability depends on the availability and quality of geological data. Although the amount and quality of geological data obtained from borehole logs, geophysical prospecting and remote sensing has increased dramatically over the past decades, 20 their spatial distribution is highly uneven. Most data exist as land surface observations in a limited number of highlydeveloped areas such as mining and oil fields. Predictive models are much-needed for extracting hidden information from local-scale rich/big datasets for predicting 3D regional-scale subsurface structures in data-poor areas.
Commonly used methods for modelling complex geological structures include geostatistical approaches such as sequential Gaussian or indicator simulation (Lee et al., 2007), transition probability simulation (Felletti et al., 2006;Weissmann and 25 Fogg, 1999), or multiple-point simulation (MPS) methods (Hu and Chugunova, 2008;Mariethoz and Caers, 2014;Strebelle, 2002). However, they often present drawbacks such as not being able to represent elongated and connected geological structures such as palaeovalleys, being inefficient in capturing essential features and patterns from very large training https://doi.org/10.5194/gmd-2020-106 Preprint. Discussion started: 19 June 2020 c Author(s) 2020. CC BY 4.0 License. datasets, or presenting a high computational cost. A fast and reliable tool for high-resolution 3D subsurface imaging based on multiple-support big dataset is still lacking.
Deep learning approaches specialised in big data mining have the potential to fill this gap (Gu et al., 2017;Hinton and Salakhutdinov, 2006;Marcais and de Dreuzy, 2017). Applications in the geosciences include earthquake detection based on seismic monitoring (Mousavi and Beroza, 2019;Perol et al., 2018), or disaster recognition from remote sensing data (Amit et al., 2016;Lä ngkvist et al., 2016), among others. A recent breakthrough in deep learning is the 2D to 3D image processing (Niu et al., 2018;Sinha et al., 2017;Wu et al., 2016;Yi et al., 2017). Such approaches point out a novel way to rapidly and 35 automatically identify covered 3D subsurface structures directly from readily-available 2D surface observations (digital elevation models, land cover maps, signals captured by airborne geophysical surveys).
To this end, we designed a deep convolutional neural network (CNN) with joint autoencoder (Kingma and Welling, 2013) and adversarial structures (Goodfellow et al., 2014). The autoencoder component features large input and output images connected by a small latent space. This structure is advantageous for the fusion of complex input data and 3D image 40 reconstruction. Its training involves direct back-propagation according to a voxel-wise independent heuristic criterion, and thus often needs a large training dataset to constrain the model and avoid overfitting (Laloy et al., 2018). The generative adversarial learning tries to generate multiple images inheriting the probability structure of one real image, which relaxes the need for very large training dataset. The proposed approach successfully generates regional-scale 3D palaeovalley patterns from 2D digital terrain information in an Australian desert landscape, demonstrating that the interplay between autoencoder 45 and adversarial components provides a generic tool to exploit more effectively geophysical, land surface and other data to generate realistic regional-scale 3D geological structures.

Method
The adversarial neural network for 3D subsurface imaging involves three steps: (1) patch extraction and representation, (2) nonlinear mapping and reconstruction, and (3) statistical expression of the generated image (Fig. 1). The first step is referred 50 to as 'encoder' (Fig. 1a), which is employed to fuse the information contained in the 2D land surface observation images (input data) into a low-dimension layer by successive convolutions (Hinton and Salakhutdinov, 2006): where f is a nonlinear function referred to as "activation function", W is a matrix of weights and b is a bias vector in the encoder.
The encoder can be designed to contain multiple layers, where the number of layers is defined as 'depth'. Each layer can contain multiple images, with the number of images defined as 'width'. The images in one layer are convoluted to generate https://doi.org/10.5194/gmd-2020-106 Preprint. Discussion started: 19 June 2020 c Author(s) 2020. CC BY 4.0 License. and horizontal directions.
The weight and bias in the encoder are trained to ensure that the code follows a standard normal distribution, by minimizing the Kullback-Leibler divergence (L1), defined as (Kullback and Leibler, 1951): where N is number of codes in the final output layer of the encoder, and are the mean and standard deviation of the codes, respectively. the input data uses the 2D MrVBF (an index calculated from globally available digital elevation model); the output is a 3D probability map of palaeovalley presence. For convenience of 3D convolution, the 2D input image (800×800×1) is simply repeated in 10 layers to form a 3D input dataset (800×800×10). Following a structure optimization by trial-and-error, the encoder is designed to contain 4 layers, with a width of 64, 32, 32 and 1 in each layer, respectively; the decoder contains 6 layers, with a width 75 of 1, 16, 32, 32, 64, and 128, respectively; the discriminator contains 4 layers with a width of 128, 64, 32, 1, respectively.
In the second step, the codes are converted into a 3D output image by deconvolution (referred to as 'decoder', Fig. 1a), which is a process involving a zero-padding before the convolution (Fig. 1c). The combination of decoder and encoder forms a 'generator', linking input and output images. The generated 3D image is referred to as 'simulated image'.
To ensure that the simulated image is comparable to a real image, a voxel-wise independent heuristic criterion is minimized.

80
The mean square error (L2) between simulated and real images at all voxels is used as criterion to update the weight and bias in the decoder, which is expressed as: where M is the number of voxels in the output 3D image, is the real image, z is the code generated from the encoder, and (•) represents the convolutional calculations in the decoder (in the same form as Eq. 1).
However, if only a limited number of real 3D images are available to train the network, the use of a voxel-wise independent 85 criterion often leads to an overfitting problem. Goodfellow (2014) proposed a generative adversarial network structure, which adds a 'discriminator' to convert simulated and real images to a vector, respectively, by an identical convolution process (Fig. 1a). Adversarial criteria are proposed, typically expressed by binary cross entropy functions as: and where V is the size of the output vector via the discriminator, and D(•) represents the calculations (Eq. 1) in the discriminator.

90
The weights in the discriminator are trained to minimize L4, which attempts to distinguish the vectors generated from the real and simulated 3D images. The weights in the generator are trained to minimize L3, which attempts to fool the discriminator to be unable to identify the vector generated from the simulated 3D image. In such a way, the generator can produce images aligned with the real image in terms of probability structure (Goodfellow et al., 2014).
Finally, while the loss function L4 is minimized to optimize the weights in the discriminator, a comprehensive loss function where a, b, c are the coefficients on each loss function. This loss function makes it convenient to vary the neural network structure between semi-supervised learning with additional adversarial neural network by defining coefficient c as non-zero value, and supervised learning with merely autoencoder neural network with c as zero.
The hyperparameters (including the width, depth, filter size and the coefficients in generator loss functions, etc.) defining the 100 neural network structure, are determined by trial-and-error tests (Supplementary materials). Weight and bias in generator and discriminator are trained to minimize Lg and L4 using the stochastic gradient descent algorithm, referred to as adaptive moment estimation (ADAM) (Kingma and Ba, 2014). We implemented the above convolution neural network using the Tensorflow Python library (Abadi et al., 2016). Once the neural network is trained, the 'generator' in the network (Fig. 1a) is used independently to generate 3D subsurface structures from the 2D land surface observations.

3 Results
We use a CSIRO dataset to test the effectiveness of our deep-learning approach in predicting 3D palaeovalley patterns in the Anangu Pitjantjatjara Yankunytjatjara (APY) lands of South Australia ( Fig. 2a and 2b).
where max and min represent the maximum and minimum logarithm of EC values over the entire dataset, respectively. PAI 115 ranges from 0.0 to 1.0 and is calculated in the first 100 m depth at the AEM-surveyed area, which is considered as a groundtruth 3D probability map of palaeovalley occurrences with a spatial resolution of 400 m×400 m×10 m.

145
Furthermore, another 19 validation areas west to the training domain (Fig. 2d) are used to monitor the decay in the accuracy of predicted palaeovalley patterns. This is done using two different models: semi-supervised learning with additional adversarial neural network and supervised learning with only autoencoder neural network (controlled by coefficient c in Eq.

6).
As shown in Fig. 4, an extremely small error (<0.01) can be achieved when constructing palaeovalleys in the training area by  While the autoencoder learning generally performs better than the adversarial learning in terms of mean squared errors, its 160 prediction errors (especially the 95% quantile) increase much faster with the separation distance between validation and training areas. This indicates that using the autoencoder only can potentially lead to very large errors (or poor predictions) in case of discrepancies between training and prediction areas. We hypothesize that these errors are due to model overfitting in the case of using only the autoencoder learning. To confirm this, we conduct an overfitting test based on a synthetic ground being a random MrVBF input following a uniform distribution (i.e. non-informative) ranging from 0 to 1, which should 165 result in a uniform PAI distribution. As shown in Fig. 4c, a uniform PAI can be generated by adversarial learning, while the predicted PAI by using only the autoencoder learning results in structured patterns. This means that the weights trained by purely supervised learning inherit too much information hidden in the training dataset, which is inflexible in predicting 3D palaeovalley patterns with strong variations from the input image. On the other hand, adversarial learning is much more https://doi.org/10.5194/gmd-2020-106 Preprint. Discussion started: 19 June 2020 c Author(s) 2020. CC BY 4.0 License.
robust to discrepancies and the accuracy decays only slightly in predicting 3D structures in areas further away from the training area, which is a highly desired property in real world applications.

Conclusions
This study developed an efficient and reliable adversarial convolutional neural network simulator for generating 3D subsurface structures directly from 2D land-surface data. The proposed generic structure of the convolutional neural network was composed of an 'encoder' to fuse 2D input data into low-dimension codes following a normal distribution, a 'decoder' to nonlinearly map the low-dimension codes into 3D subsurface images, and a 'discriminator' to statistically express the generated and real subsurface image into a vector for adversarial semi-supervised learning based on a single training image.
The neural network was successfully tested in mapping the 3D palaeovalley systems in the northeast Great Victoria Desert, Australia. Training and validation involved using the multiple resolution valley bottom flatness (MrVBF) (input) and 3D palaeovalley aquifer index (PAI) (output) on an area of 80×80 km 2 . The neural network trained to a maximum error <0.1 can 180 predict 3D PAI with errors <0.1 at over 90% of the validation zones.
The outstanding performance of the deep-learning neural network for 3D subsurface structure imaging has applications as a generic novel tool for making better use of existing multiple-support geophysical, land surface, and remote sensing data for better management of limited resources such as groundwater.

185
The original data of MrVBF and AEM were provided by John Gallant and Tim Munday, respectively, and