Mapping high-resolution basal topography of West Antarctica from radar data using non-stationary multiple-point geostatistics (MPS- BedMappingV1)

The subglacial bed topography is critical for modeling the evolution of Thwaites Glacier in the Amundsen Sea Embayment (ASE), where rapid ice loss threatens the stability of the West Antarctic Ice Sheet. However, mapping of subglacial topography is subject to high uncertainty. This is mainly because the bed topography is measured by airborne ice-penetrating radar along flight lines with large gaps up to tens of kilometers. Deterministic interpolation approaches do not reflect such 15 spatial uncertainty. While traditional geostatistical simulation can model such uncertainty, it may be difficult to apply because of the significant non-stationary spatial variation of topography over such large surface area. In this study, we develop a nonstationary multiple-point geostatistical approach to interpolate large areas with irregular geophysical data and apply it to model the spatial uncertainty of entire ASE basal topography. We collect 166 high-resolution topographic training images (TIs) to train the gap-filling of radar data gaps, thereby simulating realistic topography maps. The TIs are extensively sampled from 20 deglaciated regions in the Arctic as well as Antarctica. To address the non-stationarity in topographic modeling, we introduce a Bayesian framework that models the posterior distribution of non-stationary training images to the local modeling domain. Sampling from this distribution then provide candidate training images for local topographic modeling with uncertainty, constrained to radar flight line data. Compared to traditional MPS approaches without considering TI sampling, our approach demonstrates significant improvement in the topographic modeling quality and efficiency of the simulation algorithm. Finally, 25 we simulate multiple realizations of high-resolution ASE topographic maps. We use the multiple realizations to investigate the impact of basal topography uncertainty on subglacial hydrological flow patterns.


Introduction
The topography beneath the Greenland and Antarctic ice sheets is essential for nearly every ice sheet investigation, including modeling subglacial hydrology (MacKie et al., 2021), interpreting geologic conditions (Holschuh et al., 2020), estimating ice 30 volume and sea level rise contributions (Fretwell et al., 2013), and ice sheet modeling for sea level rise projections (Le clec'h https://doi.org/10.5194/gmd-2021-297 Preprint. Discussion started: 14 September 2021 c Author(s) 2021. CC BY 4.0 License. et al., 2019;Seroussi et al., 2017). The characterization of subglacial topography is particularly important for Thwaites Glacier in the Amundsen Sea Embayment, which is experiencing accelerating ice loss (Rignot et al., 2019) that could destabilize the West Antarctic Ice Sheet (Joughin et al., 2014). Subglacial topography is predominantly measured with airborne icepenetrating radar along flight lines separated by up to tens of km (Fretwell et al., 2013;Herzfeld et al., 1993). Large gaps in 35 data remain, which are generally interpolated deterministically using methods such as kriging (Fretwell et al., 2013) or ice sheet model inversions (Huss and Farinotti, 2012;Morlighem et al., 2017Morlighem et al., , 2020. These approaches produce topography that is unrealistically smooth and provide limited morphological information. Furthermore, deterministically interpolated topography does not sample the uncertainty space, making it difficult to quantify uncertainty in ice sheet models with respect to topographic uncertainty. 40 These issues have previously been addressed with two-point geostatistical simulation, such as sequential Gaussian simulation (SGSIM) (MacKie et al., 2021). The objective of geostatistical simulation is to generate multiple realizations of phenomena that reproduce the spatial variability of observations, as modeled by variogram or spatial covariance and can be used to quantify uncertainty (e.g. Deutsch and Journel, 1998). Thwaites Glacier has previously been simulated by Goff et al. (2014), though 45 only one realization was generated. Geostatistical simulation has also been applied in Antarctica and Greenland to quantify uncertainty in subglacial hydrology (MacKie et al., 2020(MacKie et al., , 2021Zuo et al., 2020). However, spatial variation over very large areas is inherently non-stationary. For example, The Greenland and Antarctic ice sheets are thousands of km across and contain a globally wide range of topographic and geologic settings. This means that the 50 nature of spatial variation changes significantly and possibly in complex ways over the domain of interest. Traditional geostatistical ways of dealing with non-stationary data is through the modeling of trend functions (e.g. Pyrcz and White, 2015) or using covariates (e.g. Almeida and Journel, 1994;MacKie et al., 2021). However, such approaches typically model the variation in the mean (trend) or some degree of correlation (co-simulation). Another approach is using a non-stationary spatial covariance model (Schmidt and O'Hagan, 2003). Such approach becomes exceedingly difficult to apply over large areas 55 because of the use of Markov chain Monte Carlo in its Bayesian inference. Regardless, all these approaches are limited in expressing non-stationary in terms of a mean or covariance function only.
The non-stationary bed topography is directly measured using high-resolution remote sensing data such as satellite images, but only in deglaciated areas (Porter et al., 2018). Deglaciated topographic images reveal glaciated morphologies resembling 60 the topography beneath the ice sheets (King et al., 2009;Margold et al., 2015;Spagnolo et al., 2017). They therefore bear significant information on the subglacial topography. Exposed topography has previously been used to perform deterministic interpolations (Clarke et al., 2009). However, satellite imagery of deglaciated topography has not been explored to stochastically simulate subglacial topography.
Recent developments in multiple-point geostatistics (MPS) has shown great potentials in using high-resolution training images (e.g. satellite images) to fill remote sensing gaps (e.g. Gravey and Mariethoz, 2020;Mariethoz et al., 2012;Yin et al., 2017;Zakeri and Mariethoz, 2021;Zuo et al., 2020). MPS approaches use the training images (TIs) as explicit prior models to generate realistic topographical models and quantify spatial uncertainty. The simulation of non-stationary and morphologically complex topography can also be achieved with MPS (Hoffimann et al., 2017a(Hoffimann et al., , 2019Mariethoz and Caers, 2014). Compared 70 to alternative machine learning or deep learning approaches (Laloy et al., 2018;Mo et al., 2020), MPS has a flexible conditioning capability and can accommodate sparse and non-uniform sampling in space. It can generate multiple topographic model realizations conditioned to the radar line observations, without requiring a large amount of training data.
We review three categories of approaches to build non-stationary geospatial models using MPS. One simple way is to divide the non-stationary model simulation domain into several stationary zones, and then use different stationary TIs for each zone 75 (Strebelle, 2002;Wu et al., 2008). Another way instead divides non-stationary TIs into small stationary zones. Then MPS then uses different divided TI patterns to fill different locations in the simulation domain (Honarkhah and Caers, 2012;Zhou et al., 2014). But the zonation in either the simulation domain or training images can make it difficult to maintain smooth transitions between the modeling patterns. Therefore, a third way is most commonly used. It incorporates spatially continuous non-stationary maps (named as "auxiliary variables") by ad-hoc weighting (Chugunova and Hu, 2008;Mariethoz et al., 2010a;80 Oriani et al., 2014;Zuo et al., 2020). Such auxiliary variables determine which TI patterns should fill which location in the simulation domain in a spatially smooth manner. The limitation is that the ad-hoc weights do not scale to the complexity of bed topography. The determination of weights is also subjective. More importantly, auxiliary variables are very difficult to obtain in subglacial topographic modeling. Another challenge in the non-stationary modeling is how to choose training images (Tahmasebi, 2018). This is particularly important as the MPS modeling relies on the spatial information provided by the 85 training images. Hoffimann et al. (2019) introduced an approach to generate time-series training images to model the spatial and temporal evolutions of geomorphology. A training image transitional model in time was proposed to reproduce the nonstationary geomorphologic evolutions. However, in subglacial topographic modelling, there are no available training images because subglacial topographic measurements are only made along flight lines. Satellite altimetry observations from deglaciated areas in the Arctic offer a potential source of training imagery. However, training images retrieved from the Arctic 90 would be logically non-stationary due to the natural variability of the landscape. Furthermore, the Arctic provides a vast amount of deglaciated topographic data, which presents a significant computational burden on MPS simulation algorithms. We therefore will need a strategy to explicitly specify which training images or patterns should go where when filling the radar line gaps.

95
In this paper, we generalize a geospatial modeling framework to fill irregular geophysical data gaps in large areas. We will address the non-stationary topographic modeling by properly selecting non-stationary topographic training images using MPS.
We first collect a large amount of topographic images to serve as the training images. These images are taken from the deglaciated areas in the Arctic and Antarctica. To assign the training images to local areas, we develop a probabilistic method for estimating the posterior distribution over the prior set of training images. The posterior distribution is conditioned 100 (constrained) on the radar flight line data. The posterior TI probability will be calculated using kernel density estimation conditioned to the actual radar line observations. Such TI sampling scheme will avoid the use of auxiliary variables with arbitrary ad-hoc weightings. We will demonstrate our method using the entire Amundsen Sea Embayment in West Antarctica. This region has alternating areas of sparse and dense measurements with a variety of radar line spacings and orientations. We show that the training image sampling process accommodates a range of data configurations. It will generate realistic non-105 stationary topographic realizations that reflect the subglacial topographic uncertainty in ASE. We use the topographic simulations to model subglacial hydrologic flow in order to investigate the impact of topographic uncertainty on hydrologic uncertainty.

Radar data set & training images
The topographic data for the ASE includes seafloor bathymetry measurements from the International Bathymetric Chart of the 110 Southern Ocean (IBCSO) (Arndt et al., 2013), subaerial topography from the Reference Elevation Model of Antarctica (REMA) (Howat et al., 2019), and ice-penetrating radar measurements of subglacial topography (Blankenship et al., 2001;Gogineni, 2012;Holschuh et al., 2020;Holt et al., 2006;Vaughan et al., 2006;Young et al., 2016). The data is gridded at a 500-meter resolution ( Figure 1). The swath bathymetry data (Arndt et al., 2013) and subglacial swath radar data (Holschuh et al., 2020) (provide some training imagery. To have more extensive representations of the subglacial topography, we augment 115 the available training data with deglaciated subaerial topography from the ArcticDEM (Porter et al., 2018). The Arctic and much of North America was formerly covered by the Laurentide and Cordilleran ice sheets and share morphological similarities with Antarctic subglacial topography (King et al., 2009;Margold et al., 2015). While the seafloor and subaerial topography may have experienced additional depositional processes after deglaciation, any topographic alterations are likely minimal at a 500 m resolution. We sampled a total 166 candidate training images to capture a variety of geologic settings 120 ( Figure 2). Each training image has a size of 100x100 km 2 . The training image data repository is publicly accessible from

Overview
Multiple-point geostatistics (Journel and Zhang, 2006;Mariethoz and Caers, 2014;Srivastava, 2018;Strebelle, 2002) is the field of study that focuses on the digital representation of physical reality by reproducing high-order statistics inferred from training images. The emphasis in MPS lies on capturing higher-order (hence multi-point statistics) from training images that have been selected to be representative for a specific area of study. In that sense, it differs from spatial covariance-based 140 (variograms) methods (e.g. Gaussian process regression or kriging) (Matheron, 1963;Williams and Rasmussen, 1996) that are based on spatial correlation (two values at a time). Both MPS and covariance-based methods have the ability to interpolate data exactly. Exact interpolation, if desired, is also where geostatistics differs from machine learning or computer vision methods, where such exact interpolation is not usually considered important.

145
Several MPS simulation algorithms (e.g. Gravey and Mariethoz, 2020;Hoffimann et al., 2017b;Mariethoz et al., 2010;Strebelle and Journel, 2001) have been developed that use training images to generate multiple realizations that interpolate the data exactly. The algorithm used in this work is Direct Sampling (DS) (Mariethoz et al., 2010b;Mariethoz and Renard, 2010), which will be introduced in section 0. These algorithms do not address the challenge of selecting the training images themselves. For example, if an area of the simulation grid contains dense data, few training images may be compatible with 150 that data. On the other hand, an area with sparse data may have many compatible training images. Finally, training images selected for two adjacent areas are not necessarily independent from each other.
To date, there has not been any attempt to use MPS to interpolate ice-penetrating radar measurements of topography at the scale of the Amundsen Sea Embayment. In doing so, additional challenges occur that are not present in smaller study areas. 155 The challenges may include limited amount of training images, non-stationarity over the ASE, and running time cost when generating high-resolution topographic maps. Before moving to the methodology that addresses these challenges, we first introduce the Direct Sampling method and a probabilistic framework for representing training images in metric spaces.

Direct Sampling (DS)
Direct sampling is a widely used MPS approach for achieving spatial modeling and gap filling (Mariethoz, 2018;Mariethoz 160 et al., 2012;Zuo et al., 2020). Figure 3 provides a simplified example of DS in the context of flight lines. The values in the grid indicate the elevation. In general, there are two major components within DS. First, the algorithm visits an unknown location in the simulation grid and collects neighboring observed points as conditioning data. For example, in Figure 3(a), three conditioning points are detected near a unknow location (marked with "?"). DS records the values and relative locations of known points. Second, a searching program is launched to find the similar structure in TI. The similarity within DS is 165 defined by a certain distance metric. As Figure 3(d) shows, the program finds a matching structure. The center of the similar instance is pasted into the simulation grid. Thus, the value of an unknown point is predicted. The preceding simulation program is repeatedly performed until there is no unknown point in the grid.

Figure 3. Conceptual example of the DS point simulation. (a) Radar lines on the simulation grid; (b) Three known points (value: 37, 80, 86) constitute a conditioning data pattern; (c) A mismatch pattern in TI; (d) A similar pattern in TI.
Based on the explanation above, there are mainly three important parameters in DS. The first one is , the number of neighboring conditioning data points. It plays a key role in extracting complex patterns from the training images. In general, 175 ≥ 30 is suggested to deal with a continuous simulation case (Bruna et al., 2019;Meerschman et al., 2013). Another parameter is the distance threshold . It determines whether to accept a TI pattern based on the mismatch distance to the conditioning points. The TI pattern with mismatch distance below will be accepted and pasted to the simulation grid. therefore significantly affects the simulation efficiency and quality. A small value of could improve modeling quality but will result high computational burden. = 0.1 is generally recognized as the upper bound in the most cases (Meerschman et 180 al., 2013;Zuo et al, 2020). The third main DS parameter is the fraction of scanned TI. With the intention of saving time and avoid verbatim copy, an recommended value of is between 0.1 and 0.5 (Mariethoz and Caers, 2014).

A metric space for training images
A metric space expresses the relationship between objects by using a distance function defining the similarity between any two objects. In metric spaces, we do not know the exact coordinates of objects, only how far objects are apart. Metric spaces 185 are therefore useful in representing high-dimensional objects, such as training images. In this paper, we employ metric spaces for two purposes: 1) to visualize the difference between training images and 2) to estimate probabilities of training images to occur over some area.
To define a meaningful distance between any two training images, we create a set of representative patterns for each training 190 image. The TI morphological features are mainly concerned when creating representative patterns to compute the distance. This requires first removing the effects of the original TI elevations. To do so, we rescale each TI to range between 0 and 1 by min-max normalization (Han et al., 2012). Then, like other MPS approaches such as SNESIM (Strebelle, 2002) and DISPAT (Honarkhah and Caers, 2010), we extract the spatial patterns from each TI with a fixed template. We then use the classical agglomerative hierarchical clustering (Romary et al., 2015) to divide the spatial patterns of each TI into a finite number of 195 groups. The group number in agglomerative hierarchical clustering is determined by a distance threshold (between the clustered groups). We referred to the commonly used distance threshold in DS approach to set it as 0.1 (Meerschman et al., 2013) of the maximum pattern distances of the TI. The TI with more complex spatial patterns will therefore have more clustered groups. The medoid pattern of each group is taken as the representative pattern of that group. Figure 4 shows a few representative patterns. The distance used in the clustering is the normalized Euclidean distance. 200 After clustering and medoid selection, training images are now represented expressed by a set of representative patterns. We define the difference between any two training images as the difference between their sets of representative patterns. To do 205 this, we use the modified Hausdorff distance (Dubuisson and Jain, 1994;Huttenlocher et al., 1993). This distance is commonly used to define the difference between shapes of high-dimensional objects. If we call the set of representative patterns for training image A as and for training image B as ℬ, then the modified Hausdorff distance is where ! is any representative pattern in , and " is any pattern in ℬ. (. ) is the Euclidean distance between any two representative patterns. | | and |ℬ| are respectively the sizes of and ℬ. In essence, the modified Hausdorff distance represents the maximum of expected minimum distances between the two TIs' representative patterns. Once a distance is defined, we can visualize the metric space in low-dimensional Cartesian space using multi-dimensional scaling or MDS (Scheidt et al., 2018). MDS projects high-dimensional objects into a 2D cartesian space, where the difference between points 215 in that space approximates the Hausdorff distance. Figure 5 show the projection of 166 training images in 2D, each dot represents a training image. Training images that are similar map close to each other in the scatterplot.

Illustration case & overview of the mapping strategy
To illustrate the proposed methodology, we focus on a small area of the ASE overlapping Pine Island Glacier (see Figure 6).
In this area, we observe a variety of radar line geometries and densities, as well as elevation changes. This smaller area is 225 divided into 4 subareas. Strategies for such subdivision will be discussed later in the application to the entire domain.
Direct sampling, by construction, avoids any artifact boundary, because the data template is not aware of the subareas. With this strategy, two problems now need to be addressed. First, we need to find training images that are consistent with radar data within a selected area. There could be multiple such training images. Second, we need to model training image cross-correlation between different areas. Training images of two adjacent areas are not necessarily independent. Our approach is to model the 230 posterior distribution of each area through a probability aggregation problem.

Formulation of the problem through probability aggregation 235
Our goal is to estimate, for each area # , . . . * the posterior distribution of the training image, given the flight radar line data

240
( + ) is a discrete random variable that has 166 possible outcomes. To achieve this, we first estimate individual conditional probability ( ( + )| ! ( ), then aggregate them into a single estimate for Eq (2). We will use a simple aggregation model that uses log-ratios (Allard et al., 2012), as follows: To aggregate these individual conditional probabilities, the log-ratios can be summed relative to the prior: Here, + is the log-ratio of A ( (2). is the log-ratio of the prior. The prior is a uniform 250 distribution over all training images. Thus is calculated as: Then, we can solve Eq (4) for + and invert the log-ratio to get ( ( However, in summing, we make a conditional independence assumption (Allard et al., 2012). Indeed, summing logarithms is 255 equivalent to making products of the actual probabilities, which entails a form of conditional independence. Assuming conditional independence, when that assumption is untrue in reality, often results in overconfidence and too small uncertainty.
To mitigate this issue, we add an additional weight term: Logically, we would like the weight to account for the correlation between data in different regions. For example, if data of region + is highly correlated with data in region , , then they are likely redundant with respect to the training image selection.
Hence, we will make the weight +, function of the correlation structure between different subareas. In the next section we will detail the subtasks ahead: 1) modeling and estimating ( ( + )| ! ( ) and 2) calculating the weights +, . 265 3.4 Probability of training images given radar line data.

Most probable set of training images
A direct estimate of ( ( + )| ! # ) is challenging because the ! # are very high-dimensional. We turn this high-dimensional problem into a low-dimensional as follows. Using the data ! # in area + , we find those training images that constitute a set of most probable training image, i.e. those images closest to the radar line data in that area. Term this set as O . Then given this 270 set, we replace the radar line data with the most probable set: To determine this set, we solve the following optimization problem: 275 where = Y (#) , (;) , … , (#77) \ is the total set of training images. is an indicator function which returns O , a subset of of size . We will explain how to determine the size of O in the Appendix. The distance in Eq (8) measures the distance between the radar line data and any given training image. To calculate , we rely on the same modified Hausdorff 280 distance approach as section 3.1.3: is now the set of patterns 2 extracted from the radar lines. By pattern, we mean the radar line data are scanned within a given template. We use flexible sized templates when scanning the radar lines over each subarea + . The template size is 285 randomly chosen between the maximum radius up to 15 pixels to include 40 measurement points. (Mariethoz and Caers, 2014, p160). is the set of TI representative patterns ! , obtained in section 3.13. Figure 7 provides an illustration of this idea.

290
We use a particle swarm optimization (PSO) to minimize the distance function A ( O ), ! # C. As a heuristic optimization approach, PSO has its specific advantages in requiring less parameterizations, easy implementation, and fast convergency with good accuracy (Rezaee Jordehi and Jasni, 2013). These characteristics makes PSO a preferred optimizer for our initial training image selection. In conjunction with PSO, we employ the profile log-likelihood function to find the optimal size of O .
Detailed explanation on the PSO algorithm and profile log-likelihood function implementations is provided in the Appendix. 295

Kernel density estimation of ( ( )| )
We will use the optimal set of training images O to infer ( ( + )| ! # ). We assume that the TIs near the O on MDS plot ( Figure 4) tend to have similarly high probability of being assigned to the radar data subarea. This is because, from the TI distributions in the MDS metric space, we can observe the spatial patterns of nearby TIs look similar. We therefore consider a 305 Gaussian kernel density estimation (KDE) to predict the probability to each TI. The probability of each TI is estimated according to its distance with O in the MDS plot ( Figure 5): Here, O D is the -th selected TI using PSO. is the size of the set O .
( , O D ) is distance between a and O D . is the Gaussian kernel function (Eq (11)). The bandwidth ℎ is the variance of the Gaussian kernel. We choose the optimal bandwidth by Silverman's rule of thumb (Silverman, 1981). Figure 9 shows the KDE estimated probability of each TI for subarea A1.

Aggregation by weighting log-ratios
Next, we aggregate the KDE estimated probabilities by weighting the log-ratios to obtain the posterior The weights +, required in the log-ratio aggregation of Eq (6) is to reflect the spatial correlation 320 between the radar line subareas. We use a variogram-based approach proposed by Fouedjio (2020)  (12) 325 where A + , , C is the dissimilarity between area + and , . According to Fouedjio (2020), A + , , C are the spatial center locations of + and , respectively. R and R . are the radar data locations in + and , . S (•) is the Epanechnikov kernel function. ( ) are the radar data measured values at location . Using the calculated A + , , C, the weights +, are simply: 330 With +, , we can aggregate the probability using the Eq (4). Figure

Direct sampling with TI sampling 340
Using the aggregated posterior TI probability ( ( + )| ! $ , ! % , ! & , ! ' ), we can now sample training images from the posterior distribution ( Figure 10) in each subarea. Figure 11 plots two realizations of sampled training images on the radar line map. We observe that the sampled topography TIs are different between the realizations. For example, the TIs sampled for A1 tend to have higher elevations and more mountain peaks than A2. A2 and A4 tend to have larger scale low-elevation valleys that can be related to warm water routing, while the TIs in A1 and A3 data have more small-scale valleys. For each realization 345 set of training images, we run a DS simulation. At the end, multiple realizations of topographical models are generated with multiple realizations of TIs. Figure 12 shows two example realizations of the DS simulated results. We can observe large valleys in A2 and A4 areas, while A1 and A3 areas mainly have high elevation peaks. The non-stationarity of both simulated topography realizations also agreed well with their sampled TIs, when compared to the TIs in Figure 11.

Comparison with traditional MPS modeling and two-point geostatistical modeling
Our results are compared to the conventional MPS simulation without proposed TI sampling. Here, we use the same DS simulation parameters as our TI sampling approach, except that the training images are different. In the conventional test, we 360 run the DS simulation by scanning all the 166 TIs to fill the radar line gaps. Figure 13a shows one realization of the simulated result. It is obvious that the conventional approach results in a much noisier topographical model. There are significant line artifacts that make the model unrealistic. To gain detailed understandings, we take a cross-section A-A' on the Pine Island glacier and plot the comparison in Figure 14. We can observe that the DS without TI sampling creates a significant amount unrealistic elevation peaks and troughs. Especially at the main channel of Pine Island glacial (marked by the dashed box in 365 Figure 14) where mostly radar data are available, unrealistic channels are simulated between the radar data observations. This suggests that, when using all the 166 TIs without proper sampling, the DS finds too large a set of patterns likely many incompatible with the sparse data. Our TI sampling approach avoids this problem by limiting the algorithm to a small number of most suitable TIs, thereby improving the result. More importantly, avoiding the channel artifacts is critically important for modeling subglacial hydrological flow (see section 0). In terms of running time, the conventional DS approach with 166 TIs 370 took nearly 21 hours to simulate one realization. When using our TI sampling approach, it took less than 1 hour. Our initial DS implementation tests are run on a PC with an Intel i9-11900 of 2.5GHz processor and 32GB of RAM. We further compare to the two-point geostatistical modeling with kriging and Sequential Gaussian Simulation (SGSIM). Figure 13b and Figure 13c plot topographic modeling results from kriging and SGSIM. We can observe that kriging produces 375 the most smoothed topographical model. The over-smoothed features are very clear from the detailed cross-section in Figure   14. After all, kriging is a deterministic modeling approach. Thus, it cannot capture location scale elevation variations and quantify the spatial uncertainty. Our SGSIM approach uses local ordinary kriging; this way non-stationarity is addressed by limited the neighborhood of spatial inference. The limitation of SGSIM, an approach based on spatial covariances, lies on its limitations in capturing complex morphological features, especially when the radar line data are very sparse (see the circle 380 highlighted on Figure 13c). In Figure 15, we also compare the empirical variograms from the modeled topographical maps using the four different approaches. It shows the DS using sampled TIs has reproduced the observed radar data variogram.
SGSIM maps also reproduce the variograms from the observed radar data, because it directly uses the radar data variograms for modeling. However, the DS without TI sampling has a nugget (noise) effect. Overall, it shows the TI sampling approach performs the best in terms of improving the modeling speed, simulation quality, and capturing the spatial uncertainty. 385

Training image sampling and DS simulation
We apply the methodology to fill the radar lines gaps of the entire ASE area to generate high-resolution topography maps. To 395 address the spatial non-stationarity and sample TIs, we first divide the whole ASE area into local subareas. We use the following recursive diving steps to divide the entire ASE into subareas based on the radar line data density: Step 1. Equally divide the ASE area into four subareas.
Step 2. For each subarea, if it has more than N radar data points, continue to divide it into four equally size areas.
Step 3. Repeat step 2 until amount of data measurements is below the threshold amount . In this case, we set N=10000 400 and divide ASE into totally of 56 subareas. Figure 16 shows the final ASE subareas with the corresponding radar data density. Next, we calculate posterior TI probability for each subarea A ( + )B ! $ , … , ! 0 C. We can then use the posterior probability to sample one single TI for each subarea. Figure 17 plots two realizations of the sampled TIs in the entire ASE space. For each TI realization, we run DS to fill the radar 405 line gaps to generate high-resolution topography maps. To reflect the spatial uncertainty, 20 topography map realizations are simulated using 20 realizations of TI sets. The generated high-resolution bed topography is shown in Figure 18.

Uncertainty in subglacial hydrological flow.
We use the topographic realizations to investigate the sensitivity of subglacial water routing to topographic uncertainty. A water routing model was applied to 20 realizations to model the flow of water at the ice/bed interface. The direction of water flow is determined by calculating hydraulic potential, , using the Shreve equation (Shreve, 1972): 420 where Y is the density of water (100 kg m -3 ), + is the density of ice (917 kg m -3 ), is gravitational acceleration, ℎ is bed elevation, and is ice thickness. The hydrological model is implemented using the Antarctic Mapping Tools (Greene et al., 425 2017) and the FLOWobj function and multiple flow directions (MFD) algorithm from the TopoToolbox package (Schwanghart and Scherler, 2014).These functions use the hydraulic potential gradient to compute flow accumulation, or the number of pixels that flow into another pixel. We assume spatially uniform basal melt rates and that the water pressure is equal to the ice overburden pressure.

430
The water routing models vary across each realization ( Figure 19). In particular, the Thwaites Glacier tributaries flowing towards the grounding line (area where the ice meets the ocean and decouples from the bed) show significant differences across each realization. These tributaries are located near a system of active subglacial lakes -lakes at the ice/bed interface that periodically drain and refill (Smith et al., 2017). These lakes are hypothesized to be hydrologically connected, with a drainage and refill cycle that depends on the level of connectivity (Smith et al., 2017). Lake drainage events are associated with increases 435 in ice velocity (Stearns et al., 2008), making it important to characterize the connectivity of active lake systems. Our results could be used to investigate the nature of hydrological drainage at Thwaites and highlight areas that require additional observational constraints.

Conclusions 445
We developed a non-stationary multiple-point geostatistical approach to fill large-scale geophysical data gaps and applied it to map high-resolution subglacial topography in West Antarctica. The radar data gaps were filled using morphological features learned from high-resolution topographic training images. To reflect the geospatial uncertainty, we modeled multiple realizations of topography maps using 166 high-resolution training images from the Arctic and Antarctica. These training images represent the diversity of subglacial geologic settings. We have placed them in a publicly accessible repository for 450 training subglacial topography models. The TI repository can be further expanded in the future upon the acquisition of additional swath bathymetry and swath radar measurements.
Our major contribution was to show a probabilistic method to model posterior TI probabilities, then sample TIs to model the global non-stationarity in subglacial topography. This was achieved by probabilistically assigning non-stationary TIs from the 455 provided repository to the local radar data. We used the collected 166 topographic training images as prior. The posterior distribution is calculated based on the distance between each TIs and local radar data. To address the spatial correlation across the global area, we aggregate the TI probability between the local areas based on their spatial correlation. The aggregated posterior TI distribution allowed us to sample training images. Finally, we ran direct sampling to fill the radar line gaps.
Multiple realizations of high-resolution topography maps were generated using multiple realizations of sampled training 460 images. Such non-stationary TI sampling framework avoided the use of auxiliary variables and arbitrary ad-hoc weighting. It significantly improved the topography modeling quality from DS. It also dramatically reduced the DS running time when giving large amount of training images. Compared to the traditional deterministic interpolation (kriging) and two-point geostatistical simulation (SGSIM) approaches, our approach was shown to provide more realistic topographic maps for spatial uncertainty quantification, whilst retaining the spatial correlation measured by radar data. 465 We applied our proposed approach to fill the radar line gaps for entire the Amundsen Sea Embayment in West Antarctica. The improved modeling efficiency enabled us to simulate 20 realistic high-resolution topographic maps on a local PC. We then used the 20 topographic realizations to investigate the sensitivity of subglacial water routing to topographic uncertainty. The results reveal significant variabilities in the Thwaites Glacier tributaries across realizations. These tributaries are near a system 470 of active subglacial lakes, which are hypothesized to be hydrologically connected and could have the potential to influence ice sheet velocity. The high hydrological uncertainty in this area highlights the need for additional measurement constraints. These findings demonstrate the utility of geostatistically simulating subglacial topography rather than performing deterministic interpolations. Our non-stationary MPS framework provides a path forward for implementing geostatistical simulations at continental scales.

Appendix: Particle swarm optimization (PSO) and optimal TI numbers
We perform particle swarm optimization to minimize the distance function A ( O ), ! # C in Eq (8). Following the PSO algorithm (Rezaee Jordehi and Jasni, 2013), we start with a random initialization of selected TIs). Each individual TI 480 selection is regarded as an individual particle ( + ). To find TIs that minimize the distance function, each particle will explore the whole TI space iteratively with a velocity + . The position (TI index) of + at time step + 1 is determined by its previous position + ( ) and searching "velocity" + ( + 1). + ( + 1) = + ( ) + + ( + 1) (15) 485 The velocity + ( + 1) is determined by the particle's current logged best TI index + Z[A\ and the best TI index ] Z[A\ for the whole swarm, as where + ( ) is the velocity from the previous time step. is the "inertia weight" that controls the contribution of + ( ) to + ( + 1). A smaller means less influence from the previous velocity, thus higher PSO exploration capability. Here we set as 0.8 according to the study by Han et al. (2010). r1 and r2 are two random numbers for stochastic update of the velocity.
They have a uniform distribution with the interval of [0,1]. # and ; are the acceleration parameters that pull the particles 495 towards + Z[A\ and ] Z[A\ . # = ; = 2 are recommended for most optimization problems according to Ozcan and Mohan (1999). We adopt the recommended settings. The swarm size also affects the PSO performance. So far, there are not exact rules selection of swarm size (Rezaee Jordehi and Jasni, 2013). Here we use the size of O to determine the swarm size .
We create = 10 × particles in PSO population to enhance searching ability and running time.

500
Another important question is how to determine the optimal amount of O . To specify , we use a profile log-likelihood approach from Zhu and Ghodsi (2006) and Honarkhah and Caers (2010). Specifically, we expect that the distance between training images and radar line data will decease as we visit more training images. The PSO optimized distance should decrease dramatically when the optimal TIs is visited, and then start flattening out. Hence, there will be an elbow point corresponding to the optimal number of TI. Based on the study in Honarkhah and Caers (2010), the elbow is found by maximizing profile 505 log-likelihood. We find the optimal number of TIs using the following steps.
3. Calculate the log-likelihood @ ( ) as: where # , ; are the means of # and ; , By contrast, ; is the common scale variance. # , ; are the sample variances of # and ; .
4. Obtain the optimal (elbow) size Š based on the empirical maximum value of @ ( ).

515
We use the illustration case area A1 as an example to show how to select Š. Figure 20a plots the PSO distance function between A1 radar lines and TIs with varying TI numbers. We can observe a fast drop of the distance at the beginning, and the distance then drops slowly after = 3. To find out the exact elbow, we calculate the log-likelihood values and plot them in Figure 20b. Figure 20b clearly indicate that the optimal number of TI is 3.