1 ATAT 1 . 1 , an Automated Timing Accordance Tool for 1 comparing icesheet model output with geochronological data 2

Earth’s extant ice sheets are of great societal importance given their ongoing and potential future 8 contributions to sea-level rise. Numerical models of ice sheets are designed to simulate ice sheet behaviour in 9 response to climate changes, but to be improved require validation against observations. The direct observational 10 record of extant ice sheets is limited to a few recent decades, but there is a large and growing body of 11 geochronological evidence spanning millennia constraining the behaviour of palaeo-ice sheets. Hindcasts can be 12 used to improve model formulations and study interactions between ice sheets, the climate system and landscape. 13 However, ice-sheet modelling results have inherent quantitative errors stemming from parameter uncertainty and 14 their internal dynamics, leading many modellers to perform ensemble simulations, while uncertainty in 15 geochronological evidence necessitates expert interpretation. Quantitative tools are essential to examine which 16 members of an ice-sheet model ensemble best fit the constraints provided by geochronological data. We present 17 an Automated Timing Accordance Tool (ATAT version 1.1) used to quantify differences between model results 18 and geochronological-data on the timing of ice sheet advance and/or retreat. To demonstrate its utility, we perform 19 three simplified ice-sheet modelling experiments of the former British-Irish Ice Sheet. These illustrate how ATAT 20 can be used to quantify model performance, either by using the discrete locations where the data originated 21 together with dating constraints or by comparing model outputs with empirically-derived reconstructions that have 22 used these data along with wider expert knowledge. The ATAT code is made available and can be used by ice23 sheet modellers to quantify the goodness of fit of hindcasts. ATAT may also be useful for highlighting data 24 inconsistent with glaciological principles or reconstructions that cannot be replicated by an ice sheet model. 25


Introduction 26
Numerical models have been developed which simulate ice sheets under a given climate forcing (e.g. Greve, 1995;  conditions (e.g. basal topography) and parameterisations of physical processes (e.g. basal sliding, calving), as well 32 as the difficulty of predicting future climate, lead to model-based uncertainty in these predictions (Applegate et  There is another uncertainty which hinders ice-sheet models from being able to accurately predict the evolution 235 of ice-sheets, which is the presence of instabilitieswe use this term in the technical sense of a small perturbation 236 that leads to the whole ice-sheet system amplifying this small perturbation to the extent it can leave a mark in the 237 geological record. A classic example of this in ice-sheet dynamics is the marine ice-sheet instability (MISI), first 238 discussed in the1970s (Hughes, 1973;Weertman, 1974, Mercer, 1978 and more recently put on a sounder 239 mathematical footing (Schoof 2007(Schoof , 2012. 240 The MISI actually refers to an instability in grounding-line (GL) position on a reverse slope, where the water 241 depth is shallowing in the direction of ice flow. Since ice flux increases with ice thickness, a straightforward 242 argument leads to the conclusion that if the GL advances into shallower water, the efflux will decrease, the ice 243 sheet will gain mass and the advance continue. If, on the other hand, the GL retreats, the flux will increase, the 244 ice-sheet will lose mass and the retreat continue. In principle, given the right parameterisations and basal 245 topography, ice-sheet models should be able to predict the 'trajectory' of GL migration arising as a consequence 246 of the MISI. However, the MISI is one of the class of instabilities that lead to poor predictability; certain small 247 variations of parameters and specifications will lead to large-scale changes in the 'trajectory', in this case the 248 retreat history. A well-known analogy is the 'butterfly effect', which originated in atmospheric modelling work 249 (Lorenz, 1963); the butterfly effect is concerned with the consequences of the statement "small causes can have 250 larger effects". Recent work has also shown that additional physical processes, such as ice-shelf buttressing 251 (Gudmunsson, 2012) and the effect that the gravitational pull of ice-sheets has on sea level (Gomez et al., 2012) 252 have additional effects on grounding line stability. Given that most of the palaeo-ice sheets during the last glacial 253 cycle had extensive marine margins and overdeepened basins, with isostatic adjustment creating further zones of 254 reverse slope, capturing grounding line processes is important for simulating these ice-sheets. 255

Considerations when comparing geochronological data and ice-sheet model output 256
Sections 2.1 and 2.2 make it clear that several factors must be considered in order to satisfactorily compare 257 geochronological data and ice-sheet model output (Table 2). Most critically, the two datasets involved in any 258 comparison have varying spatial properties. Raw geochronological data is unevenly distributed and located at 259 specific points, with horizontal position accurate to a metre or so; such data may be used to plot ice-margin 260 fluctuations of the order of tens of kilometres ( Figure 2C). Ice-sheet models typically produce results on evenly-261 spaced points (at ~5 km to 20 km resolution) that are distributed over and beyond the maximum area of the palaeo-262 ice sheet (Table 2; Figure 2B). Consequently, in comparing the two, a choice must be made; either 263 geochronological data should be gridded (coarsened) to the resolution of the ice-sheet model, or the ice-sheet 264 model results must be interpolated to a higher resolution. Both options have drawbacks, as the former removes 265 spatial accuracy from geochronological data while the latter relies upon interpolation beyond model resolution 266 and, more seriously, model physics. A second problem lies in the spatial organisation of the data (Table 2). Ice-267 sheet models produce a regular grid of data ( Figure 2B), meaning that no location is more significant than any 268 other when comparing the modelled deglacial chronology with that inferred from geological data. Conversely, 269 owing to the uneven distribution of raw geochronological data, some regions of a palaeo-ice sheet may be better 270 8 constrained than others ( Figure 2C). As noted by Briggs and Tarasov (2013), any comparison that does not treat 271 the uneven spatial distribution of geochronological data may favour sites where numerous dates exist over more 272 isolated locations. One approach to overcoming these disparities is to use an interpolation scheme (e.g. empirical 273 reconstruction, Bayesian sequence) on the raw geochronological data. This produces a geochronological 274 framework by combining evidence on pattern and timing to yield a distribution that is spatially more uniform and 275 a spatial resolution similar to that of palaeo-ice sheet model output ( Figure 2D). 276 The temporal intervals between and precision of geochronological data and ice sheet model output also vary 277 (Table 2). The time intervals between geochronometric data are determined by the number of available 278 observations, and precision determined by sources of uncertainty. Conversely, ice sheet models produce output at 279 regular intervals and are temporally exact, which is to be contrasted with 'correct'. Since the output interval of an 280 ice-sheet model is generally determined by the user (see Section 2.2) it is pertinent to consider an appropriate 281 time-interval of ice-sheet model output for comparison with geochronological data. For example, radiocarbon 282 dates have precision typically in the order of hundreds of years but do not directly constrain ice extent, whilst 283 empirically reconstructed isochrones are typically produced for thousand-year time-slices (e.g. Hughes et al., 284 2016). In reality, ice-sheets may respond to events at faster time-scales than this, but in the absence of internal 285 instabilities (e.g. MISI) palaeo-ice sheet models are ultimately limited by the temporal resolution of the available 286 climate forcing data. Thus, to gain insight into controls on palaeo-ice sheet behaviour, it may be necessary to 287 create model output with a greater (centurial) temporal resolution than the uncertainty associated with 288 geochronology. 289 Both geochronological data and ice-sheet model output have sources of uncertainty which must also be considered 290 when comparing the two. For geochronological data, uncertainty is typically expressed as a standard deviation 291 from the reported age, and are therefore easy to consider when comparing to an ice sheet model. For ice-sheet 292 models, individual model runs do not currently express uncertainty, and it is only when multiple, ensemble, runs 293 which systematically vary parameters and boundary conditions are conducted that uncertainty in all output 294 variables can be expressed. Therefore, any comparison between geochronological data and model simulations 295 must either compare to all members of an ensemble experiment in turn, or against amalgamated output from an 296 ensemble which considers model uncertainty. Having said this, statistical techniques exist to derive probability 297 distribution functions for individual quantities (e.g. Ritz et al., 2015). Such ensemble runs typical comprise 298 hundreds to thousands of individual runs (Tarasov and Peltier, 2004;Robinson et al., 2011). Given the volume of 299 data this produces, one appealing application of a quantitative comparison between geochronological data and ice 300 sheet model output would be to act as a filter for scoring ice-sheet model runs and reducing predictive uncertainty 301 by only using the parameter combinations that were successful. However, if all possible parameters have been 302 modelled, (i.e. the full 'phase-space' of the model has been explored (cf. Briggs and Tarasov, 2013)), and very 303 few (or no) model runs conform to a certain set of geochronological data or an empirical reconstruction, this may 304 provide a basis to question aspects of the evidence (e.g. re-examining the stratigraphic context of a dated sample 305 site or questioning the basis of the reconstructed isochrone). Of course, a third possibility that both data and model 306 are incorrect cannot be excluded. 307 We therefore suggest that any comparison between ice-sheet model experiments and geochronological data should 308 consider: 309 i) That both ice-sheet models and geochronological data have inherent uncertainties; 310 9 ii) That geochronological data typically provide a constraint on just the absence of ice; such that ice must have 311 withdrawn from a site sometime (50 years? 500 years? 5000 years?) prior to the date (which can be any point 312 within the full range of the stated uncertainty). It is thus a limit in time and not a direct measure of glacial activity. 313 Figure 3 illustrates this for advance and retreat constraints. It is most often the case that dated material is taken 314 close to the stratigraphic boundary or landform representing ice presence, in which case a date might be considered 315 as a 'tight constraint' (e.g. the ice withdrew and very soon afterwards (50 years) marine fauna colonised the area 316 and deposited the shells used in dating). Sometimes however there may have been a large (centuries to millennia) 317 interval of time between the withdrawal and the age of the shell chosen as a sample, in which case the date will 318 provide a 'loose' limiting constraint; it might be much younger than ice retreat (Figure 3). 319 iii) There is inherent value to the expert interpretation of stratigraphic and geomorphological information, meaning 320 an ice-free age reported for a site is likely as close as possible (tight constraint) to a glacial event. However, this 321 interpretation could be subject to change; 322 iv) Geochronological data exist as spatially distributed dated sites (e.g. Figure 2C) which can be built into a 323 spatially coherent reconstruction (e.g. Figure 2D); 324 v) A great input uncertainty in a palaeo-ice sheet model is the climate, which can lead to changes in the spatial 325 extent and timing of ice sheet activity. 326 vi) A factor which requires further investigation is the relationship between the operation of a physical instability 327 (e.g. the MISI) and the practical ability of models to predict retreat or advance rates; the presence of an instability 328 can result in extreme sensitivity to parameter ignorance or over-simplified model physics. 329 vii) Other uncertainties can also lead to variations in ice-sheet model results; these can be accounted for in an 330 ensemble of hundreds to thousands of simulations. 331 Given the above, it is unlikely that a single procedure could capture model-data conformity. ATAT therefore 332 implements several ways of measuring data-model discrepancies and produces output maps (described in the 333 following two sections) to help a user assess which model runs best agree with the available geochronological 334 data. One approach is to transform the geochronological data points (x,y,t) to a gridded field (raster) that define 335 age constraints of ice advance and another grid for retreat . Both of these data types also require an associated grid 336 that reports the uncertainty range as error ( Figure 4). These age grids may then be quantitatively compared to 337 equivalent grids (age of advance grid and age of retreat grid) derived from the ice sheet model outputs. 338 Alternatively, one might prefer to compare model runs against the geochronological data (points) combined with 339 expert-sourced interpretive geomorphological and geological data, in which age constraints from dated sites have 340 been spatially extrapolated using moraines and the wider retreat pattern. In this case ATAT allows the model 341 outputs to be compared to the 'lines on maps' type of reconstruction subsequent to conversion from age isolines 342 to a grid of ages ( Figure 4). 343

Description of tool 344
ATAT is written in Python, and utilises several freely available modules. Access to these modules may require a 345 Python package manager, such as 'pip' or 'anaconda'. ATAT can therefore be run from the command line on any 346 operating system, or by using a Python interface such as IDLE. 347

Required data and processing 348
ATAT requires two datasets as an input: (i) an ice-sheet model output; and (ii) gridded geochronological data. 349 Table 3 provides the required variables and standard names for each dataset. In order to determine the advance 350 age or deglacial age predicted by the ice sheet model, ATAT requires either an ice thickness (where the model 351 does not produce ice shelves) or a grounded ice-mask variable (where ice shelves are modelled). In the latter case, 352 the user is asked to define the value which represents grounded ice. 353 Empirical advance and deglacial geochronological data (Table 1) require separate input files (NETCDF format), 354 as model-data comparison for these two scenarios are run separately in ATAT. Table 1 Table  359 2), this was preferred to the alternative solution of regridding an ice sheet model onto a higher resolution grid, as 360 this may introduce the false impression of high resolution modelling sensitive to boundary conditions (e.g. 361 topography) beyond the actual model resolution. 362 Preparation of the geochronological data to be the same format and grid resolution as the ice sheet model output 363 requires use of a GIS software package such as ESRI ArcMap or QGIS. Users must define deglacial/advance ages 364 based either upon the availability of geochronological data in a cell, or based upon an empirical reconstruction 365 ( Figure 4). These ages must be calibrated to a calendar which is the same as that output by the ice-sheet model (in 366 our case the 365-day calendar in units of seconds since 1-1-1). Where there are no data (i.e. outside the ice-sheet 367 limit), the grid value must be kept at 0. When multiple dates are contained within a cell, expert judgement is 368 required to ascertain which date is most representative of the deglaciation of a region. This assessment should be 369 based upon the quality of sample taken; criteria for establishing this quality are considered in Small et al. (2017). 370 In the case where a profile of dates has been collected (for example up a vertical section at the side of a valley, or 371 from multiple depths of a marine core) the date which most closely defines the timing of final deglaciation of an 372 area should be chosen, as this is the focus of ATAT. The assembly of this geochronological database input into 373 ATAT should consider the reliability of ages, removing outliers and unreliable ages (see Small et al. (2017) for a 374 discussion of this issue). In particular, loose constraints, such as cosmogenic dates which display inheritance or 375 radiocarbon dates effected by a depositional hiatus, should be removed as this have the potential to bias results. 376 In a comparable manner, the attribution of error to each cell is also reliant upon expert interpretation. The 377 magnitude of error may vary between the source of geochronological data (radiocarbon, cosmogenic nuclide or 378 luminescence) and user choice for experimental design (e.g. 1, 2 or 3 sigma). A single error value must be given 379 for each dated cell, corresponding to the maximum threshold beyond which the user deems it is unacceptable for 380 a model prediction to occur (Figure 3). Given that creating this input data may involve many expert decisions (e.g. 381 which date has the relevant stratigraphic setting, which date(s) are most reliable?), this part of the process is not 382 yet automated within ATAT. This data preparation stage is therefore the most time-consuming and user-intensive 383 part of the process. However, users only need to define the data-based advance/deglacial grid once to compare to 384 multiple model outputs. Future work should consider alternatives means of choosing dates and identifying outliers, 385 such as Bayesian age modelling (e.g. Chivverell et al., 2013). The input data NetCDF file should also contain the 386 variables latitude, longitude, base topography (the topography that the ice-sheet modelling is conducted on and 387 the elevation of the geochronological sample (Table 3). 388 ATAT is called from a suitable python command-line environment, using several system arguments to define 389 input variables (Table 1; Figure 5). Users must define whether they are testing a deglacial or advance scenario. 390 ATAT only considers the last time that ice advances over an area. Therefore, caution must be undertaken when 391 defining advance data in regions where multiple readvances occur, and users should consider limiting the time 392 interval of the ice sheet model tested when examining specific events (e.g. a well-dated readvance or ice sheet 393 build-up). The location of the file containing the geochronological data grid (e.g. Figure 5) is then required. From 394 this file, the age and error grids are converted to arrays. For the age data, null values are masked out using the 395 numpys masked array function. A second array that accounts for error is then created, the properties of which 396 depends upon whether a deglacial or advance scenario is being tested. For a deglacial scenario, a model prediction 397 will be unacceptable if the cell is ice-covered after the range of the date error is accounted for, but the cell may 398 become deglaciated any time before this. Therefore, the associated error value is added onto the cell date, to create 399 a maximum age at which a cell must be deglaciated by to conform to the ice sheet model (Figure 3). The opposite 400 is true for advance ages; ice can cover a cell any time after the date and associated error, but cannot cover the cell 401 before the date of the advance. In order to allow for advances which occur after the date and its error, associated 402 error is therefore subtracted from the date cell ( Figure 3). To account for the uneven spatial distribution of dates, 403 a weighting for each date is then calculated based upon their spatial proximity. This weighting is used later when 404 comparing the data to the model output. To calculate this weighting (wi), ATAT defines a local spatial density of 405 dated values based upon a kernel search of 10 neighbouring cells. 406 The user must define the path to the ice sheet model output, from which the modelled deglacial age will be 407 calculated and eventually compared to the data (Figure 4). The user must also define whether to base deglacial 408 timing on an ice thickness or grounded extent mask variable (Table 2). If the user selects thickness, the margin is 409 defined by an increase from 0 ice thickness. For the mask, the user is also asked to supply the number which refers 410 to grounded ice extent. The timing of advance is then determined by the change of a cell to this number (Figure  411 5). The margin position recreated by the ice-sheet model has a spatial uncertainty due to downscaling issues and 412 fluctuations which may occur between recorded outputs. To account for this, ATAT calculates a second set of 413 modelled deglacial ages, whereby the deglaciated region at each modelled time output is expanded to all cells 414 which neighbour the originally identified deglaciated or advanced over cells. Furthermore, the spatial resolution 415 of ice-sheet models typically means that the emergence of ice-free topography at the edge or within an ice-sheet 416 (e.g. in situations such as steep-sided valleys or nuntaks) are poorly represented. To account for this, ATAT firstly 417 calculates the modelled ice-sheet surface at each time output by adding ice thickness to the input base topography. 418 Where the modelled surface elevation is below that of the sample elevation, these cells are identified as being 419 deglaciated ( Figure 5). The downscaling of topography onto ice-sheet model grids also introduces a vertical 420 uncertainty. This is accounted for in ATAT through calculating the difference between sample elevation and the 421 reference elevation. A second metric which identifies cells as having been deglaciated if they are also within this 422 vertical uncertainty is also calculated ( Figure 5). 423

Model-data comparison 424
Once the required variables have been retrieved from the NETCDF data and manipulated, ATAT compares the 425 geochronological age and modelled age at each location ( Figure 4). Firstly, the grid cells which have data are 426 categorised as to whether there is model-data agreement, based on the criteria shown in Figure 3. Since all dating 427 techniques only record the absence of ice, geochronological data provides only a one-way constraint on palaeo-428 ice sheet activity. For deglacial ages, deglaciation could occur any time before the geochronological data provided 429 and within the error of the date (i.e. deglacial ages are minimum constraints), but deglaciation must not occur after 430 the error of the date is considered (Figure 3). For advance ages, advance must have happened after the date or 431 within error beforehand (i.e. advance ages are maximum constraints), but palaeo-ice sheet advance cannot occur 432 in the time period before that dated error (Figure 3). Once ATAT has determined whether each cell conforms to 433 these criteria, a map is produced identifying at which locations the ice sheet model agrees with the 434 geochronological data. 435 Though the criteria described above and illustrated in Figure  to create a metric that doesn't account for dating error but may give an indication of how close a model-run gets 461 to dated cells. Dated locations are also categorised according to whether model-data agreement occurs within 462 dating error, and whether the addition of horizontal (ice margin) and vertical (ice surface) downscaling uncertainty 463 means that model-data agreement occurs. The RMSE and wRMSE are calculated for these categories to create a 464 metric which accounts for data and model uncertainty ( Figure 5). ATAT then produces a .csv file containing all 465 calculated statistics per ice-sheet model output file. We suggest that the most rigorous metric, the wRMSE of 466 dates which conform within geochronological data and model downscaling uncertainty (Figure 5) likely that in an ensemble there will be no single model run which has significantly better metrics than others, so 475 ATAT may best be used to choose members which pass a user-defined threshold of combined metrics. 476 Pragmatically, we envisage that ATAT could be used in the following ways, though others may exist. The maximum extent of ice for each experiment is shown in Figure 6 and the timing of advance and retreat is 522 shown in Figure 7. Potentially unrealistic ice sheets occur in the North Sea, perhaps due to the choice of domain 523 not including the influence of the Fennoscandian ice sheet in this area. As noted above, we do not expect these 524 model runs to fully replicate the reconstructed characteristics of the BIIS (e.g. Clark et al., 2012). However, it is 525 worth noting general, visually-derived, observations regarding the outputs shown in Figure 6. For larger 526 temperature offsets, the ice sheet gets bigger, the timing of maximum extent gets progressively later and the 527 modelled ice sheet gets thicker ( Figure 6). In all experiments, there is generally a gradual advance toward the 528 maximum extent followed by retreat (Figure 7). This pattern is interrupted by a later readvance that corresponds 529 to the timing of the Younger Dryas in the GRIP record; this causes ice to regrow over high elevation areas such 530 as Scotland and central Wales. The extent of this readvance increases with decreased temperature offsets between 531 experiments (Figure 7). Smaller readvances, occurring around 16.5 ka also occur (Figure 7). 532

Geochronological data 533
Ice-sheet advance dates were taken from the compilation of Hughes et al. (2016) and gridded to the ice sheet 534 model domain (Figure 4). In total, 61 cells were represented with advance dates ( Figure 8A). Considering now 535 ice-sheet retreat ( Figure 8B), dates deemed reliable or probably reliable by Small et al. (2017) were used (i.e. 536 those given a 'traffic light rating' of green or amber). For the dated advance and retreat locations, the 537 geochronological data in each cell was assigned an error corresponding to that which was reported in the literature. 538 We also compared our results to the 'likely' empirical reconstruction of Hughes et al. (2016), based on that of 539 Clark et al. (2012) (Figure 8C), using the minimum and maximum bounding envelopes to assign an error to each 540 cell of the ice sheet grid ( Figure 8D). The largest errors occur in the North Sea region, where there is a lack of 541 empirical data (e.g. Figures 8A and B). 542 Table 4 shows selected statistics derived by ATAT when comparing the three ice-sheet modelling experiments 544 (Figures 6 and 7) against the three categories of data (Advance, Retreat, Isochrones; Figure 8). wRMSE was not 545 calculated for the DATED isochrone reconstruction, as grid points are distributed evenly and therefore have equal 546 spatial weighting (Table 4). Experiment C produces modelled ice-sheets with the greatest areal extent, and 547 therefore performs best at correctly covering the dated areas (Table 4). However, none of the three experiments 548 perform particularly well when compared with the data or the empirical reconstruction regarding timing and 549 results in high (>2000 year) RMSEs (Table 4). The application of ATAT and the results from these simplified 550 experiments allow us to suggest directions for analysing future experiments. 551

Results 543
All three experiments produced large RMSEs, in the order of thousands of years, when compared to all three 552 categories of data (Table 4). For advance ages, the three simulations conform to a large number of dated locations 553 (e.g. 72% of ages in Experiments B and C; Table 4). However, the RMSEs of advance ages are high (Table 4). 554 This shows that, while the models perform well at matching the constraint of covering an area in ice after an 555 advance age (Figure 3), the models often glaciate a region much later than required. Advance dates are particularly 556 difficult to obtain from the stratigraphic record, and often there may be a long hiatus between the initial deposition 557 of datable material and the subsequent advance of a glacier. Future experiments with large ensembles should 558 therefore consider the number of advance dates conformed to (rather than the RMSE) as a more robust guide for 559 model performance during ice advance. 560 For the retreat comparisons, the three modelling experiments conform to a larger percentage of sites, seemingly 561 outperforming the empirically-derived DATED reconstruction (Table 4). However, where model-data agreement 562 occurs, the RMSE produced are much higher when the model is compared to the DATED reconstruction. This is 563 due to the reconstruction containing large uncertainties in regions which lack geochronological control (for 564 example in the North Sea, Figure 8). These uncertainties, a product of spatial interpolation across regions with 565 sparse information, are much greater than those associated with individual dates. Figure 9A shows examples of 566 output maps from ATAT which display the spatial pattern of agreement and the magnitude of the difference 567 between Experiment C and the DATED reconstruction. This shows that due to the uncertainty associated with 568 North Sea glaciation, even where the model produces an unrealistic artefact, there is data-model agreement. 569 Furthermore, ATAT produces a map which displays the number of years between data-based and modelled retreat 570 and/or advance (e.g. Figure 9B). Figure 9B, which compares Experiment C to the DATED isochrones, shows that 571 the timing of model-data disagreement is spatially variable. If more modelling simulations were conducted, such 572 maps may reveal regions of reconstruction or particular dates which are difficult to simulate in the model. In such 573 cases, data or model re-evaluation may be required and herein lies the potential utility of this ATAT tool in making 574 sense of ensemble model runs. However, such model-data comparison awaits a full-ensemble simulation which 575 accounts for model uncertainty (e.g. Hubbard et al., 2009). 576

Summary and concluding remarks 577
Here we present ATAT, an automated timing-accordance tool for comparing ice-sheet model output with 578 geochronological data and empirical ice sheet reconstructions. We demonstrate the utility of ATAT through three 579 simplified simulations of the former British-Irish Ice Sheet. Note that a larger ensemble model of hundreds to 580 thousands of runs is required for model evaluation (e.g. Hubbard et al., 2009). ATAT enables users to quantify 581 the difference between the simulated timing of ice sheet advance and retreat and those from a chosen dataset, and 582 allows production of cumulative ice coverage agreement maps that should help distinguish between less and more 583 promising runs. We envisage that this tool will be especially useful for ice-sheet modellers through justifying 584 model choice from an ensemble, quantifying error and tuning ice-sheet model experiments to fit geochronological 585 data. Ideally, this tool should be used in combination with other evaluation methods, such as fit to relative sea-586 level records. In the case where locations or regions of data cannot be fit by a model, and all model uncertainty 587 has been accounted for in an ensemble simulation, the comparisons made in ATAT may also highlight that data 588 re-evaluation is necessary. ATAT is supplied as supplementary material to this article. 589 From the command line, launch the ATAT script using python ("python ATATv1.1.py"). Eight command-line 609 arguments (A1 -A8), separated by a space should then follow. 610 A1 dictates whether deglacial or advance ages are being tested. Type "DEGLACIAL" or "ADVANCE" 611 accordingly. 612 A2 is the path to the geochronological data file (e.g. "/home/ATAT/geochron.nc") 613 17 A3 defines whether the model extent is based on thickness or a mask. Type THK or MSK accordingly. 614 A4 is the path to the ice-sheet model output file (e.g. "/home/ATAT/icesheetmodel1.nc") 615 A5 is the value of the ice-sheet output mask. A value is required even if A3 = THK, but can be any value as it will 616 be ignored. 617 A6 to A8 control output maps. A6 defines whether the output map should consider margin uncertainty, with a 618 value of BORDER or NONE. 619 A7 defines whether the model-data offset map displaces RMSE (option "NONE") or wRMSE ("WEIGHTED"). 620 A8 specifies which dates are plotted on the difference map, and can be "ALL" for all dates, "COVERED" for 621 those which at some point where covered by ice and "INERROR" to display only those dates where model-data 622 agreement within dating error occurred. 623

Code Availability
An example command would be "python ATATv1. Input geochronological data can be created in a GIS environment such as ArcMap or QGIS. Here, the user must 627 discern the appropriate geochronological data for each grid cell. Since geochronological data is usually stored as 628 point data, this must be gridded to single grid points as positive values, with surrounding areas of no data assigned 629 a value of 0. When comparing to a reconstruction (e.g. Hughes et al., 2016), cells outside the reconstruction 630 should be assigned a value of 0. Those within the reconstruction should be assigned a value corresponding to the 631 reconstructed age of retreat. The gridded data must be converted to NetCDF format, the details of which are shown 632 in Table 3. We emphasise that the quality of geochronological data used must be considered, and an example of