Global sensitivity analysis of parameter uncertainty in landscape evolution models

. The evaluation and veriﬁcation of landscape evolution models (LEMs) has long been limited by a lack of suitable observational data and statistical measures which can fully capture the complexity of landscape changes. This lack of data limits the use of objective function based evaluation proliﬁc in other modelling ﬁelds, and restricts the application of sensitivity analyses in the models and the consequent assessment of model uncertainties. To overcome this deﬁ-ciency, a novel model function approach has been developed, with each model function representing an aspect of model behaviour, which allows for the application of sensitivity analyses. The model function approach is used to assess the relative sensitivity of the CAESAR-Lisﬂood LEM to a set of model parameters by applying the Morris method sensitivity analysis for two contrasting catchments. The test revealed that the model was most sensitive to the choice of the sediment transport formula for both catchments, and that each parameter inﬂuenced model behaviours differently, with model functions relating to internal geomorphic changes responding in a different way to those relating to the sediment yields from the catchment outlet.


Introduction
Landscape evolution models (LEMs) investigate how the Earth's surface evolves over timescales ranging from hundreds to millions of years (Coulthard and Van De Wiel, 2012;Martin and Church, 2004;Pazzaglia, 2003;Tucker and Hancock, 2010;Van De Wiel et al., 2011). They represent the Earth's surface with a regular or irregular mesh and simulate how the surface evolves over time as a function of tectonic processes, and erosion and deposition from Earth surface processes. LEMs have proved to be very useful scientific tools to understand how Earth surface processes interact to shape the landscape.
More recently, LEMs have improved considerably in their ability to simulate the physical environment, and this has developed in parallel with improvements in computational efficiency and power. This allows LEMs to go beyond highly simplified models of landform development and also to incorporate increasingly complex processes such as pedogenesis (Vanwalleghem et al., 2013;Welivitiya et al., 2016) and periglacial processes Egholm et al., 2015). Other processes are now being handled in more detail such as hydrodynamic flow models and aeolian processes (Adams et al., 2017;Liu and Coulthard, 2017). These developments led  to describe them as "second generation" LEMs that extend previous explanatory and explorative models used for prediction of future changes in landscapes, such as for the mining industry (e.g. Hancock et al., 2017;Saynor et al., 2012).
However, more detailed physical representations of the processes that shape the Earth's surface involve a larger number of parameters that are typically estimated from proxy data or theoretical considerations, or are completely unknown (Oreskes et al., 1994;Petersen, 2012). If LEMs are to be operationally used for prediction or as decision-making tools in the future, their outputs must be evaluated against the uncertainty in the input parameters -a task that is increasingly difficult for a large number of parameters. Sensitivity analysis (SA) investigates how variations in the output of a numerical model can be attributed to its input factors (Pianosi et al., 2016). This is useful for identifying key parameters for later calibration, although SA has rarely been conducted for LEMs. Thus, the aim of this study is to conduct a SA of the widely used and highly parameterised LEM CAESAR-Lisflood  -in particular, we wish to be able to detect the parameters that have the greatest influence on the model's simulation output. As model sensitivity may be influenced by different landscapes, we run the SA in two individual and distinct catchments.

Sensitivity analysis and landscape evolution models
The application of SA in environmental modelling has a history spanning 4 decades (Norton, 2008) and forms an important component of using models for decision-making, including model development, calibration, and uncertainty analysis (Yang, 2011). SA addresses five key questions Neumann, 2012;Song et al., 2012Song et al., , 2015: 1. Which parameters have the greatest influence on the model?
2. If additional data could be used to reduce the uncertainty in a parameter, which would most reduce the model output variance?
3. Are there parameters with such low influence that their values could be fixed without impacting the model outputs?
4. If parameter values emerge as incorrect, how will they influence model outputs?

Which parameters influence model outputs in different regions (parameter space)?
Clearly, based on the above, an appraisal of model sensitivity is important to fully understand and apply model results. In a review of the applications of SA in environmental models, Yang (2011) identified two common approaches to SA -local and global. Local SAs are limited, considering only the impacts of factors on model outputs locally, i.e. within a restricted region of the model's parameter space. Conversely, global SAs typically utilise Monte Carlo methods to assess the sensitivity of impacts across the whole parameter space (Yang, 2011). For complex models with nonlinear behaviours, the use of local SAs can be highly biased as they neglect the non-linear interactions between parameters (Oakley and O'Hagan, 2004;Pappenberger et al., 2006;Yang, 2011). Global SAs are more computationally expensive, but as the methods are more reliable, they are attractive to modellers (Yang, 2011).
The use of SA as a routine component of model assessment and calibration is common place in climatic, meteorological, hydrological, hydraulic, and many other modelling fields. However, for LEMs there are surprisingly few examples of SA being carried out. This can be explained by three interrelated issues: (i) LEMs typically have a large number of model parameters; (ii) long model runtimes can make multiple simulations for SA impractical; and (iii) model behaviour can be highly non-linear (e.g. Coulthard and Van De Wiel, 2007;Larsen et al., 2014;Van De Wiel and Coulthard, 2010), leading to potentially complex SA interpretations. Large numbers of model parameters and long runtimes, in particular, make Monte Carlo methods extremely time consuming -and therefore often unviable.
There are several studies on how LEMs respond to variable forcing, process changes, and model parameters, including changes in climate variability and precipitation resolution (Armitage et al., 2018;Coulthard and Skinner, 2016;Ijjasz-Vasquez et al., 1992;Tucker and Bras, 2000), channel widths (Attal et al., 2008), vegetation (Collins, 2004;Istanbulluoglu and Bras, 2005), and variations in initial conditions (Hancock, 2006;Hancock et al., 2016;Ijjasz-Vasquez et al., 1992;Willgoose et al., 2003). Campforts et al. (2017) investigated how different numerical solvers affect LEM simulation. However, few studies explicitly perform SA and most of the applications described above are exploring LEM sensitivity to processes, or changes in environmental conditions, and are more correctly referred to as exploratory tests (Larsen et al., 2014). Conversely, investigations to ascertain the model's response to potential uncertainties (e.g from model parameterisation) can be deemed as true SA (e.g. Armitage et al., 2018;Coulthard and Skinner, 2016;Hancock et al., 2016).
Hydrological models faced similar issues to LEMs in the past, i.e. model complexity and long processing times when applying SA. To overcome these problems, hydrologists used the Morris method (MM; Morris, 1991). The MM can be regarded as a global SA, although it actually performs multiple local SAs sampled from across the full parameter spacethis produces a series of local evaluations, the mean of which is an approximation of the global variance (van Griensven et al., 2006;Norton, 2009;Saltelli et al., 2000). The main strength of the MM is its computational efficiency. Herman et al. (2013) showed that the MM could estimate similar variance in model outputs to the Sobol variance-based global SA method (Sobol, 2001), yet required 300 times fewer evaluations, and significantly less data storage for an application to a distributed catchment hydrological model. The robustness of this approach has been further shown by numerous studies (e.g. Brockmann and Morgenroth, 2007;Pappenberger et al., 2008;Yang, 2011). However, the MM cannot provide a full quantitative assessment of parameter sensitivity and is dependent upon the user-defined bounds to the parameter space. It can successfully rank parameters between the least and most influential to model outputs, but cannot determine parameters' exact relative influence (Brockmann and Morgenroth, 2007). These advantages and limitations mean that MM has primarily been used during the prescreening stage of models, isolating the most influential parameters for further SA with quantitative, yet more computationally expensive, methods (e.g. Ratto et al., 2007;Song et al., 2015;Yang, 2011;Ziliani et al., 2013). Ziliani et al. (2013) performed a two-stage SA for the CAESAR LEM, utilising the MM (as adapted by Campolongo et al., 2007). Whilst this study demonstrated the feasibility of applying the MM as a global SA to a reach-scale LEM, it was applied as a prescreening stage to identify the most relevant parameters for model calibration. In contrast, our study focuses on SA as a tool to investigate parameter influence on model behaviour.

Metrics for landscape evolution model assessment
Evaluating LEMs is challenged by the paucity of comprehensive field data against which they can be assessed and the lack of measures for calibration and validation Hancock and Willgoose, 2001;Tucker and Hancock, 2010). Moreover, some LEMs (e.g. CAESAR-Lisflood) simulate short-term (annual to decadal) and long-term (millennial timescales and longer) landscape changes, necessitating data and methods to assess them across variable timescales. Thus, while SAs of environmental models often rely on objective functions (e.g. the Nash-Sutcliffe score between observed and simulated values; Nash and Sutcliffe, 1970), this approach is generally not practical for LEMs. With few exceptions (e.g. Ziliani et al., 2013), results from LEMs are therefore frequently assessed qualitatively, relying on visual interpretation of the simulated landforms or cross-section profiles (e.g. Coulthard and Skinner, 2016;Hancock et al., 2010Hancock et al., , 2015Hancock and Coulthard, 2012).
Catchment outlet statistics, such as sediment yield time series, allow for comparison between simulations to indicate a catchment's response to perturbations (e.g. Coulthard and Skinner, 2016;Hancock and Coulthard, 2012). However, sediment yield time series rarely provide a sufficiently complete picture of a catchment's geomorphic response. For example, Coulthard and Skinner (2016) showed that simulations calibrated to provide equivalent sediment yields produced different landforms. For planning purposes these internal catchment changes are likely to be more useful than catchment sediment yields. Moreover, changing topography potentially instigates a feedback process that leads to complex, often non-linear catchment behaviour (Coulthard andVan De Wiel, 2007, 2013;Hancock et al., 2016;Jerolmack and Paola, 2010;Van De Wiel and Coulthard, 2010). Finally, the spatially and temporally het-erogeneous response of erosion and deposition patterns in LEMs also makes "pixel-to-pixel" comparisons difficult. For example, in a valley reach, gross patterns of erosion and deposition may be identical but with the channel on the other side of the valley -yielding a poor pixel-to-pixel comparison.
Only a few studies have tested metrics to compare topographic data or physical experiments to simulated elevation changes by LEMs (Hancock et al., , 2011Hancock and Willgoose, 2001;Ibbitt et al., 1999). However, although the metrics often suggested a good agreement, visual analysis of the final DEMs indicated clear differences between the physical models and the simulations (Hancock and Willgoose, 2001). Therefore, there is a clear need for better statistical methods for critically evaluating and comparing landscapes that can also be used for evaluating the accuracy (or otherwise) of LEMs.
The paucity of observational data and the lack of measures that amalgamate the complexity of spatio-temporal landscape change into a single metric have prevented the objective function approach from being commonly utilised in modelling landscape evolution. Instead, LEMs can be evaluated by observing the changes in model outputs reflective of model behaviour -these model functions can be used in lieu of objective functions to allow the sensitivity of LEMs to be assessed. Model functions would be best used as a set in combination to allow assessment across a range of model behaviours, and would also be transferable across a range of catchments. Such an approach formalises existing methods of evaluating LEM outputs and provides a framework from which multi-criteria objective function approaches can be applied when suitable observations become available.

A global SA for a catchment LEM
This study uses MM to assess the sensitivity of the CAESAR-Lisflood model to a range of user-defined parameters; therefore, it demonstrates the first application a global SA to a catchment LEM. We selected 15 model parameters (here we consider the choice of sediment transport formula as a parameter) either due to their known importance to the model or because the model's response to the parameter is presently poorly understood. Although not all of the 15 model parameters are universal between LEMs, many LEMs have equivalents. Moreover, we developed a set of 15 model functions that reflect core behavioural responses of the model. These will indicate whether the same parameters influence all behaviours, or whether the different behaviours respond to different parameters. The choice of 15 model parameters and 15 model functions is coincidental. We conducted the SAs in two catchments with contrasting environmental settings to assess how transferable an individual SA is to different conditions.
It is important to state that this study is an illustration of the potential for using the MM to inform an operator of how model parameter choices can impact the performance and behaviour of their model. It is not an attempt to reproduce or calibrate the CAESAR-Lisflood model to real-world observations, although the model has been previously applied to each catchment.

Methods
We apply the MM to perform a global SA on the CAESAR-Lisflood model for two contrasting catchments (more detail in Sect. 2.3): the Upper Swale, UK (181 km 2 , temperate, perennial), and Tin Camp Creek, Australia (0.5 km 2 , tropical, ephemeral). Each individual simulation runs for a 30-year period, where the first 10 years are used as a spin-up to reduce the impacts of transient model behaviour; therefore, output analysis starts after year 10 of the simulation. The CAESAR-Lisflood model is used in catchment mode, the simulations have no representation of suspended sediments or bed rock, and the dune and soil evolution modules are not used. Form drag is not directly considered within the model but is reflected within the setting of the Manning n roughness coefficient. For each catchment, we assess the 15 user-defined parameters against a set of 15 model functions. Finally, we also assess the changes in elevations across different sections of the catchments.
For clarity, we define some terms here that are frequently used throughout this paper: -Parameter -adjustable value within a model. The value is determined during model set-up and remains constant throughout a given simulation. The value is often based on recorded values or adjusted during calibration.
-Objective function -an error score between model outputs and observations used to evaluate model performance.
-Model function -a measure derived from model outputs used to evaluate model behaviour in lieu of an adequate objective function.
-Elementary effect (EE) -a value used as part of the Morris method, indicating the change in function value (objective or model) resulting from a change of parameter value during a single repeat.
-Main effect (ME) -the mean of the elementary effects from all repeats, for a specified parameter and a specified function.

CAESAR-Lisflood
The LEM used is the CAESAR-Lisflood model . CAESAR-Lisflood is a second generation LEM, capable of simulations with greater physical realism than first generation models but also with increased complexity -the model features a large number of fixed, physically based, or user-defined parameters. This additional complexity may result in an increased non-linearity and sensitivity to model parameters. We used CAESAR-Lisflood v1.8f, without any additional modifications to the model's functionality from the version freely available online. A full description of the CAESAR-Lisflood model can be found in , and its core functionality is only summarised here. The model utilises an initial DEM built from a regular grid of cells; in the catchment mode (as used in this model set-up) it is driven by a rainfall time series, which can be lumped or spatially distributed (Coulthard and Skinner, 2016). At each time step the rainfall input is converted to surface runoff using TOPMODEL (Beven and Kirkby, 1979), and it is distributed across the catchment and routed using the LISFLOOD-FP component (Bates et al., 2010). The CAESAR component of the model drives the landscape development using sediment transport formulae based on flow depths and velocities derived from the LISFLOOD-FP component. Bedload is distributed to neighbouring cells proportionally based on relative bed elevations. This study did not use the suspended sediment processes in the model. The model can handle nine different grain sizes, and information is stored in surface and subsurface layers where only the top surface layer is "active" for erosion and deposition. A comprehensive description of this process can be found in Van De Wiel et al., 2007).
CAESAR-Lisflood has been freely available and since 1996, and there have been over 60 published studies using the model over a wide range of temporal and spatial scales (Skinner and . These previous studies provide useful background regarding model parameter interactions, and help to inform the choice of the user-defined parameters utilised for the SA as described in Sect. 2.4. Some studies have also investigated the model's sensitivities to external factors -for example, Coulthard and Skinner (2016) investigated the sensitivity of the CAESAR-Lisflood model to the spatial and temporal resolution of precipitation. Other studies have investigated the influence of individual processes or forcings. For example, Coulthard and Van De Wiel (2017) examined how land use influences the outputs of the model.

Morris method
Our study used the MM described in Ziliani et al. (2013), i.e. the original MM of Morris (1991), as extended by Campolongo et al. (2007), and applied the "sensitivity" package in the R statistical environment (Pujol, 2014) to generate the parameter sets for the SA.
To set up the MM we selected a number of parameters to be assessed, specifying a minimum and maximum range for each, in addition to a number of iterative steps. The parameter values were equally spaced based on the range and number of steps -for example, a parameter with a range of 2 to 10 and 5 iterative steps would have available values of 2, 4, 6, 8, and 10. This was done for each parameter and, where possible, the same number of iterative steps was used for each.
The MM samples the global parameter space by performing multiple local SAs referred to as repeats. The first simulation in each repeat is made up of a randomly assigned selection of parameter values from the available values. To set up the second simulation in the repeat a single parameter is randomly selected and its value changed by a random number of iterative steps -if we use the example above, if simulation 1 used the value 4, changing this to 2 or 6 would be one iterative step change (where one step is a change in value of 2), to 8 would be two steps, and using 10 would be three steps. For simulation 3 in the repeat another randomly selected parameter is changed although previously changed parameters are no longer available to be selected. This is continued until no further parameters are available to be changed; therefore, in our study each repeat contains 16 tests -1 starting set of parameters, and 15 parameter changes. In this study we used 100 repeats, for a total of 1600 individual simulationsfor comparison, the implementation of the MM by Ziliani et al. (2013) used 10 repeats.
The sensitivity of the model to changes in parameter values is evaluated by the changes of objective function val-ues between sequential tests within repeats relative to the number of incremental steps the parameter value has been changed by. The change in objective function score between two sequential tests divided by the number of incremental step changes is an elementary effect (EE) of that objective function and the parameter changed, as shown by Eq. (1): Here d ij is the value of the j th EE (j = 1, . . ., r; where r is the number of repetitions -r = 100 in this study) of the ith parameter (e.g. i = 1 refers to sediment transport formula, see Table 1), x i is the value of the ith parameter, k is the number of parameters investigated (here 15), y (x 1 , x 2 , . . ., x k ) is the value of the selected objective function, and i is the change in incremental steps parameter i was altered. After all 1600 tests were performed, the main effect (ME) for each objective function and parameter was calculated from the mean of the relevant EEs -the higher the ME the greater the model's sensitivity. Alongside the ME, the standard deviation of the EEs was also calculated as this provides an indication of the non-linearity within the model. The Swale catchment, UK, is a medium sized basin (181 km 2 ) with 500 m of relief (Fig. 1). It has been used extensively in previous CAESAR/CAESAR-Lisflood applications Coulthard and Macklin, 2001;Coulthard and Skinner, 2016;Coulthard and Van De Wiel, 2013). For this SA, it represents a medium basin in a temperate climate. All simulations on the Swale use a 50 m resolution DEM based on airborne lidar. Precipitation inputs are 10 years of NIMROD composite radar rainfall estimates (Met Office, 2003), applied at a 1 h temporal and 5 km spatial resolution, and repeated three times for a 30-year time series.

Tin Camp Creek, Australia
The Tin Camp Creek catchment is a small sub-catchment (0.5 km 2 ) of the full Tin Camp Creek system Hancock, 2006) (Fig. 1). The basin has 45 m of relief and is in the tropical region of the Northern Territory, Australia. In contrast to the Swale, Tin Camp Creek is a small basin and the region has pronounced wet and dry seasons, with short intense rainstorms a feature of wet season precipitation. The DEM has a 10 m grid cell resolution produced from high-resolution digital photogrammetry (Hancock, 2012).The rainfall input is taken from observations from a single rain gauge at Jabiru Airport, providing a 1 h lumped (single catchment-average) resolution time series for 23 years, with the first 7 years repeated to produce a continuous 30-year time series.

Stream orders
The changes in the mean elevation across different areas of the catchments were assessed as an illustration of spatial differences in geomorphic change. Each basin was subdivided into regions corresponding to the watersheds of five stream orders based on the proportion of the catchment drained in the initial DEM -1st ≤ 1 %; 2nd ≥ 1 %; 3rd ≥ 10 %; 4th ≥ 25 %; and 5th ≥ 50 % (see Fig. 1). This method is novel and was developed to provide a consistent method of subdividing both catchments independent of factors such as connectivity and DEM resolution.

User-defined parameters
The MM implemented here used 15 user-defined parameters, each with 5 iterative step values (as described in Sect. 2.2). The only exception was the choice of sediment transport formula parameter (SED, Table 1) where only two options were available. The parameters, their ranges, and available values are shown in Table 1. The MM varies the value of each parameter tested once per repeat, and here we use 100 repeats. Therefore, careful consideration was required in the selection of parameters as each parameter tested added 100 model runs to the test -there are 49 user-defined parameters in the version of the CAESAR-Lisflood model used (v1.8), and even excluding parameters associated with dune and soil development, there are still 35 user-defined parameters. To test each would require 3600 model runs for each catchment, although the inclusion of some parameters is likely to add little value. Thus, the abovementioned parameters were narrowed to a set of 15 user-defined parameters (Table 1) with the selection based largely on prior knowledge of the importance of these parameters, or due to a lack of previous knowledge of the influence of the parameters on the model -full justification of the selection of parameters, and descriptions of their purpose within the model, can be found in the Supplement Sect. S1.
The MM is subjective in that the relative sensitivities shown depend on the minimum and maximum range values set by the user. Therefore, it is necessary to set each parameter's range to be broadly equal to the others in order to obtain useful information. To be consistent, where possible we used a default value taken from past calibrations and varied this by ±25 % and ±50 %. There are some instances where this was not appropriate and a minimum and maximum bound was set instead, with five iterative steps of equal distance determined (for example, the Manning n roughness for Tin Camp Creek where ±50 % would have resulted in physically unrealistic values -see Table 1 for values used).
Here we considered the selection of the sediment transport formula as a parameter despite the fact that doing so is to change the functional form of the model. For clarity, and in line with how the choice is presented within the graphical user interface of the model, we henceforth consider this choice in the same way as a parameter. The sediment transport formulae employed for SED were from Einstein (derived for sand-bed rivers) (Einstein, 1950) and Wilcock and Crowe (formulated on sediment ranges between 0.5 and 64 mm) (Wilcock and Crowe, 2003). These were not selected as representing the best fit for the catchments simulated but because they are the formulae available in the unmodified version of CAESAR-Lisflood. The sediment transport formulae parameter was applied as a binary choice, with the model switching from one formula to the other once per repeat (no other parameter values were varied when this occurs, as per the description of the MM in Sect. 2.2). It was assumed that this change constituted a single iterative step change for calculating related EEs.
Grain size distribution has been shown to influence erosion patterns and erosion rate . It is more difficult to define iterative steps for the sediment grain size sets which include nine different grain sizes and proportions in each. Instead, these were skewed by altering the proportions of the five smallest grain sizes ±25 % and 50 %, and the opposite for the four largest grain sizes, before adjusting the final proportions to equal one based on the relative values. This produces two sets biased for smaller grain sizes (sets 1 and 2), and two sets biased for larger grain sizes (sets 4 and 5), as well as the default grain size set (Set 3; Fig. 2). Note, that the grain size sets presented in Fig. 2 contain non-cohesive silts and this requires an extrapolation of the sediment transport formulae .

Model functions
The common method of assessing a model's sensitivity to parameter values via SA, and the method employed by the MM, is to observe the variations to objective function measures. However, the difficulties in applying an objective function approach to LEMs were previously highlighted in Sect. 1.2; thus, in order to apply a SA a novel approach is required. The method we developed eschews the objective function approach and instead assesses the model against a series of model functions designed to reflect some of the core behaviours displayed in the model -these can be seen in Table 2. This represents a philosophical difference to traditional applications of SA -here we are not testing the model against To summarise the large amount of information produced, the ME of each parameter and model function combination was normalised based on the proportion of the ME for the highest ranking parameter for that model function; therefore the highest ranked parameter for each model function always scored a rank of 1. The scores for each parameter were aggregated for across all model functions based on the mean of the scores. The model functions were subdivided into core behaviour groups (Table 2), and the scores were aggregated again for each core behaviour. The same was also carried out, separately, for the standard deviations of each parameter and model function. Figure 3 shows the spread of parameter influence for both catchments, where a higher mean of the aggregated MEs indicates greater sensitivity in the model to that parameter, and a higher standard deviation shows greater non-linearity when interacting with other parameters. Table 3 shows the param- eters ranked for both catchments, based on the aggregated mean ME values. The most influential parameter is SED (see Table 1 for full description of parameter abbreviations), which is ranked first for both catchments and is also the most influential by a reasonable margin, with an aggregated mean at least 0.2 higher than the second ranked parameter. Other parameters, such as VEG, IOD, MNR, MinQ, and GSS, rank highly or mid range. There is a visually close correlation between the most influential parameters and those that display the most non-linearity (Fig. 3).

Catchment sediment yield vs. internal geomorphology
The core behaviours of catchment sediment yield and internal geomorphology show a different response to the changes in parameter values, as can be seen in Fig. 4, and also in the rankings in Table 4. For both catchments, SED is ranked as the most influential parameter for catchment sediment yields. For influence on the internal geomorphology, SEC ranks higher in the Tin Camp Creek catchment. The Upper Swale catchment displays a similar response to both behaviours, with SED and MNR most influential and by similar amounts, although GSS has less influence on internal geomorphology. The change in response is more varied for Tin Camp Creek -SED is less influential on internal geomorphology, and SEC is the most influential with a higher aggregated mean. Furthermore, GSS is slightly less influential, MNR is slightly more influential, and VEG is more influential on the internal geomorphology than it is on catchment sediment yield. For both model functions, there is again a strong visual correlation between those parameters showing the most influence and those showing the most non-linear behaviour. . Aggregated scores for all elementary effects where the numbers 1-15 represent the following: 1 is the sediment transport formula (SED); 2 is the maximum erode limit (MEL); 3 is the in channel lateral erosion rate (CLR); 4 is the lateral erosion rate (LAT); 5 is the critical vegetation shear stress (VEG); 6 is the grass maturity rate (MAT); 7 is the soil creep rate (SCR); 8 is the slope failure threshold (SFT); 9 is the in/out difference (IOD); 10 is the minimum Q value (MinQ); 11 is the maximum Q value (MaxQ); 12 is the slope for edge cells (SEC); 13 is the evaporation rate (EVR); 14 is the Manning n roughness coefficient (MNR); and 15 is the grain size set (GSS).

Changes in the mean elevations
The test results were binned by the parameter values used, and the mean changes in the mean elevations across the five stream orders were calculated - Fig. 5 illustrates how changes in parameter values might influence the spatial patterns of landscape change using SED, VEG, and GSS as examples. For SED ( Fig. 5a and b), the most obvious differ- Table 3. Parameters ranked by means for each catchment from the aggregated scores for all elementary effects. SED is the sediment transport formula; MEL is the maximum erode limit; CLR is the in channel lateral erosion rate; LAT is the lateral erosion rate; VEG is the vegetation critical shear stress; MAT is the grass maturity rate; SCR is the soil creep rate; SFT is the slope failure threshold; IOD is the in/out difference; MinQ is the minimum Q value; MaxQ is the maximum Q value; SEC is the slope for edge cells; EVR is the evaporation rate; MNR is the Manning n roughness coefficient; and GSS is the grain size set.
Rank (by mean: Upper Tin Camp 1 is the most influential) Swale Creek  (Fig. 5b) the spatial changes are similar, but for the larger Swale (Fig. 5a) there are differences in relative rates in 2nd and 4th order areas. In the Swale, VEG (Fig. 5c) appears to have little impact on the patterns and scale of changes, yet in Tin Camp Creek (Fig. 5d) there is a reduction in the erosion rates across the catchment with higher VEG values, except in the 5th order areas which remain at a similar level. Finally, both catchments show a reduction in rates of erosion with a greater proportion of larger grain sizes, although this is more pronounced in 4th order areas in Tin Camp Creek (Fig. 5f).

Discussion
The results reveal some important general insights into the application of SA to LEMs, in addition to insights into specific behaviours of the CAESAR-Lisflood model. Here  where the numbers 1-15 represent the following: 1 is the sediment transport formula (SED); 2 is the maximum erode limit (MEL); 3 is the in channel lateral erosion rate (CLR); 4 is the lateral erosion rate (LAT); 5 is the critical vegetation shear stress (VEG); 6 is the grass maturity rate (MAT); 7 is the soil creep rate (SCR); 8 is the slope failure threshold (SFT); 9 is the in/out difference (IOD); 10 is the minimum Q value (MinQ); 11 is the maximum Q value (MaxQ); 12 is the slope for edge cells (SEC); 13 is the evaporation rate (EVR); 14 is the Manning n roughness coefficient (MNR); and 15 is the grain size set (GSS).

Model functions
Our findings show that different model functions provide us with different indications of model sensitivity. This has important implications for how to measure LEM performance -and more widely how to quantify and assess geomorphic change within a basin. For example, Fig. 4 and Table 4 show how any LEM assessment must depend on the applied metric for comparison. Model functions that quantify sediment yield (derived at the catchment outlet) indicate different sensitivities compared to model functions that quantify the in- ternal landform response (based on spatial measures from within the catchment). Whilst at-a-point sediment yields are straightforward to extract from simulation data and easily related to field measurements (e.g. gauges, although these have their own associated uncertainties), similar or identical yields may conceal very different behaviours within the basin. This highlights an important aspect of LEM calibration: changes in sediment yields from a catchment outlet only provide partial information on what is changing internally. Therefore, we argue that metrics incorporating "spatial" changes in the basin (as well as bulk figures) are vital for assessing LEM performance. (i.e. time series of high resolution DEM data from lidar/photogrammetry). This is especially important as the shape of the landscape -where material has been eroded and deposited -is effectively the basins geomorphic memory and will directly influence subsequent model performance.
For other basin-scale models (e.g. hydrological models) this aspect is possibly not as important over longer-terms given the limited temporal extent memory of basin antecedence. Some of the challenges of LEM output comparison are similar to those of meteorology/climatology and may require a shift in expectation from end users as to what is possible. For example, predicting detailed patterns of local erosion and deposition is akin to predicting weather (low comparability MaxQ SFT EVR EVR especially over longer timescales) but more general (spatial and temporal) patterns of basin change are similar to climate predictions (better comparability especially for longer timescales).

Sediment transport formulae
Our SA shows that the choice of sediment transport formula (SED) has a very strong impact on the model functions. As sediment transport formulae are also integrated into other LEMs and geomorphic models they will affect their outcomes too. Looking at the sediment transport formulae themselves, Gomez and Church (1989) tested 11 different sediment transport formulae using the same data sets and showed widespread variation in predictions -in some cases over orders of magnitude. The variation in the model performance can be explained by the derivation of the sediment transport formulae themselves, which are often theory-based but fitted to limited laboratory and field data, sometimes representing temporal averages over equilibrium conditions (Gomez and Church, 1989). The formulae do not, and were likely never intended to, represent the full variation of actual flow conditions in a natural river. As LEMs commonly amalgamate a set of geomorphic models or transport formulae, their perfor-mance hinges on a number of individual model components. Therefore, when applied to different situations, they may not be appropriate (Coulthard et al., 2007a).

Implications for calibrating LEMs
This, however, presents a challenge, as it is highly likely that the sediment transport formula to be used was neither designed nor calibrated for a particular model application. The SIBERIA model Hancock and Willgoose, 2001;Willgoose et al., 2003) overcomes this issue by incorporating a version of the Einstein sediment transport formula (Einstein, 1950) that is calibrated or tuned to field data on erosion rates. However, even when calibrated, LEMs (and their sediment transport formulae) face another hurdle due to the non-stationarity of basin sediment yields. For example, a calibrated LEM will be adjusted to perform for a set of observed sediment outputs or erosion and deposition patterns. If, due to climate change for example, sediment supply, rainfall, or channel flows increase outside of the range of the initial calibration, then that initial calibration may no longer be valid (Coulthard et al., 2007b). This is similar to issues faced when calibrating hydrological models (e.g. Li et al., 2012), although the non-linear sediment response of LEMs like CAESAR-Lisflood  may make LEMs more sensitive to these problems. This non-linear sediment response to hydrological increases can be traced to the calculation of sediment transport as a square or cubic function of flow velocity. Furthermore, this analysis suggests that detailed justification and calibration of model choices around sediment transport will lead to the most effective gains in model skill.

Full uncertainty analysis
It is important to note that the MM does not provide an absolute value of sensitivity, but ranks each factor based on its relative influence on the model. This means it can be used to assess the main sources of uncertainty on a particular model set-up. The next step is then to establish how the uncertainty caused by model parameters (e.g. the choice of sediment transport formula) compares to other identified sources of uncertainty, such as rainfall input uncertainty, DEM observation and resolution uncertainty, and the length of the spin-up period. For example, it may be that the choice of the sediment transport formula may only be a minor source of uncertainty compared to the DEM resolution, or equally, it might be the most significant source of uncertainty in a LEM's output. Importantly, whilst the simulation of the long-term development of landscapes may be somewhat resilient to some uncertainties, e.g. initial conditions , any attempt to reproduce, predict, or forecast physical changes should have the same appreciation of uncertainty and rigorous testing that is applied to models in other fields (e.g. hydrology and hydraulics). There are many methods avail-able, but when discussing CAESAR-Lisflood the applications applied to LISFLOOD-FP seem a reasonable place to start. LISFLOOD-FP has been rigorously tested and benchmarked for decision-making purposes Neelz and Pender, 2013), and the use of SA to assess model response and uncertainty is standard practise (Di Baldassarre et al., 2009;Fewtrell et al., 2008Fewtrell et al., , 2011Hall et al., 2005;Bates, 2001, 2002;Hunter et al., 2008;Neal et al., 2011;Sampson et al., 2012), often as a stage of calibration using the GLUE method (Aronica et al., 2002;Bates et al., 2004;Horritt et al., 2006;Pappenberger et al., 2007;Wong et al., 2015). Uncertainty in model predictions can be accounted for by utilising probabilistic measures and uncertainty cascades (for example, Pappenberger et al., 2005;Stephens et al., 2012). This is not considered unique to CAESAR-Lisflood, and any application of an LEM or other geomorphic model for operational, decision-making, or forecasting applications should take full consideration of all associated uncertainties.

Limitations
The main limitation of the MM is the subjectivity in the selection of parameter values and ranges. Here, this has been mitigated by consistently selected ranges of ±50 % of a default value obtained from previous calibrations (where feasible). An issue emerges with categorical parameters, such as SED, where multiple values cannot be placed in a spectrum across a range between minimum and maximum values. The MM has no formal method for dealing with such categorical parameters, so here it has been assumed that switching from one formula to another is a single iterative step change, and this would be the same even with more choices available. This reflects the purpose of the MM, which is to inform the user about the relative importance of choices of parameter values on the performance/behaviour of the model. However, to assess the impact of this single step-change assumption, we performed a further analysis, where it was assumed that switching formula was a change of four iterative steps. This analysis shows that the relative sensitivity of the model to the sediment transport formula choice becomes less important, with other parameters such as Manning n roughness and grain size sets increasing in relative influence (see Supplement Sect. S2 for full results of this analysis).
An obvious limitation to this exercise is computational resource. This study incorporated 1600 individual model runs to test the behavioural response of the model to 15 parameters, in just two catchments, and this partly influenced the choice to limit simulation periods to 20 years. The bulk of the simulations used Intel i7-5960X processors and solid state drives (SSD), yet the runtimes varied considerably depending on the parameter sets chosen. As an indication, the mean simulation runtime for the first repeat in each catchment was 11 h and 23 min for the Swale and 21 min for Tin Camp Creek. We used the batch mode functionality of CAESAR-Lisflood to run simulations of each repeat (16 model runs each) consecutively, and distributed batches across different machines -this is feasible for the model set-ups described. However, for long-term simulations for catchments the size of the Upper Swale, individual model runs can take several weeks and running several runs consecutively becomes prohibitive. One solution would be to distribute the jobs on high performance computing (HPC) facilities, where the time for a single model run would not significantly decrease, but several hundred, or even thousands, of individual model runs could be performed coincidently.
Here, the methodology was only applied to the CAESAR-Lisflood model, and although some of the findings will be unique to CAESAR-Lisflood and the model set-ups presented, they have implications for all LEMs. Importantly, the methodology can serve as a highly useful tool for users to determine the behaviour of any LEM model set-up prior to calibration and/or simulation.

Conclusions
The feasibility of performing global SA to a highly parameterised catchment LEM was demonstrated through the application of the MM to the CAESAR-Lisflood model. The analysis was repeated over two different catchments suggesting some model behaviours are universal, and others vary depending on the catchment characteristics; this provides crucial information to inform future model developments. This analysis also confirms that the sediment transport formulae are a significant source of uncertainty in LEMs, and in the CAESAR-Lisflood model the use of one formula over another can result in order of magnitude differences in sediment yields when all other factors are kept constant. Another finding with relevance to SA and the calibration of LEMs was the influence of parameters on each model function, showing that one aspect of model behaviour (e.g. catchment sediment yield) is not fully reflective of other, albeit related, model behaviours (e.g. internal geomorphology).
In addition to the above, the results reveal the parameters in CAESAR-Lisflood which exert the greatest influence, and whilst we can only apply this to the CAESAR-Lisflood model itself, it is likely that LEMs with comparable parameters will display similar behaviours. Some of the most influential parameters, like Manning n roughness coefficient, grain size distributions, and vegetation critical shear stress are physically based, so any uncertainty can be reduced by more detailed field measurements. We also show that parameters that determine the numerical efficiency of CAESAR-Lisflood exert a medium influence on the simulation results. Although some parameters exerted less influence on model behaviour relative to others, there were no parameters which did not influence the model in some way.
The application of a global SA should become a vital step in any investigation using LEMs. This paper has demon-strated that the use of the MM is efficient for this purpose and yielded some valuable insights into model behaviour that can ultimately feed back into model set-up, as well as future model development.
Author contributions. All Authors contributed to writing the text and to all stages of editing. The research design and analysis was led by CJS with extensive support from TJC. In addition, WS proposed the original idea and helped with dissemination of the tests. MJVDW developed the model functions used, and GH provided data and support for the Tin Camp Creek tests.
Competing interests. The authors declare that they have no conflict of interest.