The evaluation and verification of landscape evolution models (LEMs) has long been limited by a lack of suitable observational data and statistical measures which can fully capture the complexity of landscape changes. This lack of data limits the use of objective function based evaluation prolific in other modelling fields, and restricts the application of sensitivity analyses in the models and the consequent assessment of model uncertainties. To overcome this deficiency, a novel model function approach has been developed, with each model function representing an aspect of model behaviour, which allows for the application of sensitivity analyses. The model function approach is used to assess the relative sensitivity of the CAESAR-Lisflood LEM to a set of model parameters by applying the Morris method sensitivity analysis for two contrasting catchments. The test revealed that the model was most sensitive to the choice of the sediment transport formula for both catchments, and that each parameter influenced model behaviours differently, with model functions relating to internal geomorphic changes responding in a different way to those relating to the sediment yields from the catchment outlet. The model functions proved useful for providing a way of evaluating the sensitivity of LEMs in the absence of data and methods for an objective function approach.

Landscape evolution models (LEMs) investigate how the Earth's surface evolves over timescales ranging from hundreds to millions of years (Coulthard and Van De Wiel, 2012; Martin and Church, 2004; Pazzaglia, 2003; Tucker and Hancock, 2010; Van De Wiel et al., 2011). They represent the Earth's surface with a regular or irregular mesh and simulate how the surface evolves over time as a function of tectonic processes, and erosion and deposition from Earth surface processes. LEMs have proved to be very useful scientific tools to understand how Earth surface processes interact to shape the landscape.

More recently, LEMs have improved considerably in their ability to simulate the physical environment, and this has developed in parallel with improvements in computational efficiency and power. This allows LEMs to go beyond highly simplified models of landform development and also to incorporate increasingly complex processes such as pedogenesis (Vanwalleghem et al., 2013; Welivitiya et al., 2016) and periglacial processes (Andersen et al., 2015; Egholm et al., 2015). Other processes are now being handled in more detail such as hydrodynamic flow models and aeolian processes (Adams et al., 2017; Coulthard et al., 2013; Liu and Coulthard, 2017). These developments led Coulthard et al. (2013) to describe them as “second generation” LEMs that extend previous explanatory and explorative models used for prediction of future changes in landscapes, such as for the mining industry (e.g. Hancock et al., 2017; Saynor et al., 2012).

However, more detailed physical representations of the processes that shape the Earth's surface involve a larger number of parameters that are typically estimated from proxy data or theoretical considerations, or are completely unknown (Oreskes et al., 1994; Petersen, 2012). If LEMs are to be operationally used for prediction or as decision-making tools in the future, their outputs must be evaluated against the uncertainty in the input parameters – a task that is increasingly difficult for a large number of parameters. Sensitivity analysis (SA) investigates how variations in the output of a numerical model can be attributed to its input factors (Pianosi et al., 2016). This is useful for identifying key parameters for later calibration, although SA has rarely been conducted for LEMs. Thus, the aim of this study is to conduct a SA of the widely used and highly parameterised LEM CAESAR-Lisflood (Coulthard et al., 2013) – in particular, we wish to be able to detect the parameters that have the greatest influence on the model's simulation output. As model sensitivity may be influenced by different landscapes, we run the SA in two individual and distinct catchments.

The application of SA in environmental modelling has a history spanning 4
decades (Norton, 2008) and forms an important component of using models for
decision-making, including model development, calibration, and uncertainty
analysis (Yang, 2011). SA addresses five key questions (Cariboni et al.,
2007; Neumann, 2012; Song et al., 2012, 2015):

Which parameters have the greatest influence on the model?

If additional data could be used to reduce the uncertainty in a parameter, which would most reduce the model output variance?

Are there parameters with such low influence that their values could be fixed without impacting the model outputs?

If parameter values emerge as incorrect, how will they influence model outputs?

Which parameters influence model outputs in different regions (parameter space)?

Clearly, based on the above, an appraisal of model sensitivity is important to fully understand and apply model results. In a review of the applications of SA in environmental models, Yang (2011) identified two common approaches to SA – local and global. Local SAs are limited, considering only the impacts of factors on model outputs locally, i.e. within a restricted region of the model's parameter space. Conversely, global SAs typically utilise Monte Carlo methods to assess the sensitivity of impacts across the whole parameter space (Yang, 2011). For complex models with non-linear behaviours, the use of local SAs can be highly biased as they neglect the non-linear interactions between parameters (Oakley and O'Hagan, 2004; Pappenberger et al., 2006; Yang, 2011). Global SAs are more computationally expensive, but as the methods are more reliable, they are attractive to modellers (Yang, 2011).

The use of SA as a routine component of model assessment and calibration is common place in climatic, meteorological, hydrological, hydraulic, and many other modelling fields. However, for LEMs there are surprisingly few examples of SA being carried out. This can be explained by three interrelated issues: (i) LEMs typically have a large number of model parameters; (ii) long model runtimes can make multiple simulations for SA impractical; and (iii) model behaviour can be highly non-linear (e.g. Coulthard and Van De Wiel, 2007; Larsen et al., 2014; Van De Wiel and Coulthard, 2010), leading to potentially complex SA interpretations. Large numbers of model parameters and long runtimes, in particular, make Monte Carlo methods extremely time consuming – and therefore often unviable.

There are several studies on how LEMs respond to variable forcing, process changes, and model parameters, including changes in climate variability and precipitation resolution (Armitage et al., 2018; Coulthard and Skinner, 2016; Ijjasz-Vasquez et al., 1992; Tucker and Bras, 2000), channel widths (Attal et al., 2008), vegetation (Collins, 2004; Istanbulluoglu and Bras, 2005), and variations in initial conditions (Hancock, 2006; Hancock et al., 2016; Ijjasz-Vasquez et al., 1992; Willgoose et al., 2003). Campforts et al. (2017) investigated how different numerical solvers affect LEM simulation. However, few studies explicitly perform SA and most of the applications described above are exploring LEM sensitivity to processes, or changes in environmental conditions, and are more correctly referred to as exploratory tests (Larsen et al., 2014). Conversely, investigations to ascertain the model's response to potential uncertainties (e.g from model parameterisation) can be deemed as true SA (e.g. Armitage et al., 2018; Coulthard and Skinner, 2016; Hancock et al., 2016).

Hydrological models faced similar issues to LEMs in the past, i.e. model complexity and long processing times when applying SA. To overcome these problems, hydrologists used the Morris method (MM; Morris, 1991). The MM can be regarded as a global SA, although it actually performs multiple local SAs sampled from across the full parameter space – this produces a series of local evaluations, the mean of which is an approximation of the global variance (van Griensven et al., 2006; Norton, 2009; Saltelli et al., 2000). The main strength of the MM is its computational efficiency. Herman et al. (2013) showed that the MM could estimate similar variance in model outputs to the Sobol variance-based global SA method (Sobol, 2001), yet required 300 times fewer evaluations, and significantly less data storage for an application to a distributed catchment hydrological model. The robustness of this approach has been further shown by numerous studies (e.g. Brockmann and Morgenroth, 2007; Pappenberger et al., 2008; Yang, 2011). However, the MM cannot provide a full quantitative assessment of parameter sensitivity and is dependent upon the user-defined bounds to the parameter space. It can successfully rank parameters between the least and most influential to model outputs, but cannot determine parameters' exact relative influence (Brockmann and Morgenroth, 2007). These advantages and limitations mean that MM has primarily been used during the prescreening stage of models, isolating the most influential parameters for further SA with quantitative, yet more computationally expensive, methods (e.g. Ratto et al., 2007; Song et al., 2015; Yang, 2011; Ziliani et al., 2013).

Ziliani et al. (2013) performed a two-stage SA for the CAESAR LEM, utilising the MM (as adapted by Campolongo et al., 2007). Whilst this study demonstrated the feasibility of applying the MM as a global SA to a reach-scale LEM, it was applied as a prescreening stage to identify the most relevant parameters for model calibration. In contrast, our study focuses on SA as a tool to investigate parameter influence on model behaviour.

Evaluating LEMs is challenged by the paucity of comprehensive field data against which they can be assessed and the lack of measures for calibration and validation (Hancock et al., 2016; Hancock and Willgoose, 2001; Tucker and Hancock, 2010). Moreover, some LEMs (e.g. CAESAR-Lisflood) simulate short-term (annual to decadal) and long-term (millennial timescales and longer) landscape changes, necessitating data and methods to assess them across variable timescales. Thus, while SAs of environmental models often rely on objective functions (e.g. the Nash–Sutcliffe score between observed and simulated values; Nash and Sutcliffe, 1970), this approach is generally not practical for LEMs. With few exceptions (e.g. Ziliani et al., 2013), results from LEMs are therefore frequently assessed qualitatively, relying on visual interpretation of the simulated landforms or cross-section profiles (e.g. Coulthard and Skinner, 2016; Hancock et al., 2010, 2015; Hancock and Coulthard, 2012).

Catchment outlet statistics, such as sediment yield time series, allow for comparison between simulations to indicate a catchment's response to perturbations (e.g. Coulthard et al., 2012; Coulthard and Skinner, 2016; Hancock and Coulthard, 2012). However, sediment yield time series rarely provide a sufficiently complete picture of a catchment's geomorphic response. For example, Coulthard and Skinner (2016) showed that simulations calibrated to provide equivalent sediment yields produced different landforms. For planning purposes these internal catchment changes are likely to be more useful than catchment sediment yields. Moreover, changing topography potentially instigates a feedback process that leads to complex, often non-linear catchment behaviour (Coulthard and Van De Wiel, 2007, 2013; Hancock et al., 2016; Jerolmack and Paola, 2010; Van De Wiel and Coulthard, 2010). Finally, the spatially and temporally heterogeneous response of erosion and deposition patterns in LEMs also makes “pixel-to-pixel” comparisons difficult. For example, in a valley reach, gross patterns of erosion and deposition may be identical but with the channel on the other side of the valley – yielding a poor pixel-to-pixel comparison.

Only a few studies have tested metrics to compare topographic data or physical experiments to simulated elevation changes by LEMs (Hancock et al., 2010, 2011; Hancock and Willgoose, 2001; Ibbitt et al., 1999). However, although the metrics often suggested a good agreement, visual analysis of the final DEMs indicated clear differences between the physical models and the simulations (Hancock and Willgoose, 2001). Therefore, there is a clear need for better statistical methods for critically evaluating and comparing landscapes that can also be used for evaluating the accuracy (or otherwise) of LEMs.

The paucity of observational data and the lack of measures that amalgamate the complexity of spatio-temporal landscape change into a single metric have prevented the objective function approach from being commonly utilised in modelling landscape evolution. Instead, LEMs can be evaluated by observing the changes in model outputs reflective of model behaviour – these model functions can be used in lieu of objective functions to allow the sensitivity of LEMs to be assessed. Model functions would be best used as a set in combination to allow assessment across a range of model behaviours, and would also be transferable across a range of catchments. Such an approach formalises existing methods of evaluating LEM outputs and provides a framework from which multi-criteria objective function approaches can be applied when suitable observations become available.

This study uses MM to assess the sensitivity of the CAESAR-Lisflood model to a range of user-defined parameters; therefore, it demonstrates the first application a global SA to a catchment LEM. We selected 15 model parameters (here we consider the choice of sediment transport formula as a parameter) either due to their known importance to the model or because the model's response to the parameter is presently poorly understood. Although not all of the 15 model parameters are universal between LEMs, many LEMs have equivalents. Moreover, we developed a set of 15 model functions that reflect core behavioural responses of the model. These will indicate whether the same parameters influence all behaviours, or whether the different behaviours respond to different parameters. The choice of 15 model parameters and 15 model functions is coincidental. We conducted the SAs in two catchments with contrasting environmental settings to assess how transferable an individual SA is to different conditions.

It is important to state that this study is an illustration of the potential for using the MM to inform an operator of how model parameter choices can impact the performance and behaviour of their model. It is not an attempt to reproduce or calibrate the CAESAR-Lisflood model to real-world observations, although the model has been previously applied to each catchment.

We apply the MM to perform a global SA on the CAESAR-Lisflood model for two
contrasting catchments (more detail in Sect. 2.3): the Upper Swale, UK
(181 km

For clarity, we define some terms here that are frequently used throughout this paper:

Parameter – adjustable value within a model. The value is determined during model set-up and remains constant throughout a given simulation. The value is often based on recorded values or adjusted during calibration.

Objective function – an error score between model outputs and observations used to evaluate model performance.

Model function – a measure derived from model outputs used to evaluate model behaviour in lieu of an adequate objective function.

Elementary effect (EE) – a value used as part of the Morris method, indicating the change in function value (objective or model) resulting from a change of parameter value during a single repeat.

Main effect (ME) – the mean of the elementary effects from all repeats, for a specified parameter and a specified function.

The LEM used is the CAESAR-Lisflood model (Coulthard et al., 2013). CAESAR-Lisflood is a second generation LEM, capable of simulations with greater physical realism than first generation models but also with increased complexity – the model features a large number of fixed, physically based, or user-defined parameters. This additional complexity may result in an increased non-linearity and sensitivity to model parameters. We used CAESAR-Lisflood v1.8f, without any additional modifications to the model's functionality from the version freely available online.

A full description of the CAESAR-Lisflood model can be found in Coulthard et al. (2013), and its core functionality is only summarised here. The model utilises an initial DEM built from a regular grid of cells; in the catchment mode (as used in this model set-up) it is driven by a rainfall time series, which can be lumped or spatially distributed (Coulthard and Skinner, 2016). At each time step the rainfall input is converted to surface runoff using TOPMODEL (Beven and Kirkby, 1979), and it is distributed across the catchment and routed using the LISFLOOD-FP component (Bates et al., 2010). The CAESAR component of the model drives the landscape development using sediment transport formulae based on flow depths and velocities derived from the LISFLOOD-FP component. Bedload is distributed to neighbouring cells proportionally based on relative bed elevations. This study did not use the suspended sediment processes in the model. The model can handle nine different grain sizes, and information is stored in surface and subsurface layers where only the top surface layer is “active” for erosion and deposition. A comprehensive description of this process can be found in Van De Wiel et al., 2007).

CAESAR-Lisflood has been freely available and since 1996, and there have been over 60 published studies using the model over a wide range of temporal and spatial scales (Skinner and Coulthard, 2017). These previous studies provide useful background regarding model parameter interactions, and help to inform the choice of the user-defined parameters utilised for the SA as described in Sect. 2.4. Some studies have also investigated the model's sensitivities to external factors – for example, Coulthard and Skinner (2016) investigated the sensitivity of the CAESAR-Lisflood model to the spatial and temporal resolution of precipitation. Other studies have investigated the influence of individual processes or forcings. For example, Coulthard and Van De Wiel (2017) examined how land use influences the outputs of the model.

Elevation map for the Upper Swale catchment, UK

Our study used the MM described in Ziliani et al. (2013), i.e. the original MM of Morris (1991), as extended by Campolongo et al. (2007), and applied the “sensitivity” package in the R statistical environment (Pujol, 2014) to generate the parameter sets for the SA.

To set up the MM we selected a number of parameters to be assessed, specifying a minimum and maximum range for each, in addition to a number of iterative steps. The parameter values were equally spaced based on the range and number of steps – for example, a parameter with a range of 2 to 10 and 5 iterative steps would have available values of 2, 4, 6, 8, and 10. This was done for each parameter and, where possible, the same number of iterative steps was used for each.

The MM samples the global parameter space by performing multiple local SAs referred to as repeats. The first simulation in each repeat is made up of a randomly assigned selection of parameter values from the available values. To set up the second simulation in the repeat a single parameter is randomly selected and its value changed by a random number of iterative steps – if we use the example above, if simulation 1 used the value 4, changing this to 2 or 6 would be one iterative step change (where one step is a change in value of 2), to 8 would be two steps, and using 10 would be three steps. For simulation 3 in the repeat another randomly selected parameter is changed although previously changed parameters are no longer available to be selected. This is continued until no further parameters are available to be changed; therefore, in our study each repeat contains 16 tests – 1 starting set of parameters, and 15 parameter changes. In this study we used 100 repeats, for a total of 1600 individual simulations – for comparison, the implementation of the MM by Ziliani et al. (2013) used 10 repeats.

The sensitivity of the model to changes in parameter values is evaluated by
the changes of objective function values between sequential tests within
repeats relative to the number of incremental steps the parameter value has
been changed by. The change in objective function score between two
sequential tests divided by the number of incremental step changes is an
elementary effect (EE) of that objective function and the parameter changed,
as shown by Eq. (1):

User-defined parameters used and the min and max values for the two study catchments.

After all 1600 tests were performed, the main effect (ME) for each objective function and parameter was calculated from the mean of the relevant EEs – the higher the ME the greater the model's sensitivity. Alongside the ME, the standard deviation of the EEs was also calculated as this provides an indication of the non-linearity within the model.

The Swale catchment, UK, is a medium sized basin (181 km

The Tin Camp Creek catchment is a small sub-catchment (0.5 km

The changes in the mean elevation across different areas of the catchments
were assessed as an illustration of spatial differences in geomorphic change.
Each basin was subdivided into regions corresponding to the watersheds of
five stream orders based on the proportion of the catchment drained in the
initial DEM – 1st

The MM implemented here used 15 user-defined parameters, each with 5 iterative step values (as described in Sect. 2.2). The only exception was the choice of sediment transport formula parameter (SED, Table 1) where only two options were available. The parameters, their ranges, and available values are shown in Table 1.

The MM varies the value of each parameter tested once per repeat, and here we use 100 repeats. Therefore, careful consideration was required in the selection of parameters as each parameter tested added 100 model runs to the test – there are 49 user-defined parameters in the version of the CAESAR-Lisflood model used (v1.8), and even excluding parameters associated with dune and soil development, there are still 35 user-defined parameters. To test each would require 3600 model runs for each catchment, although the inclusion of some parameters is likely to add little value. Thus, the abovementioned parameters were narrowed to a set of 15 user-defined parameters (Table 1) with the selection based largely on prior knowledge of the importance of these parameters, or due to a lack of previous knowledge of the influence of the parameters on the model – full justification of the selection of parameters, and descriptions of their purpose within the model, can be found in the Supplement Sect. S1.

The MM is subjective in that the relative sensitivities shown depend on the
minimum and maximum range values set by the user. Therefore, it is necessary
to set each parameter's range to be broadly equal to the others in order to
obtain useful information. To be consistent, where possible we used a
default value taken from past calibrations and varied this by

Here we considered the selection of the sediment transport formula as a parameter despite the fact that doing so is to change the functional form of the model. For clarity, and in line with how the choice is presented within the graphical user interface of the model, we henceforth consider this choice in the same way as a parameter. The sediment transport formulae employed for SED were from Einstein (derived for sand-bed rivers) (Einstein, 1950) and Wilcock and Crowe (formulated on sediment ranges between 0.5 and 64 mm) (Wilcock and Crowe, 2003). These were not selected as representing the best fit for the catchments simulated but because they are the formulae available in the unmodified version of CAESAR-Lisflood. The sediment transport formulae parameter was applied as a binary choice, with the model switching from one formula to the other once per repeat (no other parameter values were varied when this occurs, as per the description of the MM in Sect. 2.2). It was assumed that this change constituted a single iterative step change for calculating related EEs.

Grain size distribution has been shown to influence erosion patterns and
erosion rate (Hancock and Coulthard, 2012). It is more difficult to define
iterative steps for the sediment grain size sets which include nine different
grain sizes and proportions in each. Instead, these were skewed by altering
the proportions of the five smallest grain sizes

Sediment grain size distribution sets for the Upper
Swale

The common method of assessing a model's sensitivity to parameter values via SA, and the method employed by the MM, is to observe the variations to objective function measures. However, the difficulties in applying an objective function approach to LEMs were previously highlighted in Sect. 1.2; thus, in order to apply a SA a novel approach is required. The method we developed eschews the objective function approach and instead assesses the model against a series of model functions designed to reflect some of the core behaviours displayed in the model – these can be seen in Table 2. This represents a philosophical difference to traditional applications of SA – here we are not testing the model against its skill in simulating the physical environment, but rather assessing how the model responds behaviourally to changes in the user-defined parameters detailed in Sect. 2.4. The 15 model functions (Table 2) are simple, scalable, and transferable between different catchment types, and can be applied to simulations of different time frames. The model functions are based on outputs which are not unique to CAESAR-Lisflood, so they can be applied to other LEM and geomorphic models

Model functions and the associated core behaviours.

The model functions were applied to the MM as described in Sect. 2.2, substituting the model functions in place of the objective functions with no further changes to the method. Model function values were calculated at the end of each simulation.

To summarise the large amount of information produced, the ME of each parameter and model function combination was normalised based on the proportion of the ME for the highest ranking parameter for that model function; therefore the highest ranked parameter for each model function always scored a rank of 1. The scores for each parameter were aggregated for across all model functions based on the mean of the scores. The model functions were subdivided into core behaviour groups (Table 2), and the scores were aggregated again for each core behaviour. The same was also carried out, separately, for the standard deviations of each parameter and model function.

Figure 3 shows the spread of parameter influence for both catchments, where a higher mean of the aggregated MEs indicates greater sensitivity in the model to that parameter, and a higher standard deviation shows greater non-linearity when interacting with other parameters. Table 3 shows the parameters ranked for both catchments, based on the aggregated mean ME values. The most influential parameter is SED (see Table 1 for full description of parameter abbreviations), which is ranked first for both catchments and is also the most influential by a reasonable margin, with an aggregated mean at least 0.2 higher than the second ranked parameter. Other parameters, such as VEG, IOD, MNR, MinQ, and GSS, rank highly or mid range. There is a visually close correlation between the most influential parameters and those that display the most non-linearity (Fig. 3).

Aggregated scores for all elementary effects where the numbers 1–15 represent the following:
1 is the sediment transport formula (SED); 2 is the maximum erode
limit (MEL); 3 is the in channel lateral erosion rate (CLR); 4
is the lateral erosion rate (LAT); 5 is the critical vegetation
shear stress (VEG); 6 is the grass maturity rate (MAT); 7 is the soil creep rate (SCR);
8 is the slope failure threshold (SFT); 9 is the in/out difference (IOD); 10 is the minimum

Parameters ranked by means for each catchment from the
aggregated scores for all elementary effects. SED is the sediment
transport formula; MEL is the maximum erode limit; CLR is the in
channel lateral erosion rate; LAT is the lateral erosion rate; VEG
is the vegetation critical shear stress; MAT is the grass
maturity rate; SCR is the soil creep rate; SFT is the slope
failure threshold; IOD is the in/out difference; MinQ is the minimum

The core behaviours of catchment sediment yield and internal geomorphology show a different response to the changes in parameter values, as can be seen in Fig. 4, and also in the rankings in Table 4. For both catchments, SED is ranked as the most influential parameter for catchment sediment yields. For influence on the internal geomorphology, SEC ranks higher in the Tin Camp Creek catchment. The Upper Swale catchment displays a similar response to both behaviours, with SED and MNR most influential and by similar amounts, although GSS has less influence on internal geomorphology. The change in response is more varied for Tin Camp Creek – SED is less influential on internal geomorphology, and SEC is the most influential with a higher aggregated mean. Furthermore, GSS is slightly less influential, MNR is slightly more influential, and VEG is more influential on the internal geomorphology than it is on catchment sediment yield. For both model functions, there is again a strong visual correlation between those parameters showing the most influence and those showing the most non-linear behaviour.

Aggregated scores for sediment yield

Parameters ranked by means for each catchment from the
aggregated scores for catchment sediment yields (SY) and internal
geomorphology (IG) elementary effects. SED is the sediment transport
formula; MEL is the maximum erode limit; CLR is the in channel
lateral erosion rate; LAT is the lateral erosion rate; VEG
is the vegetation critical shear stress; MAT is the grass
maturity rate; SCR is the soil creep rate; SFT is the slope
failure threshold; IOD is the in/out difference; MinQ is the minimum

The test results were binned by the parameter values used, and the mean changes in the mean elevations across the five stream orders were calculated – Fig. 5 illustrates how changes in parameter values might influence the spatial patterns of landscape change using SED, VEG, and GSS as examples. For SED (Fig. 5a and b), the most obvious difference is the scale of changes seen using each formula, with formulae from Einstein generally showing greater change. For Tin Camp Creek (Fig. 5b) the spatial changes are similar, but for the larger Swale (Fig. 5a) there are differences in relative rates in 2nd and 4th order areas. In the Swale, VEG (Fig. 5c) appears to have little impact on the patterns and scale of changes, yet in Tin Camp Creek (Fig. 5d) there is a reduction in the erosion rates across the catchment with higher VEG values, except in the 5th order areas which remain at a similar level. Finally, both catchments show a reduction in rates of erosion with a greater proportion of larger grain sizes, although this is more pronounced in 4th order areas in Tin Camp Creek (Fig. 5f).

Illustration of changes in the mean elevations for Upper
Swale

The results reveal some important general insights into the application of SA to LEMs, in addition to insights into specific behaviours of the CAESAR-Lisflood model. Here we discuss model functions (Sect. 4.1), sediment transport formulae (Sect. 4.2), implications for calibrating LEMs (Sect. 4.3), full uncertainty analyses of LEMs (Sect. 4.4), and limitations of this study (Sect. 4.5).

Our findings show that different model functions provide us with different indications of model sensitivity. This has important implications for how to measure LEM performance – and more widely how to quantify and assess geomorphic change within a basin. For example, Fig. 4 and Table 4 show how any LEM assessment must depend on the applied metric for comparison. Model functions that quantify sediment yield (derived at the catchment outlet) indicate different sensitivities compared to model functions that quantify the internal landform response (based on spatial measures from within the catchment). Whilst at-a-point sediment yields are straightforward to extract from simulation data and easily related to field measurements (e.g. gauges, although these have their own associated uncertainties), similar or identical yields may conceal very different behaviours within the basin. This highlights an important aspect of LEM calibration: changes in sediment yields from a catchment outlet only provide partial information on what is changing internally. Therefore, we argue that metrics incorporating “spatial” changes in the basin (as well as bulk figures) are vital for assessing LEM performance. (i.e. time series of high resolution DEM data from lidar/photogrammetry). This is especially important as the shape of the landscape – where material has been eroded and deposited – is effectively the basins geomorphic memory and will directly influence subsequent model performance. For other basin-scale models (e.g. hydrological models) this aspect is possibly not as important over longer-terms given the limited temporal extent memory of basin antecedence. Some of the challenges of LEM output comparison are similar to those of meteorology/climatology and may require a shift in expectation from end users as to what is possible. For example, predicting detailed patterns of local erosion and deposition is akin to predicting weather (low comparability especially over longer timescales) but more general (spatial and temporal) patterns of basin change are similar to climate predictions (better comparability especially for longer timescales).

Our SA shows that the choice of sediment transport formula (SED) has a very strong impact on the model functions. As sediment transport formulae are also integrated into other LEMs and geomorphic models they will affect their outcomes too. Looking at the sediment transport formulae themselves, Gomez and Church (1989) tested 11 different sediment transport formulae using the same data sets and showed widespread variation in predictions – in some cases over orders of magnitude. The variation in the model performance can be explained by the derivation of the sediment transport formulae themselves, which are often theory-based but fitted to limited laboratory and field data, sometimes representing temporal averages over equilibrium conditions (Gomez and Church, 1989). The formulae do not, and were likely never intended to, represent the full variation of actual flow conditions in a natural river. As LEMs commonly amalgamate a set of geomorphic models or transport formulae, their performance hinges on a number of individual model components. Therefore, when applied to different situations, they may not be appropriate (Coulthard et al., 2007a).

This, however, presents a challenge, as it is highly likely that the sediment transport formula to be used was neither designed nor calibrated for a particular model application. The SIBERIA model (Hancock et al., 2010, 2016, 2017; Hancock and Willgoose, 2001; Willgoose et al., 2003) overcomes this issue by incorporating a version of the Einstein sediment transport formula (Einstein, 1950) that is calibrated or tuned to field data on erosion rates. However, even when calibrated, LEMs (and their sediment transport formulae) face another hurdle due to the non-stationarity of basin sediment yields. For example, a calibrated LEM will be adjusted to perform for a set of observed sediment outputs or erosion and deposition patterns. If, due to climate change for example, sediment supply, rainfall, or channel flows increase outside of the range of the initial calibration, then that initial calibration may no longer be valid (Coulthard et al., 2007b). This is similar to issues faced when calibrating hydrological models (e.g. Li et al., 2012), although the non-linear sediment response of LEMs like CAESAR-Lisflood (Coulthard et al., 2012) may make LEMs more sensitive to these problems. This non-linear sediment response to hydrological increases can be traced to the calculation of sediment transport as a square or cubic function of flow velocity. Furthermore, this analysis suggests that detailed justification and calibration of model choices around sediment transport will lead to the most effective gains in model skill.

It is important to note that the MM does not provide an absolute value of sensitivity, but ranks each factor based on its relative influence on the model. This means it can be used to assess the main sources of uncertainty on a particular model set-up. The next step is then to establish how the uncertainty caused by model parameters (e.g. the choice of sediment transport formula) compares to other identified sources of uncertainty, such as rainfall input uncertainty, DEM observation and resolution uncertainty, and the length of the spin-up period. For example, it may be that the choice of the sediment transport formula may only be a minor source of uncertainty compared to the DEM resolution, or equally, it might be the most significant source of uncertainty in a LEM's output.

Importantly, whilst the simulation of the long-term development of landscapes may be somewhat resilient to some uncertainties, e.g. initial conditions (Hancock et al., 2016), any attempt to reproduce, predict, or forecast physical changes should have the same appreciation of uncertainty and rigorous testing that is applied to models in other fields (e.g. hydrology and hydraulics). There are many methods available, but when discussing CAESAR-Lisflood the applications applied to LISFLOOD-FP seem a reasonable place to start. LISFLOOD-FP has been rigorously tested and benchmarked for decision-making purposes (Hunter et al., 2005; Neelz and Pender, 2013), and the use of SA to assess model response and uncertainty is standard practise (Di Baldassarre et al., 2009; Fewtrell et al., 2008, 2011; Hall et al., 2005; Horritt and Bates, 2001, 2002; Hunter et al., 2008; Neal et al., 2011; Sampson et al., 2012), often as a stage of calibration using the GLUE method (Aronica et al., 2002; Bates et al., 2004; Horritt et al., 2006; Hunter et al., 2005; Pappenberger et al., 2007; Wong et al., 2015). Uncertainty in model predictions can be accounted for by utilising probabilistic measures and uncertainty cascades (for example, Pappenberger et al., 2005; Stephens et al., 2012). This is not considered unique to CAESAR-Lisflood, and any application of an LEM or other geomorphic model for operational, decision-making, or forecasting applications should take full consideration of all associated uncertainties.

The main limitation of the MM is the subjectivity in the selection of parameter
values and ranges. Here, this has been mitigated by consistently selected
ranges of

An obvious limitation to this exercise is computational resource. This study incorporated 1600 individual model runs to test the behavioural response of the model to 15 parameters, in just two catchments, and this partly influenced the choice to limit simulation periods to 20 years. The bulk of the simulations used Intel i7-5960X processors and solid state drives (SSD), yet the runtimes varied considerably depending on the parameter sets chosen. As an indication, the mean simulation runtime for the first repeat in each catchment was 11 h and 23 min for the Swale and 21 min for Tin Camp Creek. We used the batch mode functionality of CAESAR-Lisflood to run simulations of each repeat (16 model runs each) consecutively, and distributed batches across different machines – this is feasible for the model set-ups described. However, for long-term simulations for catchments the size of the Upper Swale, individual model runs can take several weeks and running several runs consecutively becomes prohibitive. One solution would be to distribute the jobs on high performance computing (HPC) facilities, where the time for a single model run would not significantly decrease, but several hundred, or even thousands, of individual model runs could be performed coincidently.

Here, the methodology was only applied to the CAESAR-Lisflood model, and although some of the findings will be unique to CAESAR-Lisflood and the model set-ups presented, they have implications for all LEMs. Importantly, the methodology can serve as a highly useful tool for users to determine the behaviour of any LEM model set-up prior to calibration and/or simulation.

The feasibility of performing global SA to a highly parameterised catchment LEM was demonstrated through the application of the MM to the CAESAR-Lisflood model. The analysis was repeated over two different catchments suggesting some model behaviours are universal, and others vary depending on the catchment characteristics; this provides crucial information to inform future model developments. This analysis also confirms that the sediment transport formulae are a significant source of uncertainty in LEMs, and in the CAESAR-Lisflood model the use of one formula over another can result in order of magnitude differences in sediment yields when all other factors are kept constant. Another finding with relevance to SA and the calibration of LEMs was the influence of parameters on each model function, showing that one aspect of model behaviour (e.g. catchment sediment yield) is not fully reflective of other, albeit related, model behaviours (e.g. internal geomorphology).

In addition to the above, the results reveal the parameters in
CAESAR-Lisflood which exert the greatest influence, and whilst we can only
apply this to the CAESAR-Lisflood model itself, it is likely that LEMs with
comparable parameters will display similar behaviours. Some of the most
influential parameters, like Manning

The application of a global SA should become a vital step in any investigation using LEMs. This paper has demonstrated that the use of the MM is efficient for this purpose and yielded some valuable insights into model behaviour that can ultimately feed back into model set-up, as well as future model development.

The data produced in this study can be made available on
request from the corresponding author. The CAESAR-Lisflood model used (version 1.8f)
is freely available under a GNU licence from Zenodo
(

The supplement related to this article is available online at:

All Authors contributed to writing the text and to all stages of editing. The research design and analysis was led by CJS with extensive support from TJC. In addition, WS proposed the original idea and helped with dissemination of the tests. MJVDW developed the model functions used, and GH provided data and support for the Tin Camp Creek tests.

The authors declare that they have no conflict of interest.

The authors wish to thank the two reviewers, Andy Wickert and Daniel Hobley, for their insightful and helpful comments which have improved this paper. The Landscape Evolution Model Sensitivity Investigation Project (LEMSIP) has emerged from the Field and Computer Simulation in Landscape Evolution (FACSIMILE) network. The aims of the project are to collate and generate knowledge pertaining to the sensitivities and uncertainties associated with landscape evolution models, and how these influence the simulation of landscape development. The authors wish to thank the Young Geomorphologists group who donated computational resource. This work was supported by the NERC Flooding from Intense Rainfall (FFIR) project, Susceptibility of Basins to Intense Rainfall and Flooding (SINATRA) project no. NE/K008668/1. Edited by: Bethanna Jackson Reviewed by: Daniel Hobley and Andrew Wickert