Landslide susceptibility (LS) assessment provides a relative estimate of landslide spatial occurrence based on local terrain conditions. A literature review revealed that LS evaluation has been performed in many study areas worldwide using different methods, model types, different partition of the territory (mapping units) and a large variety of geo-environmental data. Among the different methods, statistical models have been largely used to evaluate LS, but the minority of articles presents a complete and comprehensive LS assessment that includes model performance analysis, prediction skills evaluation, and estimation of the errors and uncertainty

The aim of this paper is to describe LAND-SE (LANDslide Susceptibility Evaluation) software that performs susceptibility modelling and zonation using statistical models, quantifies the model performances, and the associated uncertainty. The software is implemented in R, a free software environment for statistical computing and graphics. This provides users with the possibility to implement and improve the code with additional models, evaluation tools, or output types. The paper describes the software structure, explains input and output, and illustrates specific applications with maps and graphs. The LAND-SE script is delivered with a basic user guide and three example data sets.

Landslide susceptibility (LS) is the likelihood of a landslide occurring in an area based on local terrain conditions (Brabb, 1984). In mathematical language, LS quantifies the spatial probability of landslides occurrence in a mapping unit, not considering the temporal probability of failure or the magnitude of the expected landslides. Landslide susceptibility has been evaluated in many locations around the world since the early 1980. Authors have evaluated LS using different partitioning of the territory as mapping units, diversified combination of explanatory variables, and distinct methods. Methods for the LS evaluation and mapping can be broadly grouped in geomorphological mapping, analysis of landslide inventories, heuristic- or index-based methods, statistically based models, and geotechnical or physically based models (Guzzetti et al., 1999). Among the different approaches, the statistical models have been largely used to assess LS. A recent revision of papers on statistical models (Malamud et al., 2014), have shown that more than 95 different model types were proposed in the literature. Malamud et al. (2014) grouped them in 20 classes, with the most frequent corresponding to logistic regression, neural networks, and data overlay. According to them, the relevant number of statistical models described in the literature is probably related to the recent increasing number of commercial and open-source packages for statistical analysis that can combine and integrate geographical data and/or Open Source Geographic Information System (GIS) (i.e. SAGA GIS, GRASS GIS). The review analysis also revealed that authors not always present a complete and comprehensive assessment of the models performance, the prediction skills evaluations, and the estimation of errors and uncertainty. On account of the large variety of applications of statistical approaches, but the scarcity of model evaluation and quantification of the errors, we have implemented LAND-SE (LANDslide Susceptibility Evaluation), a software developed to prepare landslide susceptibility models and zonation at basin and regional scale, with specific functions focused on results evaluation and uncertainty estimation. The software is implemented in R, a free software environment for statistical computing and graphics (R Core Team, 2015). This provides users with the possibility to implement and improve the code with additional models, evaluation tools, or output types.

The paper describes LAND-SE structure, explains input and output, illustrates them with maps and graphs, some applications, and provides a basic user guide. The description of the characteristics and results of statistical models and the advantage/disadvantage of model evaluation tools and matrixes is outside the scope of this paper. We have introduced a test area only to show and demonstrate possible potential applications and different output of LAND-SE.

The paper is structured as follows: Sect. 2 describes the software, its modelling approaches and the main output types, Sect. 3 illustrates the test area to illustrate the range of applications and different output of LAND-SE, and Sect. 4 formalizes some final remarks. The paper is completed by a Supplement containing the software code, a user guide and example data sets.

LAND-SE, a software for landslide susceptibility modelling and zonation was implemented and improved with respect to the code proposed by Rossi et al. (2010). The new version is coded in R (R Core Team, 2015) and it is open source. The software holds on the possibility to perform and combine different statistical susceptibility modelling methods, evaluate the results and estimate the associated uncertainty. As compared to the previous version (Rossi et al., 2010), the main improvements are related to (i) the possibility to use different cartographic units (pixel-based vs. polygon-based); (ii) the capacity to perform different types of validation analyses (spatial, temporal, random); (iii) the ability to evaluate the model prediction skills and performances using success and prediction rate curves (Chung and Fabbri, 1999, 2003); (iv) the possibility to provide results in standard geographical formats (shapefiles, geotiff); (v) an optimization and stabilization of the modelling algorithms; (vi) the possibility to use additional computational parameters to tune the calculation procedure, for the analysis of large data sets. This software version presents a relevant computer code restructuring (code refactoring), allowing for the implementation of new single statistical approaches (e.g. support vector machines, regression tree-based approaches) that can be added as new modules, preserving the basic software structure. The new structure simplifies the maintainability and improvement of the source code.

Logical schema of the LAND-SE software for landslide susceptibility modelling and zonation.

Figure 1 shows the logical schema of LAND-SE subdivided in the following
five functions:

input data preparation

single susceptibility models and zonation

combination of single models using a logistic regression approach

evaluation of single and combined LS models

estimation of uncertainty of single and combined LS models.

The input data preparation, follows two steps: (i) the choice of the cartographic unit and (ii) the selection of the criteria for the definition of the training and the validation data set.

LAND-SE is designed to use different cartographic units, reducible to pixels or to polygon-like subdivisions (e.g. slope units, geomorphological subdivisions, administrative boundaries). The input data shall be provided in tabular format, where each line represents one mapping unit with the associated attributes. Since raster data cannot be used directly as input, a preliminary conversion is required to perform the pixel-based analysis.

The choice of the mapping unit is crucial because it also determines how landslides are sampled to prepare the training and prediction (validation) subsets for the susceptibility modelling. In grid-based susceptibility assessments, several strategies are used to sample landslide pixels, the more frequent are (1) single pixel selected as the centroid of the entire landslide or the scarp area; (2) all the pixels within the entire landslide body or the scarp area; (3) the main scarp upper edge (MSUE) approach, which selects pixels on and around the landslide crown-line; and (4) the seed-cell approach that selects pixels within a buffer polygon around the upper landslide scarp area and sometimes part of the flanks of the accumulation zone (Atkinson et al.,1998; Atkinson and Massari, 2011; Goetz et al., 2015; Heckmann et al., 2014; Hussin et al., 2016; Regmi et al., 2014; Van Den Eeckhaut et al., 2010). The analysis of model sensitivity to different landslide mapping strategies and the significance of different variable combinations can be performed using LAND-SE, preparing different input files. Given the numerous possibility of variations required to set this type of evaluation, we decided not to include such functionalities in the current LAND-SE release, but we designed and implemented a command line interface (see Sect. S5 of the LAND-SE User Guide v1.0 in the Supplement) to make this analysis possible using external procedures.

To identify and separate the training and the validation data set, different criteria can be adopted. Temporal, spatial, or random subdivisions can be selected guiding the type of validation analysis. When the temporal validation is selected, secondary information not used in the model training must be provided for the area under analysis. Adopting a temporal subdivision approach, the training and the validation set are composed by the same mapping units and the analysis is performed using the same explanatory variables but different grouping variable (e.g. a different landslide inventory map, often more recent than what is used during the training phase). Differently, in the spatial and random approach, the training, and the validation data set contain different mapping units, characterized by different grouping and explanatory variables. The main difference between the spatial and the random validation is the method chosen to separate the training and the validation data set: in the first case, the data sets are spatially different (the two areas can be contiguous or not), in the second the subdivision is performed using a random selection. For the pixel-based approach, the definition of the training and the validation data set can follow the same criteria, but in the literature, the subdivision is commonly performed using a random selection (Van Den Eeckhaut et al., 2010; Felicísimo et al., 2013; Petschko et al., 2014).

LAND-SE uses different supervized multivariate statistical models to evaluate the landslide spatial probability, identifying and quantifying the relation between dependent and independent variables. According to previous studies (Carrara et al., 1991; Rossi et al., 2010; Guzzetti et al., 2006, 2012), dependent variable (or grouping variable) is computed as the absence/presence of landslides in the mapping units and is usually derived from a landslide inventory. The independent variables (explanatory variables) are obtained from available thematic information (morphometry, land cover/use, lithology, etc.). Each model is executed in two phases: a training phase, where the model reconstructs the relationships between the dependent and the independent variables, and a validation phase, where these relationships are verified in different conditions. LAND-SE calculates landslide susceptibility with four single models (Rossi et al., 2010): (i) linear discriminant analysis (LDA) (Fisher, 1936; Brown, 1998; Venables and Ripley, 2002), (ii) quadratic discriminant analysis (QDA) (Venables and Ripley, 2002), (iii) logistic regression (LR) (Cox, 1958; Brown, 1998; Venables and Ripley, 2002), and (iv) neural network (NN) modelling (Ripley, 1996; Venables and Ripley, 2002). The logistic regression model was significantly improved with respect to Rossi et al. (2010), substituting the previous code based on the “Zelig” package (Owen et al., 2013), with a more stable and performing code based on the “glm” function, included in the well-tested base R implementation (R Core Team, 2015).

LAND-SE uses a combination model (CM) based on a logistic regression approach, where the grouping variable is the presence or absence of landslides in the mapping units, and the explanatory variables are the forecasts of the selected single susceptibility models (Rossi et al., 2010). Similarly, to the single logistic regression model, the original code based on the Zelig package was substituted with the glm function. LAND-SE allows one to enable, or not enable, the execution of the combined model selecting different combinations of single models.

In the training phase, LAND-SE reconstructs the relationships between
dependent and independent variables and evaluates the prediction skills of
single and combined models (i.e. the capability to predict the original
data). In the validation phase, LAND-SE verifies the results in different
conditions and evaluates the models capability to predict independent data.
Model output of both phases is evaluated using the same tools and in
particular the following statistical metrics and indices:

the dependence among explanatory variables (Belsley, 1991; Hendrickx, 2012);

contingency tables (i.e. confusion matrixes) (Jollifee and Stephenson, 2003);

contingency plots or fourfold plots summarising the mapping units correctly and incorrectly classified by the models (Jollifee and Stephenson, 2003);

error maps showing the geographical distribution of the mapping units correctly and incorrectly classified by the models (Rossi et al., 2010);

plots showing receiver operating characteristic (ROC) curves (Green and Swets, 1966; Mason and Graham, 2002; Fawcett, 2006) and the associated area under curve (AUC) statistics;

evaluation plots showing the variation of the sensitivity (hit rate), the specificity (1-false positive rate), and of the Cohen's kappa index (Cohen, 1960);

success and prediction rate curves (Chung and Fabbri, 1999, 2003).

For each single and combined model, LAND-SE evaluates and quantifies the
uncertainty adopting a “bootstrapping” approach. Bootstrapping is a
resampling technique for estimating the distributions of statistics based on
independent observation. Bootstrapping can refer to any test or metric that
relies on random sampling with replacement (Efron, 1979; Davison and
Hinkley, 2006). The technique has been largely used to estimate errors and
uncertainties associated with classification models (among the others, Kuhn
and Kjell, 2013). In the training phase, a user-specified number of runs are
performed varying the selected data set. Descriptive statistics for the
probability (susceptibility) estimates, including the mean (

The sampling procedure implemented in LAND-SE is only focused to the estimation of the uncertainty associated with the susceptibility zonation. However, the software also outputs estimates of the performance variability in the training and validation phases providing confidence levels in the ROC plots (NCAR, 2014) and in the fourfold or contingency plots (Meyer et al., 2015). In addition, the execution of analyses that investigate sensitivity or variability of model results when varying inputs (e.g. using sampling procedures) is facilitated by the LAND-SE command line interface, which makes these analyses possible using external procedures.

LAND-SE can be executed in two different modes: the

To show LAND-SE functionalities and output types, we use as example the
landslide susceptibility modelling and zonation originating from two articles
published by Reichenbach et al. (2014, 2015). In the area selected, as an
example, we perform the following analysis, using different configurations:

polygon-based landslide susceptibility zonation

pixel-based landslide susceptibility zonation

landslide susceptibility scenarios zonation.

Shaded relief of the study area located in the Briga catchment, along the Ionian coast of Sicily (Italy). Red polygons show landslides triggered by the 1 October 2009 rainfall event.

A small area was selected to show applications and output of LAND-SE. The
area is located in the eastern portion of the Briga catchment (Fig. 2), in
the Messina province (Sicily, southern Italy). The elevation ranges from the
sea level to about 500 m and the terrain gradient from 0 to
80

After the event, a detailed landslide inventory map at 1 : 10 000 scale was prepared for the entire Briga catchment (Ardizzone et al., 2012). The inventory was obtained through a combination of field surveys carried out in the period from October to November 2009, and visual interpretation of pre-event and post-event stereoscopic and pseudo-stereoscopic aerial photographs. The inventory map shows the distribution and types of landslides triggered by the 1 October 2009 rainfall event (Fig. 2), and the distribution and types of pre-existing landslides. In addition, two maps reporting the land use in different periods were prepared exploiting available aerial photographs and very high resolution (VHR) satellite imagery (Reichenbach et al., 2014, 2015). The first map was derived from the analysis of the same black and white aerial photograph used to map pre-event landslides. The second map was obtained from the analysis of two QuickBird satellite images, the first taken on 2 September 2006 and the second on 8 October 2009 (Mondini et al., 2011).

In the area, landslide susceptibility zonation was prepared using two mapping
units: pixels and slope units. The slope units (SU) are terrain subdivisions
bounded by drainage and divide lines (Carrara et al., 1991). SU were outlined
using a 5 m resolution DEM (digital elevation model) obtained resampling the
VHR DEM provided by the Italian national Department for Civil Protection and
using

For the pixel-based analysis, we used the VHR DEM (1 m

Landslide susceptibility maps (CM) for the training
data set

This example is focused to illustrate landslide susceptibility zonation prepared using the slope unit as mapping unit. Two spatial criteria were used to define the training and validation data set, the first based on a random selection and the second on the subdivision of the entire catchment in two contiguous areas (north and south).

In the first case, the training set contained 70 % of the total slope units and the validation corresponded to the entire basin. Landslide susceptibility models were trained using a subset of available data and results were applied in validation to the entire study area. Figure 3 shows the main graphical and geographical output obtained during the training and the validation phases, including susceptibility, error and uncertainty maps, fourfold (contingency) plot, success and prediction rate curves, ROC plot, evaluation, and uncertainty plots. For simplicity, the figure shows only results of the combined model, but output for each single model are available and can be exploited for further analysis. In the example, the random selection criteria resulted in similar training and validation performances (Fig. 3). This application simulates LS zonation for a large territory, where landslide information is spotted and does not cover the entire study area. In such conditions, training cannot be performed on the entire area and a random selection of the training data set, within the surveyed area, is a reasonable solution.

Landslide susceptibility maps (CM) for the training data set
(

Pixel-based landslide susceptibility map (CM) of the test
area

In the second case, the SU located in the northern part of the Briga catchment with respect to the main river were used as a training set and the SU located in the southern portion as validation set. Figure 4 shows output, including susceptibility maps for the combined model, success and prediction rate curves, and ROC plots. As shown in Fig. 4, the spatial subdivision resulted in good model skill analysis, but reduced validation performances, underlying a poor spatial transferability (von Ruette et al., 2011; Petschko et al., 2014) of the model (i.e. poor applicability of the resulting model coefficients to different study areas). This type of application simulates LS zonation for areas where landslide information required to train the model is available only for a portion of the area. Results obtained in the training phase are then applied to estimate susceptibility to the portion of the territory where landslide data are not available. This application can be useful to evaluate the possibility to use the same model output in different portion of territory or in different areas.

This example shows a landslide susceptibility zonation prepared using the pixel as mapping unit. A random selection was chosen to prepare the training set and the validation was performed applying results on the entire study area. For this purpose, in the training set all the pixels corresponding to landslides and an equal number of pixels in stable areas were selected. Figure 5 shows the main output of the combined model prepared for the entire area during the validation phase, including susceptibility, error and uncertainty maps, fourfold (contingency) plot, prediction rate curve, ROC plot, evaluation, and uncertainty plots.

This example simulates a common and widespread susceptibility zonation approach that exploits pixel-based analysis at basin and regional scale. In such conditions, reasonable calculation times can be reached training the model with a randomly selected subset and applying results to the entire study area. Dealing with a large data set, we experienced that training the models using reduced samples (randomly selected) affects slightly the susceptibility model results and performances with a minor increase in the model uncertainty. As shown in Fig. 5, although the training was performed with a subset of the data, the model performance for the entire study area is adequate and acceptable.

This example illustrates how LAND-SE can be utilized to evaluate the impact of different land use scenarios on landslide susceptibility zonation (Reichenbach et al., 2014, 2015) comparing the distribution of stable/unstable slope units and the success rate curves. The current, the past, and possible future land use distributions were evaluated on landslide susceptibility classes. Single models (linear discriminant analysis, quadratic discriminant analysis, and logistic regression) and a combined model were prepared, exploiting the 2009 landslide events as grouping variable and morphological and land use classes as explanatory variables.

To evaluate the influence of land use change on landslide susceptibility zonation, results obtained with the 2009 land use map were applied using the 1945 land use distribution. Figure 6 portrays (on the left) the combined model prepared using the 2009 land use map, and (on the right) the zonation obtained applying the results to the 1954 land use cover. Zonation maps obtained with the same models but using the 1954 land use map show a significant reduction in the number of unstable SU. Success rate curves reveal a decrease in the model fitting performance when using the 1954 land use map, due to a reduction of slope units classified as unstable and an increase in stable terrain. In particular, the expansion of bare soil to the detriment of forested areas, in the 56 years from 1954 to 2009, determined a general increase in the susceptibility.

Moreover, to estimate the effect of land use distribution, we have designed
different scenarios obtained changing the 2009 land use cover and
heuristic and empirical approach. Assuming an increase in the forested
areas, we have considered three types of changes computed at the slope unit
scale resulting in the following scenarios: (i) 75 % decrease in the
pasture extent (scenario 1); (ii) 75 % reduction of both pasture and
cultivated areas (scenario 2); and (iii) 75 % decrease in bare soil where
the slope-unit mean angle was greater than 15

A recent review analysis on landslide statistical models revealed a large variety of statistical types, but a significant scarcity of a complete and comprehensive evaluation of the models performance and prediction skills (Malamud et al., 2014). Moreover, assessment of the input data quality (Ardizzone et al., 2002), discussion on the scale applicability, and the quantification of errors and uncertainty associated with the models are limited. In recent years there has been an increased number of commercial and open-source packages for statistical analysis that integrate geographical data and/or Open Source GIS, but software dedicated to landslide susceptibility zonation using statistical models is not available.

LAND-SE is an open-source software that performs LS modelling, zonation, results evaluation, and associated uncertainty estimation, using graphs, maps, and statistical metrics, which fill in the gaps in the large variety of statistical methods already available. LAND-SE is mainly designed to evaluate landslide susceptibility from basin (medium) to regional scale (small to very small scale). The quality, as well as significance, of model output is highly related to the scale, accuracy, and resolution of landslide and environmental input data. In the field of landslide susceptibility zonation, LAND-SE is designed to be properly and productively used by experienced geomorphologists. Experienced practitioners are expected to use the code, with the support of experts in the field of environmental planning and management for an accurate and reliable interpretation and exploitation of the results. A proper LAND-SE execution requires (i) a basic knowledge of R language to run the script; (ii) experience on multivariate statistical models and on their evaluation skills/metrics (ROC plot, contingency table and plots, success/prediction rate curves, etc.); (iii) GIS skills to prepare and handle input data; and (iv) specific expertise for an accurate and reliable interpretations of the results. All the modelling types implemented in LAND-SE are basically statistical classification techniques applicable to any multivariate analysis with a binary grouping (dependent or response) variable. This makes the code flexible and appropriate for use in other scientific fields, with minor customization and tailoring, by user with different expertise.

We think further improvements may include additional models (i.e. forest tree analysis), and tools for the input data preparation, tools for the visualization of results available now only in textual format (i.e. test of the collinearity evaluation, number of significant variables). Moreover, the software can be applied and customized to different applications, providing the users with the possibility to implement and improve the code with additional models, evaluations tools, or output types. LAND-SE can also be used to prepare models to predict particular types of slope movements (e.g. debris flow source areas, Carrara et al., 2008) or can be customized to evaluate the probability of spatial occurrence of completely diversified natural phenomena.

The LAND-SE code is provided as a Supplement together with

the software user guide (LAND-SE_UserGuide_v3_23sept2016.pdf);

data sets containing the software script (LAND-SE_v1r0b30_20160118.R), the configuration files (LAND-SE_configuration_spatial_data.txt, LAND-SE_configuration.txt), and input files (training.txt, training.shp, validation.txt, validation.shp) relative to three examples applications: (i) polygon-based landslide susceptibility zonation with a random selection of the training data set and a validation on a larger area; (ii) polygon-based landslide susceptibility zonation with training and validation performed in two different contiguous areas; (iii) pixel-based landslide susceptibility zonation with a random selection of the training data set and a validation on a larger area.

^{©}Mauro Rossi. LAND-SE is free software; it can be redistributed or modified under the terms of the GNU General Public (either version 2 of the license, or any later version) as published by the Free Software Foundation. The program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.

The LAND-SE code, the software user guide and three example data sets are
available at

The implementation and improvement of LAND-SE with respect to the version published by Rossi et al. (2010), was supported by the FP7 LAMPRE Project (Landslide Modelling and Tools for vulnerability assessment preparedness and recovery management, EC contract no. 31238). Edited by: L. Gross Reviewed by: S. Nasiri and two anonymous referees