1 Introduction

Unexpected environmental conditions like tsunami, flood, earthquake, landslides, etc., that cause significant environmental and human loss on the surface of the earth are called natural disaster [48]. Landslide is one of the most dangerous and costly natural disasters of the earth, and it is a form of mass wasting also known as landslip or mudslide which may cause a wide range of ground movements [99]. Gravity of the earth is the main driving force for landslides occurrences [56]. It is estimated that almost 9 percentages of worldwide natural disasters constitute landslides during the 1990s [35]. Almost 600 people are killed every year throughout the world as a result of slope instability [100]; Xie et al. [102]. Both people and the properties such as building, roads, communication networks, houses, agricultural land, forest are destroyed, every year due to landslides; thus, a huge amount of money is spent globally to mitigate the destruction of landslides [14].

Himalayan region frequently faced different types of natural hazards, and landslide is one of the prominent which damages property, agriculture and human lives [9]. The area selected for the present study has also suffered a lot of damages due to landslides, triggered by heavy rainfall; thus, the area seems suitable site to evaluate the frequency and distribution of landslides [28]. Rai et al. [78] stated that more than 20,000 landslides were recorded in one day. According to Basu and Pal [9], several landslides have been induced in different parts of Darjeeling Himalaya due to intensive rainfall, earthquake and expansion of anthropogenic activities such as road construction, building construction, resort formation. In 2015 more than 38 people were killed and 500 people were displaced due to landslides during monsoon season in this area [92]. Thus, the landslide mitigation and susceptibility studies become one of the most required fields of studies in this region.

There are several quantitative and qualitative methods for preparing landslide susceptibility maps [3]. Quantitative approaches have been used by number of researchers such as bivariate regression analysis [11, 51], multivariate regression analysis [62, 75, 74], logistic regression analysis [73, 2, 32], fuggy logic [71], artificial neural network [57, 72]. Many researchers have worked with more than one model and compared to find out which one is most accurate [9]. Recently, machine learning (ML) techniques have become popular in spatial prediction of natural hazards studies such as wildfire [45], sinkhole [91], groundwater and flood [1, 12, 16, 17, 40, 49, 50, 59, 76, 77, 82, 93], droughtiness [80], gully erosion [7, 98], earthquake [4], land/ground subsidence [96] and landslide studies [68, 70, 79, 87, 67, 97]. ML is a subdivision of artificial intelligence (AI) that uses computer techniques to analyze and forecast information by learning from training data. ML algorithms that have been used for landslide prediction include support vector machine [20, 30, 47, 69, 95], artificial neural network [31, 86], decision trees such as naïve Bayes tree (NBT) [22, 85], radial basis function (RBF) [38], kernel logistic regression (KLR) [13, 21], Bayes’ net (BN) [19], bivariate statistical index (SI) [23], stochastic gradient descent (SGD) [94], particle swarm optimization (PSO) [18], best-first decision tree (BFDT) [24], random subspace-based support vector machines (RSSVM) [39] and logistic model tree (LMT) [20]. Ensemble models have been used in landslide susceptibility mapping due to their novelty and their ability to comprehensively asses landslide-related parameters for discrete classes of independent factor [15, 16, 25, 44, 49, 63, 67, 87, 93] . To achieve this, a frequency ratio (FR) and logistic regression (LR) models have been applied to obtain maps of landslide susceptibility (spatial prediction) using the ArcGIS software (version 10.2) for the area. The frequency ratio model is useful to analyze the slope instability and treated as one of the best quantitative approaches [43]. Literature reviews show that the logistic regression model is so accurate to support vector machine, classification tree and likelihood ratios [29, 83]. The logistic regression model is one of the most effective mathematical methods which are useful to find out the relationship between landslide causative factors and landslide locations [8, 28].

Landslide susceptibility mapping is important to ensure the safety of human life and mitigate the negative impact on regional as well as the national economy of a country [48]. This map helps government agencies, policymakers and planners to reduce the damages that a landslide incident can cause. The main objective of this study is to locate probable landslide susceptible areas within the basin using FR and LR models and compare the results using the ROC curve for the most suitable or acceptable method between these two (FR & LR) models.

2 Study area

Relli Khola river basin or Relli River is a small Himalayan river of the Indian States of West Bengal flowing through Kalimpong district. The river originates between the Alagara and Lava forest range at an elevation of 2400 meters known as t’Tiffin Dara and joins Teesta River as one of its tributaries. The length of the river is almost 27.38 km. A small village known as Relli is situated on its bank. During Makar Sankranti (January 14), a fair is held annually at the Relli. The basin extends from 26°58′2″ to 27°5′31″N and from 88°26′32″ to 88°39′14″E which is about 170.61 sq km. Relli Khola river basin is the left bank tributary of the Teesta River. Being a part of the Himalayan region, the area is characterized by intensive rainfall. The total basin is situated over two geological units. From top to bottom, the two geological units are Darjeeling Gneiss, and slate, schists, quartzite. The basin has natural beauty for its surrounding environment. Its culture is full of diversity, people from different parts live in the area. But the area is not safe from natural difficulties. So many natural disturbances such as earthquakes and landslides are common phenomena of the basin. Intensive rainfall (200–250 cm/annually) and a moderate-type temperature are the common characters of the basin. Being a forested area the basin is suffered few for landslide incidents. But where lands are open and deforestation takes place, landslides damage the area. In the lower course of the river, Komesi forest, Suruk Khasmahal and Mezok forest landslides have mainly been observed. But due to low habited place, damages are not found big. The portion that is close to Kalimpong town is a highly settled area and this portion is characterized as a gentle uniform slope, and thus, no such noticeable landslides are found in this portion (Fig. 1).

Fig. 1
figure 1

Location map of the study area

3 Data and methods

A digital elevation model (DEM) with the resolution of 30 m × 30 m has been extracted from the ASTER GDEM data of October 2011 and downloaded from USGS Earth Explorer on January 18, 2018. The DEM data have been applied to extract slope, aspect, relief parameters (relative relief, maximum relief, dissection index, TWI, stream power index, etc.) and drainage parameters such as drainage density, drainage frequency, stream junction frequency, stream junction angle and infiltration number of the study area using ArcGIS 10.2 software. To apply the models, 20 parameters have been selected. These are (a) drainage density, (b) drainage texture, (c) infiltration number, (d) stream frequency, (e) stream junction frequency, (f) stream power, (g) lithology, (h) soil, (i) relative relief, (j) slope, (k) maximum relief, (l) drainage intensity, (m) ruggedness number, (n) rainfall, (o) dissection index, (p) aspect, (q) relief class and (r) distance from stream, (s) TWI and (t) land use land cover (lulc). These factors have been further classified into five subclasses such as (1) drainage factors, (2) relief factors, (3) hydrological factors, (4) lithological factors and (5) triggering factors. The methodologies for each parameter as stated above have been provided in detail in the following sections (Table 1).

Table 1 Database of the current study

3.1 Drainage factors

3.1.1 Drainage density (Dd)

Drainage density is the length of stream per unit area of a river basin. Landslides are prominent in such areas where drainage density is high and the soil layer is too thin [65]. Figure 2a shows the drainage density of the Relli Khola river basin. The drainage density of the basin ranges from 0.13 to 5.84 km/sq km (Fig. 2a). The formula is given below [41].

$${\text{Dd}} = \frac{{L\mu }}{A}$$
(1)

where Dd is drainage density, is the length of the stream and A is the total area. The grid method has been used to calculate the specific drainage density of the basin.

Fig. 2
figure 2

Raster layer of drainage parameter a drainage density, b stream frequency, c drainage intensity, d drainage texture, e stream junction frequency, f infiltration number

3.1.2 Stream frequency (Fs)

Stream frequency is one of the important factors for landslide susceptibility measuring. Stream frequency (Fs) is the number of streams per unit area of the basin [41]. The stream frequency of the basin ranges from 0 to 20 stream/sq km (Fig. 2b). The value close to 0 means less diversity of slope and less landslide susceptible areas, and the higher value means high diversity of slope and high probability of landslides. Stream frequency is calculated in the following way [41]

$${\text{Fs}} = \frac{{N\mu }}{A}$$
(2)

where Fs is stream frequency, is the total number of stream and A is the total area of the basin or region adopted for the present study.

3.1.3 Drainage intensity (Id)

Drainage intensity (Id) denotes the ratio between stream frequency (Fs) and drainage density (Dd) [33]. The value ranges from 0.26 to 8.86 (Fig. 2c). High drainage intensity indicates a high probability of landslide, and low drainage intensity indicates less probability of landslide. Faniran [33] calculated drainage intensity in the following way.

$${\text{Id}} = \frac{{\text{Fs}}}{\text{Dd}}$$
(3)

where Id is drainage intensity, Fs is the stream frequency and Dd is the drainage density.

3.1.4 Drainage texture (T)

Drainage texture is also an important morphometric factor that indicates a relative spacing of streams per unit length. Drainage texture is the ratio between the number of streams and the length of the perimeter of the basin [42]. The value of drainage texture ranges from 0 to 6 stream per km (Fig. 2d). Horton [42] gave the formula for calculating drainage texture as stated below.

$$T = \frac{{N\mu }}{P}$$
(4)

where T is the drainage texture of the basin, is the number of streams and P is the perimeter of the basin.

3.1.5 Stream junction frequency

Stream junction frequency is the number of stream junction points within a unit area of a drainage basin. Being a part of the source region, i.e., mountainous region, the river has many stream junction points throughout the basin. Stream junction frequency indirectly indicates the slope’s instability, because the break of slope occurs where two or more streams join in a single point. The value ranges from 0 to 10 stream junctions/sq km (Fig. 2e). Stream junction frequency is calculated in the following formula.

$${\text{Fsj}} = \frac{{fj}}{A}$$
(5)

where Fsj is the frequency of stream junctions, fj is the number of stream junction points and A is the area of the basin

3.1.6 Infiltration number (If)

Faniran [33] also defines infiltration number (If) as the multiplication of both stream frequency (Fs) and drainage density (Dd). The infiltration number value of the basin ranges from 0 to 176.19 (Fig. 2f). The value close to 0 indicates the high infiltration and low surface runoff and higher value indicate the opposite, i.e., low infiltration and high surface runoff. The infiltration number is calculated in the following ways [33].

$${\text{If}} = {\text{Fs}} \times {\text{Dd}}$$
(6)

where If is the infiltration number, Fs is the stream frequency and Dd is drainage density.

3.2 Relief parameter

3.2.1 Slope

Slope is one of the most dominant factors of landslide occurrences [27, 6, 36]. Occurrences of landslides are directly affected by the slope angle of an area [53]. ASTER GDEM data have been used for preparing the slope map of the basin with 30 m resolution in ArcGIS 10.2 in degree form. The slope of the basin ranges from 0 to 67.48 (Fig. 3a)

Fig. 3
figure 3

Raster layers of relief factors a slope, b aspect, c dissection index, d distance from river, e maximum relief, f relative relief, g relief and h ruggedness number

3.2.2 Aspect

Aspect of slope describes the slope direction of an area. This is also an important factor of landslide and exposure to sunlight, drying winds, rainfall (degree of saturation), discontinuities are the aspect-associated parameters are also important factors of landslides [52]. Aspect map of the basin has been prepared from ASTER GDEM data with 30 m resolution in ArcGIS 10.2. Figure 3b shows the aspect map of the basin.

3.2.3 Dissection index

Dissection index is one of the most important factors to understand the relief as it is defined as the ratio between relative relief and absolute relief [64]. The value of DI of the area ranges from 0.06 to 0.58 (Fig. 3c). The lower portion of the basin is highly dissected, whereas the upper portion is less dissected which indicates the relative relief is low in case of the upper portion and high in case of lower portion of the basin. It is calculated in the following formula [64].

$${\text{DI}} = \frac{{Rr}}{Ar}$$
(7)

where DI is the dissection index of the basin, Rr is relative relief and Ar is the absolute relief of the basin. The value of DI ranges from 0 to 1. 0 means complete absence of dissection, and 1 means vertical cliff.

3.2.4 Distance from stream

Distances from stream have been measured by using ArcGIS 10.2 software. It is also an important factor for landslides. The area close to streams can get water from the streams that help the rock or soil to be fragile for eroding or sliding. Therefore, there is a possibility of landslides close to the streams. The map shows that the value of the distance from the stream of the basin ranges from 0 to 704.96 m (Fig. 3d)

3.2.5 Maximum relief

Maximum relief is simply defined as the highest altitude of an area. It is also known as absolute relief. The map is prepared by using the grid method in ArcGIS 10.2. The maximum relief of each grid has then been used to prepare the maximum relief map using IDW (inverse distance weighted) method. The value of maximum relief ranges from 286 to 2378 ms (Fig. 3e). The source region shows the maximum relief of the basin.

3.2.6 Relative relief

Relative relief is another form of representing the slope of a terrain. It is the difference between the highest altitude and lowest altitude. Therefore, a high relative relief zone has a chance of landslide. This map is also prepared using the grid method and IDW technique in ArcGIS. The formula is given in the equation. The maximum and minimum values of relative relief of the basin are 489 and 88 meters, respectively (Fig. 3f). Smith [88] gave the formula for calculating relative relief in this way.

$${\text{RR}} = H - h$$
(8)

where RR is relative relief, H is the highest altitude and the lowest altitude of the basin.

3.2.7 Relief class

Relief class has been done based on the DEM classification of the basin. It is done basically to understand what range of relief class has been dominating the high landslide areas. The minimum and maximum reliefs of the basin are 179 and 2378 meters, respectively (Fig. 3g).

3.2.8 Ruggedness number (Nr)

Ruggedness number is a unitless value as both relative relief and drainage density are expressed in the same units and help to combine the slope steepness and [90]. The value of the ruggedness number of the basin ranges from 0 to 1.88 (Fig. 3h). The ruggedness number is calculated by using the following formula.

$$Nr = \frac{{{\text{Dd}} \times Rr}}{K(1000)}$$
(9)

where Dd is drainage density and K is a conversion constant (5280 in case of mile grid and when relative relief is expressed in feet and drainage density in miles/sq. mile and is 1000 when relative relief is expressed in meter and drainage density in meter/sq meter

3.3 Hydrological parameters

3.3.1 Stream power index (SPI)

Stream power index is the power of stream to move sediment, and thus, it is the potential of flowing water to complete geomorphic works such as incise, widen or aggrades of channel. It is estimated that if discharge or slope is increased, stream power is also increased proportionally. Stream power is low in the case of flat areas and high in the case of rugged topography. The stream power index is calculated with the following equation (Fig. 4a) [61]

$${\text{SPI}} = {\text{As}} \times \tan \beta$$
(10)

where As is the specific catchment’s area (sq m/m) and β is the slope gradient

Fig. 4
figure 4

Raster layers of hydrologic parameters a stream power index (SPI), b topographic wetness index (TWI)

3.3.2 Topographic wetness index (TWI)

Beven and Kirkby [10] developed the topographic wetness index (TWI) which is commonly used to measure and quantify the topographic control on hydrological processes [89]. If the moisture in the soil is high, the strength of soil will decrease and this enhances landslides. Wilson and Gallant [101] defined TWI in the following way (Fig. 4b).

$${\text{TWI}} = \frac{{As}}{\tan \beta }$$
(11)

where As is the specific catchment’s area (sq m/m) and β is the slope gradient.

3.4 Lithological factors

3.4.1 Lithology

Himalayan mountain region belongs to a very special convergence zone of two plates, e.g., (a) Indian Plate and (b) Eurasian Plate. Therefore, small and medium earthquakes have been occurring throughout the years. Landslide is also related to the earthquake. An earthquake can increase the rate and dimension of landslide. The lithological unit map has been collected from the Geological Survey of India and was grouped into two classes according to their character and lithological ages (Fig. 5a).

Fig. 5
figure 5

Lithological parameters a lithology and b soil

3.4.2 Soil

Five soil classes are observed in the basin (Fig. 6b). Soil demarcates the land use pattern. Soils having shallow depth on steep slopes are affected most by landslides [84]. Soil map has been collected from the National Bureau of Soil Survey (NBSS) and Land Use Planning (LUP), Kolkata.

Fig. 6
figure 6

Raster layer of triggering factors a rainfall and b land use land cover

3.5 Triggering factors

3.5.1 Rainfall

The most important triggering factor of landslide is rainfall. In Darjeeling Himalaya, intensive rainfall (almost 300 to 350 cm annually) is recorded every year. The duration of rainfall is also important for landslide. Landslides mainly occur during the monsoon season (July–Aug). The mean annual rainfall map has been prepared using TRMM (Tropical Rainfall Measuring Mission) data of the last 19 years (1998–2017) for the study area and downloaded from https://pmm.nasa.gov/data-access/downloads/trmm website. The thematic layer of rainfall has been prepared using the interpolation method of IDW (Inverse Distance Weighted) in a GIS platform. Figure 6a shows the annual rainfall map of the basin.

3.5.2 Land use land cover (lulc)

The upper layer of the earth’s crust is used for different purposes by man. Each land use land cover has different intensities of landslide, e.g., forest can reduce landslide rate and open bare surface; build-up areas can increase the rate of the landslide. Five land use classes have been identified based on supervised image classification (Fig. 6b). The Landsat 8 OLI images with the resolution of 30 m × 30 m have been used to extract land use map using ArcGIS 10.2 software. The Landsat 8 OLI images have been downloaded from USGS Earth Explorer on January 18, 2018.

3.6 Landslide inventory map

Guzzetti et al. [37] stated that the landslide inventory map is an important part of analyzing landslide susceptibility, hazard as well as risk assessment. A landslide distribution map or landslide inventory map (Fig. 2) has been prepared to determine the landslide affected areas (%) and frequency of landslides of each class of possible landslide causing factors [60]. The landslide locations have been identified using Google Earth imagery and multiple field survey to cross-check the prepared landslide map. Ten-day (December 30, 2017, to January 8, 2018) extensive field survey and observing Google Earth imagery have been done for identifying landslide locations. In this current study, a total of 67 landslides have been identified. Out of the total, 47 (70%) landslides have been used as a training data set and 20 (30%) landslides have been used as validation data set. After that landslide inventory map has been prepared in the GIS environment to run the models and identify the probable landslide susceptible areas (Fig. 7). All the possible landslide causing factors have been incorporated with this landslide inventory map to understand the degree of importance of each possible landslide causing factors [60].

Fig. 7
figure 7

Landslide inventory map of the study area

3.7 Frequency ratio model

The frequency ratio model is a well-accepted and popular quantitative approach for preparing landslide susceptibility mapping [60]. Lee and Talib [55], Lee and Pradhan [54], Jadda etal. [46], Avinash and Ashamanjari [5], Mondal and Maiti [60] successfully applied frequency ratio model for preparing susceptible map. To obtain the frequency ratio of each class of all the data layers, the following equation has been applied.

$${\text{FR}} = \frac{{{{N_{{{\text{pix}}(S_{i} )}} } \mathord{\left/ {\vphantom {{N_{{{\text{pix}}(S_{i} )}} } {N_{{{\text{pix}}(N_{i} )}} }}} \right. \kern-0pt} {N_{{{\text{pix}}(N_{i} )}} }}}}{{{{\sum\nolimits_{i} {N_{{{\text{pix}}(S_{i} )}} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i} {N_{{{\text{pix}}(S_{i} )}} } } {\sum\nolimits_{i} {N_{{{\text{pix}}(N_{i} )}} } }}} \right. \kern-0pt} {\sum\nolimits_{i} {N_{{{\text{pix}}(N_{i} )}} } }}}}$$
(12)

where \(N_{{{\text{pix}}(S_{i} )}}\) is the number of landslide pixels containing in class i, \(N_{{{\text{pix}}(N_{i} )}}\) is the total number of landslide pixels having class i in the watershed, \(\sum\nolimits_{i} {N_{{{\text{pix}}(S_{i} )}} }\) is the total number of pixels containing landslide and \(\sum\nolimits_{i} {N_{{{\text{pix}}(N_{i} )}} }\) is the total number of pixels in the watershed. The total number of pixels containing landslide is 400 out of 189,500 pixels (almost 170.61 sq km) in the watershed. Most of the landslides in the study area are rainfall-induced shallow types. The derived frequency ratio value of more than 1 means strong and positive relationship between landslide occurrences and concerned class of the selected data layers and high landslide probability; on the other hand, frequency ratio value of less than 1 means poor and negative correlation between landslide occurrences and concerned class of the data layer and low landslide probability, whereas 1 means moderate relationship. After calculating the frequency ratio, all the raster map parameters of frequency ratio have been summed up to make landslide susceptibility index value (LSIV) using the following equation.

$$LSIV = Fr_{1 } + Fr_{2} + \cdots + \, Fr_{n}$$
(13)

where LSIV is landslide susceptibility index value and Fr1, Fr2, Frn is the frequency ratio of the raster data layers and n is the total number of factors for the study [20]. Higher value indicates high landslide susceptibility, and lower value indicates low susceptibility and vice versa.

3.7.1 Logistic regression model

The logistic regression model is also known as multivariate analysis is measured with dichotomous variables such as 1 or 0 (presence or absence), and it is determined by one or more independent variables [58]. The general purpose of the study is to determine the best-fitting model to describe the relationship between dependent variables (Landslide occurrence) and many independent variables, e.g., slope, lulc, lithology, rainfall, etc. (Kavzoglu et al. 2013). Dai and Lee [27] argued that the advantage of the model is that the dependent variable can have only two values, presence (Value 1) or not presence (Value 0). The logistic regression model is based on the generalized linear model and can be calculated by the following equation.

$$P \, = \, {1 \mathord{\left/ {\vphantom {1 {(1 + e^{ - z} }}} \right. \kern-0pt} {(1 + e^{ - z} }} ) { }$$
(14)

where p is the probability of landslide occurrence and z is the linear regression model.

$$z \, = \, b_{0} + \, b_{1} x_{1} + \, b_{2} x_{2} + \cdots + b_{n} x_{n}$$
(15)

b0 is the intercept of the model, n is the number of independent variables, b1, b2bn are the coefficients and x1, x2,…xn are the landslide causing factors.

P, probability of vulnerability, varies from 0 to 1. Whenever it is nearer to 1, it indicates high vulnerable, and whenever it is nearer to 0, it indicates very low vulnerable. The entire calculation of logistic regression has been done in SPSS software (Fig. 8).

Fig. 8
figure 8

Picture showing the landslide sites—from the Google Earth image a near Suruk Khasmahal (26°59ʹ31.03ʹʹN, 88°28ʹ14.43ʹʹE), b Mezok forest (26°59ʹ36.23ʹʹN, 88°27ʹ28.89ʹʹE), c near confluence with Teesta River (27°00ʹ22.89ʹʹN, 88°27ʹ15.37ʹʹE) and from the field d Dalapchan Slip Reserve Forest (27°04ʹ53.47ʹʹN, 88°32ʹ08.78ʹʹE), e near Komesi Forest (27°01ʹ37.43ʹʹN, 88°29ʹ05.67ʹʹE) and f Suruk Khasmahal (26°58ʹ51.62ʹʹN, 88°28ʹ25.30ʹʹE)

4 Results and analysis

FR and LR models have been developed to prepare two landslide vulnerability maps (Fig. 9) based on raster inputs, and each vulnerable map has been classified into five subtypes indicating varying intensity of vulnerability [66]

Fig. 9
figure 9

Landslide susceptibility maps (a, b) of Relli Khola river basin (a) frequency ratio model and (b) logistic regression model

4.1 Frequency ratio model

The susceptibility map based on the frequency ratio model has been classified into 5 groups such as very high, high, moderate, low and very low which represent 4.05%, 4.36%, 9.12%, 14.34 and 68.11% area of the total basin, respectively (see Table 2). The lower portion of the basin has mainly been characterized as high landslide-prone area (Fig. 9). The landslide areas have been mostly dominated along the river channel of the basin.

Table 2 Area under different landslide vulnerable classes based on frequency ratio model and logistic regression model

The frequency ratio for different classes of each parameter helps us to understand the importance or probability of subclass under landslide occurrences. For example, frequency ratio of river and barren land (subcategories of lulc) is 12.84 and 5.74 indicating landslide vulnerable of these zones (Table 3). That is the significance of this model. The frequency ratio of the fourth class (5.56–8.17) of topographic wetness index (TWI) is high (2.04) compared to other subclasses of this parameter (Table 3). Therefore, it can be said that this zone is highly landslide prone and this zone is more responsible for landslide occurrences. Thus, this model helps us to understand the importance of each subclass.

Table 3 Class frequency ratios of selected parameters

Slope is a major parameter of landslide. The higher slope indicates the risk of landslide. In Table 3 it is observed that the frequency ratio of fifth (38.37 to 67.48 degrees) subclass of the slope is high (2.97). In the case of maximum relief, the result is different, i.e., higher value represents a low-frequency ratio and the lower value represents a high-frequency ratio (Table 3). It seems that not only maximum altitude is responsible for landslide but also other parameters are playing the dominant role for the basin’s landslide. In the case of relative relief, it gives a good result with our view that higher relative relief (rugged topography) represents a high-frequency ratio, i.e., high probability of landslides (Table 3). The two classes of land use land cover (lulc), i.e., (a) river and water body and (b) barren land, have been dominated with high landslides as a frequency ratio of these two subclasses are 12.84 and 5.74 respectively (Table 3). Vegetation cover is the least subclass of land use land cover which interrupts landslide incidents. Slate, Schists, Quartzite rock is dominated by a high-frequency ratio (Table 3). The higher value of stream power index (SPI) indicates a high risk of landslide occurrences. It has been noticed that the frequency ratio of the fifth class of ruggedness number value is maximum (1.72) and lower classes have been gradually decreased. It seems that high ruggedness number values play a significant role in landslide occurrences.

4.2 Logistic regression model

Logistic regression is a useful method to determine the magnitude of the correlation between landslide locations and affective factors [34]. Table 2 shows that 5.75 sq km (3.37%) and 1.86 sq km (1.09%) areas of the total area of the Relli Khola basin are under very high and high landslide susceptibility zones, respectively. This map also shows almost the same result that the lower portion of the basin has been dominated by a highly landslide-prone area. Landslide is dependent on different factors (it can be physiographic or it can be anthropogenic), and logistic regression is useful to predict the future landslide trend based on the factors, and it also helps us to predict the most dominating factors of landslide occurrences [34]. Among the 20 parameters drainage density, slope, aspect, lulc, soil and rainfall have been identified as dominating factors of landslide occurrences (Table 4). The significance levels of these 3 parameters are 95 to 100. The steep slope is responsible for slope failure. The areas which have steep slope represent rugged topography and high gravity power and hence are at high risk of landslide occurrences. Lulc is also a major factor of landslide occurrences. It is the upper layer of the soil.

Table 4 Coefficient of logistic regression for different factors

A particular land use has a high capability to slide such as bare land or open land tends high landslide. On the other hand, the forest cover area has a low tendency of landslide. Rainfall can increase the rate of landslide as it can make the soil fragile and detach soil from other soil molecules.

Table 4 shows the logistic regression result calculated in SPSS software. This result shows that the rate of landslide occurrences is noticeably and positively determined by stream frequency (Wald = 0.028, Exp(B) = 7.067, df = 1), stream power (Wald = 0.001, Exp(B) = 1.001, df = 1), lithology (Wald = 0.604, Exp(B) = 1.830, df = 1), relative relief (Wald = 0.005, Exp(B) = 1.005, df = 1), Slope (Wald = 0.048, Exp(B) = 1.049, df = 1), ruggedness number (Wald = 2.152, Exp(B) = 8.604, df = 1 and rainfall (Wald = 0.004, Exp(B) = 1.004, df = 1) (Table 4). For the categorical factors of soil, aspect, some subclasses have been categorized as positively determined and some subclasses have been categorized as negatively determined. But for lulc it has been positively determined (Table 4). Other parameters and their logistic results are shown in Table 3. This result denotes that landslide occurrence is not controlled by a single parameter; rather, it is the result of multiple factors.

5 Validation of FR and LR models

After preparing landslide susceptibility, map validation is necessary. Otherwise, it has no use and has no scientific importance [26]. A receiver operating characteristic curve (ROC) has been used to validate these models. The ROC curve measures the goodness of fit from the area it falls under the curve [9]. There are five categories of AUC (area under the curve) value under the ROC curve such as excellent (0.90–1.00), good (0.80–0.90), fair (0.70–0.80), poor (0.60–0.70) and fail (0.50–0.60) to understand the accuracy level [81].

Figure 10 shows the ROC curves of landslide maps using FR and LR models. The ROC curves of both models have been prepared using SPSS software. The values of AUC (area under the curve) of the FR model and LR model are 0.814 and 0.751, respectively. The value of the FR model has indicated a good accuracy level with 81% area under the curve, and the value of the LR model has indicated a fair accuracy level having 75% area under the curve. Thus, it can be said that these two models have a considerable amount of accuracy and that can be used for further study.

Fig. 10
figure 10

Validation of landslide susceptible maps using frequency ratio (FR) and logistic regression (LR) models by ROC curve

6 Conclusion

Logistic regression and frequency ratio models have been used for the present study of landslide susceptibility of the Relli Khola river basin, a small tributary of the Teesta River in Darjeeling Himalaya. A total number of 20 possible parameters have been identified to prepare landslide susceptibility maps of the basin. Out of the total basin area, almost 7–14 sq km area has come under high landslide susceptible zones. Both the maps have shown that high landslide susceptibility zones have been located at the lower portion of the basin and along the river channel where the soil is saturated, open to sky and surface is unprotected, whereas the upper portion of the basin has very less susceptible as this portion has been covered with healthy forest cover that protects the soil from sliding. The susceptibility maps have been proved with satisfying accuracy of the ROC curve (Fig. 10). The resulting maps have provided the spatial distribution of landslide occurrences, but it cannot forecast the time, degree of landslide occurrences and how often it can occur. Therefore, the government and responsible authorities should take responsibility and steps to mitigate the problem. The government and higher authorities should keep notice so that any kind of development project or market, towns, roads, tourist places, etc., will not grow in the future in the susceptible areas within the basin. These maps will be also useful for the planners and decision-makers to build up policies to restrict and save the destruction from landslides.