Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

An ecologically constrained procedure for sensitivity analysis of Artificial Neural Networks and other empirical models

Abstract

Sensitivity analysis applied to Artificial Neural Networks (ANNs) as well as to other types of empirical ecological models allows assessing the importance of environmental predictive variables in affecting species distribution or other target variables. However, approaches that only consider values of the environmental variables that are likely to be observed in real-world conditions, given the underlying ecological relationships with other variables, have not yet been proposed. Here, a constrained sensitivity analysis procedure is presented, which evaluates the importance of the environmental variables considering only their plausible changes, thereby exploring only ecological meaningful scenarios. To demonstrate the procedure, we applied it to an ANN model predicting fish species richness, as identifying relationships between environmental variables and fish species occurrence in river ecosystems is a recurring topic in freshwater ecology. Results showed that several environmental variables played a less relevant role in driving the model output when that sensitivity analysis allowed them to vary only within an ecologically meaningful range of values, i.e. avoiding values that the model would never handle in its practical applications. By comparing percent changes in MSE between constrained and unconstrained sensitivity analysis, the relative importance of environmental variables was found to be different, with habitat descriptors and urbanization factors that played a more relevant role according to the constrained procedure. The ecologically constrained procedure can be applied to any sensitivity analysis method for ANNs, but obviously it can also be applied to other types of empirical ecological models.

1. Introduction

Fish assemblage diversity in freshwater ecosystems constitutes a valuable natural resource in economic, scientific, cultural and educational terms [1]. Its conservation and management face threats as overexploitation of inland waters, flow modification, water pollution, habitat degradation and invasion by exotic species [2], [3]. Identifying the relationships between fish species richness and habitat complexity at a local scale is one of the primary concerns in understanding how environmental descriptors actually affect fish biodiversity [4], [5], [6].

In this respect, the ecological variables that can be taken into account are often characterized by complex and non-linear dependencies [7]. Ecological models have been increasingly applied in the management and conservation of freshwater fish communities, especially to predict spatial patterns of fish occurrence [8], [9]. In particular, Artificial Neural Networks (ANNs) modeling has proved to be a valuable method in order to assess whether predictable relationship between environmental descriptors and fish species richness exist in small stream environments [10], [11], [12].

While in the past ANNs were defined as “black boxes” since the computational processes taking place inside them are not easy to untangle, at present several methodologies have been developed to assess the contribution of each variable to the prediction process. For deeper elucidations, Olden et al. [13] provided a comprehensive review and comparison of these methodologies.

In particular, sensitivity analysis is the term used to define a collection of methods that evaluate how sensitive model output is to changes in the values of predictive variables [14]. In ecology, the main sensitivity analysis methods applied to ANNs can be classified into four categories: (i) the Lek’s profiles method [15], [16]; (ii) the Perturbation method [17], [18]; (iii) the Partial Derivatives method [19], [20], [21], [22]; (iv) the Weights method, developed by Garson [23] and then implemented by Olden & Jackson [24]. Lek’s profiles study each input variable by keeping all other parameters at fixed values, while in Perturbation method each input variable is perturbed according to empirically established ranges while all others are kept untouched. The Partial Derivatives method involves small changes in each input variable and the evaluation of their relative contribution by computing the partial derivatives of the ANN output with respect to changes in the input. In the Weights method the connection weights of the ANN model are partitioned to evaluate the relative importance of each input variable and its positive or negative contribution to the model output. In the application of the first three methods, the values assigned to input variables can be devoid of real ecological meaning, i.e. they can be out of the range that is likely to be observed in real-world conditions. In these cases, environmental variables are forced to values that are only aimed at evaluating the model output, with no attention to the actual probability of recording those values given the (fixed) values of all the other variables. In fact, while of course the above-mentioned methods may provide valuable information about the way the “black-box” model works, the role of ecological relationships in constraining the multidimensional space where meaningful data patterns exist is not fully taken into account. With regard to the Weights method instead, the estimation of the input variables importance based on the connection weights may result unbalanced in certain cases where constrained training procedure may be applied to the ANN model for optimization purposes [25] (NB: in this sentence the term constrained is referred to the training procedure developed by Scardi [25] and it has nothing to do with the constrained perturbation of input variables here illustrated).

Therefore, although all those methodologies proved to be means of determining the overall numerical influence of each predictor variable to the model output, approaches that only consider changes consistent with the ecological relationships among environmental variables have never been proposed. It is well known in ecology that most environmental variables are far from independent of each other [26], [27]and therefore not all the combinations of their values are likely to occur (e.g. river slope tends to increase with elevation, as does the water oxygen concentration, and cannot be very steep in a floodplain). As these relationships constrain each variable in the complex multidimensional space that represents the abiotic conditions found in an ecosystem, some combinations of values are more easily found, while others just cannot occur. In fact, for instance, it would be highly unlikely for the maximum width of a stream channel to occur in a headwaters reach.

These issues raise the question: what is the point of perturbing or fixing variables at values which are ecologically meaningless? Evaluating the model output response in areas of the multidimensional space where environmental descriptors take far-fetched values may not be useful from an ecological perspective. Indeed it would make more sense to evaluate how sensitive model output is to changes in predictive variables values taking into account only plausible perturbations, i.e. changes which are consistent with the ecological relationships between environmental variables.

This study demonstrates an example of a new type of sensitivity analysis, using a case study about an ANN model aimed at predicting fish species richness in central Italian rivers. The goal of this work is to evaluate the real contribution of each predictive variable to species richness estimates by taking into full account the underlying ecological relationships and constraints. This way, all the perturbations applied to predictive variables reflect plausible environmental conditions, thus evaluating shifts in fish species richness only among ecological meaningful scenarios.

2. Material and methods

2.1. Study area and data collection

Data have been obtained from 368 sites that have been sampled from 2009 to 2014 in central Italy [28], [29] (Fig 1). Most rivers in this area are characterized by a Mediterranean climate, hydrological regimes affected by rainfall variability and strong seasonal discharge variation, with high flows in spring and fall, and droughts in summer [30].

thumbnail
Fig 1. Sampling sites.

Elevation map of the river basins of latium and umbria administrative regions in central Italy. Black dots mark the position of sampling sites. The image was obtained by using QGIS 2.18 (http://www.qgis.org).

https://doi.org/10.1371/journal.pone.0211445.g001

Fish sampling and environmental data acquisition were carried out according to the official Italian sampling protocol [31]. It generally consists of electrofishing sampling using a standard electro-fish shoulder-bag (4KW, 0.3–6 Ampere, 150–600 Volt). All available habitats were sampled along a stream channel 40–70 m long (the transect length was about 20 times the width of the wetted channel). Field activities were carried out beyond parks or protected areas. No endangered or protected species were involved and no specimen were harmed during the study nor collected. The occurrence of 55 fish species and values for 27 environmental variables (Table 1) were recorded at each site during sampling activities. Most of these variables had been already considered in previous studies [9], [32], [33].

thumbnail
Table 1. Environmental variables used as input to the ANN model.

All environmental data have been obtained according to the official Italian sampling protocol [31].

https://doi.org/10.1371/journal.pone.0211445.t001

Channel width was always less than 20 m, since sample sites were primarily located within foothills and mountain zones. Thus, sampling methods (electrofishing) was standardized across sites, where wider river widths would have required nets or other gears.

2.2. Data set processing

All quantitative or semi-quantitative environmental data were normalized in the [0, 1], interval while qualitative data (e.g. wetlands or islands presence) were coded as binary values (0–1). Data normalization is a common procedure in ANNs model development [16], [17], since it transposes the predictive input variables into the data range on which sigmoid activation functions are based, thereby helping to approach to global minima at the error surface. As very steep slopes were only observed at two sites (13.4% and 23.4% respectively), slope data were normalized, omitting these two values, relative to third steepest slope value (9%). The maximum normalized value, i.e. 1, was assigned these outliers after normalization. This solution was adopted to prevent the compression of the normalized slope values into a very narrow range because of a couple of cases that cannot be regarded as part of a continuum. Species richness values were also normalized in the [0, 1] interval.

The whole data set was divided into three subsets (i.e. training, validation and test). The training set included 50% of records, while validation and test set included both 25% of records. Records were assigned to each subset by sorting all data according to ascending values of fish species richness and by dividing the resulting ordered sequence into groups of four records. Then the first and third record in each group of four records were assigned to the training set, while the second assigned to the validation test and the fourth to the test set. This procedure allowed to avoid unbalanced levels of species richness in the three data subsets.

2.3. Artificial neural network modeling

In this study, a three-layered feedforward network with bias has been trained in order to predict species richness. The optimal number of neurons in the hidden layer was determined by comparing the performance of different networks with 1 to 30 hidden neurons. A sigmoid transfer function was used both for hidden and output layers, thus enabling the network to learn non-linear relationships between input and output vectors [34]. Mean Square Error (MSE) was computed for the validation set to quantify the goodness of fit of the ANNs during training. The training procedure was terminated as soon as the MSE stopped decreasing monotonically, thus preventing the overtraining of the model during the learning process. This approach favors better generalization of ANN models while predicting new cases, as previously described in several ecological papers [25], [26]. Several values of learning rate and momentum (range 0.1–0.5) were tested to optimize learning performances. ANNs training and testing were performed in R environment [35] by using the functions of the package h2o [36].

2.4. Constrained sensitivity analysis

In order to use a sensitivity analysis aimed at perturbing environmental predictive variables in an ecologically sound perspective, the dependencies between all environmental variables were first investigated.

In particular, for each jth environmental variable, the following steps were performed:

  1. ■. A Euclidean distance matrix was computed between the test set observations taking into account all the environmental variables but excluding the jth variable.
  2. ■. For the ith observation, neighboring observations were selected by taking those within the first quartile of the (dmax−dmin) distribution, where dmax and dmin were respectively the maximum and the minimum distance between the ith observation and all other observations.
  3. ■. The minimum (jmin) and maximum (jmax) values of the jth environmental variable were selected within the neighboring, i.e. most similar, observations. This defined the range of values that the jth variable can take for the ith observation.
  4. ■. The jth variable was perturbed in the [jmin, jmax] range while all other variables were kept untouched.
  5. ■. Five perturbed values in the [jmin, jmax] range for each predictive variable were then passed to the data pattern fed to the ANN model, whose output was compared to the target (i.e. observed) fish species richness.
  6. ■. The same process was iterated for each observation (i.e. for each sampling site in the test set).

The results of this constrained sensitivity analysis were then compared to those obtained from simple input perturbation, i.e. by adding white noise in the [-0.5, 0.5] range to each input variable while keeping all the others untouched.

The method was entirely implemented in R programming language [35]. An example code is provided in the S1 File.

3. Results and discussion

3.1. Artificial neural network model

The best ANN architecture for predicting fish species richness on the basis of our environmental predictive variables had 8 hidden neurons and therefore a 27-8-1 structure. It explained a fairly large share of variance, ranging from R2 = 0.771 for the training/validation set to R2 = 0.675 for the test set (Fig 2).

thumbnail
Fig 2. Predicted vs. observed species richness.

Values on axes refer to normalized species richness. The determination coefficient for the ANN model was R2 = 0.771 for the training/validation set and R2 = 0.675 for the test set.

https://doi.org/10.1371/journal.pone.0211445.g002

The MSE (obtained from normalized data) varied correspondingly: MSE = 0.00756 for the training/validation set and MSE = 0.01001 for the test set. It seems that very low observed values of species richness are hardly reproduced by the model, possibly because the absence of species that could have been found on the basis of their ecological niche might depend on other factors (e.g. pressures not described by the available environmental variables) in species-poor situations. On the contrary, the highest values in the training set are slightly underestimated, while they match the observed values in the test set. However, the overall agreement between observed and predicted values is quite good with both data sets and is comparable to the level obtained in similar cases [12], [37], [38].

The average residuals relative to the normalized training data set as well as those relative to the normalized test set were very small (0.0017 and 0.0016, respectively), thus showing that the model was not systematically biased. In fact, when compared to the test set data, model predictions about species richness differed in only one species in 46% of the cases.

Although all the levels of species richness were included in both training and test data set, the model was less accurate when the highest species richness values were involved. This effect was most likely related to the difficulty of the ANN in identifying less frequent patterns (those with high species richness in this case), as already evidenced by Ozesmi et al. [39], thereby more easily leading to incorrect estimations. In fact, species richness values higher than 11 (normalized value = 0.631) were not frequently found, amounting to less than 5% of the whole data set.

3.2. Sensitivity analysis

3.2.1. Constrained perturbations.

All the methods for analyzing the sensitivity of ANNs relative to predictive variables are based on the assessment of changes in output values obtained as a consequence of known changes in input values. The procedure we present here has been implemented by constraining the random permutation method [17], [18], but its rationale (i.e. the same constraints) can be applied to any other method [21], [24].

In order to outline the differences between the way input data are perturbed by any unconstrained procedure and the way they are by our constrained approach, Fig 3 shows observed (dark circle) and perturbed (light circle) values for three environmental variables (Slope, Riffles and Conductivity) in scatter plots against elevation. Elevation is obviously not independent of some environmental variables and constrains their values according to the procedure outlined in section 2.4. In particular, in this example, constrained ranges are clearly visible on slope (positively correlated to elevation) and conductivity (negatively correlated to elevation), while perturbations of riffles values are very close to the maximum potential range in the ether upper quartiles of the elevation range, as a consequence of a much looser dependence of this variable from elevation.

thumbnail
Fig 3. Constrained perturbations for slope, riffles and conductivity vs. elevation values.

Perturbed values were obtained by applying the procedure outlined in section 2.4. The effect of the constraint is more evident for Slope and Conductivity, given their stronger dependence from elevation, than for Riffles, where it only limits the variability at low Elevation. Both observed (dark dots) and perturbed (lighter dots) values are shown.

https://doi.org/10.1371/journal.pone.0211445.g003

The effect of random perturbations of slope and conductivity (i.e. complete independence between variables) would have been to fill up all graphs, while points representing perturbations of the two environmental variables occupy only a portion of the two-dimensional space, thus showing that some combinations of values are very unlikely to be observed. Perturbations showed in Fig 3 consist only of values that are more likely to be found in real-world conditions, although their range is large enough to allow assessing their impact on model behavior. Fig 3 is obviously depicting a very simplified set of relationships (only 3 out of 27 predictive variables). In practice, however, the same concept was applied to an n-dimensional space, where n is the number of environmental predictive variables used for the model development, thus defining an n-dimensional envelope that constrains the random perturbation of each environmental variable, excluding very unlikely patterns (e.g. very steep slope at very low elevation) from the sensitivity analysis.

3.2.2. MSE percentages differences.

The percent increase in MSE obtained by constrained perturbation of each variable for the test set is shown in Fig 4 versus the percent increase obtained by unconstrained perturbation. Unconstrained perturbations obviously induce larger increases in MSE, as they modify known data patterns to a larger extent. Although ANNs may respond to changes in a single input variable in a non-monotonic way, thus potentially making a large change in an input value less influential than a smaller one, in practice larger changes in input variables are clearly associated with larger increases in MSE. However, very large increases in MSE obtained from data patterns that are unlikely to occur in practical applications of the model are not useful–and possibly misleading–when it comes to the very purpose of sensitivity analysis, i.e. at inferring the role each input variable plays relative to the target variable.

thumbnail
Fig 4. Percent increase in MSE obtained by constrained vs. unconstrained perturbations.

Constrained sensitivity analysis clearly reduces maximum perturbations for the environmental variables, thus resulting in smaller increases in MSE for all of them (all points are below the unit slope line). However, the effect of the constraint is larger for some variables (e.g. Slope, SLP; pH, PHP; Source distance, SOD; Sampled area, SAA). See Table 1 for the names of environmental variables corresponding to other point labels.

https://doi.org/10.1371/journal.pone.0211445.g004

While all the input variables are more sensitive to unconstrained perturbations, some show negligible differences between the two perturbation strategies, while others exhibit sharp differences. According to changes in MSE, the input variables that showed largest differences between the two perturbation methods were Slope (101.1% and 17.1%, for unconstrained and constrained perturbations, respectively), pH (57.8%; 9.3%), Source distance (52.7%; 14.5%) and Sampled area (38.5%; 10.1%).

Variables whose perturbations affected the model to a very limited extent (less than 10% increase in MSE), i.e. those in the lower left corner of Fig 4, do not deserve any further comment, because they certainly play a less important role. Other variables, however, are associated with changes in MSE between 10% and 30% and their constrained perturbation in some cases (e.g. Conductivity, Pools and Anthropic disturbance) induces changes in MSE almost as large as unconstrained and even more than the constrained perturbation of the “most influential” unconstrained (Slope, pH, Source distance and Sampled area).

In ecology, it is well known that fish species composition in lotic ecosystems tends to follow a typical longitudinal pattern [4] (i.e. differences in fish guilds occurrences and abundances) and generally fish species richness generally tends to increase with the distance from the river source [40]. Of course, there are field conditions that can be regarded as exceptions to this general trend. In fact, habitat features [41], [42], hydrological factors [43] or urbanization [44] may highly affect fish species diversity. It is clear that environmental variables like slope, pH or distance from source may provide information about the riverine trait where a site to be modeled is located (e.g. mountain or hilly region) [45], thereby providing valuable input information to the ANN model about expected species richness and inducing large changes in MSE when their values are perturbed. However, unconstrained perturbations, especially with those variables, may result in combinations of values, e.g. a steep slope too far from the source, that are unlikely or even impossible to occur in real-world situations, but that could trigger large changes in MSE.

Sensitivity analysis based on unconstrained perturbations can be deeply affected by this problem and the reason is that any model (and ANNs are no exception) is fitted to known data patterns, which obviously include only the combination of input values that actually occur in real-world situations. Extreme values may occur, but only in combination with a narrow range of values for other variables. Moreover, environmental variables are often strongly correlated with each other and their correlations make the range of ecologically meaningful variation in their values even narrower. For instance, pH usually decreases as the distance from river source increases, while conductivity increases [46]. These relationships make perturbations for Slope, pH, Sampled area, Source distance and Elevation strictly related to the ecological context, thereby defining a narrower, but more realistic range of values that can be safely used in practical applications of the model. Therefore, the MSE increase associated to large perturbations of these variables has very little importance relative to real world applications of the model.

3.2.3. Importance of the environmental variables.

Changes in MSE after perturbation of each environmental variable were sorted in decreasing order after the application of a conventional scheme for sensitivity analysis and after the application of the constrained procedure. The outcome relative to the unconstrained procedure can be regarded as a different and simplified view relative to Fig 4. In fact, the bar diagram in Fig 5, just shows the increase in MSE caused by the perturbation of each variable. On the left after unconstrained perturbation and on the right after constrained perturbation. MSE% scales show the percent increase in MSE and are different in the two cases, as constrained perturbation cannot induce a level of increase in MSE as large as that induced by unconstrained perturbations.

thumbnail
Fig 5. Unconstrained and constrained sensitivity analysis compared.

Bars show the percent increase in MSE caused by the perturbation of each variable. Black bars (left) are for unconstrained perturbations while blue bars (right) are for constrained ones. Environmental variables are ordered according to the rank of their importance in the unconstrained sensitivity analysis. As the increase in percent MSE was smaller in constrained sensitivity analysis, the MSE axis was scaled accordingly to better show the relative length of the bars.

https://doi.org/10.1371/journal.pone.0211445.g005

In fact, all variables were obviously associated with smaller changes in MSE when the constrained procedure for sensitivity analysis was applied and the largest differences in the rank of variable importance occurred for Slope, Conductivity, pH, Sampled area, Pools and Anthropic disturbance, while less important environmental variables showed only minor shifts in their relative importance. pH was one of the most important variables according to the conventional procedure of sensitivity analysis based on unconstrained variable perturbation, but it only ranked eighth in sensitivity analysis based on constrained perturbations. Similar downgrades in importance were also observed for Slope and Sampled area. They are not surprising, as they occurred because of the narrower range of perturbed values these variables can assume under the constrained procedure for sensitivity analysis. In fact, this procedure takes only into account an amount of variability that is consistent with the observed relationships between variables and with the environmental context of each data pattern. As a consequence, environmental variables that had an intermediate relative importance according to the unconstrained procedure (e.g. Conductivity, Pools and Anthropic disturbance), gained a more relevant role as potential drivers of the local fish species richness.

While this result cannot be formally validated, as the true relative importance of the environmental variables is obviously unknown, it demonstrated an important feature of the constrained sensitivity analysis. The unconstrained procedure suggested a ranking of variables importance that showed what made the ANN model learn to recognize the riverine trait where sampling sites are located, thus obtaining estimates for fish species richness. However, species richness was also affected by variables that convey information about some relevant local conditions, like habitat features, hydrologic factors or urbanization. As a matter of fact, several studies evidenced that, at local scale, urbanization and/or flow regulation may strongly modify the expected fish species richness [40], [47]. Results obtained from the constrained sensitivity analysis showed indeed how, at any given site, fish species diversity is highly affected by environmental factors as habitat descriptors (e.g. Pools; Bars & islands) and anthropic disturbance (Conductivity; Anthropic disturbance). As conductivity can be considered as an indirect measure of water pollution [48], [49] and anthropic disturbance in most cases is related to urbanization, it is reasonable that they had a strong impact on fish assemblage diversity and composition.

In this work we focused on the estimation of variables importance taking into account first-order effects, as one input variable at a time was perturbed, while all other variables were kept untouched. Estimating the model output response to two-way [22] or more complex interactions between variables is certainly feasible in a constrained sensitivity analysis, but the problems related to the complexity of the procedure remain unsolved, making the analysis of higher order interactions between predictive variables practical only when their number is very small.

A very common goal in ANN modeling is the reduction of the number of input variables. The reason for that reduction is twofold: it might reduce the cost of predictive information and it might help to fight the curse of dimensionality [50]. The first problem depends on the way predictive information is collected: if all predictive data are already available, or if they are collected with no additional costs, e.g. during the same field activities, then the overall cost of predictive information will not be affected. The second problem is strictly related to the ratio between the number of available records and the number of input variables. According to Theodoridis & Koutroumbas [51], acceptable values for that ratio are in the 2 to 10 range, with smaller values that might result in a reduced prediction ability of the model.

As our data set was already available and all the predictive variables are routinely included in monitoring activities, no reduction in the cost of information could be achieved. Moreover, the number of available records (N = 368) is quite large relative to the number of predictive variables (p = 27) and therefore the ratio between the two (N/p = 13.63) is even larger than the upper limit of the above-mentioned range. Therefore, reducing the number of input variables was not needed, while preserving the full set allowed testing the constrained sensitivity analysis on a wider spectrum of variables. Moreover, preserving the full set of input variables allowed to exploit all the potential high-order relationships between variables that a trained ANN is able to capture and embed in its synaptic weights.

However, selecting the most important variables on the basis of a sensitivity analysis can be needed in data-limited scenarios and therefore we checked the effects of a reduced set of input variables, selected through a constrained sensitivity analysis, on the performance of the resulting ANN model. A subset of input variables was selected, including only those whose constrained perturbation induced increases in MSE larger than 10% (Fig 5), i.e. conductivity, pools, slope, anthropic disturbance, source distance and sampled area. Then a new ANN model with a 6-4-1 structure was trained and the determination coefficient for the test set was R2 = 0.44. Even if model accuracy in predicting fish species richness values considerably decreased, the variance explained by the model using the selected variables was still acceptable, especially in the light of the exclusion of 21 variables out of 27.

As far as we know, problems related to the scaling of ANN input variables (e.g. because of heterogeneity in their units) have been already tackled [52], [53], but methods aimed at defining to what an extent normalized input variable can be perturbed or changed in a sensitivity analysis, while preserving reasonable quantitative relationships with each other have never been implemented. From an ecological point of view, the method we propose showed what environmental variables, in real-world conditions (i.e. with values that vary within a realistic range) may actually induce changes in fish species richness. Looking at the results from a conservation perspective, assigning the highest degree of importance to variables that are very unlikely to change at local scale (e.g. slope) would be meaningless, while considering as more influential variables that may have a real impact on the fish assemblage richness, such as the level of water pollution or alterations of river traits due to urbanization [54], [55] is certainly more appropriate.

4. Conclusions

While several methods are available to test the sensitivity of ANNs or of any other type of model, we based our analysis on the perturbation method, because it is the one that most closely matches the rationale of the procedure we propose. However, the same rationale may be adapted to any other method (e.g. Partial Derivatives or Lek’s profiles method), as its only goal is to avoid data patterns that are not likely to occur in real-world conditions and that therefore are not really useful to open the ANN “black-box” as well as any other type of empirical model and to elucidate the way it worked and the ecological relationships it captured.

Of course, it was not possible to validate the approach we proposed by means of statistical analyses or by any other method. However, it showed that variables that influence fish species richness according to a procedure that takes into account only combinations of values that are likely to occur in real-world situations are not the same that would have been selected according to a procedure that does not take the ecological relationships between environmental variables into due account. Thus, our constrained approach to sensitivity analysis can be regarded as more realistic way to look into the model behavior, focusing on a meaningful subset of the multidimensional space in which the model can be theoretically applied. In fact, investigating how a model behaves in a region of its potential input space that will never be used in practical applications seems definitely pointless.

Needless to say, the procedure we proposed is only aimed at demonstrating a concept and therefore further developments can be imagined in its future applications, particularly as regards the selection of the number of neighboring observations or the maximum distance to them, thus investigating the effect of different levels of constrained perturbations and their effects in the resulting ranking of environmental variables importance.

Supporting information

S1 File. Constrained sensitivity analysis algorithm.

Here, the R code algorithm of the constrained sensitivity analysis is provided.

https://doi.org/10.1371/journal.pone.0211445.s001

(R)

S2 File. Data set.

Data set used for the Artificial Neural Network modeling. All values were normalized as described in the Material and Methods section.

https://doi.org/10.1371/journal.pone.0211445.s002

(CSV)

References

  1. 1. Dudgeon D, Arthington AH, Gessner MO, Kawabata Z-I, Knowler DJ, Lévêque C, et al. Freshwater biodiversity: importance, threats, status and conservation challenges. Biological Reviews. 2006 May;81(2):163–82. pmid:16336747
  2. 2. Postel S, Richter B. Rivers for Life: Managing Water for People and Nature. Island Press; 2012.
  3. 3. Revenga C, Campbell I, Abell R, Villiers , Bryer M. Prospects for monitoring freshwater ecosystems towards the 2010 targets. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2005 Feb 28;360(1454):397–413. pmid:15814353
  4. 4. Jackson DA, Peres-Neto PR, Olden JD. What controls who is where in freshwater fish communities: the roles of biotic, abiotic, and spatial factors. Can J Fish Aquat Sci. 2001 Jan 1;58(1):157–70.
  5. 5. Albaret J-J, Simier M, Darboe FS, Ecoutin J-M, Raffray J. Fish diversity and distribution in the Gambia Estuary, West Africa, in relation to environmental variables. Aquat Living Resour. 2004 Jan 1;17(1):35–46.
  6. 6. Radinger J, Hölker F, Horký P, Slavík O, Dendoncker N, Wolter C. Synergistic and antagonistic interactions of future land use and climate change on river fish assemblages. Glob Change Biol. 2016 Apr 1;22(4):1505–22.
  7. 7. Levin SA. Ecosystems and the Biosphere as Complex Adaptive Systems. Ecosystems. 1998 Sep 1;1(5):431–6.
  8. 8. Joy MK, Death RG. Predictive modelling and spatial mapping of freshwater fish and decapod assemblages using GIS and neural networks. Freshwater Biology. 2004 Aug 1;49(8):1036–52.
  9. 9. Franceschini S, Gandola E, Martinoli M, Tancioni L, Scardi M. Cascaded neural networks improving fish species prediction accuracy: the role of the biotic information. Scientific Reports. 2018 Mar 15;8(1):4581.
  10. 10. Oberdorff T, Guégan J-F, Hugueny B. Global Scale Patterns of Fish Species Richness in Rivers. Ecography. 1995;18(4):345–52.
  11. 11. Guégan J-F, Lek S, Oberdorff T. Energy availability and habitat heterogeneity predict global riverine fish diversity. Nature. 1998 Jan 22;391(6665):382–4.
  12. 12. Olaya-Marín EJ, Martínez-Capel F, Vezza P. A comparison of artificial neural networks and random forests to predict native fish species richness in Mediterranean rivers. Knowl Managt Aquatic Ecosyst. 2013;(409):07.
  13. 13. Olden JD, Joy MK, Death RG. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological Modelling. 2004 Nov 1;178(3):389–97.
  14. 14. Schwartzman GL, Kaluzny SP 1987. Ecological simulation primer. New York: Macmillan. 1987
  15. 15. Lek S, Belaud A, Baran P, Dimopoulos I, Delacoste M. Role of some environmental variables in trout abundance models using neural networks. Aquatic Living Resources. 1996 Jan;9(1):23–9.
  16. 16. Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S. Application of neural networks to modelling nonlinear relationships in ecology. Ecological Modelling. 1996 Sep 1;90(1):39–52.
  17. 17. Scardi M, Harding LW. Developing an empirical model of phytoplankton primary production: a neural network case study. Ecological Modelling. 1999 Aug 17;120(2):213–23.
  18. 18. Mattei F, Franceschini S, Scardi M. A depth-resolved artificial neural network model of marine phytoplankton primary production. Ecological Modelling. 2018 Aug 24;382:51–62.
  19. 19. Dimopoulos Y, Bourret P, Lek S. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Process Lett. 1995 Dec 1;2(6):1–4.
  20. 20. Dimopoulos I, Chronopoulos J, Chronopoulou-Sereli A, Lek S. Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece). Ecological Modelling. 1999 Aug 17;120(2):157–65.
  21. 21. Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling. 2003 Feb 15;160(3):249–64.
  22. 22. Gevrey M, Dimopoulos I, Lek S. Two-way interaction of input variables in the sensitivity analysis of neural network models. Ecological Modelling. 2006 May 15;195(1):43–50.
  23. 23. Garson GD. Interpreting Neural-network Connection Weights. AI Expert. 1991 Apr;6(4):46–51.
  24. 24. Olden JD, Jackson DA. Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling. 2002 Aug 15;154(1):135–50.
  25. 25. Scardi M. Advances in neural network modeling of phytoplankton primary production. Ecological Modelling. 2001 Dec 1;146(1):33–45.
  26. 26. Lek S, Scardi M, Verdonschot PFM, Descy J-P, Park Y-S. Modelling Community Structure in Freshwater Ecosystems. Springer Science & Business Media; 2005. 526 p.
  27. 27. Lecours V, Brown CJ, Devillers R, Lucieer VL, Edinger EN. Comparing Selections of Environmental Variables for Ecological Studies: A Focus on Terrain Attributes. PLOS ONE. 2016 Dec 21;11(12):e0167128. pmid:28002453
  28. 28. Larsen S, Mancini L, Pace G, Scalici M, Tancioni L. Weak Concordance between Fish and Macroinvertebrates in Mediterranean Streams. PLOS ONE. 2012 Dec 10;7(12):e51115. pmid:23251432
  29. 29. Carosi A, Ghetti L, Forconi A, Lorenzoni M. Fish community of the river Tiber basin (Umbria-Italy): temporal changes and possible threats to native biodiversity. Knowl Manag Aquat Ecosyst. 2015;(416):22.
  30. 30. Sarrocco S, Maio G, Celauro D, Tancioni L. "Carta della biodiversita' ittica delle acque correnti del Lazio. Analisi della fauna ittica".—Roma: Agenzia Regionale Parchi. 2012
  31. 31. Scardi M, Tancioni M, Martone M. Protocollo di campionamento e analisi della fauna ittica dei sistemi lotici. APAT, Rome. 2007
  32. 32. Olden JD, Jackson DA. Fish–Habitat Relationships in Lakes: Gaining Predictive and Explanatory Insight by Using Artificial Neural Networks. Transactions of the American Fisheries Society. 2001 Sep 1;130(5):878–97.
  33. 33. Joy MK, Death RG. Predictive modelling of freshwater fish as a biomonitoring tool in New Zealand. Freshwater Biology. 2002 Nov 1;47(11):2261–75.
  34. 34. Lek S, Guegan J-F. Artificial Neuronal Networks: Application to Ecology and Evolution. Springer Berlin Heidelberg; 2000. 296 p.
  35. 35. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2008. ISBN 3-900051-07-0, URL http://www.R-project.org.
  36. 36. The H2O.ai Team. h2o: R Interface for H2O. Version 3.14.0.3. 2017. URL https://cran.r-project.org/web/packages/h2o/index.html
  37. 37. Joy MK, Death RG. Modelling of freshwater fish and macro-crustacean assemblages for biological assessment in New Zealand. In: Modelling Community Structure in Freshwater Ecosystems. Springer, Berlin, Heidelberg; 2005. p. 76–89.
  38. 38. Gevrey M, Park YS, Oberdoff T, Lek S. Predicting fish assemblages in France and evaluating the influence of their environmental variables. In: Modelling Community Structure in Freshwater Ecosystems. Springer, Berlin, Heidelberg; 2005. p. 54–63.
  39. 39. Özesmi SL, Tan CO, Özesmi U. Methodological issues in building, training, and testing artificial neural networks in ecological applications. Ecological Modelling. 2006 May 15;195(1):83–93.
  40. 40. Ibarra AA, Park Y-S, Brosse S, Reyjol Y, Lim P, Lek S. Nested patterns of spatial diversity revealed for fish assemblages in a west European river. Ecology of Freshwater Fish. 2005 Sep 1;14(3):233–42.
  41. 41. Hoeinghaus DJ, Winemiller KO, Birnbaum JS. Local and regional determinants of stream fish assemblage structure: inferences based on taxonomic vs. functional groups. Journal of Biogeography. 2007 Feb 1;34(2):324–38.
  42. 42. Teresa FB, Casatti L. Influence of forest cover and mesohabitat types on functional and taxonomic diversity of fish communities in Neotropical lowland streams. Ecology of Freshwater Fish. 2012 Jul 1;21(3):433–42.
  43. 43. Lamouroux N, Poff NL, Angermeier PL. Intercontinental Convergence of Stream Fish Community Traits Along Geomorphic and Hydraulic Gradients. Ecology. 2002 Jul 1;83(7):1792–807.
  44. 44. Cunico AM, Allan JD, Agostinho AA. Functional convergence of fish assemblages in urban streams of Brazil and the United States. Ecological Indicators. 2011 Sep 1;11(5):1354–9.
  45. 45. Giller P, Malmqvist B. The Biology of Streams and Rivers. Oxford, New York: Oxford University Press; 1998. 304 p. (Biology of Habitats Series).
  46. 46. Cushing CE, Allan JD. Streams: Their Ecology and Life. Gulf Professional Publishing; 2001. 392 p.
  47. 47. Ferreira FC, Petrere M. Anthropic effects on the fish community of Ribeirão Claro, Rio Claro, SP, Brazil. Braz J Biol. 2007 Feb;67(1):23–32. pmid:17505746
  48. 48. Vega M, Pardo R, Barrado E, Debán L. Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Research. 1998 Dec 1;32(12):3581–92.
  49. 49. Morrison G, Fatoki OS, Persson L, Ekberg A. Assessment of the impact of point source pollution from the Keiskammahoek Sewage Treatment Plant on the Keiskamma River—pH, electrical conductivity, oxygen- demanding substance (COD) and nutrients. Water SA. 2001 Jan 1;27(4):475–80.
  50. 50. Bellman RE. Dynamic Programming. Princeton: Princeton University Press; 1957.
  51. 51. Theodoridis S, Konstantinos K. Pattern Recognition - 4th Edition. Burlington: Academic Press; 2008.
  52. 52. Ming Lu, AbouRizk S. M., Hermann U. H. Sensitivity Analysis of Neural Networks in Spool Fabrication Productivity Studies. Journal of Computing in Civil Engineering. 2001 Oct 1;15(4):299–308.
  53. 53. Nourani V, Sayyah Fard M. Sensitivity analysis of the artificial neural network outputs in simulation of the evaporation process at different climatologic regimes. Advances in Engineering Software. 2012 May 1;47(1):127–46.
  54. 54. Morgan RP, Cushman SF. Urbanization effects on stream fish assemblages in Maryland, USA. Journal of the North American Benthological Society. 2005 Sep 1;24(3):643–55.
  55. 55. Nelson KC, Palmer MA, Pizzuto JE, Moglen GE, Angermeier PL, Hilderbrand RH, et al. Forecasting the combined effects of urbanization and climate change on stream ecosystems: from impacts to management options. J Appl Ecol. 2009 Feb;46(1):154–63. pmid:19536343