Introduction

The first, and crucial, step in non-native species (NNS) risk analysis is to identify which species are likely to become invasive in the risk assessment (RA) area of interest and therefore warrant a full, comprehensive RA to assess their potential adverse impacts on native species and ecosystems (Ricciardi and Rasmussen 1998; Copp et al. 2005a, b). The development of risk analysis protocols for aquatic NNS in recent decades has been strongly influenced by decision-support (DS) tools developed for the analysis of weeds and plant pests (see Baker et al. 2005), with perhaps the most widely-used risk identification tools being the Weed Risk Assessment (WRA) for non-native terrestrial plants (Pheloung et al. 1999) and its direct derivative, the Fish Invasiveness Screening Kit (FISK) for freshwater fishes (Copp et al. 2005a, b)—sometimes also referred to, albeit less correctly, as the Fish Invasiveness Scoring Kit (e.g. Gozlan et al. 2010; Onikura et al. 2011; Tricarico et al. 2010; Verbrugge et al. 2012; Lawson et al. 2013; Puntila et al. 2013; Vilizzi and Copp 2013).

A user-friendly DS tool based in Excel®, the WRA had been applied globally (Gordon et al. 2008) prior to its adaptation in 2005 (Copp et al. 2005a, b) to create FISK and its sister ‘-ISK’ toolkits for freshwater invertebrates (FI-ISK), marine fish (MFISK), marine invertebrates (MI-ISK) and amphibians (Amph-ISK). The first application and calibration of these screening tools was with FISK v1 to identify potentially invasive freshwater fishes in England & Wales (Copp et al. 2009). This DS tool was subsequently applied to several RA areas in Europe, Asia, North and South America (Copp 2013; Table 1), before being replaced by a much improved version (v2: Lawson et al. 2013). Global applications of FISK v1 and v2 include RA areas in the six inhabited continents (see Table 1), which contrasts the more limited geographical application of the other ‘-ISK’ tools, i.e. FI-ISK (Tricarico et al. 2010; Urho et al. 2012; Chucholl 2013; Škraba et al. 2013; Papavlasopoulou et al. 2014; Patoka et al. 2014; Loureiro et al. 2015; Kotovska et al. 2016; Tovar Hernández 2016; Chucholl and Wendler 2017; Patoka et al. 2017; Uderbayev et al. 2017; Vodovsky et al. 2017; Weiperth et al. 2018), MFISK (Copp et al. 2013), MI-ISK (Drolet et al. 2016; Perdikaris et al. 2016a), and AmphISK (Kopecký et al. 2016).

Table 1 Original and new or re-computed thresholds (Thr) and corresponding mean, lower and upper confidence intervals (LCI and UCI, respectively) for the Area Under the Curve (AUC) based on Receiver Operating Characteristic (ROC) curve analysis applied to the taxa screened under the Fish Invasiveness Screening Kit (FISK) according to Risk Assessment Area (RA area) (see also Fig. 1). For each RA area, the Köppen-Geiger climate class (or classes) is provided (A = Tropical; B = Dry; C = Temperate; D = Continental: Peel et al. 2007). FISK applications are grouped according to version (v1 and v2) and information is given whether (ROC-based) calibration was performed, in which case further distinction is made whether the a priori classification (Not impl. = Not implemented) of the taxa was after FishBase (Froese and Pauly 2018) and the Global Invasive Species Database (GISD: www.iucngisd.org/gisd/), here referred to as ‘Global’, or RA area-specific. No AUCs were computed for those applications with low sample sizes, nor were they re-computed for those studies providing an RA area-specific a priori classification of the taxa (unless additional taxa were available from the original study). Original threshold values are given with the number of digits reported in the source study; new or re-computed threshold values with two digits for comparative purposes. AUC and LCI values < 0.5 in italics. See Appendix Table A2 in Supplementary Material for the complete list of screened taxa by RA area

An initial summary of these applications (Copp 2013) reported calibrated thresholds to distinguish between the categories of low-to-medium and high risk of the screened species becoming invasive in the RA area, which henceforth are referred to as medium- and high-risk species. Also, Copp’s (2013) summary emphasised the importance of screening species within the context of a defined RA area and recommended that species assessments should ideally be carried out by more than one assessor. With the release of the Aquatic Species Invasiveness Screening Kit (AS-ISK: Copp et al. 2016), which replaces all previous ‘-ISK’ toolkits, the aim of the present study was to provide a meta-analysis and overall evaluation of the worldwide applications of the FISK DS tool, including an update of taxon-specific risk levels. The specific objectives were to: (i) evaluate the extent and variation of the FISK scores across RA areas; (ii) measure the certainty in question-specific responses making up the FISK risk screening protocol; (iii) assess the capability of FISK as a screening DS tool to distinguish correctly between non-invasive and invasive taxa and categorise them according to risk level; and (iv) provide climate-specific, generalised thresholds for risk level categorisation so as to identify the most potentially invasive species on a global level. It is anticipated that the findings of the present study will reveal the robustness of FISK as a DS tool, as well as similarities and differences in thresholds and scores depending on RA areas. Identifying such generalisations and specificities within FISK will help improve its accuracy and enable better-informed and more effective management strategies for the management and conservation of freshwater ecosystems. Finally, the global and climate-specific thresholds identified in the present study will contribute to the transferability and calibration of thresholds to be developed for the AS-ISK tool, and the evaluation of a wide range of FISK applications will allow identification of minimum sample sizes for RA area-specific calibration in AS-ISK.

Methods

Toolkit description

As with its parent screening tool, the WRA, both versions of the FISK (henceforth referred to simply as ‘FISK’ unless the version is indicated) consist of 49 questions (Qs) and related guidance (cf. Gordon et al. 2010), which are arranged into two main sections and eight categories (Copp et al. 2005a, b). The Biogeography/Historical section includes the categories Domestication/Cultivation (three Qs), Climate and distribution (five Qs) and Invasive elsewhere (five Qs); the Biology/Ecology section includes the categories Undesirable (or persistence) traits (12 Qs), Feeding guild (four Qs), Reproduction (seven Qs), Dispersal mechanisms (eight Qs) and Tolerance attributes (five Qs). Following revision and upgrade of FISK v1 to FISK v2 to allow incorporation of broader climatic zones (Lawson et al. 2013), changes were made to the formulation of 36 out of the 49 Qs in total, even though their arrangement into the original categories and sections was preserved (Appendix Table A1 in Supplementary Material).

In FISK, each answered question (including ‘Don’t know’ responses) results in a score that is either directly related to the question itself or, in certain cases, indirectly computed (by means of a weighting system) from a ‘parent’ question, and the Q-specific score has a value ranging from − 1 to 2 (Copp et al. 2005a). The ‘Don’t know’ response indicates the inability by the assessor to provide information on a certain ecological aspect of the species being evaluated, either due to unavailability of information or, possibly, overall non-applicability of a certain question. The summation of the Q-specific values provides an outcome score ranging (theoretically) from a minimum of − 15 to a maximum of 57. Based on this score, the potential risk of a species being invasive is then categorised as ‘low’, ‘medium’ or ‘high’, so that a species categorised as high-risk is regarded as invasive and considered for a full RA (Copp et al. 2005a; Britton et al. 2011). Distinction between medium and high risk species is made with reference to a threshold value that is generally ‘calibrated’ to be RA area-specific (see Copp 2013; Hill et al. 2017); whereas, distinction between low- and medium-risk species is based upon a fixed threshold of 1 (Copp et al. 2005a), which is independent of the RA area.

As each Q-related response in FISK for any given assessment is allocated a certainty level (1 = very uncertain; 2 = mostly uncertain; 3 = mostly certain; 4 = very certain), the ‘certainty factor’ (CF) for the assessment is computed as:

$$ \sum \left( {{\text{CQ}}_{i} } \right)/\left( {4 \times 49} \right)\quad (i = 1, \ldots ,49) $$

where CQi is the certainty level for Qi, 4 is the maximum achievable certainty level (as above), and 49 is the total number of Qs comprising FISK. The CF ranges from a minimum of 0.25 (i.e. all 49 questions with certainty level equal to 1) to a maximum of 1 (i.e. all 49 questions with certainty level equal to 4).

Data sources and processing

Data sets were collated from all applications of FISK as retrievable from the scientific literature. Applications consisted primarily of peer-reviewed papers, but also reports, and were identified according to the RA area under investigation. Three unpublished data sets were also included in the review, and a few extra (unpublished) assessments were added to four of the published applications (Table 1). Given the changes in FISK v2 relative to FISK v1 (see “Toolkit description”), for analytical purposes the applications were grouped according to the version used. For this reason, the two applications for Belarus (Table 1) were kept separate for analytical purposes throughout except with reference to species’ distribution. For each RA area, the (Q-related) assessor-specific responses and corresponding certainty levels for each taxon screened were then retrieved, whenever possible, from the original ‘output spreadsheet’ as generated by FISK (free program’s download at www.cefas.co.uk/nns/tools/).

For each taxon screened, whenever applicable the scientific name used in the original study was updated to the most recent taxonomy after FishBase (Froese and Pauly 2018), followed by ‘cross-checking’ for the existence of at least one peer-reviewed published study that adopted the updated scientific name. This criterion also applied (in principle) to the common name, except for those taxa for which an ‘official’ name is not (yet) available, in which case the most frequently used common name in English, or the vernacular name as per the original study, was employed. The taxonomic order and family were also retrieved for each taxon screened.

FISK applications were distinguished into those that provided calibration of the outcome scores and those that did not. In the former case, a distinction was made whether the a priori classification of the taxa into either ‘non-invasive’ or ‘invasive’ (a requirement for calibration: see below) was according to both FishBase and the Global Invasive Species Database (GISD: www.iucngisd.org/gisd/), and hereafter referred to as ‘global’, or whether it was specific to the RA area under investigation (e.g. based on local lists of invasive species). Regardless of the type of a priori classification (i.e. global or RA area-specific), all studies that provided a calibrated threshold relied upon Receiver Operating Characteristic (ROC) analysis (Bewick et al. 2004), which also involves computation of the Area Under the Curve (AUC). Conversely, those studies that did not provide a calibrated threshold typically employed the ‘generic’ (reference) threshold of 19 originally set for England & Wales (Copp et al. 2009; but see Neal et al. 2017).

For each RA area, the corresponding Köppen-Geiger climate class (A = Tropical; B = Dry; C = Temperate; D = Cold (continental): Peel et al. 2007) was identified, noting that in several cases more than one climate class applied to a certain RA area. However, the E class (Polar and Alpine), if present in the RA area, was not included as alpine and polar streams and lakes are inhabited by a very limited number of fish species compared to neighbouring continental habitats, and statistically would represent a ‘naughty noughts’ component in the data set (Martin et al. 2005).

Data analysis

Scoring and certainty

The shape of the global distribution of FISK scores was tested in R x64 v3.4.3 (R Development Core Team 2015) using package moments v0.14 (Komsta and Novomestky 2015), with normality, skeweness and kurtosis evaluated by the Jarque–Bera (JB), D’Agostino and Anscombe tests, respectively.

Differences between mean scores for the taxa classified a priori into non-invasive and invasive (based on the original, updated or new a priori classification, as applicable: see Outcomes), and according to FISK version and RA area, were tested by Permutational (Univariate) Analysis of Variance (PERANOVA). This was based on a partial-hierarchical design (cf. Vilizzi 2005) with factors Category (non-invasive, invasive), Version (v1, v2) and RA area (see Table 1, but excluding Great Lakes Basin, Northeast of Pará Basin, Puerto Rico, Rhine Basin and River Oder Estuary, due to low samples sizes) nested within Version, and with all factors fixed. PERANOVA was carried out in PERMANOVA + v1.0.8 for PRIMER v6.1.18 (Anderson et al. 2008), following normalisation of the data, using a Euclidean distance, 9999 permutations of the residuals under a reduced model (because of the nested design: Anderson and Robinson 2001), and with statistical effects evaluated at α = 0.05 (including a posteriori pair-wise comparisons, in case of significance). Notably, the advantage of PERANOVA compared to ‘traditional’ (fully parametric) ANOVA is that the stringent assumptions of normality and homoscedasticity, which prove very often unrealistic when dealing with ecological data sets, are ‘relaxed’ considerably.

Differences between certainty values in the assessments according to FISK version, Section, Category within Section, and Question within Category within Section (see “Toolkit description” and Appendix Table A1 in Supplementary Material) were also tested by PERANOVA. This relied again on a partial-hierarchical design with factors Version, Section, Category(Section) and Question(Category(Section)) all fixed, and using the same computational settings as for the PERANOVA on the mean scores.

Outcomes

For those FISK applications that relied on the global (i.e. FishBase and GISD based) a priori classification (see “Toolkit description”), corresponding thresholds and AUCs were re-computed by ROC analysis following an update of the a priori classification for each taxon assessed whenever applicable. This was because of the change in status (i.e. from non-invasive to invasive or vice versa) for some taxa since implementation of the original screening study, which was in some cases also ‘augmented’ by inclusion of one or more (published or unpublished) assessment(s) for the RA area under investigation (Table 1). Conversely, thresholds and corresponding AUCs were computed ex novo both for those applications that did not originally implement calibration (but under the constraint of there being a representative sample size) and for the three unpublished data sets (Table 1).

Statistically, a ROC curve is a graph of sensitivity versus 1—specificity (or alternatively, sensitivity versus specificity) for each threshold value, where in the present context sensitivity and specificity will be the proportion of a priori invasive and non-invasive taxa, respectively, that are correctly identified by FISK as such. A measure of the accuracy of the calibration analysis is the AUC, which typically ranges from 0.5 to 1.0, and the closer to 1.0 the better the ability to differentiate between invasive and non-invasive taxa. If the AUC is equal to 1.0, then the test is 100% accurate, because both sensitivity and specificity are 1.0, and there are neither ‘false positives’ (a priori non-invasive taxa categorised as high risk, hence invasive) nor ‘false negatives’ (a priori invasive taxa categorised as low risk, hence non-invasive). Conversely, if the AUC is equal to 0.5, then the test is 0% accurate as it cannot discriminate between ‘true positives’ (a priori invasive taxa categorised as high risk, hence invasive) and ‘true negatives’ (a priori non-invasive taxa categorised as low risk, hence non-invasive).

Differences between original and re-computed threshold values were evaluated in R using the Wilcoxon test. Differences between threshold values (original or re-computed) under FISK v1 and v2 were tested by PERANOVA based on a one fixed-factor design and using the same settings as above (see “Scoring and certainty”) but under a full model (because of the single factor: Anderson and Robinson 2001). The best FISK threshold value that maximises the true positive rate and minimises the false positive rate was then determined using Youden’s J statistic (Youden 1950). Differences between application-specific AUCs were tested for all possible pair-wise combinations of RA areas, but separately under FISK v1 and v2 and after excluding those AUCs equal to 1 or less than 0.5 (Zhang and Pepe 2005). ROC analyses were carried out in R with package pROC (Robin et al. 2011) using the default 2000 bootstrap replicates for computation of the AUC confidence intervals and the DeLong test for a posteriori pair-wise comparisons with Bonferroni-corrected significance values.

Log-linear analysis (Quinn and Keough 2002) was used to determine the effects of RA area (except for Great Lakes Basin, Northeast of Pará Basin, Puerto Rico, Rhine Basin and River Oder Estuary, because of the low samples sizes: Table 1), a priori classification (non-invasive, invasive: original or updated a priori classification, as applicable), and risk level (low, medium, high: see “Toolkit description”) on the number of taxa screened, and separately for FISK v1 and v2. In both cases, a null model (that is, with all frequencies being equal) was initially fitted and terms were added sequentially starting from all possible combinations of the individual factors and two-way interactions up to a saturated model (that is, one including the highest three-way interaction term). Significance of terms included sequentially (α = 0.05) was then tested by an analysis of deviance based on a Chi square test. Fitting of log-linear models was performed in R using library MASS v7.3-47 (Venables and Ripley 2002) under a Poisson distribution.

Following Smith et al. (1999), three measures of accuracy were defined, namely (i) for a priori invasive taxa, (ii) for a priori non-invasive taxa, and (iii) overall:

$$ A_{i} = \left( {I_{r} /I_{t} } \right) \times 100 $$

where Ir is the number of a priori invasive taxa that were rejected by FISK (i.e. high risk), and It the total number of a priori invasive taxa screened. Similarly:

$$ A_{n} = \left( {N_{a} /N_{t} } \right) \times 100 $$

where Na is the number of a priori non-invasive taxa accepted by FISK (i.e. low and medium risk) and Nt the total number of a priori non-invasive taxa screened. Overall accuracy is then given by:

$$ A_{o} = \left( {N_{a} + N_{t} } \right)/\left( {N_{t} + I_{t} } \right) $$

Notably, in all cases values above 50% are indicators of the accuracy of the screening tool.

To identify the taxa posing a high-risk level of invasiveness at the global (worldwide) scale, ROC analysis was applied to the combined data set (hence, regardless of RA area) but after excluding those (few) taxa other than species, sub-species or hybrids. Also, given the global level of analysis, the a priori classification for all taxa was in all cases after FishBase and GISD (hence, global: see “Data sources and processing”).

Climate

Following identification of the global threshold (see Outcomes), the taxa categorised globally as high risk were further grouped according to the number of climate classes (see Data sources and treatment) in the different RA areas for which they were screened, and were additionally ‘flagged’ both for their being listed in the GISD (i.e. invasive) and for their a priori classification (i.e. non-invasive or invasive). Taxa evaluated across all climate classes were then (loosely) regarded as carrying a ‘high confidence’ of being high risk, those evaluated across three classes as ‘medium confidence’, and those evaluated for two classes as ‘low confidence’; whereas, the remaining high-risk taxa evaluated for only one climate class were regarded as amenable to further screenings. Notably, the confounding of climate classes with RA area (i.e. due to the presence of two or more classes within a single RA area: see Table 1) and the more limited climatic scope of FISK v1 versus v2 (see “Toolkit description”) were not accounted for at this more generic level of analysis for climate-related patterns.

To unravel the confounding effect of climate class with RA area, assessments were selected from a subset of the RA areas and only for those applications under FISK v2 because of the DS tool’s wider climatic applicability (see “Toolkit description”). The subset was chosen so that each RA area was either entirely comprised within a single climate class or within a ‘predominant’ climate class with respect to the ‘secondary’ one(s)—this implied that all other RA areas spanning across three or all four climate classes were excluded from the subset. PERANOVA (one-factor design, Euclidean distance, 9999 permutations of the residuals under a full model) was then used to test for differences in mean outcome scores amongst climate classes for each taxon in the subset. Using the global a priori classification for the taxa, ROC analysis was then implemented separately on each climate class, and corresponding thresholds and AUCs were computed and statistically compared (as per Outcomes).

Results

Applications and assessments

In total, 36 FISK applications were available for 35 RA areas in 45 countries across the six inhabited continents (Fig. 1). Of these applications, 11 were carried out under FISK v1 and 25 under FISK v2 (Table 1). The RA areas consisted of: (i) groups of countries (Croatia and Slovenia, European Union) or parts of countries (Conterminous USA, England & Wales); (ii) ‘extensive’ geographical areas (Anatolia and Thrace, Balkans, Iberian Peninsula); (iii) individual countries (Belarus, Greece, Mexico, Moldova, Netherlands, Portugal, Serbia, Singapore, South Africa), other political entities (Scotland, Puerto Rico) and states (Florida, Pennsylvania); (iv) regions (Catalonia, Flanders, Northern Kyushu Island, Southern Finland); and (v) river or lake drainage basins (Gangneungnamdae Stream Basin, Great Lakes Basin, Murray-Darling Basin, Northeast of Pará Basin, Rhine Basin, River Neretva Basin, River Oder Estuary, São Camilo Stream Basin, Upper River Paraná Basin), or waterbody elements thereof (Lagoa dos Patos, Lake Balaton).

Fig. 1
figure 1

Map showing the countries/political entities including the Risk Assessment Areas (RA areas) for which the Fish Invasiveness Screening Kit (FISK) was applied. Note that in some cases the RA area was only part of a certain country/political entity. See also Table 1

Based on all FISK applications, 1973 assessments in total were made by 70 + experts on 372 taxa. These comprised 1 genus, 354 species, 4 sub-species, 8 hybrids and 5 haplotypes in 19 orders and 62 families (Appendix Table A2 in Supplementary Material). Most of the taxa screened (62.4% of the total) belonged to the orders Cypriniformes and Perciformes, followed by Siluriformes, Characiformes, Salmoniformes and Cyprinodontiformes (29.3%), and with the remaining taxa (8.3%) distributed across an additional 13 orders each representing < 2% of the total (Fig. 2a). Cyprinidae were by far the most highly represented family, followed by Cichlidae and Salmonidae (and together representing 49.2% of the taxa), and with all other families each including < 4% of the taxa (Fig. 2b). The most widely-screened species (60% of the RA areas in both cases) were grass carp and common carp (Ctenopharyngodon idella and Cyprinus carpio, Cyprinidae) for 21 out of the 35 RA areas in total, rainbow trout (Oncorhynchus mykiss, Salmonidae) for 17 RA areas, silver carp and topmouth gudgeon (Hypophthalmichthys molitrix and Pseudorasbora parva, Cyprinidae) for 16 RA areas, and with an additional 15 species screened for at least 10 RA areas (Fig. 2c). Overall, 47 of the 51 species listed as invasive in GISD, hence excluding yellowfin goby (Acanthogobius flavimanus, Gobidae), alewife (Alosa pseudoharengus, Clupeidae), dusky millions fish (Phalloceros caudimaculatus, Poeciliidae), and ‘planitilapia’ (Sarotherodon occidentalis, Cichlidae), were screened with FISK.

Fig. 2
figure 2

Number and corresponding proportion of the taxa screened with FISK according to a order and b family. c Proportion of species screened for more than ten RA areas. See also Appendix Table A2 in Supplementary Material

Replicated assessments (i.e. by more than a single assessor) were available for all taxa screened for a certain RA area in 9 out of the 36 applications in total (Appendix Table A3 in Supplementary Material). For FISK v1, England & Wales had 2 assessors for all taxa (5 assessors in total) and Northern Kyushu Island had 5 assessors for all taxa, with 3 taxa evaluated twice by the same assessor. For FISK v2, there were 2 assessors for all taxa for Anatolia and Thrace, for Greece and for the River Neretva Basin, whereas the Conterminous USA had 2 to 5 assessors for all taxa (seven assessors in total), the Iberian Peninsula had 3 assessors for all taxa as did South Africa (6 assessors in total), and Lake Balaton had 3 to 4 assessors for all taxa (4 assessors in total). Of the other 27 applications, 6 included replicated assessments for most or part of the taxa: using FISK v1, Flanders had 2 assessors for 21 out of 22 taxa; and using FISK v2, the Balkans had 2 to 4 assessors for 12 out of 43 taxa (7 assessors in total), Croatia and Slovenia had 2 assessors for 23 out of 40 taxa, Florida had 2 to 5 assessors for 75 out of 97 (5 assessors in total), Mexico had 2 assessors for 18 out of 30 taxa (3 assessors in total), and Portugal had two assessors for 39 out of 40 taxa. Whereas, the remaining 21 applications consisted of un-replicated assessments. For the 601 replicated assessments in total, the difference (Δ) between the min and max score value was equal to 0 in 29 cases (i.e. 4.8% of the total), and in the other 572 cases it ranged from 0.5 to 26 (Appendix Table A3 in Supplementary Material). Statistics for Δ were: mean = 7.0 ± 0.2 SE, median = 5.5, and 5th and 95th percentiles = 0.5 and 18.0, respectively.

Scoring and certainty

FISK scores ranged from − 9 [golden mbuna (Melanochromis auratus, Cichlidae): Conterminous USA] to 44 [goldfish Carassius auratus (Cyprinidae) and common carp: Iberian Peninsula], with a mean of 15.4, a median of 15.0, and 5th and 95th percentiles of 0 and 33.0, respectively. The distribution of the scores was not normal (JB = 39.623, P < 0.001), but slightly skewed to the right (skewness = 0.186, z = 3.320, P < 0.001) and platykurtic (kurtosis = 2.405, z = − 7.721, P < 0.001) (Fig. 3).

Fig. 3
figure 3

Frequency distribution of the scores for the taxa screened with FISK (see also Appendix Table A4 in Supplementary Material)

The overall mean score for FISK v1 was significantly higher than for FISK v2 (19.4 ± 0.6 SE vs 14.3 ± 0.3 SE). There were also differences between taxa classified a priori into non-invasive and invasive although conditional upon RA area within FISK version (Table 2), with mean scores for the a priori non-invasive taxa being in most cases significantly lower compared to those for the a priori invasive taxa (Table 3). Notable exceptions were the applications for Moldova, Netherlands, Pennsylvania and the Upper River Paraná Basin (FISK v1), and for the Gangneungnamdae Stream Basin (FISK v2), for which there were no statistically significant differences (Table 2). However, in the case of Pennsylvania and the Upper River Paraná Basin, this was most likely an outcome of the low sample sizes (cf. Table 1), as the mean score values for the a priori non-invasive taxa were consistently lower than those for the a priori invasive taxa (Table 3). And the same was true for the Gangneungnamdae Stream Basin application, even though the statistical difference was below ‘heuristic’ significance at the α = 0.10 level. This contrasted the very similar mean score values (between a priori non-invasive and invasive taxa) for the Moldova and Netherlands applications—the latter also limited by a relatively small sample size (Table 3).

Table 2 Permutational Analysis of Variance (PERANOVA) results for the FISK scores of the taxa classified a priori into non-invasive and invasive (Classification) and according to both FISK Version (v1 and v2) and Risk Assessment Area (RA area, nested within Version). Significant effects (α = 0.05) in bold and heuristically (α = 0.10) in italics, including a posteriori pair–wise comparisons for a priori non-invasive versus invasive. MS = mean square; # = permutational. Great Lakes Basin, Northeast of Pará Basin, Puerto Rico, Rhine Basin and River Oder Estuary RA areas not included due to low sample sizes; for Portugal, Alaunocara sp. not included due to a not applicable a priori classification (see Table 1). See also Table 3
Table 3 Number of taxa with corresponding mean ± SE (standard error) score classified a priori as non-invasive and invasive (new or updated a priori classification, as applicable) across 34 RA areas grouped according to FISK version (see Table 1). In italics, RA areas not included in the statistical analyses (cf. Table 2) due to low sample sizes

Certainty values could not be retrieved for 11 of the 36 FISK applications reviewed, nor were they available for one of the two replicated assessments on ide (golden orfe) (Leuciscus idus, Cyprinidae) for England and Wales (Table 1). Based on the 24 FISK applications for which certainty values were available, there were significant differences in certainty between Sections, Categories within Section, and Questions within Category within Section (Table 4). Mean certainty was higher for the Biogeography/Historical versus the Biology/Ecology Section (3.47 ± 0.03 vs 3.34 ± 0.02). At the Category(Section) level (Fig. 4a): for the Biogeography/Historical section, mean certainty for Invasive elsewhere (3.35 ± 0.05) was lower compared to both Domestication/Cultivation and Climate and Distribution (3.58 ± 0.04 and 3.52 ± 0.04, respectively), which did not differ significantly; for the Biology/Ecology section, mean certainty for Feeding guild and for Reproduction (3.60 ± 0.04 and 3.56 ± 0.03) was higher than for Undesirable (or persistence) traits, Dispersal mechanisms, and Tolerance attributes (3.37 ± 0.03, 3.23 ± 0.04 and 3.14 ± 0.06, respectively). At the Question(Category(Section)) level, in the Biogeography/Historical section (Fig. 4b): Domestication/Cultivation Qs 1 and 2 had higher mean certainty Q3 (3.72 ± 0.04 and 3.64 ± 0.06) relative to Q3 (3.38 ± 0.09), and that for Climate and Distribution Q8 was higher (3.80 ± 0.03) than all of the other Qs (i.e. 4–7) in the section (mean certainty = 3.45); in the Biology/Ecology section (Fig. 4c): Undesirable (or persistence) traits Q25 had lower mean certainty (2.90 ± 0.15) than most of the other Qs (i.e. 14–24) in the section (mean certainty = 3.41), and Dispersal mechanisms Q44 also had lower mean certainty (2.80 ± 0.15) than all other Qs (i.e. 37–43) in the section (mean certainty = 3.24); whereas, there were Q-wise differences in the Feeding guild, Reproduction, and Tolerance attributes sections.

Table 4 PERANOVA results for the certainty levels in FISK assessments according to FISK version (v1 and v2), Section, Category(Section) and Question(Category(Section)) (see Appendix Table A1). Statistically significant effects (α = 0.05) in bold, including a posteriori pair-wise comparisons (for the Questions (Qs), only the significant comparisons or sets thereof are given for conciseness). Certainty values not available for all RA areas (see Table 1). See also Fig. 4
Fig. 4
figure 4

Mean (± SE) certainty for a the FISK Categories of questions (Qs) within each of the corresponding Section; b and c the FISK Qs within each corresponding Category and Section. Black and light gray bars indicate statistically significant higher and lower certainty, respectively, of one Q versus all or most of the others within each grouping (i.e. Category or Section); dark gray bars either no statistically significant differences with all other Qs or only with some of them. Statistical results in Table 4

Overall, 1516 assessments (76.8% of the total) included ‘Don’t know’ responses. These ranged from a minimum of 1 to a maximum of 28 per assessment, with a mean value of 4.9 ± 0.1 SE, a median of 3, and 5% and 95% percentiles of 1 and 14, respectively (Fig. 5a). The three questions that resulted in the largest proportion of assessments with ‘Don’t know’ responses were (FISK v1 and FISK v2 Q formulations given, whenever applicable): Q44 (Is dispersal of the species density dependent?), Q25 (Does the species require minimum population size to maintain a viable population?) and Q47 (Is the species susceptible to piscicides? /Is the species readily susceptible to piscicides at the doses legally permitted for use in the risk assessment area?). Also, Qs 41 (Does natural dispersal occur as a function of dispersal of larvae (along linear and/or ‘stepping stone’ habitats)?) and Q49 (Are there effective natural enemies of the species present in the risk assessment area?) resulted in > 20% assessments including a ‘Don’t know’ response, and another seven Qs (11, 48, 19, 10, 12, 43, 39, in order of proportions) in > 10% (Fig. 5b).

Fig. 5
figure 5

a Frequency distribution for the number of “Don’t know” responses in each assessment. b Proportion of “Don’t know” responses according to Question number

Outcomes

Of the 15 applications in total that provided score calibration, eleven did so relative to FishBase and GISD (hence, global), whereas the other four based their a priori classification on local, RA area-specific literature (Table 1). Owing to the change in status (i.e. from non-invasive to invasive, or vice versa) of some taxa since implementation of the original screening study, the original a priori classification of the taxa screened was therefore updated for the eleven applications relying on the global calibration plus the application for South Africa, which was augmented by inclusion of an additional three species (Table 1). Re-computation of corresponding thresholds and AUCs for the 12 applications above resulted in several changes in the a priori classification of some taxa (i.e. from non-invasive to invasive, or vice versa) except for Serbia and South Africa, with the proportion of changes in status ranging from 8.2% (Greece) to 25.0% (Southern Finland) of the total taxa originally screened for the corresponding RA area (Appendix Table A4 in Supplementary Material). Because of the change in status, four species [tench (Tinca tinca, Cyprinidae) for Anatolia and Thrace; ide for the Iberian Peninsula; grass carp and pumpkinseed (Lepomis gibbosus, Centrarchidae) for Southern Finland] changed their risk level from medium to high, and one species, namely racer goby (Babka gymnotrachelus, Gobiidae) for Croatia and Slovenia, from high to medium (Appendix Table A4 in Supplementary Material).

Thresholds and AUCs were computed ex novo for 13 of the 21 applications that did not originally provide a calibrated threshold (but relied on the threshold of 19 originally set for England & Wales), as 5 of these applications had too small a sample size for successful ROC implementation (i.e. Great Lakes Basin, Northeast of Pará Basin, Puerto Rico, Rhine Basin and River Oder Estuary: Table 1). In addition, thresholds and AUCs were computed ex novo for the (unpublished) applications for Portugal, Scotland and Singapore (Table 1). Original threshold values ranged from 6 (Conterminous USA) to 24 (Mexico); whereas, based on the computed and re-computed thresholds (as applicable), the range was from 7.17 (Conterminous USA) to 32 (Moldova) (Table 1). Despite a lack of statistically significant differences between original and re-computed thresholds for the 12 applications above (Wilcoxon test: V = 24, P = 0.760), there was a substantial increase in threshold value for Southern Finland (Δ = 10.25) and a slighter one for Anatolia and Thrace and for South Africa (2.50 and 0.97, respectively); whereas, a decrease occurred for Croatia and Slovenia (− 5.00 and − 3.94) and for the Balkans, and a minor one for Serbia and for the River Neretva Basin (− 2.00 and − 1.38, respectively). On the other hand, there was a very minor change for England & Wales, the Iberian Peninsula and Lake Balaton (0.25, 0.17 and − 0.25, respectively), and no change for Florida and Greece (cf. Table 1). Finally, the mean threshold value (new or re-computed, as applicable) under FISK v1 was significantly higher than under FISK v2 (20.9 ± 4.8 SE vs 15.9 ± 4.7 SE: F #1,29  = 8.06, P# = 0.007; # = permutational).

Original mean AUC values were always above 0.5, thereby confirming the ability of FISK to differentiate between a priori invasive and non-invasive taxa, although the LCIs for the River Neretva Basin and Serbia fell below it (Table 1). Amongst the new and re-computed AUC values, only the one for Moldova fell below 0.5, and the LCI for the Netherlands also was below it (Table 1). Under FISK v1, new and re-computed AUCs ranged from 0.459 (Moldova) to 0.912 (Catalonia), whereas those for Lagoa dos Patos, Pennsylvania and São Camilo Stream Basin were equal to 1; under FISK v2, AUCs ranged from 0.710 (Lake Balaton) to 0.989 (Portugal), whereas for Serbia the AUC was equal to 1. Overall, there were no statistically significant differences between AUCs under both FISK v1 and v2 (Bonferroni-corrected pair-wise comparisons at α = 0.05/15 ≅ 0.003 and α = 0.05/120 ≅ 0.0004, respectively).

Number and corresponding percentage of RA area-wise risk levels for the taxa classified a priori into non-invasive and invasive (updated categorisation, whenever applicable) under FISK v1 and v2 are given in Table 5 (see also: Appendix Table A3 in Supplementary Material, for the risk level outcomes of all taxa assessed according to FISK version and RA area; and Appendix Table A4 in Supplementary Material, for the change in risk level of some taxa resulting from their re-classification in a priori status and re-computation of thresholds). With FISK v1, there were two statistically significant interaction terms, namely between risk level and RA area, and between risk level and a priori classification (Table 6). The former interaction term simply reflected the structure of the data set comprising the different proportions of low, medium and high risk taxa depending upon RA area. Conversely, the latter (and ecologically relevant) interaction term reflected: (i) the proportion of correct categorisations of a priori invasive taxa as high risk (i.e. true positives: 38.0%) and a priori non-invasive taxa as low risk (i.e. true negatives: 2.0%); (ii) the proportion of incorrect categorisations of a priori non-invasive taxa as high risk (i.e. false positives: 8.6%) and a priori invasive taxa as low risk (i.e. false negatives: 0%:); and (iii) the remaining 35.9% and 15.5% of the a priori non-invasive and invasive taxa, respectively, categorised as medium risk (Table 5). With FISK v2, there were three statistically significant interaction terms, namely between risk level and RA area, risk level and a priori classification, and a priori classification and RA area (Table 6). Like FISK v1, the former and latter interaction terms simply reflected the structure of the data set respectively comprising the different proportions of low, medium and high risk taxa and of a priori non-invasive and invasive taxa depending upon RA area. Conversely, the second (and ecologically relevant) interaction term reflected the proportion of correct categorisations for: (i) true positives (35.8%) and true negatives (6.1%): (ii) false positives (6.9%) and false negatives (0%); and (iii) the remaining 42.8% and 8.4% of the a priori non-invasive and invasive taxa, respectively, categorised as medium risk (Table 5).

Table 5 Number and percentage of RA area-wise risk levels according to FISK (v1 and v2) for the taxa classified a priori into non-invasive and invasive (updated categorisation whenever applicable: see Table 1). Thresholds to distinguish between medium and high risk taxa in Table 1; low risk outcomes all based on a fixed threshold of 1. Great Lakes Basin, Northeast of Pará Basin, Puerto Rico, Rhine Basin and River Oder Estuary RA areas not included due to low sample sizes (see Table 1). Statistical results in Table 6
Table 6 Log-linear analysis results of the RA area-based risk levels (Table 5) for the taxa screened under FISK (v1 and v2). Statistically significant (α = 0.05) effects in bold type. Risk level = low, medium, high; A priori classification: non–invasive, invasive (new or re-computed, as applicable: see Table 1); RA area as per Table 1 (Great Lakes Basin, Northeast of Pará Basin, Puerto Rico, Rhine Basin and River Oder Estuary not included due to low sample sizes). Also, for Portugal, peacock cichlid not included due to not applicable a priori classification

All three measures of accuracy had a mean value well above 50% (Ai = 81.0 ± 3.8 SE; An = 85.8 ± 3.2 SE; Ao = 82.5 ± 2.8 SE), which confirmed the accuracy of the screening tool (Table 7). However, for the Netherlands application accuracy was in all cases below acceptable threshold and the same was true for Ai for the Moldova and Northern Kyushu Island applications.

Table 7 RA area-wise accuracy of FISK (v1 and v2) for a priori invasive taxa (Ai), a priori non-invasive taxa (An), and overall (Ao) (see text for computational details). Accuracy values based on the outcomes of Table 5

Based on the number of RA areas (but after excluding Moldova because of the unreliable ROC outcomes: see above), common carp (the most widely screened species) posed a high risk level of invasiveness in all the 21 RA areas for which it was investigated (Table 8). Amongst the other species screened for at least ten RA areas, goldfish and brown bullhead (Ameiurus nebulosus, Ictaluridae) were also categorised as carrying a high risk in all areas investigated; whereas, grass carp, rainbow trout, silver carp, bighead carp (Hypophthalmichthys nobilis, Cyprinidae), largemouth (black) bass (Micropterus salmoides, Centrarchidae), Nile tilapia (Oreochromis niloticus, Cichlidae), eastern mosquitofish (Gambusia holbrooki, Poeciliidae) and round goby (Neogobius melanostomus, Gobiidae) were categorised as high risk from 67% to 91% of the RA areas. Finally, brook trout (Salvelinus fontinalis, Salmonidae) was categorised as medium risk in eight out of the 13 RA areas for which it was screened, and Atlantic salmon (Salmo salar, Salmonidae) was never categorised as high risk in the five RA areas where it was studied.

Table 8 RA area-based risk levels (no low-risk taxa identified) for the freshwater fish species screened with FISK and listed as invasive in GISD. Medium- and high-risk categories are based on the originally computed, new or re-computed thresholds as applicable (see Table 1). Number and percentage of RA areas are also indicated. 1 = Anatolia and Thrace; 2 = Balkans; 3 = Belarus (FISK v1); 4 = Belarus (FISK v2); 5 = Catalonia; 6 = Conterminous USA; 7 = Croatia and Slovenia; 8 = European Union; 9 = Flanders; 10 = Florida; 11 = England & Wales; 12 = Greece; 13 = Iberian Peninsula; 14 = Lagoa dos Patos; 15 = Lake Balaton; 16 = Mexico; 17 = Murray-Darling Basin; 18 = Netherlands; 19 = Northern Kyushu Island; 20 = Pennsylvania; 21 = Portugal; 22 = River Neretva Basin; 23 = São Camilo Stream Basin; 24 = Scotland; 25 = Serbia; 26 = Singapore; 27 = South Africa; 28 = Southern Finland; 29 = Upper River Paraná Basin

A global threshold of 15.5 was identified by ROC analysis (AUC = 0.851, LCI = 0.805, UCI = 0.896). As a result, of the 366 taxa screened (after excluding one genus and five haplotypes), 112 (30.6%) were categorised as high risk of which 61 (16.7%) were true positives and 51 (13.9%) false positives; whereas, all 34 (9.3%) taxa categorised as low risk were true negatives, and no false negatives occurred. Of the remaining 220 (60.1%) taxa, 203 (55.5%) and 17 (4.6%) a priori non-invasive and invasive, respectively, were categorised as medium risk (Appendix Table A5 in Supplementary Material). Cypriniformes and Perciformes, but also Siluriformes Cyprinodontiformes and Salmoniformes, were the orders with the largest proportion of high risk taxa (Fig. 6a); at the family level, Cyprinidae were by far the most highly represented in number of high risk taxa, followed by Cichlidae, Centrarchidae, Poeciliidae and Salmonidae (Fig. 6b).

Fig. 6
figure 6

Number and corresponding proportion of the high risk taxa (global threshold of 15.42) according to a order and b family. See also Appendix Table A5 in Supplementary Material

Climate

With the caveat for the confounding of climate class with RA area, 24 (21.4%) of the globally high risk species were screened for all climate classes (i.e. A, B, C and D). Except for molly (Poecilia sphenops, Poeciliidae) and giant snakehead (Channa micropeltes, Channidae), these species were classified a priori as invasive; also, except for molly, fathead minnow (Pimephales promelas, Cyprinidae), sailfin molly (Poecilia latipinna, Poeciliidae), channel catfish (Ictalurus punctatus, Ictaluridae) and giant snakehead, these species were all listed in the GISD (Fig. 7). Of the other 93 (78.6%) globally high risk species, 7 (6.3%) were screened for climate classes A, B and C, 44 (39.3%) for B, C and D, 2 (1.8%) for A and C, 9 (8.0%) for B and C, and 8 (7.1%) for C and D (Fig. 8). Finally, the remaining 18 taxa were screened either for the A, C or D climate class only [2 (1.8%), 12 (10.7%) and 4 (3.6%) taxa, respectively].

Fig. 7
figure 7

Mean ± SE (standard error) scores for the taxa screened with FISK across all four climate classes (A = Tropical; B = Dry; C = Temperate; D = Continental: Peel et al. 2007) occurring in the corresponding RA areas. Black circle: a priori invasive; Black square: listed in the Global Invasive Species Database (GISD: www.iucngisd.org/gisd/)

Fig. 8
figure 8

Mean ± SE (standard error) scores for the taxa screened with FISK across three or two climate classes occurring in the corresponding RA areas. Black circle: a priori invasive; Black square: listed in GISD

After excluding Anatolia and Thrace, the Conterminous USA, European Union and Mexico due to the diversity of climate classes encountered across the corresponding RA areas, the subset of (quasi) ‘non-climate-class-confounded’ RA areas consisted of the following climate classes (p.p. = pro parte, indicating the predominant climate class): A—Florida p.p., Northeast of Pará Basin, Puerto Rico p.p., Singapore; B—Murray-Darling Basin p.p. and South Africa p.p.; C—Croatia and Slovenia p.p., Gangneungnamdae Stream Basin, Greece p.p., Portugal, Rhine Basin, River Neretva Basin, Scotland, Serbia; D—Balkans p.p., Belarus, Great Lakes Basin, Lake Balaton, River Oder Estuary, Southern Finland (see Table 1). However, given that only four species (namely, grass carp, common carp, bighead carp and Nile tilapia) were screened across all climate classes, thereby making for too small a sample size, subsequent analysis focused on the subset of 27 species in total screened for climate classes B, C and D (Table 9).

Table 9 Number of assessments (n), mean ± SE score, a priori classification (after FishBase and GISD: N = non-invasive, Y = invasive) and corresponding risk level for the species screened with FISK v2 according to climate class (B, C, D) separately and combined after removing the confounding with RA area (see text for explanation). Risk levels determined according to climate-class specific ROC–based thresholds (B = 21.4; C = 12.1; D = 8.2) and corresponding intervals for the scores. B: Low = [− 15, 1[, Medium = [1, 21.4[, High = [21.4, 57]; C: Low = [− 15, 1[, Medium = [1, 12.1[, High = [12.1, 57]; D: Low = [− 15, 1[, Medium = [1, 8.2[, High = [8.2, 57] (note the reverse bracket notation indicating in all cases an open interval)

There were statistically significant differences in mean scores for the species screened under the three climate classes. Specifically, the mean score was higher for climate B (21.7 ± 1.3 SE) relative to C (17.6 ± 1.5 SE: t# = 2.04, P# = 0.005) and D (15.4 ± 1.3 SE: t# = 3.39, P# = 0.002); whereas, there were no significant differences between C and D (t# = 1.13, P# = 0.260). ROC analysis yielded thresholds and AUCs of 21.4 and 0.818 (LCI = 0.651, UCI = 0.984), 12.1 and 0.907 (LCI = 0.783, UCI = 1.000), and 8.2 and 0.821 (LCI = 0.627, UCI = 1.000) for climate classes B, C and D, respectively. However, despite the sharp decrease in threshold from climate class B to C and D, there were no significant differences between corresponding AUCs (P > 0.05). Based on the climate class-specific thresholds for the 27 species screened: for climate class B, 17 (63.0%) species were categorised as high risk of which 16 (59.3%) were true positives and 1 (3.7%) a false positive (and there were no low risk species present); of the remaining 10 (37.0%) species, 6 (22.2%) a priori non-invasive and another 4 (14.8%) a priori invasive were categorised as medium risk; for climate class C, 21 (77.8%) species were categorised as high risk of which 19 (70.4%) were true positives, 2 (7.4%) were false positives (no low risk species present); of the remaining 6 (22.2%) species, 5 (18.5%) a priori non-invasive and 1 (3.7%) a priori invasive were categorised as medium risk; for climate class D, 23 (85.2%) species were categorised as high risk of which 20 (74.1%) were true positives, 3 (11.1%) were false positives, and 2 (7.4%) were true negatives (no low risk species present); the remaining 2 (7.4%) species were both a priori non-invasive categorised as medium risk.

Across the three climate classes B, C and D, 15 species in total (56.6%) were categorised as high risk (including the highest scoring), 9 (33.3%) as both medium and high risk, 1 (3.7%) as medium risk, and 2 (7.4%) as both low and medium risk (Table 9). By parsing the reviewed data, the following ‘globally’ high risk species were identified (i.e. in order of decreasing scores > 20): common carp, black bullhead (Ameiurus melas, Ictaluridae), round goby, Chinese sleeper (Perccottus glenii, Odontobutidae), brown bullhead, eastern mosquitofish, largemouth bass, pumpkinseed and pikeperch (Sander lucioperca, Percidae) (Table 9).

Discussion

Scope and extent of applications

In the last decade, and especially following release of FISK v2 (Lawson et al. 2013), a large number of FISK-based applications has been made worldwide, with RA areas consisting of geo-political, biogeographical and hydrologic entities and spanning some five orders of magnitude in size, i.e. from Lake Balaton (592 km2) to the Conterminous USA (8,080,464 km2) (Table 1). This outcome is remarkable, especially when comparing FISK to other risk screening/assessment protocols (see Roy et al. 2018). In this respect, the Invasive Species Environmental Impact Assessment Protocol (ISEIA: Branquart 2009), the Trinational Risk Assessment Guidelines for Aquatic Invasive Species for North America (Mendoza et al. 2009), the German-Austrian Black List Information System (GABLIS: Essl et al. 2011), and the Generic Impact Scoring System (GISS: Nentwig et al. 2016) have all been employed so far in a restricted number of countries to screen a considerably smaller number of freshwater fish taxa compared to FISK. Also, the Australian Freshwater Fish Model (Bomford 2008), likely due to its intrinsically limited geo-political scope and conception (cf. Kumschick and Richardson 2013), has remained confined to a few local applications. Finally, the lack of uptake of FISK in Australia, save for the Murray-Darling Basin (Vilizzi and Copp 2013), is remarkable given that this DS tool was derived from the Australian government’s officially-recognised WRA (Pheloung et al. 1999; see also www.agriculture.gov.au/biosecurity/risk-analysis/weeds/system).

Like its geographical extent, the large number of taxa screened with FISK indicates consensus as to the adoption of this DS tool. In fact, it is noteworthy that almost all invasive species listed in GISD have been screened under FISK (Table 8) and that, like other studies (e.g. Alcaraz et al. 2005), the most widely-represented orders and families of invasive taxa comprised only few taxonomic entities deviating from the world’s freshwater richness. Also, like the WRA, FISK has been found to be applicable to taxonomic entities other than species (i.e. sub-species, hybrids and haplotypes), hence confirming the flexibility of the tool (Gordon et al. 2016). Finally, the large spectrum of taxa screened with FISK has allowed for comparative studies with other risk classification protocols, with special emphasis on issues of performance, standardisation, and ability to communicate with managers and stakeholders (Verbrugge et al. 2012; van der Veer and Nentwig 2015).

Nearly half of the FISK applications reviewed in the present study included replication of all or part of the assessments. Replication is important for assessing the accuracy of scoring systems in general (Makowski and Mittinty 2010) as well as uncertainty in the assessment process (e.g. Hill et al. 2014). Given the (theoretical) range in scores of FISK spanning across 72 units (i.e. − 15 to 57: see “Methods”—“Toolkit description”), the median value of 15.0 found in the present study indicates overall close agreement between/amongst assessors, even though in some cases larger values were encountered. However, despite intrinsic disagreements between/amongst assessors, ‘global’ ROC curves (i.e. based on mean score values from all assessors) could always be computed, namely in those (calibrated) studies relying on multiple assessors, due to the lack of statistically significant differences in assessor-specific ROC curves (i.e. Copp et al. 2009; Almeida et al. 2013; Tarkan et al. 2014; Lawson et al. 2015; Ferincz et al. 2016; Perdikaris et al. 2016b; Piria et al. 2016; Glamuzina et al. 2017). Conversely, in their five-assessor study, Onikura et al. (2011) removed (from computation of mean score values) the minimum and maximum scores for each taxon screened; whereas, in their evaluation of bias between assessors, Marr et al. (2017) found that the mean FISK score for the species evaluated by four of the six assessors in total participating in that study was within 10% of the overall mean score, with mean FISK scores from two of these assessors being about 30% away from the latter. Finally, in their application for Portugal, Range, Mourão, Magalhães, & Ribeiro (unpublished), evaluated differences in scores amongst three assessors and, despite some disagreements, pointed to overall similarities between assessments for the same taxa. Clearly, replication of assessments is encouraged whenever feasible (Copp 2013; Roy et al. 2018), as it will contribute to reduce uncertainty and variability in the risk screening/assessment process by eliciting multiple expert opinions and associated confidence levels, thereby making it possible to derive a measure of the degree of agreement between experts (Vanderhoeven et al. 2017).

Scoring and certainty

The overall range in FISK scores, albeit extensive, was still six units above the (theoretical) minimum score value and more than twice as many units (i.e. 13) below the (theoretical) maximum score value. However, achieving the minimum and maximum possible values of − 15 and 57 for the FISK scores has been demonstrated to be hardly achievable in practice, hence making such values mainly of theoretical relevance (Vilizzi and Copp, unpublished). This is due to the constraints imposed computationally by Q1 (Domestication/Cultivation), Qs 4, 5 and 8 (Climate and distribution) and, in FISK v2, ‘cognitively’ by the four Feeding guild questions (i.e. Qs 26–29), which assign a taxon to a certain guild and are partly mutually exclusive (i.e. a taxon is very unlikely to belong to all four guilds; Appendix Table A1 in Supplementary Material). As a result, ‘real-world’ FISK scores are necessarily expected to be confined within a more restricted range of ‘ecologically-meaningful’ values as opposed to the full, ‘computationally-possible’ set of all values. Finally, the observed right-skewness (i.e. towards higher values) in the overall distribution of FISK scores reviewed in the present study would indicate a propensity to assess proportionally more taxa likely to be invasive in the RA area under study, as in the case of those taxa included in local ‘black/grey’ lists and/or global databases of invasive organisms (e.g. Essl et al. 2011; Matthews et al. 2017).

In the present study, reporting of the statistically-significant higher value in overall mean score under FISK v1 relative to FISK v2 per se was mainly driven by ‘illustrative’ rather than statistical reasons. This is because, as a rule in experimental design, the statistical significance of an interaction term overrides the significance of its component terms (e.g. Quinn and Keough 2002). Thus, the difference in mean scores between the two FISK versions should rather be explained relative to the individual RA areas under study. In such a case, comparison of overall mean FISK scores, regardless of the version used and hence unconfounded by other factors, is possible only in replicated screening studies of the same taxa for a certain RA area. However, for FISK no such studies are available, whereas comparisons between FISK v2 and AS-ISK using the same taxa are provided in Glamuzina et al. (2017) and Tarkan et al. (2014, 2017).

As expected, the mean scores for a priori non-invasive taxa were in most of cases (i.e. RA areas) significantly lower than those for a priori invasive taxa. This supported the validity of the a priori classification in general (i.e. either global or RA area-specific), which is an essential component for evaluating the accuracy of any screening tool (Gordon et al. 2008). On the other hand, the lack of statistically significant differences in mean scores for the Pennsylvania and Upper River Paraná Basin applications would point to a minimum sample size required for more reliable a priori classifications, which, based on the available data, can be empirically (and provisionally) identified as being ≈ 15–20 taxa.

Like the mean score values, the observed differences in certainty between sections need to be evaluated at the hierarchical level of significance of the corresponding nested level of the factors, namely Category(Section) and Question(Category(Section)). Thus, for the Biogeography/Historical section, the lower certainty for the Invasive elsewhere category of questions is likely attributable to the need by the assessor to determine the existence of impacts in the taxon’s introduced range (cf. Qs 10–12: Appendix Table A1 in Supplementary Material). However, such impacts may be difficult to determine in some cases due to lack of experimental evidence, which for the lesser-studied taxa often relies on circumstantial (or even anecdotal) evidence. For the Biology/Ecology section, the higher certainty for the Feeding guild and Reproduction relative to the Undesirable (or persistence) traits, Dispersal mechanisms and Tolerance attributes categories can again be explained by the easier availability (e.g. FishBase) of ecological information for the first two categories of questions relative to the other three. And the same argument applies to the lowest hierarchical level at the Question(Category(Section)), where more subtle differences in certainty at the question level were revealed.

As a derivative of the WRA, FISK has preserved the original 49-question template making up the original risk screening questionnaire, and has adapted some of those questions for application to freshwater fishes (Copp et al. 2005a, b; Kumschick and Richardson 2013; Appendix Table A1 in Supplementary Material). In the present evaluation, three of the 49 Qs stood out for having received ‘Don’t know’ responses in a large proportion of assessments. Thus, minimum population size to maintain a population (Q25) and density-dependent dispersal (Q44) are amongst the most difficult aspects of fish population dynamics to estimate (Rose et al. 2001), and this resulted in > 50% of assessments receiving ‘Don’t know’ responses to those questions. Larval dispersal via linear/stepping-stone habitats (Q41) also received a high proportion of ‘Don’t know’ responses. Indeed, such life-history parameters can only be obtained through studies of age-growth, reproduction and early life-history (Beddington and Kirkwood 2005), which may be limited or even lacking for several taxa. Similarly, knowledge of the susceptibility of a certain taxon to piscicides (Q47) would require data from field experiments and/or laboratory studies (e.g. Allen et al. 2006), which, like predation/habitat competition (Q49), may again not be available in several cases. And although a ‘Don’t know’ response may highlight the need for research on that topic, this response should be avoided in NNS risk analysis protocols (R.H.A. Baker, personal communication) and for this reason was removed as a response option when FISK v2 was adapted to create AS-ISK (Copp et al. 2016).

Outcomes

The wide range in FISK threshold values recorded in the present review emphasises the importance of conducting RA area-specific calibrations whenever possible (Kumschick and Richardson 2013). In this regard, the major constraint that can be envisaged is the lack of sufficient sample sizes, which in the present study was empirically identified at a minimum of 15–20 taxa (see “Scoring and certainty”). In fact, the ‘transferability’ of a threshold from another RA area (as in the case of some FISK applications) may often represent a weak compromise given the observed discrepancies from threshold values computed ex novo (Table 1). On the other hand, setting a reference threshold for those studies limited to the evaluation of a restricted number of taxa (or just one species) would represent the only available option, in which case computation of global and/or climate-class specific thresholds (see Climate) in RA studies is still recommended. Importantly, both the computation and transferability of thresholds as well as the identification of a minimum sample size for successful calibration are an important outcome of the present study in view of the future adoption and implementation of the new derivative AS-ISK DS tool (Copp et al. 2016).

Because the information base for risk assessments is ever increasing, it is important to remember that risk analysis is a dynamic process. Therefore, when new data are available for a taxon, a risk screening (and even a full risk assessment) may be advisable to ensure that the risk ranking of that taxon is as accurate as possible to inform decision makers of any change in risk posed by the taxon being evaluated. In the present study, this was exemplified by the change in a priori invasiveness status for several taxa, which caused five of these to change in risk level following screening (discussion in Appendix A1 in Supplementary Material). Similarly, the original mean FISK score of 36 attributed to topmouth gudgeon for England & Wales (Copp et al. 2005a) increased to 43 a few years later when the species was re-assessed in light of new data becoming available (Copp et al. 2009). Finally, regarding the three FISK applications included in the present review as ‘unpublished data’ (i.e. Portugal, Scotland and Singapore), an overall discussion of the corresponding FISK outcomes is provided in Appendix A2 in Supplementary Material.

The mean ROC values (both original and re-computed, as applicable) were in all cases (but for the application for Moldova) significantly greater than 0.5, and consistently so across all RA areas, indicating that FISK was able to separate accurately invasive and non-invasive taxa to a greater degree than would be expected by chance alone. This outcome is like that for the WRA, FISK’s parent DS tool, as revealed by a meta-analysis study of seven WRA applications across three continents (Gordon et al. 2008). On the other hand, the LCI values below threshold observed in the present study for the Moldova and Netherlands applications would point to some discrepancies in the correct distinction between invasive and non-invasive taxa. Such discrepancies were likely due to the original selection of taxa, which was not balanced between invasive and non-invasive, but rather meant to provide a representative number of (mostly invasive) taxa. In this respect, it is unknown whether a similar variation in ROC values (i.e. causing some of them to fall below threshold) was also present in the WRA applications reviewed by Gordon et al. (2008), where only standard errors were reported. Overall, the low-to-very-low proportion of false positives and absence of false negatives across the FISK v1 and v2 applications, but also globally (Appendix Table A5 in Supplementary Material), is an indicator of the accuracy of this DS tool (Kumschick and Richardson 2013), as also measured explicitly by the corresponding Ai, An and Ao values, which were in all cases close to or above 80% (Smith et al. 1999).

Whilst the setting of RA area-specific thresholds is desirable to evaluate the sensitivity of a RA tool (Kumschick and Richardson 2013), under certain circumstances this may not be possible. This was the case for those FISK applications in which only one or a few taxa were evaluated (Table 1) and for which the authors relied on the ‘reference’ threshold of 19 originally set for England & Wales (Copp et al. 2009). However, that threshold was intended for use for that RA area, which may or may not be (at least climatically) relevant to the other RA areas for which it was applied (i.e. Belarus, Moldova, Northeast of Pará Basin, River Oder Estuary, and Puerto Rico, the latter having mistakenly used 18: Table 1). To this end, the global threshold of 15.5 identified in the present study would be more appropriate than the original (19) used in those FISK applications. Based on this cut-off value, the finding that Cypriniformes, Perciformes, Siluriformes, Cyprinodontiformes and Salmoniformes were the taxonomic orders with the largest proportion of high risk taxa is remarkable, as the same conclusion was reached at the smaller scale of the Iberian Peninsula (Alcaraz et al. 2005), hence suggesting that patterns of risk invasiveness may be consistent at different geographical scales.

Climate

As shown in the present review, RA areas consisted mainly of geo-political entities and, less often, bio-geographical units (Table 1). This is a logical outcome of RA studies whose purpose is to inform local managers and stakeholders about the risks involved in the introduction/translocation of (potentially invasive) taxa (e.g. Pheloung et al. 1999; Copp et al. 2009; Neal et al. 2017; Dodd et al. 2019). As a result, RA areas rarely coincide with definite climatic entities, except for those limited in geographical extent (e.g. basins, water bodies), which however are generally constrained by their intrinsically small size. This was evinced in the present study by the difficulties encountered in ‘teasing out’ the confounding effect of climate class with RA area, which in the case of all climate classes encountered (i.e. A, B, C and D) resulted in too small a sample size of taxa to allow computation of threshold values and related categorisation of risk levels. In this respect, further risk screening studies in tropical (i.e. class A) regions using AS-ISK would help fill the current gap in knowledge about the potential invasiveness of non-native freshwater fishes.

The significantly-higher mean FISK score and corresponding risk threshold found for climate B, relative to climates C and D, can be explained by the fact that most aquaculture and aquarium trade species are of tropical or warm origin and thus have less chance to thrive and establish in temperate or cold climates. For example, in the climate C and D regions of Japan, some of these species are reported only from sites with hot spring water inflows and industrial effluent (Japan Wildlife Research Center 2008). Also, based on temperature tolerances, only nine of 308 ornamental fish species investigated could potentially survive winter temperatures in the Great Lakes (Chapman 2000), hence similar to failed introductions of ornamental fishes in the Iberian Peninsula such as the tinfoil barb (Barbonymus schwanenfeldii, Cyprinidae) (Gante et al. 2008). Impacts can be quite severe in Mediterranean-type climate (class B) regions because of the native (especially endemic) biota are often naïve to introduced predators (Ribeiro and Leunda 2012; Weyl et al. 2014) and depauperate in species diversity (e.g. Murray-Darling Basin, Australia: Lintermans 2007). Such predatory pressure can be of concern for conservation, as these (Mediterranean) areas usually act as hot-spots of biodiversity for a highly-endemic fish fauna (Reyjol et al. 2007). Fish introductions, together with the availability of small-scale habitats (i.e. streams: Whiterod et al. 2015, 2017), have therefore resulted in local extirpations and fragmentation of native fish communities as well as high ecological impacts by predators such as largemouth bass, smallmouth bass (Micropterus dolomieu, Centrarchidae) and pikeperch (Ellender and Weyl 2014; van der Walt et al. 2016).

Lack of native predatory fish in many dry (class B) regions enhances the tendency for stocking alien predatory species, mainly for sport fishing—even though some of these introductions have eventually failed [e.g. Eurasian perch (Perca fluviatilis, Percidae) in South Africa] most likely due to species’ preferences for cooler waters (Ribeiro et al. 2009; Weyl et al. 2014). However, these same species may thrive in class B regions by taking advantage of disturbed aquatic environments such as reservoirs, where hydrological conditions are more stable/homogeneous. This has been largely documented for a variety of piscivorous species in Mediterranean fresh waters (Clavero et al. 2013), including northern pike (Esox lucius, Esocidae), largemouth bass, European catfish (Silurus glanis, Siluridae), European perch and pikeperch. By contrast, temperate (class C) climate regions are generally characterised by a rich ichthyofauna and fewer available niches, with freshwater aquaculture activities relying on a few species thereby leading to underestimation of the risks of translocations of native species (Musil et al. 2010). Finally, in cold (continental) climates (e.g. central-eastern Europe), introductions and aquaculture activities generally tend to include more (regionally) domesticated non-native but thermophilic species (i.e. common carp, silver carp and bighead carp) that are perceived as economically valuable (Varadi 2008), even though harsh winter conditions tend to reduce propagule pressure and establishment success of (sub)tropical species (Musil et al. 2010).

Conclusions

Overall, the most frequently-screened species were common carp and grass carp, followed by rainbow trout, silver carp and topmouth gudgeon. Of these, only common carp was amongst the ‘globally high risk’ species, but surprisingly topmouth gudgeon was not. This is despite the elevated risk posed by topmouth gudgeon, which is the only freshwater fish species for which the U.K. government established a national eradication programme (Britton et al. 2010). However, unlike common carp, topmouth gudgeon has not been assessed within any RA area with predominant climate class B, but only for RA areas with predominant or full climate class C and D, hence causing this species to fall outside the criteria set in the present study for global potential invasiveness. And a similar reasoning applies to gibel carp (Carassius gibelio, Cyprinidae), also high risk in all RA areas for which it was assessed (although not listed in the GISD). The other ‘globally high risk’ species were: black bullhead, brown bullhead, eastern mosquitofish, pumpkinseed, largemouth bass, round goby, Chinese sleeper, and pikeperch (species-specific discussion in Appendix A3 in Supplementary Material).

In virtually all cases and consistently so across all RA areas, FISK was able to distinguish accurately between invasive and non-invasive taxa to a greater degree than would be expected by chance alone, with ROC values that were significantly > 0.5. The global threshold score for distinguishing between species that pose a low-to-medium risk of being invasive and those of high risk, i.e. 15.5, provides a reliable basis for the evaluation of species invasiveness risk in an RA area for which no calibration was possible due to an insufficient number of assessments. This threshold score also represents an improvement over the past practice of using the original threshold score of 19, which was calibrated for GB as the RA area. Further, the observed patterns of certainty associated with responses to FISK questions appear to be a direct reflection of the available scientific information (both peer-reviewed and grey literature) about the species assessed—the most data-deficient information were related to the minimum population size (required to maintain a population) and dispersal-related factors (density-dependence, reliance on habitat connectivity).

In conclusion, the present study provides the means for existing risk rankings (using FISK) to be adjusted, providing a stronger evidence base for the categorisation of species, e.g. which ones to: (i) subject to a comprehensive risk assessment and possibly immediate management action (e.g. eradication, control) to avoid or minimise adverse impacts; (ii) restrict or ban with regard to importation and/or sale as ornamental or fishery enhancement species; (iii) include in policy and legislation regarding NNS; highlight for international coordination with neighbouring countries, especially transboundary drainage basins; and (iv) fine tune NNS risk assessment procedures for countries that encompass more than one climate class.