Elsevier

Ecological Modelling

Volume 162, Issue 3, 15 April 2003, Pages 211-232
Ecological Modelling

Evaluating predictive models of species’ distributions: criteria for selecting optimal models

https://doi.org/10.1016/S0304-3800(02)00349-6Get rights and content

Abstract

The Genetic Algorithm for Rule-Set Prediction (GARP) is one of several current approaches to modeling species’ distributions using occurrence records and environmental data. Because of stochastic elements in the algorithm and underdetermination of the system (multiple solutions with the same value for the optimization criterion), no unique solution is produced. Furthermore, current implementations of GARP utilize only presence data—rather than both presence and absence, the more general case. Hence, variability among GARP models, which is typical of genetic algorithms, and complications in interpreting results based on asymmetrical (presence-only) input data make model selection critical. Generally, some locality records are randomly selected to build a distributional model, with others set aside to evaluate it. Here, we use intrinsic and extrinsic measures of model performance to determine whether optimal models can be identified based on objective intrinsic criteria, without resorting to an independent test data set. We modeled potential distributions of two rodents (Heteromys anomalus and Microryzomys minutus) and one passerine bird (Carpodacus mexicanus), creating 20 models for each species. For each model, we calculated intrinsic and extrinsic measures of omission and commission error, as well as composite indices of overall error. Although intrinsic and extrinsic composite measures of overall model performance were sometimes loosely related to each other, none was consistently associated with expert-judged model quality. In contrast, intrinsic and extrinsic measures were highly correlated for both omission and commission in the two widespread species (H. anomalus and C. mexicanus). Furthermore, a clear inverse relationship existed between omission and commission there, and the best models were consistently found at low levels of omission and moderate-to-high commission values. In contrast, all models for M. minutus showed low values of both omission and commission. Because models are based only on presence data (and not all areas are adequately sampled), the commission index reflects not only true commission error but also a component that results from undersampled areas that the species actually inhabits. We here propose an operational procedure for determining an optimal region of the omission/commission relationship and thus selecting high-quality GARP models. Our implementation of this technique for H. anomalus gave a much more reasonable estimation of the species’ potential distribution than did the original suite of models. These findings are relevant to evaluation of other distributional-modeling techniques based on presence-only data and should also be considered with other machine-learning applications modified for use with asymmetrical input data.

Introduction

Predictive modeling of species’ distributions now represents an important tool in biogeography, evolution, ecology, conservation, and invasive-species management (Busby, 1986, Nicholls, 1989, Walker, 1990, Walker and Cocks, 1991, Sindel and Michael, 1992, Wilson et al., 1992, Box et al., 1993, Carpenter et al., 1993, Austin and Meyers, 1996, Kadmon and Heller, 1998, Yom-Tov and Kadmon, 1998, Corsi et al., 1999, Peterson et al., 1999, Peterson et al., 2000, Fleishman et al., 2001, Peterson and Vieglais, 2001, Boone and Krohn, 2002, Fertig and Reiners, 2002, Scott et al., 2002). These approaches combine occurrence data with ecological/environmental variables (both biotic and abiotic factors: e.g. temperature, precipitation, elevation, geology, and vegetation) to create a model of the species’ requirements for the examined variables. Primary occurrence data exist in the form of georeferenced coordinates of latitude and longitude for confirmed localities that typically derive from vouchered museum or herbarium specimens (Baker et al., 1998, Funk et al., 1999, Soberon, 1999, Ponder et al., 2001, Stockwell and Peterson, 2002a). Absence data are rarely available, especially in poorly sampled tropical regions where modeling may hold greatest value (Stockwell and Peters, 1999, Anderson et al., 2002a). The environmental variables typically examined in such modeling efforts encompass only relatively few of the possible ecological-niche dimensions (Hutchinson, 1957). Nevertheless, currently available digital environmental coverages (digitized computer maps) provide many variables that commonly influence species’ macrodistributions (Grinnell, 1917a, Grinnell, 1917b; Root, 1988, Brown and Lomolino, 1998).

The resulting model is then projected onto a map of the study region, showing the species’ potential geographic distribution (e.g. Chen and Peterson, 2000, Peterson and Vieglais, 2001). Models are generally based on the species’ fundamental niche (Hutchinson, 1957; including factors controlling distributions put forward in Grinnell, 1917b; see also MacArthur, 1968, Wiens, 1989, Morrison and Hall, 2002). Thus, some areas indicated by the model as regions of potential presence may be occupied by closely related species, or may represent suitable areas to which the species has failed to disperse or in which it has gone extinct. Rather than a drawback, however, this “overprediction” resulting from the niche-based nature of the models actually allows for synthetic evolutionary and ecological applications comparing potential and realized distributions (Peterson et al., 1999, Peterson and Vieglais, 2001; Anderson et al., 2002a, Anderson et al., 2002b).

The Genetic Algorithm for Rule-Set Prediction (GARP: http://biodi.sdsc.edu/; see http://beta.lifemapper.org/desktopgarp/ for software download) is an expert-system, machine-learning approach to predictive modeling (Stockwell and Peters, 1999). Genetic algorithms constitute one class of artificial-intelligence applications and were inspired by models of genetics and evolution (Holland, 1975). They have been applied to various problems not amenable to traditional computational methods because the search space of all possible solutions is too large to search exhaustively in a reasonable amount of time (Stockwell and Noble, 1992). Genetic algorithms present a heuristic solution to this dilemma by scanning broadly across the search space and refining solutions that show high values for the optimization (fitness) criterion. GARP has proven especially successful in predicting species’ potential distributions under a wide variety of situations (Peterson and Cohoon, 1999; Peterson et al., 1999, Peterson et al., 2001, Peterson et al., 2002a, Peterson et al., 2002b, Peterson et al., 2002c; Godown and Peterson, 2000, Sanchez-Cordero and Martinez-Meyer, 2000, Peterson, 2001, Elith and Burgman, 2002; Feria-A. and Peterson, 2002; Stockwell and Peterson, 2002a, Stockwell and Peterson, 2002b; but see Lim et al., 2002). Chen and Peterson (2000), Peterson and Vieglais (2001), and Anderson et al. (2002a) provide general explanations of the GARP modeling process and interpretation of potential distributions; see Stockwell and Noble (1992) and Stockwell and Peters (1999) for technical details.

GARP reduces error in predicted distributions by maximizing both significance and predictive accuracy, a novel goal for such analytical systems (Stockwell and Peters, 1999). The algorithm is largely successful in doing so without overfitting or overly specializing rules, which is especially important when models are based on occurrence data compiled without a fixed study design (Peterson and Cohoon, 1999). Owing to stochastic elements in the algorithm (such as mutation and crossing over; Holland, 1975, Stockwell and Noble, 1992), however, no unique solution is produced; indeed, the underdetermination of the system yields multiple solutions holding the same value for the optimization criterion. Hence, the variability among resulting models (typical of most machine-learning problems) requires careful examination of possible sources of error in order to select the most predictive models.

A common strategy for evaluating model quality has been to divide known localities randomly into two groups: training data used to create the model and an independent test data set used to evaluate model quality (Fielding and Bell, 1997, Fielding, 2002). One-tailed χ2-statistics (or binomial probabilities, if sample sizes are small) are often employed to determine whether test points fall into regions of predicted presence more often than expected by chance, given the proportion of map pixels predicted present by the model (e.g. Peterson et al., 1999, Anderson et al., 2002a). These tests using independent test data thus provide extrinsic measures of model significance (departure from random predictions). However, by excluding part of the data set from the model-building stage, the algorithm cannot take advantage of all known locality records. Clearly, an optimal model would incorporate data from all available records of the species.

One tactic for managing the variability among models has been to make multiple models and determine how many models predict particular pixels as present (Anderson et al., 2002a, Lim et al., 2002; Peterson et al., unpublished data). Anderson et al. (2002a) tempered among-model variation by making three GARP models per species and creating a composite prediction based on all three models. In further analyses, map pixels predicted present by at least two of the models were then considered “predicted presence”. Similarly, Lim et al. (2002) created five models per species and deemed pixels predicted by three or more of them as predicted presence in subsequent analyses. More recently, Peterson et al. (unpublished data) have made larger numbers of models and summed them (for each model, value of 1 for a pixel of presence; value of 0 for predicted absence). In such an approach, the value of a pixel in the composite (summed) map thus equals the number of models predicting presence in that cell. Summing models may reveal a consistent signal that holds up across many different independent random walks of model generation. The above methods weigh all model replicates equally; in contrast, we herein compare such equal-weight tactics with a best-subsets approach.

Two types of error are possible in predictive models of species’ distributions: false negatives (omission error or underprediction) and false positives (commission error or overprediction). The relative proportions of these errors are typically expressed in a confusion matrix, or error matrix (Fielding and Bell, 1997). Four elements are present in a confusion matrix (Table 1). Element a represents known distributional areas correctly predicted as present, and d reflects regions where the species has not been found and that are classified by the model as absent. Thus, a and d are considered correct classifications; in contrast, c and b are usually interpreted as errors. Element c denotes omission: pixels of known distribution predicted absent by the model. Conversely, b is a measure of areas of absence (or “pseudo-absence”—see below) incorrectly predicted present (commission). Unfortunately, when known presence points are few in number and true absence points are not available, problems arise with some measures derived from the confusion matrix (Fielding and Bell, 1997).

GARP creates a confusion matrix by intrinsically re-sampling map pixels with replacement. First, 1250 map pixels are chosen randomly with replacement from those pixels holding localities of known occurrence (training points). The quantity a is the number of those pixels that coincide with areas of predicted presence; the number falling outside the prediction equals c. Thus, a+c=1250 for GARP models in which all pixels are predicted as either present or absent (in some models, the rule-set may not make a decision for every pixel; such pixels are then coded as “no data” in the prediction—see below). Likewise, 1250 pixels are re-sampled with replacement from the remaining pixels of the study area (any pixels without confirmed presence data in the training set). These pixels are referred to as background points or pseudo-absence points (Stockwell and Peters, 1999), highlighting the difference between models based on typical biodiversity information (positive occurrence records from zoological museums or herbaria, as here) and those that also include true absence data (e.g. Corsi et al., 1999, Fertig and Reiners, 2002). Background pixels that fall into regions of predicted presence yield b, whereas background pixels of predicted absence produce d; b+d=1250 for models with a presence/absence prediction for all pixels (but less if not all cells are predicted either present or absent).

As mentioned above, distributional-modeling algorithms like GARP are often used with only presence data. For most species, data regarding absence are not available (Stockwell and Peters, 1999, Peterson, 2001). In addition, when a potential distribution based on the species’ fundamental niche is desired, use of absence data could adversely affect the model-building process by inhibiting inclusion of areas that hold suitable environmental conditions where the species is not present due to historical restrictions or biological interactions (Peterson et al., 1999, Anderson et al., 2002b). However, despite the practical necessity and theoretical justification for using only presence data in modeling ecological niches, this asymmetry in input data (errors in pseudo-absences but not in presences) requires that interpretation of the confusion matrix be amended. In such cases, whereas element c represents pure omission error, element b includes the contributions of both true and apparent commission error.

Apparent commission error derives from potentially habitable regions correctly predicted as presence, but that cannot be demonstrated as such because no verification of the species exists there. The lack of verification of the species may have various causes (Karl et al., 2002). In certain cases, some areas lacking documentation of the species stem from historical causes or biotic interactions (Peterson, 2001). For example, disjunct areas of potential habitat with no records of the species often correspond to historical restrictions or the historical effects of speciation (e.g. failure of the species to disperse to a region of suitable habitat; Peterson et al., 1999, Peterson and Vieglais, 2001, Anderson et al., 2002a). Similarly, competition between related species showing parapatric distributions likely restricts many species’ realized distributions (Peterson, 2001, Anderson et al., 2002b). Other biological interactions—such as predation in some parts of the potential range but not in others—may also limit some species’ distributions. In addition to historical and biotic causes, apparent commission error can also derive from inadequate sampling: map pixels of real presence (at least at some time of the year in some subhabitat) lacking documentation of the species because they have not been adequately sampled by biologists (Karl et al., 2002). This latter form of apparent commission error has recently been recognized in presence/absence data sets where inventories were extensive yet incomplete (Boone and Krohn, 1999, Karl et al., 2000, Schaefer and Krohn, 2002, Stauffer et al., 2002). By definition, it reaches maximum manifestation in presence-only modeling applications like current implementations of GARP. As the goal of presence-only potential-distribution modeling is to determine which of the background (pseudo-absence) pixels actually represent suitable areas for a species—whether or not it actually inhabits them—interpreting measures of commission is critical.

One measure of overall model performance is the correct classification rate of Fielding and Bell (1997) (see Table 2). GARP provides an intrinsic correct classification rate derived from the confusion matrix: (a+d)/(a+b+c+d)—equal to the “accuracy” of Stockwell and Peterson (2002b), not that of Anderson et al. (2002a). This quantity ranges from 0 to 1 and is designed to measure overall model adequacy, including contributions of both omission and commission in the denominator. Note that, correct classification rate = (1 minus sum of error terms)/(sum of all terms). However, because element b is overestimated by the preponderance of background (pseudo-absence) pixels, this statistic is necessarily biased with data sets that lack true absence data (common with biodiversity information; Peterson, 2001, Ponder et al., 2001, Stockwell and Peterson, 2002a). Likewise, the overall Kappa (κ)-statistic of Fielding and Bell (1997) includes elements of both omission and commission and thus suffers from the same problem (see also Fielding, 2002).

The χ2-statistic based on independent test data can be used as an extrinsic measure of overall performance, because it incorporates both omission (of test points) and commission (via expected frequencies; Table 2). However, this statistic is highly sensitive to the proportional extent of predicted presence, making highly significant results possible with unacceptably high omission rates (e.g. models that only include the core ecological distribution of the species). In addition, χ2-significance values are related to sample size (Peterson, 2001). Hence, it is likely that neither correct classification rates, κ-statistics (both potentially intrinsic), nor χ2-significance values (typically extrinsic) represent reliable measures of overall model performance.

To assess model performance more adequately, other indices that provide intrinsic estimates of each error component can be derived from the confusion matrix (Table 2; reviewed in Fielding and Bell, 1997). The quantity c/(a+c) represents the intrinsic omission error rate, and b/(b+d) represents what we here term the intrinsic commission index (false negative and false positive rates, respectively, of Fielding and Bell (1997)). The intrinsic omission error reflects the proportion of known localities (training points) that fall outside the predicted region (by re-sampling with replacement to produce the confusion matrix). The intrinsic commission index mirrors the proportion of pixels predicted present by the model (proportion of re-sampled background points falling into regions of predicted presence). Owing to the general scarcity of confirmed presence data, however, this latter index includes contributions of (1) true commission error (overprediction) as well as of (2) apparent commission error (correctly predicted areas not verifiable as such, primarily because of the lack of adequate sampling). The aim of predictive modeling is precisely to determine this latter quantity, as well as the geographic distribution of those pixels. To emphasize the dual nature of b/(b+d), we term it the intrinsic commission index rather than intrinsic commission error. One of our aims is to discriminate between its two components.

Extrinsic measures of omission and commission exist parallel to the respective intrinsic ones (Table 2). Where outtest=the number of test points falling outside predicted areas and ntest=the number of test points, outtest/ntest represents extrinsic omission error. Likewise, the proportion of pixels predicted present can serve as an extrinsic commission index. In fact, because the number of training points is usually extremely small in comparison with the number of background pixels in the overall study region, the intrinsic commission index will converge on this extrinsic measure with adequate re-sampling.

In the present study, we evaluate model performance based on both intrinsic and extrinsic criteria, with the goal of identifying optimal models based on intrinsic measures only. If that were possible, optimal models could then be identified even when generated using all known locality data. We approach this problem by examining measures of omission and commission, as well as composite indices designed to reflect both quantities. Because measures of commission are dependent on the proportional extent of areas potentially inhabitable by the species within the study region, we examine in detail three cases whose modeled ecological niches show geographic manifestations occupying varying proportions of the respective study areas. Current implementations of GARP represent the modification of a general algorithm for the specific case of presence-only (generally museum) data. The present research is also germane to evaluation of other distributional-modeling techniques that use presence-only data. In addition, it may be broadly relevant to machine-learning applications with asymmetrical input data (asymmetrical errors).

Section snippets

Study species

The spiny pocket mouse Heteromys anomalus (Heteromyidae) is a common, medium-sized rodent (50–100 g) that is widespread along the Caribbean coast of South America in northern Colombia and Venezuela, as well as on the nearby islands of Trinidad, Tobago, and Margarita. It has been documented in deciduous forest, evergreen rainforest, cloud forest, and some agricultural areas, typically from sea level to approximately 1600 m (Anderson, 1999, unpublished data; Anderson and Soriano, 1999). We examine

Composite measures of performance

Extrinsic performance measures (χ2) were almost always significant. Seventeen of the 20 models for H. anomalus showed significant deviations from random predictions, in the desired direction (χ2 for significant models=4.07–16.95; P<0.05; one-tailed critical value χ1,0.052=2.706; the other three models showed non-significant departures in the desired direction). All models were highly significant for both M. minutus (χ2=177.02–684.74; P⪡0.05) and C. mexicanus (χ2=42.29–164.50; P⪡0.05). The

Measures of overall performance

Considerable variation was present among GARP models, as predicted by the theoretical background of genetic algorithms (Holland, 1975) and indicated by previous work (e.g. Anderson et al., 2002a). Thus, the algorithm generally performed as expected under this domain. Below, however, we consider issues regarding error quantification in this special case of presence-only data. Furthermore, we explore relationships between various indices and expert-judged model quality.

Neither extrinsic nor

Conclusions and recommendations

In the terminology of genetic algorithms, modification of GARP for use with presence-only occurrence data can result in a highly atypical fitness surface. When visualized in omission/commission space, the repercussions of pseudo-absences sometimes create a fitness ridge, rather than the typical global fitness peak. For GARP distributional models, this ridge is likely present for most species having medium-to-large potential distributions in the study region. Solutions along the ridge show

Acknowledgements

This work has been supported by a Grant in Aid of Research (American Society of Mammalogists) and a Roosevelt Postdoctoral Research Fellowship (American Museum of Natural History) to RPA; Subvención CONICIT (S2-2000002353) to DL; and National Science Foundation grants to ATP. Funding sources supporting Anderson’s systematic research on Heteromys appear in the relevant taxonomic works. Vı́ctor Sánchez-Cordero, Mark E. Stahl, Robert S. Voss, Marcelo Weksler, and two anonymous reviewers read

References (73)

  • R.P. Anderson et al.

    The occurrence and biogeographic significance of the southern spiny pocket mouse Heteromys australis in Venezuela

    Z. Sauget.

    (1999)
  • R.P. Anderson et al.

    Geographical distributions of spiny pocket mice in South America: insights from predictive models

    Glob. Ecol. Biogeogr.

    (2002)
  • R.P. Anderson et al.

    Using niche-based GIS modeling to test geographic predictions of competitive exclusion and competitive release in South American pocket mice

    Oikos

    (2002)
  • AOU, 1998. Check-List of North American Birds, 7th ed. American Ornithologists’ Union, Washington, DC, 829...
  • August, P.V., 1984. Population ecology of small mammals in the llanos of Venezuela. In: Martin, R.E., Chapman, B.R....
  • R.J. Baker et al.

    Bioinformatics, museums, and society: integrating biological data for knowledge-based decisions

    Occas. Pap. Mus. Tex. Tech Univ.

    (1998)
  • O. Bangs

    List of the mammals collected in the Santa Marta region of Colombia by W.W. Brown, Jr

    J. Proc. N. Engl. Zool. Club

    (1900)
  • R.B. Boone et al.

    Modeling the occurrence of bird species: are the errors predictable?

    Ecol. Appl.

    (1999)
  • Boone, R.B., Krohn, W.B., 2002. Modeling tools and accuracy assessment. In: Scott, J.M., Heglund, P.J., Morrison, M.L.,...
  • E.O. Box et al.

    A climatic model for location of plant species in Florida, USA

    J. Biogeogr.

    (1993)
  • Brown, J.H., Lomolino, M.V., 1998. Biogeography, 2nd ed. Sinauer Associates, Sunderland, MA, 691...
  • J.R. Busby

    A biogeoclimatic analysis of Nothofagus cunninghamii (Hook.) Oerst. in southeastern Australia

    Aust. J. Ecol.

    (1986)
  • M.D. Carleton et al.

    Systematic studies of oryzomyine rodents (Muridae, Sigmodontinae): a synopsis of Microryzomys

    Bull. Am. Mus. Natl. Hist.

    (1989)
  • G. Carpenter et al.

    DOMAIN: a flexible modelling procedure for mapping potential distributions of plants and animals

    Biodivers. Conserv.

    (1993)
  • G.-J. Chen et al.

    A new technique for predicting distribution of terrestrial vertebrates using inferential modeling

    Zool. Res.

    (2000)
  • F. Corsi et al.

    A large-scale model of wolf distribution in Italy for conservation planning

    Conserv. Biol.

    (1999)
  • A. Dı́az de Pascual

    Aspectos ecológicos de una microcomunidad de roedores de selva nublada, en Venezuela

    Bol. Soc. Venez. Cienc. Nat.

    (1988)
  • A. Dı́az de Pascual

    The rodent community of the Venezuelan cloud forest, Mérida

    Polish Ecol. Stud.

    (1994)
  • Elith, J., Burgman, M., 2002. Predictions and their validation: rare plants in the central highlands, Victoria,...
  • ESRI, 1998. ArcView GIS, version 3.1. Environmental Systems Research Institute Inc., Redlands,...
  • Feria-A., T.P., Peterson, A.T., 2002. Prediction of bird community composition based on point-occurrence data and...
  • Fertig, W., Reiners, W.A., 2002. Predicting presence/absence of plant species for range mapping: a case study from...
  • Fielding, A.H., 2002. What are the appropriate characteristics of an accuracy measure? In: Scott, J.M., Heglund, P.J.,...
  • A.H. Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence/absence models

    Environ. Conserv.

    (1997)
  • E. Fleishman et al.

    Modeling and predicting species occurrences using broad-scale environmental variables: an example with butterflies of the Great Basin

    Conserv. Biol.

    (2001)
  • V.A. Funk et al.

    Testing the use of specimen collection data and GIS in biodiversity exploration and conservation decision making in Guyana

    Biodivers. Conserv.

    (1999)
  • Cited by (967)

    View all citing articles on Scopus
    View full text