Abstract
The Random Forest (RF) algorithm is an ensemble of classification or regression trees, and is a widely used and high-performing machine learning technique. It is increasingly used for species distribution modelling (SDM). Many researchers use implementations of RF in the R programming language with default parameters to analyse species presence-only data together with background samples. However, there is good evidence that RF with default parameters does not perform well with such species “presence-background” data. This is often attributed to the typical disparity between the number of presence and background samples also known as class imbalance, and several solutions have been proposed.
Here, we first set the context: the background sample should be large enough to represent all environments in the region. We then aim to understand the drivers of poor performance of RF with presence-background data, and explain, test and evaluate suggested solutions. Using simulated and real species data, we compare performance of default RF with other weighting and sampling approaches.
We show that class overlap is an important driver of poor performance, alongside class imbalance. The results demonstrate clear evidence of improvement in the performance of RFs when class imbalance is explicitly managed by sampling methods or when the overfitting commonly associated with overlapping classes is avoided by forcing shallow trees.
Presence-background data is a particular version of class imbalance in which class overlap is highly likely and extreme imbalance exists. Without compromising the environmental representativeness of the sampled background, we show several approaches to fitting RF that ameliorate the effects of imbalance and overlap, and allow excellent predictive performance. Understanding the problems of RF in presence-background data allows new insights into how best to fit models, and should guide future efforts to best deal with such data.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Email: rvalavi{at}student.unimelb.edu.au