Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical  and convenient non-linear classifier

David Steyrl; Reinhold Scherer; Josef Faller; Gernot R. Müller-Putz

doi:10.1515/bmt-2014-0117

Publicly Available Published by De Gruyter April 1, 2015

Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier

David Steyrl , Reinhold Scherer , Josef Faller and Gernot R. Müller-Putz

From the journal Biomedical Engineering / Biomedizinische Technik

https://doi.org/10.1515/bmt-2014-0117

Abstract

There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate – for the first time – that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.

Keywords: brain-computer interfaces; machine learning; random forests; regularized linear discriminant analysis; sensorimotor rhythms

Introduction

People with severe paralysis through, for example, amyotrophic lateral sclerosis or brain-stem stroke, have few remaining options to communicate with their environment. One option is brain-computer interfaces (BCIs) [6, 23, 28, 34, 39, 47]. BCIs are devices that open a direct information path from the human brain to a computer. Many BCI applications are conceivable [48]. For example, BCIs are used to control devices such as motor neuro-prostheses for restoring grasp and reaching movements in people with high spinal cord injury [22, 31, 32, 35, 38].

One type of non-invasive BCI is based on classifying sensorimotor rhythm (SMR) patterns in the electroencephalogram (EEG). Individuals can modulate these SMR patterns through motor imagery (MI) [35, 36]. MI does not depend on external stimuli and it can typically be used by able-bodied and motor-impaired individuals alike [25]. Unambiguous assignment of SMR patterns and corresponding MI tasks is challenging because induced patterns vary within, and between, individuals with respect to amplitudes, frequency bands, spatial distribution, and timing. Machine learning is commonly used to face this challenge [9]. The typical machine learning based approach is to collect an individual’s EEG prior to BCI use in so-called training or calibration runs and to identify MI task specific SMR patterns. Subsequently, a classifier is trained to detect these patterns. The classifier assigns labels to patterns. Labels are translated into commands and online classification then drives an application.

In the last decade, a debate took place about the optimal classification method for this challenge [26, 29]. In summary, there was agreement that SMR patterns consist of a multitude of possible features for classification, but commonly only a few, typically <100, trials per MI task are available due to time restrictions. Consequently, statistical properties of SMR patterns are poorly estimated and estimates are susceptible to noise and outliers. Therefore, complexity of the classification method should be restricted to reduce the risk of overtraining, with the consequence that today linear discriminant analysis (LDA) is by far the most common classifier in BCI research [26]. Further, noise and outliers should be tackled by regularization. Surprisingly, regularized classifiers are much less used than their non-regularized counterparts [26].

However, there was also agreement that non-linear classifiers can provide better results. Unfortunately, non-linear methods often have a number of parameters that must be very carefully chosen for good results, which is particularly hard at low trials-to-features ratios as they are characteristic for SMR-BCIs.

What is wanted by BCI researchers is a practical classifier that offers a complex model for capturing non-linear relationships; a classifier that is overtraining resistant even at low trials-to-features ratios and that is resistant against outliers and noise. In addition, advanced data mining capabilities would be highly appreciated to get insights into the data’s structure.

A candidate for such a convenient method is the random forest (RF) classifier [11]. This is an ensemble classifier, which consists of many, typically of hundreds, decision tree classifiers [11]. RF classifiers achieve highest classification accuracies on various data and are very popular in many fields of science, for example, microarray gene expression data analysis [27, 45]. Although some works on RF classifiers for non-invasive BCIs were published over the last decade [1, 2, 5, 17, 42–44, 46], RF classifiers are still not common in the BCI field.

One reason is that RF classifiers are not yet implemented in the typical BCI research toolboxes. Beside the missing implementations, there are also open research questions that hinder a broader use of RF classifiers in SMR based BCIs. We identified three major open questions: (i) whether, and how, have the RF classifiers’ parameters influence on the performance in SMR-BCIs? (ii) Can the promising offline results be transferred to online [42, 43]? (iii) How do RF classifiers perform in combination with and compared to standard SMR-BCI methods such as common spatial patterns (CSP) filtering and LDA classifiers [40]?

In this work, we answer these open questions to provide a basis for a broader use of RF classifiers in the BCIs research field. Particularly, we answer the questions for two class SMR-BCIs at low trials-to-features ratio scenarios (ratio approximately 1). Further, this work aims to provide evidence that RF classifiers are a practical and convenient non-linear method and we want to stimulate a debate and an exchange of opinions about the optimal classification method in non-invasive BCI research.

Materials and methods

The three questions in the introduction that arose are dependent on each other. For example, results of the parameters influences are used for comparing the performance of RF and LDA and are necessary for answering the question about online performance. Because of that, we have to anticipate particular results when describing the methods.

To guide the reader through the work, we give here an overview of the upcoming sections. “Decision trees and random forests” gives an overview on RF classifiers. The following sections are about parameters influences on RF performance (question i); online applicability and performance (question ii); and provide the RF classifier’s combination with CSP and comparison with LDA (question iii). Finally, “Conclusion and outlook” concludes the work and gives an outlook on further capabilities of the RF classifier.

Decision trees and random forests

Decision tree classifiers, or in short trees, have been known for a long time. Trees have a sound theoretical foundation and they provide an easily interpretable model [12]. However, building optimal tree classifiers is still not satisfactorily solved. Other methods achieve better classification accuracies on various data. An analysis of this behavior, error decomposition in bias and variance, showed that trees typically have low bias, but a high variance, which means they tend to overtrain the data [11].

Bagging, an acronym for “bootstrap aggregating”, is a method that uses many classifiers of the same type to reduce the overall variance [10]. Bagging consists of two steps: (i) taking repeated bootstrap samples from all available training data and build a separate classifier on each sample, and (ii) letting each of the classifiers vote for a class and take the class with the highest vote number as the prediction of bagging (majority voting). Bagging is particularly successful in reducing variance if base classifiers (trees) are uncorrelated. Hence, all base classifiers should be different. This relation can be described with:

(1)PE≤ρ¯(1-s2)s2, (1)

where PE denotes the prediction error, ρ̅ denotes the average correlation between the base classifiers and s denotes the accuracy of the base classifiers [11]. Decision trees are particularly suitable for bagging because they are unstable classifiers [10]. Unstable means that small variability in the training set can cause high variations in the classifiers model. In other words, tree classifiers, although built with similar training sets, are typically different (uncorrelated). One can further reduce correlation between trees by introducing additional randomness in the decision tree learning process. Binary decision tree learning is carried out as following (cf. Figure 1, modified from [12]): (i) the procedure is started with all available training data at a so-called root node. (ii) Then the best split based on training data values is calculated to differentiate training data of the node into two subgroups as pure as possible (child nodes). (iii) The procedure of (ii) is repeated for each child node (i.e., it is a recursive procedure) until each destination node contains trials of one class only. The mentioned additional randomness can be introduced by limiting available data dimensions for calculating splits to a separate random subset at each node [19]. This means that the learning process forms very different trees, even if two trees get similar bootstrap samples.

Figure 1:

Panel (A) shows a possible binary decision tree and panel (B) shows the associated partitioning of the feature space. Modified from [12].

As many trees form a forest and randomness is used to reduce the correlation between the trees, this type of classifier is called random forest and it was invented by Leo Breiman [11]. In summary, the RF method relies on many low bias but high variance, uncorrelated decision trees and reduces variance by majority voting to form a highly accurate ensemble classifier.

In the present work, Jaiantilal’s RF libraries for MATLAB (Mathworks Inc., Natick, MA, USA) were used [20].

Parameter’s influences on random forest performance

RF classifiers have two parameters that strongly impact their accuracy and computational effort: (i) the number of tree classifiers to be built, and (ii) the number of randomly chosen data dimensions per node [11]. For example, a sufficient number of tree classifiers is necessary to reduce the overall variance and to guarantee accurate RF classifiers. However, the computational effort increases with the number of trees [11]. Hence, a trade-off is necessary. We conducted a cross-validation grid search over different parameter combinations on available BCI data to determine the parameters’ influence.

Data description:

EEG recordings from ten healthy participants were analyzed (source: Müller-Putz et al. [31]). For details on paradigm, participants and recording settings see [31]. In summary, the participants sat in a comfortable armchair in front of a computer monitor (∼1 m distance) where instructions were displayed. EEG was measured during sustained kinesthetic imaginations of three different movement conditions: right hand, left hand, and feet movements [33]. In this analysis, we just considered right hand and feet MI trials, as the focus in this work is on binary classification; 160 trials were collected in a single session. The session was divided into eight consecutive runs. Between the runs were short breaks ranging from 2–5 min. One run consisted of ten trials per condition. Over all, 80 trials per condition were available. The sequence of condition types was pseudo randomized. White colored arrows (cue) indicated the task to be performed. A right-pointing arrow indicated right hand MI and a downward-pointing arrow indicated feet MI.

The trial was started with a black screen. At 0 s, a cross appeared in the middle of the screen. Then, at 2 s, a beep sounded to catch the participant’s attention. Starting at 3 s, the cue was shown for 1.25 s. The participants were requested to perform MI for a period of 5 s. Finally, the cross was removed, the screen was black again and stayed black for a random period of 0.5–2.5 s (cf. Figure 2A). Participants did not receive feedback during the recording; 32 EEG channels were recorded, using Ag/AgCl electrodes, a sampling rate of 1000 Hz, bandpass filtering between 0.5 and 200 Hz and a notch filter at 50 Hz.

Figure 2:

(A) Course of a trial as used for determining the parameter influences (no feedback). (B) Course of a trial as used for demonstrating the online applicability of RF classifiers (no feedback during training, feedback during evaluation).

Automated outlier rejection:

Outliers in the data were rejected on the basis of an automated algorithm [16]. The outlier rejection operated in two phases. The first phase worked on the band filtered EEG of 15 monopolar channels that composed three Laplacian derivations at C3, Cz, and C4 (cf. Figure 3) [18]. Based on Delorme et al. [15], the rejection algorithm removed trials by thresholding amplitude and the statistical measures kurtosis and probability. In the second phase, the system iteratively removed trials based on the distribution of their average logarithmic power in six predefined frequency bands (4–9 Hz, 8–13 Hz, 12–17 Hz, 16–24 Hz, 23–31 Hz, and 30–38 Hz for the Laplacian derivations at C3, Cz, and C4) and time segments (reference period, second 1–2 and activity period from second 4 to 8) within the trials. The algorithm computed (i) mean and standard deviation separately for all features in one time segment and (ii) removed the one trial whose value in one feature lay farthest outside 3 times the standard deviation. The algorithm repeated steps (i) and (ii) until no trials had values outside 3 times the standard deviation. The algorithm rejected on average 11.9±4.5 (SD)% of the 160 trials per participant.

Figure 3:

Small Laplacian electrode placing scheme centered at C3, Cz, and C4. Distances between neighboring electrodes were 2.5 cm.

Feature extraction:

SMR patterns of different MI tasks can be distinguished by their power spectral densities (PSDs) [35, 36]. In detail: (i) starting with the raw EEG of outlier free trials, Laplacian derivations of the positions C3, Cz, and C4 were calculated (cf. Figure 3) [18]. (ii) Then a 1 s long window was cut from each trial. The window started 1.5 s after cue presentation onset because differences in SMR patterns are typically highest in this time segment [41]. (iii) For approximating PSDs, a discrete Fourier transform (DFT) was applied to each window. (iv) DFT magnitudes from 1 to 40 Hz of each Laplacian channel were concatenated into one feature vector. Hence, for the classifiers, one trial was a vector of 120 values (three Laplacian channels×40 DFT magnitudes).

Parameters assessment:

For each participant, both the number of trees (N_t=10, 20, 50, 100, 200, 500, 1000, 2000, 5000) and the number of randomly chosen data dimensions per node (N_d=1–25 in steps of two) were varied. A 10 times 10-fold cross-validation was carried out for each combination of trees and data dimensions. In each fold, the trials-to-features ratio was typically 128.7/120=1.07.

Online SMR-BCI using a random forest classifier

Although a first online application of RF classifiers in a P300-BCI was reported in [1], it has not been shown yet that RF classifiers can reliably perform online classification in SMR-BCIs. Hence, whether offline classification results of SMR-BCI data can also be achieved in online operation. We carried out a two class online SMR-BCI experiment with RF classifiers controlling a simple feedback.

Participants and experimental design:

The experiment was carried out in compliance with the World Medical Association Declaration of Helsinki and was approved by the local ethics committee. Fourteen volunteers (aged between 20 and 30 years, five female, and nine BCI novices) participated in this study. They had no known medical or neurological diseases, were paid for their participation and gave written informed consent. The experimental design was similar to the one in [31] and as summarized in “Data description”.

One session per participant was recorded on a single day. The session consisted of eight consecutive runs with short breaks between them. The first five were carried out for collecting training data and the last three for online evaluation. One run was composed of 20 trials. Taken together, 100 trials were recorded for training and 60 trials for evaluation. Participants were instructed to perform sustained kinesthetic MI during the imagery period [33]. At 0 s, a white colored cross appeared on screen, 2 s later a beep sounded to catch the participant’s attention. The cue was displayed from 3 s to 4 s. Participants were instructed to start with MI as soon as they recognized the cue and to perform the indicated MI until the cross disappeared at 8 s. A rest period with a random length between 2 s and 3 s was presented between trials. Participants did not receive feedback during training (cf. Figure 2B). During the last three runs, feedback was displayed in form of a white colored bar (cf. Figure 2B). Only positive feedback was displayed to motivate the participants [4] and to prevent non-stationarities due to frustration [21]. Here, positive feedback means that the direction of the feedback bar was always the same as the direction of the cue in the respective trial. The length of the bar was proportional to the number of correctly classified samples over the last second. The more correct classifications, the longer the feedback bar. If the other class was detected, the majority of the last second’s labels were the other class, the length of the feedback bar was set to zero.

EEG measurement:

EEG was recorded using a bio-signal amplifier with 15 active Ag/AgCl electrodes (g.USBamp, g.LADYbird, Guger Technologies OG, Schiedlberg, Austria), the tools4BCI Signal Server signal acquisition software and MATLAB/SIMULINK [13]. Center electrodes were placed at positions C3, Cz, and C4. Four additional electrodes were placed around each center electrode (one anterior, one posterior, one left lateral, one right lateral). The distance from the center electrode to one of the neighboring electrodes was 2.5 cm (cf. Figure 3). The reference electrode was placed on the left mastoid and the ground electrode on the right mastoid. A sample rate of 512 Hz was used. A band pass filter (8th order Butterworth filter with cutoff frequencies at 0.1 and 200 Hz) and a 50 Hz notch filter were applied.

Automated outlier rejection:

After collecting training trials in the first five runs, the same trial based outlier rejection was carried out as described in “Automated outlier rejection”. The algorithm rejected on average 6.6±2.2 (SD)% of the training trials per participant.

BCI set up:

After outlier rejection, the SMR-BCI was set up. The same feature extraction as in the parametrization section (“Feature extraction”) was applied to preserve consistency. In short: (i) starting with the 15 raw EEG channels, Laplacian derivations at positions C3, Cz, and C4 were calculated. (ii) For each trial, windows of 1 s length, starting 1.5 s after cue onset were cut out. (iii) DFT was applied to windows for PSD estimation. (iv) DFT magnitudes from 1–40 Hz of each channel were concatenated to feature vectors. On average, the trials-to-features ratio was 93.4/120=0.78. Subsequently, a RF classifier was trained. The RF classifier’s parameters were chosen based on the results and conclusion of the parametrization evaluation carried out in this work (cf. “Automated outlier rejection” and “Parameters influences on random forest performance” in the Results and Discussion) and were set to 1000 trees and 11 randomly chosen data dimensions {round[sqrt(120)]=11} accordingly.

Online data processing:

Online processing of the last three runs followed the scheme depicted in Figure 4. In summary, moving windows with 1 s length were cut out from the three Laplacian derivations of the EEG. Then, DFT was applied to the each window and the feature vector was assembled by concatenating DFT magnitudes from 1 to 40 Hz of each Laplacian channel. Subsequently, the RF classifier assigned a label to each window. The feedback was controlled by the number of correct labels over the last second. The calculation of the DFT and the classification was repeated 16 times per s, which allowed providing smooth feedback.

Figure 4:

Signal processing scheme for the online detection of motor imagery with RF classifiers.

Performance metrics:

As our focus is on binary classification and the number of classes was equal, we chose classification accuracy as metric. A classification accuracy course per participant was calculated by averaging the classification results for each time point over all validation trials. The highest classification accuracy per participant during the feedback period is called peak accuracy. To indicate stability and sustainability of the classification accuracy, the median classification accuracy over the 4 s feedback period is also reported and is called median accuracy accordingly.

BCI simulation: random forest vs. regularized LDA

Some of the current SMR-BCI systems are based on CSP filtering and LDA classifiers [3, 8, 9, 40]. CSP filtering is applied to counteract the effects of volume conduction of the head and is a spatial filter that weights channels with the goal to maximize differences in variances of SMR patterns of MI tasks. This weighting improves separability, and therefore, in general, also accuracy of SMR-BCIs.

Although first works on the combination of CSP filtering and RF classifiers are available [2, 5], a direct comparison with the common LDA classifier on CSP filtered features is missing. We performed a BCI simulation with the data of the online SMR-BCI to provide such a comparison. We created a low-trials-to-features ratio scenario by using a modified filter bank CSP approach [3] and simulated the application of RF classifiers and regularized LDA classifiers. Shrinkage regularization was chosen because a comfortable, analytic way to determine the regularization parameter is known [7], and therefore, the usage of this type of regularization increases. In this work, analytic shrinkage regularized LDA is abbreviated as sLDA.

BCI simulation procedure:

In BCI simulation, the data of each participant were divided in two parts. The first part (runs 1–5) was used to train CSP filters and classifiers, whereas the second part (runs 6–8) was used for validation. The signal processing was as follows: (i) a filter bank of 8th order Butterworth band-pass filters divided the raw EEG data into 15 overlapping sub-bands (6–8 Hz, 7–9 Hz, 8–10 Hz, 9–11 Hz, 10–12 Hz, 11–13 Hz, and 12–14 Hz in the alpha band and 14–19 Hz, 17–22 Hz, 20–25 Hz, 23–28 Hz, 26–31 Hz, 29–34 Hz, 32–37 Hz, and 35–40 Hz in the beta band). Overlaps were chosen to prevent gaps in the filter bank frequency range as the frequencies given above are 3 dB cut off frequencies. (ii) Then, a separate set of CSP filters was calculated for each sub-band using the training data. The spatial filters according to the three highest and three lowest eigenvalues of each set of CSP filters were selected. Hence, one CSP calculation per sub-band and six filters per CSP resulted in 90 virtual channels. (iii) Logarithmic power of each of the 90 virtual channels was used as feature for classification. This implies a trials-to-features ratio of 100/90=1.11. The power was estimated by squaring and subsequent averaging over windows with a length of one second. Then the logarithm was applied to the power values. (iv) Subsequently, sLDA and RF classifiers were trained. The RF classifier’s parameters were set to 1000 trees and nine randomly chosen data dimensions according to the results of the parameters evaluation in this work. The classifiers’ trainings were carried out using features from a fixed window, starting 1.5 s after cue presentation onset and 1 s length. (v) After training, the band-pass filtered EEG data of runs 6–8 were CSP filtered and divided into overlapping windows with 1 s length to simulate a sample-by-sample online processing. Subsequently, logarithmic power was calculated for each selected CSP channel of each window (90 features per window). These features were then classified separately by the two classification methods. The same performance metrics as in the online section of this work are reported.

Results

Parameters influences on random forest performance

Best cross-validation accuracies per participant are shown in Table 1. They are between 54% (S10) and 89% (S1). Within the range of examined parameter values (number of trees=10, 20, 50, 100, 200, 500, 1000, 2000, 5000; number of randomly chosen data dimensions=1–25 in steps of two), the combination of 2000 trees and 25 randomly chosen data dimensions performed best with an average accuracy of 68.9% (cf. Figure 5). The combination of ten trees and one randomly chosen dimension performed worst with an average accuracy of 55.7% (cf. Figure 5). Separate statistical comparisons between these highest average accuracy and every other computed average accuracy were carried out (116 two-tailed t-tests, significance level=0.05, Bonferroni-Holm corrected for multiple comparisons). Figure 5 shows the comparison’s results. In that figure, the region with white background indicates parameter combinations that performed statistically not different to the best parameter combination. The region with black background indicates parameter combinations that performed statistically different (worse) than the best combination.

Figure 5:

The figure shows binary cross-validation accuracies in percent for different combinations of number of trees and number of randomly chosen data dimensions per node for random forest classifiers. The region with a white background indicates combinations that performed statistically not different to the best combination (two-tailed t-tests, significance level=0.05, Bonferroni-Holm corrected for multiple comparisons). The best combination was 2000 trees and 25 randomly chosen data dimension per node and is marked with a circle. N_t, number of trees; N_d, number data dimensions.

Table 1

Peak cross validation accuracies (Peak acc) in percent and their corresponding number of trees (no. trees) and number of randomly chosen data dimensions per node (no. dimensions).

ID	Peak acc at	No. trees	No. dimensions
S1	89%	2000	5
S2	73%	5000	17
S3	56%	5000	9
S4	60%	1000	25
S5	73%	200	19
S6	88%	2000	17
S5	64%	2000	25
S8	60%	50	25
S9	83%	1000	11
S10	54%	50	15

Online SMR-BCI using a random forest classifier

The upper significance threshold for peak accuracies (α=0.05) is 62.5% with 30 trials per class [30]. Peak accuracies of all participants exceed this significance threshold; three (21%) are above 90%, six (42%) above 80%, and 11 (79%) above 70%. This 70% threshold is the commonly accepted minimum accuracy for useful BCI control [24]. The average peak accuracy is 78.8%.

Three (21%) median accuracies are above 80% and six (42%) are above 60%. The average median accuracy is 63.5%. All individual results are presented in Figure 6A and Table 2.

Figure 6:

(A) Individual accuracies of the online sensorimotor rhythm (SMR)-brain-computer interface (BCI) using random forest (RF) classifiers. (B) Individual accuracies of the BCI simulation when combining filter bank common spatial patterns (CSP) and RF classifiers. (C) Individual accuracies of the BCI simulation when combining filter bank CSP and shrinkage linear discriminant analysis classifiers. Chance level for peak accuracies is 62.5% (30 trial per class, α=0.05).

Table 2

Classification accuracies of the online random forest (RF) based sensorimotor rhythm (SMR)-brain-computer interface (BCI) and the BCI simulations in percent.

ID	Online DFT+RF		Sim fbCSP+RF		Sim fbCSP+sLDA
ID	Peak	Median	Peak	Median	Peak	Median
Chance	62.5	–	62.5	–	62.5	–
P1	100.00	91.67	100.00	100.00	100.00	100.00
P2	96.67	86.67	98.33	92.50	100.00	92.50
P3	93.33	83.33	96.67	88.33	95.00	85.00
P4	86.67	68.33	91.67	82.50	90.00	80.00
P5	83.33	66.67	95.00	85.83	88.33	81.67
P6	80.00	60.00	81.67	65.00	78.33	61.67
P7	76.67	65.00	95.00	81.67	93.33	77.50
P8	76.67	56.67	73.33	63.33	65.00	56.67
P9	71.67	56.67	81.67	76.67	78.33	70.00
P10	70.00	50.00	68.33	56.67	68.33	56.67
P11	70.00	50.00	63.33	52.50	60.00	51.67
P12	66.66	55.00	91.67	88.33	86.67	81.67
P13	66.67	48.33	88.33	78.33	83.33	73.33
P14	65.00	50.00	68.33	53.33	71.67	56.67
Average	78.81	63.45	85.24	76.07	82.74	73.21
std	11.65	14.45	12.47	15.32	12.89	14.85

Highest accuracies per participant are in bold. Chance level calculated for 30 trials per class and α=0.05. DFT, discrete Fourier transform; sLDA, shrinkage linear discriminant analysis.

BCI simulation: random forest vs. regularized LDA

Filter bank CSP and RF classifiers

The peak accuracies of the SMR-BCI simulation are better than random for all 14 participants (threshold 62.5% [30]). Seven (50%) peak accuracies are above 90%, ten (71%) above 80% and 11 (79%) above 70%. The average peak accuracy is 85.2%. For comparison, the online average peak accuracy of the online SMR-BCI in this work is 78.8%. This increase of 6.4% is statistically significant (p<0.05, two-tailed t-test).

Two (14%) median accuracies are above 90%, seven (50%) are above 80% and 11 (79%) are above 60%. The average median accuracy over all participants is 76%. The online average median accuracy of the online SMR-BCI presented in this work is 63.5%. This increase of 12.6% is statistically significant (p<0.01, two-tailed t-test).

All individual results are shown in Figure 6B. A comparison with the other results of this work is presented in Table 2.

RF classifiers compared to sLDA classifiers

Average peak accuracy is 82.7% with sLDA classifiers compared to 85.2% with RF classifiers. Average median accuracy is 73.2% with sLDA classifiers compared to 76.1% with RF classifiers. These differences are statistically significant (p<0.05 for peak accuracies, p<0.01 for median accuracies, two-tailed t-tests). All individual results are shown in Figure 6C. A comparison with the other results of this work is presented in Table 2.

Discussion

Parameters influences on random forest performance

Figure 5 illustrates that RF classifiers are robust against variations of their parameters. It is important to select values that exceed certain data dependent thresholds. For the present data, these thresholds can be recognized from Figure 5. Starting at about 500 tree classifiers and 11 randomly chosen data dimensions, no statistically significant difference in average cross-validation accuracy was found compared to the best accuracy. One could simply chose high values for the parameters. However, lower values are preferred, because they decrease the computational effort. Breiman and Jaiantilal recommended between 500 and 1000 trees and round[sqrt(number of features)] randomly chosen data dimensions, which results in 11 dimensions in this parameters evaluation [11, 20]. These recommendations performed well on a variety of data [11, 27, 45] and as we showed with this work, are also suitabile for SMR-BCI settings. However, we prefer the more conservative recommendation of 1000 tree classifiers and round[sqrt(number of features)].

RF classifiers built with these parameters are fast enough to enable online operation. For example, an RF classifier with these parameters required on average 21±1.6 (SD) ms for one classification (Intel Core i5 processor at 2.6 GHz, Intel Corporation, Santa Clara, CA, USA).

Concluding this section, there is no need for a time-consuming parameter search when using RF classifiers with PSD features. Conveniently, classification accuracies achieved with fixed pre-selected parameters values are statistically not different to the accuracies achieved with the best parameters values. We recommend using fixed parameters values of 1000 tree classifiers and round[sqrt(number of features)] randomly chosen data dimension per node.

Online SMR-BCI using a random forest classifier

To our knowledge, this is the first online SMR-BCI system that uses RF classifiers. Classification accuracies, all above chance level, range from 65% to 100% (Table 2 and Figure 6A).

These results are remarkable, as at low trials-to-features ratios, classifiers with complex non-linear models usually tend to overtrain data and then do not generalize well on unseen data. Although the trials-to-features ratio was very low in this online application setting (0.78), a performance bias was not observed after the transition to online operation. This observation is supported by literature, where Breiman reports on using RF classifiers in scenarios with a trials-to-features ratio of approximately one [11].

Concluding this section, the results of our online study (Figure 6A and Table 2) show that RF classifiers are applicable and suitable for online operation and provide reliable online performance. Hence, offline results of RF classifiers can be transferred to online. In addition, the results exemplify the overtraining resistance of RF classifiers even at very low trials-to-features ratios.

BCI simulation: random forest vs. regularized LDA

As already shown many times, CSP filtering is a powerful spatial filter for SMR-BCIs in combination with RF classifiers. The combination statistically significantly increase average classification accuracies compared to the online SMR-BCI system using Laplacian filtering and RF classifiers. Average peak accuracy is increased by 6.4% and average median accuracy is increased by 12.6%. The higher accuracies are also visible on an individual participant level. For example, peak classification accuracy of one participant is improved by 33.3% (Table 2, P12). Further, classification stability is improved. The difference between average peak and average median accuracy is reduced from 15.4% in the online SMR-BCI to 9.2% in the BCI simulation. A stable, sustained classification is of particular importance to allow precise control with the Graz-BCI paradigm.

In direct comparison, RF classifiers achieve higher classification accuracies than sLDA classifiers, at least with the present data (Table 2). The average increase in classification accuracy is small (peak 2.5%, median 2.9%), but statistically significant. Peak accuracies are higher for 12 of the 14 participants when using RF classifiers. For one participant there is no difference. Median accuracies are higher for ten participants. There is no difference for three participants. This means that the non-linear RF classifier with his complex model outperformed a simpler, regularized LDA classifier at a low trials-to-features scenario, although they are specifically hard for complex models.

At least two explanations for the higher accuracy of RF classifiers compared to regularized LDA are possible: (i) RFs classifiers are possibly more robust against outliers; and (ii) the statistical distribution of logarithmic power features is only similar to a normal distribution and are potentially not optimally linear separable [42]. Hence, the RF classifier’s non-linear model and his independence from feature distributions is possibly more appropriate.

Concluding this section, in a BCI simulation, RF classifiers achieve statistically significantly higher classification accuracies than sLDA classifiers on the present data.

Conclusion and outlook

This work lays a foundation for a broader use of RF classifiers in the field of SMR-BCIs. We answered open questions and successfully showed that: (i) RF classifiers are easy to handle in terms of parameters. Time consuming parameter optimization is not required. RF classifiers with preselected parameters achieve classification accuracies as high as with optimized parameters. (ii) RF classifiers are applicable for online classification and provide reliable performance. (iii) RF classifiers are able to beat regularized LDA classifiers with respect to the classification accuracy, even in low trials-to-features scenarios.

The RF classifier offers what is wanted by BCI researchers. A stable, online applicable, non-linear classification method that is easy to handle in terms of parameters and is not prone to overtraining, even at low trials-to-feature ratios. RF classifiers have maximum margin behavior like support vector machines, which makes them robust against variations or drifts like they are typically present in EEG data [14]. Beside binary classification, RF classifiers have many other very attractive features. (i) RF classifiers are inherently multi class capable. Hence, RF classifiers are ideal in multi class BCI systems. (ii) RF classifiers make no presumptions about statistical distributions of features. For example, hybrid BCIs typically use features from different sources and therefore with different statistical distributions [37]. Hence, RF classifiers are a promising candidate to merge the different features into one classifiers model. Even categorical features like yes/no are possible. (iii) RF classifiers are more than classifiers. They give a wealth of important insights in the training data (e.g., estimate of generalization error, features importance rating, and features proximity estimate), which make them a valuable data mining tool.

This work confirms that RF classifiers are practical and convenient classifiers for BCI research and we hope that this work will (re)open and stimulate the debate about the optimal classifier for BCI research. Based on the results of this work, we argue that new, multi-functional machine learning methods like RF classifiers should be taken into account when planning future BCIs.

Corresponding author: Reinhold Scherer, Institute for Knowledge Discovery, Laboratory of Brain-Computer Interfaces, Graz University of Technology, Inffeldgasse 13/IV, 8010 Graz, Austria, Phone: +43 316 873 30713, Fax: +43 316 873 30702, E-mail: reinhold.scherer@tugraz.at

Acknowledgments

This work was partly supported by the FP7 Framework EU Research Projects ABC (No. 287774), BackHome (No. 288566), and BNCI Horizon 2020 (No. 609593). This paper only reflects the authors’ views and funding agencies are not liable for any use that may be made of the information contained herein. The authors want to thank Daniel Hackhofer for carrying out the measurements.

References

[1] Akram F, Han HS, Jeon HJ, Park K, Park S-H, Cho J, Kim T-S. An efficient words typing P300-BCI system using a modified T9 interface and random forest classifier. Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE, pp.2251–2254, 3-7 July 2013.Search in Google Scholar

[2] AlZoubi O, Koprinska I, Calvo RA. Classification of brain-computer interface data. In Proc. Seventh Australasian Data Mining Conference (AusDM 2008), Glenelg, South Australia. CRPIT, 87. Roddick JF, Li J, Christen P, Kennedy PJ, editors. ACS 2008: 123–131.Search in Google Scholar

[3] Ang KK, Chin ZY, Wang C, Guan C, Zhang H. Filter bank common spatial pattern algorithm on BCI competition IV datasets 2a and 2b. Front Neurosci 2012; 6: 39.10.3389/fnins.2012.00039Search in Google Scholar PubMed PubMed Central

[4] Barbero A, Grosse-Wentrup M. Biased feedback in brain-computer interfaces. J Neuroeng Rehabil 2010; 7: 1–4.10.1186/1743-0003-7-34Search in Google Scholar PubMed PubMed Central

[5] Bentlemsan M, Zemouri E. Random forest and filter bank common spatial patterns for eeg-based motor imagery classification. Proceeding of the 5th International Conference on Intelligent System Modeling and Simulation 2014 (ISMS 2014).10.1109/ISMS.2014.46Search in Google Scholar

[6] Birbaumer N, Ghanayim N, Hinterberger T, et al. A spelling device for the paralysed. Nature 1999; 398: 297–298.10.1038/18581Search in Google Scholar PubMed

[7] Blankertz B, Lemm S, Treder M, Haufe S, Müller KR. Single-trial analysis and classification of ERP components – a tutorial. Neuroimage 2011; 56: 814–825.10.1016/j.neuroimage.2010.06.048Search in Google Scholar PubMed

[8] Blankertz B, Losch F, Krauledat M, Dornhege G, Curio G, Müller KR. The berlin brain-computer interface: accurate performance from first-session in BCI-naive subjects. IEEE Trans Biomed Eng 2008; 55: 2452–2462.10.1109/TBME.2008.923152Search in Google Scholar PubMed

[9] Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller KR. Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 2008; 25: 41–56.10.1109/MSP.2008.4408441Search in Google Scholar

[10] Breiman L. Bagging predictors. Mach Learn 1996; 24: 123–140.10.1007/BF00058655Search in Google Scholar

[11] Breiman L. Random forests. Mach Learn 2001; 45: 5–32.10.1023/A:1010933404324Search in Google Scholar

[12] Breiman L, Friedman JH, Olshen RA, Stone CJ. CART: classification and regression trees. Wadsworth: Belmont 1983.Search in Google Scholar

[13] Breitwieser C, Daly I, Neuper C, Müller-Putz GR. Proposing a standardized protocol for raw biosignal transmission. IEEE Trans Biomed Eng 2012; 59: 852–859.10.1109/TBME.2011.2174637Search in Google Scholar PubMed

[14] Criminisi A, Shotton J, Konukoglu E. Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Research Technical Report: TR-2011-114.10.1561/9781601985415Search in Google Scholar

[15] Delorme A, Sejnowski T, Makeig S. Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. NeuroImage 2007; 34: 1443–1449.10.1016/j.neuroimage.2006.11.004Search in Google Scholar

[16] Faller J, Vidaurre C, Solis-Escalante T, Neuper C, Scherer R. Autocalibration and recurrent adaptation: towards a plug and play online ERD-BCI. IEEE Trans Neural Syst Rehabil Eng 2012; 20: 313–319.10.1109/TNSRE.2012.2189584Search in Google Scholar

[17] Farooq F, Kidmose P. Random forest classification for P300 based brain computer interface applications. Proceedings of the 21st European Signal Processing Conference (EUSIPCO) 2013.Search in Google Scholar

[18] Hjorth B. An on-line transformation of EEG scalp potentials into orthogonal source derivations. Electroen Clin Neuro 1975; 39: 526–530.10.1016/0013-4694(75)90056-5Search in Google Scholar

[19] Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 1998; 20: 832–844.10.1109/34.709601Search in Google Scholar

[20] Jaiantilal A. Random forest implementation for MATLAB. Available from: http://code.google.com/p/randomforest-matlab/. Accessed 6 November, 2012.Search in Google Scholar

[21] Jatzev S, Zander TO, DeFilippis M, Kothe C, Welke S, Roetting M. Examining causes for non-stationarities: The loss of controllability is a factor which induces non-stationarities. Proc. of the 4th International Brain-Computer Interface Workshop and Training Course 2008 (Graz, Austria). Graz: Verlag der Technischen Universität 2008: 138–143.Search in Google Scholar

[22] Kreilinger A, Kaiser V, Rohm M, Leeb R, Rupp R, Müller-Putz GR. Neuroprosthesis control via noninvasive hybrid brain-computer interface. IEEE Intell Syst 2014; 28: 40–43.Search in Google Scholar

[23] Kübler A, Furdea A, Halder S, Hammer EM, Nijboer F, Kotchoubey B. A brain-computer interface controlled auditory event-related potential (P300) spelling system for locked-in patients. Ann NY Acad Sci 2009; 1157: 90–100.10.1111/j.1749-6632.2008.04122.xSearch in Google Scholar

[24] Kübler A, Neumann N, Kaiser J, Kotchoubey B, Hinterberger T, Birbaumer N. Brain-computer communication: self-regulation of slow cortical potentials for verbal communication. Arch Phys Med Rehabil 2001; 82: 1533–1539.10.1053/apmr.2001.26621Search in Google Scholar

[25] Kübler A, Nijboer F, Mellinger J, et al. Patients with ALS can use sensorimotor rhythms to operate a brain-computer interface. Neurology 2005; 64: 1775–1777.10.1212/01.WNL.0000158616.43002.6DSearch in Google Scholar

[26] Lotte F, Congedo M, Lécuyer A, Lamarche F, Arnaldi B. A review of classification algorithms for EEG-based brain-computer interfaces. J Neural Eng 2007; 4: 1–13.10.1088/1741-2560/4/2/R01Search in Google Scholar

[27] Meyer D, Leisch F, Hornik K. The support vector machine under test. Neurocomputing 2003; 55: 169–186.10.1016/S0925-2312(03)00431-4Search in Google Scholar

[28] Millán JD, Rupp R, Müller-Putz GR, et al. Combining brain-computer interfaces and assistive technologies: state-of-the-art and challenges. Front Neurosci 2010; 4: 161.10.3389/fnins.2010.00161Search in Google Scholar

[29] Müller KR, Anderson CW, Birch GE. Linear and nonlinear methods for brain-computer interfaces. IEEE Trans Neural Syst Rehabil Eng 2003; 11: 165–169.10.1109/TNSRE.2003.814484Search in Google Scholar

[30] Müller-Putz GR, Scherer R, Brunner C, Leeb R, Pfurtscheller G. Better than random? A closer look on BCI results. Int J Bioelectromagn 2008; 10: 52–55.Search in Google Scholar

[31] Müller-Putz GR, Scherer R, Pfurtscheller G, Neuper C. Temporal coding of brain patterns for direct limb control in humans. Front Neurosci 2010; 4: 34.10.3389/fnins.2010.00034Search in Google Scholar

[32] Müller-Putz GR, Scherer R, Pfurtscheller G, Rupp R. EEG-based neuroprosthesis control: a step towards clinical practice. Neurosci Lett 2005; 382: 169–174.10.1016/j.neulet.2005.03.021Search in Google Scholar

[33] Neuper C, Scherer R, Reiner M, Pfurtscheller G. Imagery of motor actions: differential effects of kinesthetic and visual-motor mode of imagery in single-trial EEG. Cogn Brain Res 2005; 25: 668–677.10.1016/j.cogbrainres.2005.08.014Search in Google Scholar

[34] Nijboer F, Sellers EW, Mellinger J, et al. A P300-based brain-computer interface for people with amyotrophic lateral sclerosis. Clin Neurophysiol 2008; 119: 1909–1916.10.1016/j.clinph.2008.03.034Search in Google Scholar

[35] Pfurtscheller G, Neuper C. Motor imagery activates primary sensorimotor area in humans. Neurosci Lett 1997; 239: 65–66.10.1016/S0304-3940(97)00889-6Search in Google Scholar

[36] Pfurtscheller G, Neuper C. Motor imagery and direct brain-computer communication. Proc IEEE 2001; 89: 1123–1134.10.1109/5.939829Search in Google Scholar

[37] Pfurtscheller G, Allison BZ, Brunner C, et al. The hybrid BCI. Front Neurosci 2010; 4: 30.10.3389/fnpro.2010.00003Search in Google Scholar

[38] Pfurtscheller G, Müller GR, Pfurtscheller J, Gerner HJ, Rupp R. ‘Thought’ – control of functional electrical stimulation to restore hand grasp in a patient with tetraplegia. Neurosci Lett 2003; 351: 33–36.10.1016/S0304-3940(03)00947-9Search in Google Scholar

[39] Pokorny C, Klobassa D, Pichler G, et al. The auditory P300-based single-switch brain-computer interface: Paradigm transition from healthy subjects to minimally conscious patients. Artif Intel Med 2013; 59: 81–90.10.1016/j.artmed.2013.07.003Search in Google Scholar

[40] Ramoser H, Müller-Gerking J, Pfurtscheller G. Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehab Eng 2000; 8: 441–446.10.1109/86.895946Search in Google Scholar

[41] Scherer R, Pfurtscheller G, Neuper C. Motor imagery induced changes in oscillatory EEG components: speed vs. accuracy. Proc of the 4th International Brain-Computer Interface Workshop and Training Course 2008 (Graz, Austria). Graz: Verlag der Technischen Universität Graz 2008: 186–190.Search in Google Scholar

[42] Steyrl D, Scherer R, Müller-Putz GR. Random forests for feature selection in non-invasive brain-computer interfacing. In: Holzinger A, Pasi G, editors. Human-computer interaction and knowledge discovery in complex, unstructured, big data, lecture notes in computer science. 2013; 7947: 207–216.Search in Google Scholar

[43] Steyrl D, Scherer R, Müller-Putz GR. Using random forests for classifying motor imagery EEG. Proceedings of TOBI workshop IV 2013; 89–90.Search in Google Scholar

[44] Sun S, Zhang C, Zhang D. An experimental evaluation of ensemble methods for EEG signal classification. Pattern Recognit Lett 2007; 28: 2157–2163.10.1016/j.patrec.2007.06.018Search in Google Scholar

[45] Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: a survey and results of new tests. Pattern Recognit 2011; 44: 330–349.10.1016/j.patcog.2010.08.011Search in Google Scholar

[46] Weichwald S, Meyer T, Schölkopf B, Ball T, Grosse-Wentrup M. Decoding index finger position from EEG using random forests. Fourth International Workshop on Cognitive Information Processing, Copenhagen, Denmark 2014.10.1109/CIP.2014.6844513Search in Google Scholar

[47] Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM. Brain-computer interfaces for communication and control. Clin Neurophysiol 2002; 113: 767–791.10.1016/S1388-2457(02)00057-3Search in Google Scholar

[48] Wolpaw JR, Winter Wolpaw E. Brain-computer interfaces: something new under the sun. In: Wolpaw JR, Winter Wolpaw E, editors. Brain-computer interfaces: principles and practice. New York: Oxford University Press 2012: 3–12.10.1093/acprof:oso/9780195388855.003.0001Search in Google Scholar

Received: 2014-10-3

Accepted: 2015-3-2

Published Online: 2015-4-1

Published in Print: 2016-2-1

Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier

Abstract

Introduction

Materials and methods

Decision trees and random forests

Parameter’s influences on random forest performance

Data description:

Automated outlier rejection:

Feature extraction:

Parameters assessment:

Online SMR-BCI using a random forest classifier

Participants and experimental design:

EEG measurement:

Automated outlier rejection:

BCI set up:

Online data processing:

Performance metrics:

BCI simulation: random forest vs. regularized LDA

BCI simulation procedure:

Results

Parameters influences on random forest performance

Online SMR-BCI using a random forest classifier

BCI simulation: random forest vs. regularized LDA

Filter bank CSP and RF classifiers

RF classifiers compared to sLDA classifiers

Discussion

Parameters influences on random forest performance

Online SMR-BCI using a random forest classifier

BCI simulation: random forest vs. regularized LDA

Conclusion and outlook

Acknowledgments

References

Journal and Issue

Articles in the same Issue