1 Introduction

As per declared reports by the World Health Organization (WHO), around 1.3 million individuals lose their lives each year as a result of traffic accidents, which are the primary cause of death for children and young adults (5–29 years old) worldwide, causing a loss of about 3% of the gross domestic product (GDP) of most countries [1]. Additionally, according to the WHO, by 2030, road traffic accidents are expected to overtake all other causes of death to become the fifth leading cause of death [2, 3]. Stress is intimately linked to driving safety, especially in driving circumstances. For example, stress can cause road accidents by lowering a driver’s ability to make judgments in risky situations or compromising driving performance. As a result, various studies have been conducted to address the problem of identifying stress early in order to lower the risk of traffic accidents [4,5,6].

Over the last few decades, an increase in traffic accidents and fatalities has been attributed to an increase in driver drowsiness, exhaustion, and mental stress. To reduce human faults, some physiological parameters such as electrocardiogram (ECG), electromyogram (EMG), skin conductance (SC), also known as galvanic skin response (GSR), and respiration rate (RR) can be continuously measured in order to monitor the stress and alertness while driving [7, 8]. Skin conductivity and heart rate indicators are more directly associated with a driver’s stress level in the majority of cases.

Numerous measurements, which can be divided into three categories: physiological measurements, measurements of facial behavior, and measurements of vehicle motion, have been used to assess drivers’ levels of stress [9]. The most frequent vehicle motion-related attributes include acceleration, braking, lane position, steering angle, and handling movement patterns. These characteristics are straightforward to calculate, although they are influenced by the type of vehicle, driving habits, and road conditions. Facial activities, such as head movement, pupil dilation, blink rate, and eye gazing, can be measured without affecting driving [10,11,12,13]. These measures, however, may be unreliable in some scenarios, such as low illumination, severe weather, at night, or when a driver is wearing eyeglasses.

Physiological data, however, not usually affected by contextual factors unrelated to stress, such as lighting or driving technique. In addition, physiological signals obtained through body-worn technology can offer helpful details about a driver’s internal state, which can be utilized to identify stress [14, 15]. GSR signals linked to sweat gland activity associated with heart activity are frequently utilized as accurate stress indicators because the stress response is related to autonomic nervous system activity. In stress recognition, the focus is detecting and utilizing a variety of physiological signals from inexpensive and readily accessible sensors.

In driving scenarios, short-term monitoring is critical for driving safety. However, various stress detection studies relied on relatively long-term physiological signals, usually lasting several minutes [16, 17]. In some recent studies, short-term ECG signals with high sampling frequencies under stressful settings were frequently used [18, 19]. Although the use of short-term GSR signals is becoming more common, knowledge-based feature building in conjunction with traditional machine learning classifiers still takes a significant amount of time and human effort.

The aim of this paper is to develop a solution to detect mental stress for automotive drivers based on selected biosignals by using different machine learning (ML) techniques. Such system can be integrated with a Driver Assistance System (DAS), which can continuously probe the mental state of the driver. Also, it may provide a warning or take an action (e.g., playing relaxing music or turning on the favorite program) to relieve the stress state in order to increase safety. This work investigates the use of physiological signs: ECG, EMG, GSR, and respiration, to classify between the stressed and non-stressed states.

The proposed stress detection approach consists of three main phases. The first phase involves biosignal pre-processing, in which the signal is segmented and filtered. The second phase is the feature extraction phase, in which some discriminate features are extracted from each biosignal to describe the mental state of the driver. The third phase is stress detection. This work uses the k-nearest neighbor (KNN), support vector machines (SVM), random forest (RF), multilayer perceptron (MLP), decision tree (DT), and logistic regression (LR) classifiers to detect and classify the stress level.

The main contributions of this paper are summarized as follows:

  • An artificial intelligence-based Driver Assistance System (AI-DAS) is proposed to identify mental stress in automotive drivers using a group of physiological signals, which are easily captured from the driver. These signals are ECG, EMG, hand GSR, foot GSR, and respiration rate.

  • The proposed system employs three phases: pre-processing, feature extraction, and classification.

  • Ten statistical features are extracted from each 1-min segment of the involved signals and fed to the classifiers for identification.

  • Different ML classifiers are adopted to differentiate between stressed and relaxed states, including KNN, SVM, RF, MLP, DT, and LR. These classifiers showed the ability to detect the driver’s stress effectively with short training periods.

  • Grid-search technique was used to find the optimal hyperparameters of the classifiers.

  • The evaluation experiments have been performed, and the classification accuracy for the model is 98.2, sensitivity 97, and specificity 100% using the drivedb dataset.

The rest of the paper is organized as follows. After this introductory section, Sect. 2 is dedicated to the related work. Section 3 introduces the proposed strategy for identifying the stress state in car drivers. Section 4 presents the experimental results of the method on real-world driving experiences. Finally, Sect. 5 presents a conclusion of the study.

2 Related work

This section discusses a number of studies that have been conducted by researchers in the literature. The researchers used a combination of detection systems to capture mental and behavioral reactions in a social situation. Lin et al. [20] proposed a fuzzy-assisted Petri net technique for stress detection using HR and BP monitoring. They monitor the HR by the duration between two successive QRS complex in the ECG signal. BP is tracked by monitoring the transient time of each pulse. The method is based on the variance of HR and BP in the case of stress/non-stress states using time and frequency analysis. The fuzzy-assisted Petri net assessed the stress evaluation process. The accuracy, precision, and recall are 93.55, 89.01, and 89.50%, respectively.

Hu et al. [21] presented a heart rate variability (HRV) analysis to detect mental stress for automotive drivers. The study demonstrates a synchronization between the HRV, which identifies the rhythm pattern change of heart pulses, with the driver mental state. KNN algorithm is employed to distinguish HRV characteristics to detect the mental condition. They revealed that an accuracy of 93.7% was achieved with this technique.

Lee et al. [16] suggested a method to detect stress during driving using short-time physiological signals and convolutional neural networks (CNNs). HR and the GSR signals from the hand and foot are used to identify mental stress in drivers. CNNs are employed to extract discriminant features from the signals and identify the stressed and normal states. They applied the method and recorded its performance on 10 and 30 s signals. The classification accuracy reported was 92.33 and 95.67%, respectively.

Tang et al. [22] presented a study on the effect of different activities on the stress state. They measured the GSR signal and taken as an indication of the mental stress of a person. They also showed how activity data could enhance the sensitivity of stress detection in seated and standing positions. An accuracy of 94.7% from this system is achieved.

Mozos et al. [23] combined two sensor systems that record physiological and social reactions to provide a machine learning strategy for automatically detecting stress in humans in social situations. They used various classifiers, such as the SVM, AdaBoost, and KNN, to classify the stress state. The results show that when the signals from both sensors are combined, they can distinguish between stress and neutral situations. They also provided an assessment of each sensor separately for suitability for stress detection in real time.

To improve instant stress tracking, Giakoumis et al. [24] introduced behavioral parameters that are related to the operation and can be accessed instantaneously through a computer network. The proposed features are based on video and accelerometer data from the tracked subject areas. A stress-inducing approach based on Stroop color word test was used. Nineteen participants participated in the study, and biosignals (ECG and GSR) were collected from them, besides video and accelerometer data. Spatial–temporal features are extracted from video sequences, and an exploratory methodological investigation was conducted. They examined various activity-related behavioral features, potentially helpful for automatic stress detection. Results reveal that most of these features directly correlate with self-reported stress.

Numerous physiological markers have been investigated in the literature to detect stress. Table 1 summarizes the physiological signals involved with stress detection in previous studies.

Table 1 Physiological parameters implicated in stress detection

3 The proposed stress detection technique (SDT)

This paper aims to develop an approach to detect mental stress for automotive drivers based on selected biosignals using different ML techniques. Such system can be integrated with a Driver Assistance System (DAS), which can continuously probe the mental state of the driver, and may provide a warning or take an action (e.g., playing relaxing music or turning on the favorite program) to relieve the stress state in order to increase safety [26, 29].

This work investigates the use of physiological signs: ECG, EMG, GSR, and respiration, to identify the stress and relaxation states. The proposed stress detection technique (SDT) consists of three main phases, which are: (1) Biosignal Pre-Processing: In which the signal is segmented and filtered. (2) Feature Extraction: In which some discriminate features are extracted from each biosignal to describe the mental state of the driver. (3) Classification: This work uses the KNN, SVM, DT, LR, RF, and MLP classifiers to detect and classify the stress level. Figure 1 shows the flow diagram of the proposed method for stress detection.

Fig. 1
figure 1

Overview of the proposed stress detection technique (SDT)

3.1 Biosignal pre-processing

In pre-processing phase, the biosignals pass through three main steps, which are: (1) signal segmentation, (2) segment partitioning, and (3) filtering.

3.1.1 Signal segmentation

The first step of the SDT is to separate the driving periods which have different stressful events. In this step, each biosignal is subdivided into a number of time intervals. Each interval corresponds to a different driving condition. The marker signal, accompanied with each driver record, determines the time intervals of the periods of each driver. This marking signal is used to identify the start, end, and duration of each period. First, the peaks of the marker signals and their locations are identified. Then, these locations are used to divide the different signals of the driver into subintervals, each with a different stressful event. Figure 2 shows the marker signal of one driver, annotating peaks, and a sample of involved biosignals, which will be divided. Each driving period will be assigned a different stress level, as shown in Table 2.

Fig. 2
figure 2

Signal segmentation

Table 2 Different stress levels for different driving periods

After identifying each driving period, we can separate the biosignal segments corresponding to each period. This process is repeated for the five involved signals: ECG, EMG, fGSR, hGSR, and respiration. Table 3 displays the end samples of each driving period for different drivers.

Table 3 The end samples of each driving period for different drivers

Also, we can determine the time intervals (in minutes) for each driving period, from which we can identify the start and end times for each period. These time intervals can be computed (in minutes) by dividing the values of samples in the previous table by the sampling frequency (15.5 Hz) multiplied by 60. Table 4 shows the time intervals, in minutes, for the driving periods. The displayed values correspond to the end times for each period.

Table 4 Time intervals, in minutes, for the driving periods

3.1.2 Segment partitioning

In this step, each driving segment (period) will be divided into small partitions, each with 1-min duration. The time points are taken 2 min after the start of each segment. This produces 555 partitions, each with 1-min duration, and each partition is either relaxed or stressed. These signal partitions will then be filtered and analyzed for the extraction of discriminant features that will subsequently be used to train and test the ML classifiers.

3.1.3 Filtering

The main objective of the filtering step is to remove noise and artifacts from the biosignals. Each biosignal in the dataset is subjected to a different type of noise. Therefore, each biosignal will be processed separately, as follows:

3.1.3.1 ECG filtering

Baseline wander is one of the main types of noise that may exist in an ECG signal [30]. It is a low-frequency artifact resulting from the movement and respiration of the subject [31, 32]. This noise can be removed by suppressing the low-frequency components (≤ 0.5 Hz) in the signal. In the proposed approach, the signal is firstly transformed from the time to the frequency domain using the fast Fourier transform (FFT). Then, we set the target frequency to 0.5 Hz, below which all frequency components will be removed by setting the range of values between 0 and 0.5 Hz to zero in the spectrum of the signal. After that, the processed signal is transformed back into the time domain using the inverse FFT (IFFT).

A signal in the time domain, \(x_{n} = \left\{ {x_{0} ,x_{1} ,{ }x_{2} ,{ } \ldots ,{ }x_{N - 1} } \right\}\), can be converted to the frequency domain, \(X_{k} = \left\{ {X_{0} ,X_{1} ,{ }X_{2} ,{ } \ldots ,{ }X_{N - 1} } \right\}\), using the formula in Eq. (1). The filtering process, which involves suppressing the frequency components below 0.5 Hz, is illustrated by Eq. (2).

$$X_{k} = \sum\limits_{n = 0}^{N - 1} {x_{n} e^{ - i2\pi kn/N} } \;\;\;\;\;\;\;\;\;\;\;\;\;,k = 0, \ldots ,N - 1$$
(1)
$$X\left( f \right) = \left\{ {\begin{array}{*{20}c} {0, \quad\quad\quad\quad f \le 0.5} \\ {X\left( f \right),\quad\quad\quad\quad f > 0.5} \\ \end{array} } \right.$$
(2)

where N is the number of samples.

Figure 3 displays the time and frequency domains for a sample of ECG signal before and after removing the baseline wander noise.

Fig. 3
figure 3

An ECG signal in time and frequency domains before and after filtering

3.1.3.2 fGSR and hGSR filtering

The GSR signals from the foot (fGSR) and hand (hGSR) are filtered by removing the DC component (the component with 0 Hz) from the signal [33]. Similar to the ECG signal, the fGSR and hGSR signals are converted to the frequency domain using FFT; then, the frequency component at 0 Hz is set to zero. After that, the signals are converted back to the time domain.

3.2 Feature extraction

This step involves the extraction of some discriminant features from each 1-min signal partition on which the ML models will be trained. In total, for every partition, ten statistical features are extracted, which are distributed over the five involved signals, such that two for ECG, three for fGSR, three for hGSR, one for EMG, and one for respiration. The description of extracted features is depicted in Table 5. The peak detection algorithm is applied to the filtered ECG, fGSR, and hGSR signals with the width between two consecutive peaks is set to 5, and prominence is set to 0.1.

Table 5 Description of features used to classify the stress state

This feature extraction procedure is repeated for all 555 signal partitions. Then, a standardization process from scikit-learn toolkit is applied on the features. Standardization involves removing the mean value of each feature, then scaling it by dividing features by their standard deviation [34]. After that, the data were further split into training and testing portions with a 70:30 ratio split. This results in producing 388 partitions for training and 167 partitions for testing. The training and testing portions are used to train and verify the performance of the models, respectively.

3.3 Classification

Some common machine learning algorithms are employed to classify the stressful state of automotive drivers based on the extracted features from their biosignals. The employed algorithms are KNN, SVM, DT, LR, RF, and MLP. The models in this study are implemented using the scikit-learn toolkit. The best structure and hyperparameters for classifiers are found using grid-search [35] in order to achieve the best performance from the models. Additionally, the trial-and-error method is used to calculate the optimum combinations of hyperparameters to avoid overfitting. Table 6 displays the optimal hyperparameters obtained for each model.

Table 6 The optimal hyperparameters of the classifiers used in the current study

4 Experimental results

This section presents the evaluation results of the proposed system for stress detection. All the experiments of pre-processing, feature extraction, and classification are implemented with Python programming language and executed in a PC with Intel Core i3 (1.80 GHz) and 8 GB RAM.

4.1 Dataset description

This study uses the drivedb dataset acquired by Healey and Picard [28]. This dataset was collected for the purpose of determining a driver’s relative stress level using four physiological signals: ECG, EMG, skin conductivity (known as GSR), and respiration. The authors acquired the four signals during three periods of driving activity: rest, driving along a highway, and driving inside a city that were assumed to produce low, medium, and high levels of stress, respectively. The dataset, publicly available on PhysioNet [36, 37], contains data from 17 drivers. However, only the data of ten drivers are complete.

The records have durations of 65–93 min, split over driving periods with different stress events, as shown in Fig. 4. The start and end of each period are identified by the marker signal, provided along with the driver’s data. The sampling frequency of all signals is 15.5 Hz.

Fig. 4
figure 4

An ECG signal in time and frequency domains before and after filtering

In this study, the following signals, listed in Table 7, are used to classify the stress level of the drivers.

Table 7 List of biosignals used in the study

4.2 Evaluation metrics

The following metrics are used to evaluate the performance of the proposed SDT:

  • Classification Accuracy: It is the percentage of correctly identified instances as stressed/relaxed to the total number of instances.

    $${\text{Classification Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
    (3)
  • Sensitivity (recall): It is the probability that all positive (stressed) instances are identified as positive.

    $${\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
    (4)
  • Specificity: It is the probability that all negative (relaxed) instances are identified as negative.

    $${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}}$$
    (5)
  • Precision: It is the ratio of actual positive (stressed) instances to the total number of instances identified as positive by the model.

    $${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$
    (6)
  • F1-Score: It combines the two metrics, precision and recall, into a single metric by taking the harmonic mean.

    $$F1 - {\text{Score}} = 2 \frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}$$
    (7)
  • Training Time: It is the amount of time (in seconds) each model takes to be trained using the same amount of data.

  • Permutation Importance: It is a measure of the importance of each feature in the classification process. It is computed as the difference between the evaluation score of the baseline model and the evaluation score when changing (permuting) each feature column, one at a time [38].

where TP (true positive) is the ratio of positive (stressed) instances identified as positive, FP (false positive) is the ratio of negative (relaxed) instances falsely identified as positive, TN (true negative) is the ratio of negative (relaxed) instances identified as negative, and FN (false negative) is the ratio of positive (stressed) instances falsely identified as negative by the classifier.

4.3 Evaluation results

This work employs the KNN, SVM, DT, LR, RF, and MLP models to identify the stress level and test the impact of applying each classifier. The overall results of training and testing each classifier based on the aforementioned evaluation metrics are shown in Table 8. In addition, Fig. 5 displays the confusion matrix for each classifier. The learning curves, which show the training and validation scores of a model with varying training sizes, and the progress of model accuracy with training time, with cross-validation (no. of splits = 5), are shown in Fig. 6a and b, respectively. Moreover, a comparison of ROC curves of adopted ML models is depicted in Fig. 7.

Table 8 The overall results of training and testing each classifier
Fig. 5
figure 5

The confusion matrix for each classifier

Fig. 6
figure 6

a Accuracy vs. training size and b accuracy vs. training time, for each classifier

Fig. 7
figure 7

Comparison of ROC curves of classifiers

From the results in Table 8 and Figs. 5, 6, and 7, it is clear that the RF model performs better than other classifiers using the drivedb dataset with a classification accuracy of 98.2, sensitivity 97, and specificity 100%.

Figure 8 displays the permutation importance scores of each model, which explains the importance level of each feature. In this graph, the features appear in descending order, from top to bottom, based on their importance in the classification process. From these graphs, it is clear that feature_3 (Peaks_fGSR) and feature_9 (rms_EMG) are the most important features for all classification models, which contribute more in the classification process.

Fig. 8
figure 8figure 8

Permutation importance for each classifier, indicating the importance of each feature

Table 9 shows a comparison of the proposed and some related works in the literature for stress detection using the drivedb database. This comparison shows the superiority of the proposed system compared with other systems in the literature.

Table 9 Comparison of the current and some related works for stress detection using the drivedb database

5 Conclusion

In this paper, we presented an AI-based Driver Assistance System (AI-DAS) that can automatically detect stress in automotive drivers. In the proposed method, the physiological signals ECG, EMG, fGSR, hGSR, and RR, which are easily captured using wearable sensors, are analyzed and processed. Such application requires fast processing to be able to track stress in car drivers. So, in order to maintain fast processing in the proposed solution, the filtering and feature extraction processes are performed over short periods (1 min) to ensure that the proposed solution is reliable in actual processing. Therefore, in real scenarios, a 1-min recording from signals is captured, filtered, and then, only ten statistical features are extracted from all the signals. Consequently, any stress states will be detected the minute after it is encountered. Different ML classifiers are adopted to differentiate between stressed and relaxed states using the publicly available drivedb dataset. The classifiers used in this study are KNN, SVM, DT, LR, RF, and MLP, which showed the ability to detect the driver’s stress effectively with short training times. Grid-search was used to find the optimal hyperparameters of the classifiers. The experimental results reveal that the RF classifier outperforms other techniques with a classification accuracy of 98.2%, which also has superior performance than other methods presented in earlier studies.