Background & Summary

The ability to recognize human emotions based on physiology and facial expressions opens up important research and application opportunities, mainly in healthcare and human-computer interaction. Continuous affect assessment can help patients suffering from affective disorders1 and children with autism spectrum disorder2. On a larger scale, promoting emotional well-being is likely to increase public health, improve the quality of life, and prevent some mental problems3,4. Emotion recognition could also enhance interaction with robots – they would better and less obtrusively understand the user’s commands, needs, and preferences5. Furthermore, difficulty in video games could be adjusted to the user’s emotional feedback6. Recommendations for movies, music, search engine results, user interface, and content may be enriched with the user’s emotional context7,8.

To achieve market-ready and evidence-based solutions, the machine learning models detecting and classifying affect and emotions need improvement. Such models require a large amount of data collected from several affective outputs to train complex, data-intensive deep learning architectures. Over the last decade, several datasets on physiological responses and facial expressions to affective stimuli have been published, i.e., POPANE9, BIRAFFE10, QAMAF11, ASCERTAIN12, DECAF13, MAHNOB-HCI14, and DEAP15. However, these datasets have limitations such as using only dimensional scales to capture participants’ emotional state rather than asking about discrete emotions.

The Emognition dataset contains physiological signals and upper-body recordings of 43 participants who watched validated emotionally arousing film clips targeted at nine discrete emotions. The autonomous nervous system responses to the stimuli were recorded with consumer-grade wearables: Muse 2 equipped with electroencephalograph (EEG), accelerometer (ACC), and gyroscope (GYRO) sensors; Empatica E4 measuring and providing blood volume pulse (BVP), electrodermal activity (EDA), skin temperature (SKT), and also providing interbeat interval (IBI), and ACC data; Samsung Galaxy Watch measuring and providing BVP, and also providing heart rate (HR), peak-to-peak interval (PPI), ACC, GYRO, and rotation data. The participants reported their emotions using discrete and dimensional questionnaires. The technical validation supported that participants experienced targeted emotions and the obtained signals are of high quality.

The Emognition dataset offers the following advantages over the previous datasets: (1) the physiological signals have been recorded using wearables which can be applied unobtrusively in everyday life scenarios; (2) the emotional state has been represented with two types of emotional models, i.e., discrete and dimensional; (3) nine distinct emotions were reported; (4) we put an emphasis on the differentiation between positive emotions; thus, this is the only dataset featuring four discrete positive emotions; the differentiation is important because studies indicated that specific positive emotions might differ in their physiology16,17,18; (5) the dataset enables versatile analyses within emotion recognition (ER) from physiology and facial expressions.

The Emognition dataset may serve to tackle the research questions related to: (1) multimodal approach to ER; (2) physiology-based ER vs. ER from facial expressions; (3) ER from EEG vs. ER from BVP; (4) ER with Empatica E4 vs. ER using Samsung Watch (both providing BVP signal collected in parallel); (5) classification of positive vs. negative emotions; (6) affect recognition – low vs. high arousal and valence; (7) analyses between discrete and dimensional models of emotions.

Methods

Ethics statement

The study was approved by and performed in accordance with the guidelines and regulations of the Wroclaw Medical University, Poland; approval no. 149/2020. The submission to the Ethical Committee covered, among others, participant consent, research plans, recruitment strategy, data management procedures, and GDPR issues. Participants provided written informed consent, in which they declared that they (1) were informed about the study details, (2) understand what the research involves, (3) understand what their consent was needed for; (4) may refuse to participate in the research at any time during the research project; (5) had the opportunity to ask questions of the experimenter and receive answers to those questions. Finally, participants gave informed consent to participate in the research, agreed to be recorded during the study, and consented to the processing of their personal data to the extent necessary for the implementation of the research project, including sharing their psycho-physiological and behavioral data with other researchers.

Participants

The participants were recruited via a paid advertisement on Facebook. Seventy people responded to the advertisement. We have excluded ten non-Polish speaking volunteers. An additional 15 could not find a suitable date, and two did not show up for the scheduled study. As a result, we collected data from 43 participants (21 females) aged between 19 and 29 (M = 22.37, SD = 2.25). All participants were Polish.

The exclusion criteria were significant health problems, use of drugs and medications that might affect cardiovascular function, prior diagnosis of cardiovascular disease, hypertension, or BMI over 30 (classified as obesity). We asked participants to reschedule if they experienced an illness or a major negative life event. The participants were requested (1) not to drink alcohol and not to take psychoactive drugs 24 hours before the study; (2) to refrain from caffeine, smoking, and taking nonprescription medications for two hours before the study; (3) to avoid vigorous exercise and eating an hour before the study. Such measures were undertaken to eliminate factors that could affect cardiovascular function.

All participants provided written informed consent and received a 50 PLN (c.a., $15) online store voucher.

Stimuli

We used short film clips from databases with prior evidence of reliability and validity in eliciting targeted emotions19,20,21,22,23. The source film, selected scene, and stimulus duration are provided in Table 1.

Table 1 Stimuli clips used to elicit emotions.

Measures

We used two types of self-assessment for manipulation checks that accounted for discrete and dimensional approaches to emotions. For the discrete approach, participants reported retrospectively, using single-item rating scales, on how much of the targeted emotions they had experienced while watching the film clips21. The questionnaire was filled in electronically with a tablet, see Fig. 1a. It included nine items corresponding to the selected stimuli. Each emotion-related scale ranged from 1 (not at all) to 5 (extremely). The questionnaire was modeled after the instruments used in previous studies with similar methodology24,25,26,27.

Fig. 1
figure 1

The English version of the self-reports used in the study: (a) questionnaire for discrete emotions; (b) questionnaire for valence, arousal, and motivation. The original Polish version can be found in the Supp. Mat. Fig 3.

For the dimensional approach, participants reported retrospectively, using single-item rating scales, on how much valence, arousal, and motivation they experienced while watching the film clips. The 3-dimensional emotional self-report was collected with the Self-Assessment Manikin – SAM28. The SAM is a validated nonverbal visual assessment developed to measure affective responses. Participants reported felt emotions using a graphical scale ranging from 1 (a very sad figure) to 9 (a very happy figure) for valence, Fig. 1b; and from 1 (a calm figure) to 9 (an agitated figure) for arousal, Fig. 1b. We also asked participants to report their motivational tendency using a validated graphical scale modeled after the SAM29, i.e., whether they felt the urge to avoid or approach while watching the film clips, from 1 (figure leaning backward) to 9 (figure leaning forward)30, Fig. 1b. The English versions of the self-reports used in the study are illustrated in Fig. 1.

Apparatus

The behavioral and physiological signals were gathered using three wearable devices and a smartphone:

  • An EEG headband Muse 2 equipped with four EEG electrodes (AF7, AF8, TP9, and TP10), accelerometer (ACC), and gyroscope (GYRO). The data was transmitted to a smartphone in real-time using the Mind Monitor (https://mind-monitor.com) application. At the end of each day, data from the smartphone was transferred to the secure disk;

  • A wristband Empatica E4 monitoring blood volume pulse (BVP), interbeat interval (IBI), electrodermal activity (EDA), acceleration, and skin temperature (SKT). The Empatica E4 was mounted on the participant’s dominant hand. The device was connected wirelessly via Bluetooth to the tablet using a custom-made Android application with Empatica E4 link SDK module31. The data was streamed in real-time to the tablet and after the study to the secure server. The signals obtained with the Empatica E4 were synchronized with the stimuli presented on the tablet;

  • A smartwatch Samsung Galaxy Watch SM-R810 providing heart rate (HR), peak-to-peak interval (PPI), raw BVP – the amount of reflected LED light, ACC, GYRO, and rotation data. A custom Tizen application was developed and installed on the watch to collect and store data locally. At the end of each day, data was downloaded to the secure disk;

  • A smartphone Samsung Galaxy S20 + 5G recording participants’ upper-body – head, chest, and hands. The footage also included a small mirror reflecting the tablet screen to enable later synchronization with stimuli. At the end of each day, recordings were moved to the encrypted offline disk.

The Muse 2 has lower reliability than medical devices but sufficient for nonclinical trial settings32. It has been successfully used to observe and quantify event-related brain potentials33, as well as to recognize emotions34. The Empatica E4 has been compared with a medical electrocardiograph (ECG), and proved to be a practical and valid tool for studies on HR and heart rate variability (HRV) in stationary conditions35. It was also likewise effective as the Biopac MP150 in the emotion recognition task36. Moreover, we have used the Empatica E4 for intense emotion detection with promising results in a field study37,38. The Samsung Watch devices were successfully utilized (1) to track the atrial fibrillation with an ECG patch as a reference39, and (2) to assess the sleep quality with a medically approved actigraphy device as a baseline40. Moreover, Samsung Watch 3 performed well in detecting intense emotions41.

Additionally, a 10.4-inch tablet Samsung Galaxy Tab S6 was used to guide participants through the study. A dedicated application was developed to instruct the participants, present stimuli, collect self-assessments, as well as gather Empatica E4 signals, and synchronize them with the stimuli.

The sampling rate of the collected signals is provided in Table 2. The devices and the experimental stand are illustrated in Fig. 2.

Table 2 Sampling rate of signals and other data available in the Emognition dataset.
Fig. 2
figure 2

Devices used to gather the physiological data and the experimental stand.

Procedure

The study was conducted between the 16th of July and the 4th of August, 2020. It took place in the UX Wro Lab - a laboratory at the Wrocław University of Science and Technology. Upon arrival, participants were informed about the experimental procedure, Fig. 3. They then signed the written consent. The researcher applied the devices approximately five minutes before the experiment so that the participants could get familiar with them. It also enabled a proper skin temperature measurement. From this stage until the end of the experiment, the physiological signals were recorded. Next, participants listened to instructions about the control questionnaire and self-assessments. The participants filled out the control questionnaire about their activity before the experiment, e.g., time since the last meal or physical activity and wake-up time. Their responses are part of the dataset.

Fig. 3
figure 3

The experiment procedure.

The participants were asked to avoid unnecessary actions or movements (e.g., swinging on the chair) and not to cover their faces. They were also informed that they could skip any film clip or quit the experiment at any moment. Once the procedure was clear to the participants, they were left alone in the room but could ask the researcher for help anytime. For the baseline, participants watched dots and lines on a black screen for 5 minutes (physiological baseline) and reported current emotions (emotional baseline) using discrete and dimensional measures. The main part of the experiment consisted of ten iterations of (1) a 2-minute washout clip (dots and lines), (2) the emotional film clip, and (3) two self-assessments, see Fig. 3. The order of film clips was counterbalanced using a Latin square, i.e., we randomized clips for the first participant and then shifted by one film clip for each next participant so that the first film clip was placed as the last one.

After the experiment, participants provided information about which movies they had seen before the study and other remarks about the experiment. Concluding the procedure, participants received the voucher. The whole experiment lasted about 50 minutes, depending on the time spent on the questionnaires.

Data processing and cleaning

Empatica E4 was synchronized with the stimuli out-of-the-box using a custom application and Empatica E4 SDK. Samsung Watch and Muse 2 devices were synchronized using accelerometer signals. All three devices were placed on the table, which was then hit with a fist. The first peak in the ACC signal was used to find the time shift between the devices, Fig. 4. All times were synchronized to the Empatica E4 time.

Fig. 4
figure 4

The time difference between the devices used in the study identified by recording the ACC signal when devices were moved according to the synchronization procedure.

Each device stored data in a different format and structure. We unified the data to JSON format and divided the experiment into segments covering washouts, film clips, and self-assessment separately. We provide the raw recordings from all used devices. Additionally, we performed further preprocessing for some devices/data and provide it alongside the raw data.

For EEG, the raw signal represents the signal filtered with a 50 Hz notch frequency filter, which is a standard procedure to remove interference caused by power lines. Besides the raw EEG, the Mind Monitor application provides the absolute band power for each channel and five standard frequency ranges (i.e., delta to gamma, see Table 2). According to the Mind Monitor documentation, these are obtained by (1) using a fast Fourier transform (FFT) to compute the power spectral density (PSD) for frequencies in each channel, (2) summing the PSDs over a frequency range, and (3) taking the logarithm of the sum, to get the result in Bels (B). The Mind Monitor documentation presents details https://mind-monitor.com.

The processing of BVP signal from the Samsung Watch PPG sensor consisted of subtracting the mean component, eight-level decomposition using Coiflet1 wavelet transform, and then reconstructing it by the inverse wavelet transform based only on the second and third levels. Amplitude fluctuations were reduced by dividing the middle value of the signal by the standard deviation of a one second long sliding window with an odd number of samples. The final step was signal normalization to the range of [−1,1].

The upper-body recordings were processed with the OpenFace toolkit42,43,44 (version 2.2.0, default parameters) and Quantum Sense software (Research Edition 2017, Quantum CX, Poland). The OpenFace library provides facial landmark points and action units’ values, whereas Quantum Sense recognizes basic emotions (neutral, anger, disgust, happiness, sadness, surprise) and head pose.

Some parts of the signals were of lower quality due to the participants’ movement or improper mounting. For example, the quality of EEG signal can be investigated using Horse Shoe Indicator (HSI) values provided by the device, which represent how well the electrodes fit the participant’s head. For video clips, OpenFace provides information about detected faces with their head pose per one frame. We have not removed low-quality signals so that users of the dataset can decide how to deal with them. Any data-related problems that we identified are included in the data_completeness.csv file.

Data Records

Collected data (physiological signals, upper-body recordings, self-reports, and control questionnaires) are available at Harvard Dataverse Repository45. The types of data available in the Emognition dataset are illustrated in Fig. 5. The upper-body recordings in an MP4 format, full HD resolution (1920 × 1080) constitute 76GB of space. The other data is compressed into the study_data.zip package of size 1GB (16GB after decompression). The data are grouped by participants. Each participant has their folder containing files from all experimental stages (stimulus presentation, washout, self-assessment) and all devices (Muse 2, Empatica E4, Samsung Watch). In total, each participant has 97 files related to:

  • 10 film clips × 3 devices × 3 phases (washout, stimulus, self-assessment) = 90 files with signals;

  • baseline × 3 devices × 2 phases (baseline, self-assessment) = 6 files with signals;

  • a questionnaires.json file containing self-assessment responses, the control questionnaire, and some metadata (demographics and information about wearing glasses, e.g.).

Fig. 5
figure 5

Examples of data available in the Emognition dataset: (a) physiological signals recorded with wearable devices: 4 x EEG (Muse 2); BVP, EDA, SKT (Empatica E4); raw BVP, processed BVP, HR (Samsung Watch); ACC from all devices; (b) upper-body recordings capturing the facial reactions to the stimuli, from the left: neutral, disgust, surprise; (c) facial landmarks generated with the OpenFace library facilitating emotion recognition from the face. The participant gave written consent to include her image in this article.

Additionally, facial annotations are provided in two ZIP packages, OpenFace.zip and Quantum.zip, respectively. The OpenFace package contains facial landmark points and action units’ values (7.4GB compressed, 25GB after decompression). The Quantum Sense package contains values of six basic emotions and head position (0.7GB compressed, 4.7GB after decompression). The values are assigned per video frame.

The files are in JSON format, except OpenFace annotations in CSV format. More technical information (e.g., file naming conventions, variables available in each file) is provided in the README.txt file included in the dataset.

Technical Validation

To test whether film clips elicit targeted emotions, we used repeated-measures analysis of variance (rmANOVA) with Greenhouse-Geisser correction and calculated recommended effect sizes of \({\eta }_{p}^{2}\) for ANOVA tests46,47. To examine differences between the conditions (e.g., whether self-reported amusement in response to the amusing film clips was higher than it was reported in response to the other film clips), we calculated pairwise comparisons with Bonferroni correction of p-values for multiple comparisons.

As summarized in Tables 3, 4, Figs. 6 and 7, watching film clips evoked the targeted emotions. The differences in self-reported emotions in film clips should be interpreted as large48. Pairwise comparisons indicated that self-reported targeted emotions were the highest in the corresponding film clip condition (e.g., self-reported amusement in response to the amusing film clip). Furthermore, we observed that some emotions were intense in more than one film clip condition and some film clips elicited more than one emotion. These are frequent effects during emotion elicitation procedures22,29,30, see Supp. Mat. for details.

Table 3 Results of Repeated Measures Analysis of Variance for Differences Between Conditions in Self-reported Emotions.
Table 4 Results of Repeated Measures Analysis of Variance for Differences Within Conditions in Self-reported Emotions.
Fig. 6
figure 6

Distribution of self-reported emotions between conditions, i.e., what level of a given emotion was elicited by different films (conditions). The chart titles indicate reported emotions, vertical axis denotes the values used in questionnaires (intensity of emotions: 1–5 for discrete emotions, 1–9 for SAM), and X-axis labels represent film clips (conditions); AM - amusement, AN - anger, AW - awe, D - disgust, E - enthusiasm, F - fear, L - liking, SA - sadness, SU - surprise, B - baseline, N - neutral. Green indicates targeted emotion. Boxes depict quartiles of distributions and whiskers the span from the 5th to 95th percentile. Everything outside them (diamonds) is classified as outliers.

Fig. 7
figure 7

Distribution of self-reported discrete emotions within conditions, i.e., what really emotions were evoked by films that were supposed to invoke a given emotion. The chart titles denote film clips (conditions). Vertical axis corresponds to the values used in the questionnaire (intensity of emotions). The horizontal labels represents discrete emotions reported by the participants: AM - amusement, AN - anger, AW - awe, D - disgust, E - enthusiasm, F - fear, L - liking, SA - sadness, SU - surprise. Boxes present quartiles of distributions, whereas whiskers – the span from the 5th to 95th percentile. Everything outside them is treated as outliers (diamonds in the graph).

To validate the quality of the recorded physiological signals, we computed signal-to-noise ratios (SNRs) by fitting the second-order polynomial to the data obtained from the autocorrelation function. It was done separately for all physiological recordings (all participants, baselines, film clips, and experimental stages, see Sec. Data Records). SNR statistics indicated the signals’ high quality. Mean SNR ranged from 26.66 dB to 37.74 dB, with standard deviations from 2.27 dB to 11.13 dB. For one signal, the minimum SNR was 0.88 dB. However, 99.7% of its recordings had SNR values over 5.15 dB. As the experiments were conducted in a sitting position, we did not analyze signals from accelerometers and gyroscopes. For details, see Supp. Mat. Table 3.

Additionally, the Quantum Sense annotations were analyzed to see how well the software recognized emotions. In general, it performed well within conditions, but poorly between conditions. The main reason behind wrong or missing annotations were participants covering face with palm or leaning towards camera. In some cases, participants already seen the movie and react differently – smiled instead of being disguised. For details see Supp. Mat. Sec. Analysis of Quantum Sense Results.

Usage Notes

Emotion Recognition

The most common approach to emotion recognition from physiological signals includes (1) data collection and cleaning; (2) signal preprocessing, synchronization, and integration; (3) feature extraction and selection; and (4) machine learning model training and validation. A comprehensive overview of all these stages can be found in our review on emotion recognition using wearables49.

For further processing of the Emognition dataset, we recommend the following Python libraries, which we analyzed and found useful for feature extraction from physiological data. The pyPhysio library50 (https://github.com/MPBA/pyphysio) facilitates the analysis of ECG, BVP, and EDA signals by providing algorithms for filtering, segmentation, extracting derivatives, and other signal processing. The BioSPPy library (https://biosppy.readthedocs.io/) handles BVP, ECG, EDA, EEG, EMG, and respiration signals. For example, it filters BVP, performs R-peak detection, and computes the instantaneous HR. The Ledalab library51 focuses on the EDA signal and offers both continuous and discrete decomposition analyses. The Kubios software (https://www.kubios.com) enables data import from several HR, ECG, and PPG monitors, and calculates over 40 features from the HRV signal. The PyEEG library (https://github.com/forrestbao/pyeeg) is intended for EEG signal analysis and processing, but it works with any time series data.

Emotion recognition from facial expressions can be achieved by processing video images52. At first, the face has to be detected, and landmarks (distinctive points in facial regions) need to be identified. Tracing the position of landmarks between video frames allows us to measure muscle activity and encode it into Action Units (AUs). AUs can be used to identify emotions as proposed by Ekman and Friesen in their Facial Action Coding System (FACS)53.

Accessing data

The use of the Emognition dataset is limited to academic research purposes only due to the consent given by the participants. The data will be made available after completing the End User License Agreement (EULA). The EULA is located in the dataset repository. It should be signed and emailed to the Emognition Group at mailto:emotions@pwr.edu.pl. The mail has to be sent from an academic email address associated with the Harvard Dataverse platform account.