Elsevier

Signal Processing

Volume 120, March 2016, Pages 359-372
Signal Processing

Feature extraction from smartphone inertial signals for human activity segmentation

https://doi.org/10.1016/j.sigpro.2015.09.029Get rights and content

Highlights

  • Human activity segmentation using Hidden Markov Models.

  • Frequency-based feature extraction from Inertial Signals.

  • RASTA filtering analysis and delta coefficients.

  • Important dimensionality reduction.

Abstract

This paper proposes the adaptation of well-known strategies successfully used in speech processing: Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual Linear Prediction (PLP) coefficients. Additionally characteristics like RASTA filtering or delta coefficients are also considered and evaluated for inertial signal processing. These adaptations have been incorporated into a Human Activity Recognition and Segmentation (HARS) system based on Hidden Markov Models (HMMs) for recognizing and segmenting six different physical activities: walking, walking–upstairs, walking-downstairs, sitting, standing and lying.

All experiments have been done using a publicly available dataset named UCI Human Activity Recognition Using Smartphones, which includes several sessions with physical activity sequences from 30 volunteers. This dataset has been randomly divided into six subsets for performing a six-fold cross validation procedure. For every experiment, average values from the six-fold cross-validation procedure are shown.

The results presented in this paper overcome significantly baseline error rates, constituting a relevant contribution in the field. Adapted MFCC and PLP coefficients improve human activity recognition and segmentation accuracies while reducing feature vector size considerably. RASTA-filtering and delta coefficients contribute significantly to reduce the segmentation error rate obtaining the best results: an Activity Segmentation Error Rate lower than 0.5%.

Introduction

Recently, the research in multisensor networks has increased significantly due to the sensor prize reduction. Thanks to the increment of sensor networks, the number of possible research areas has also increased rapidly. One of these areas is Human Activity Recognition (HAR): recognition of physical human actions. This area of research has received a lot of attention in the last 5 years due to the high number of promising applications and the increasing interest shown by government and commercial organizations. HAR can be performed by using information obtained from various types of sensors: on-body, object-placed or ambient sensors.

One example of environment sensors is video cameras in monitoring areas [1], [2], [3], [4]. Computer vision-based techniques have been widely used for human activity tracking, but the human activity can be also reflected in a rich variety of acoustic events, either produced by the human body or by objects handled by humans. The determination of both the identity of sounds and their position in time may help to detect and describe that human activity [5]. But environment sensors have disadvantages: many environment sensors require infrastructure support, for example, installation of video cameras in the monitoring areas. Also, video data can be obscured due to lighting conditions, type of costumes and background colors, and further video recordings raise privacy issues in many scenarios. Additionally, although supervised users (i.e. in a Smart Home) spend most of their time at home, they also can move from one place to another: for example, to buy smalls things, to go for a walk or to visit friends. In this respect, home environmental sensors are limited by their infrastructure and cannot provide monitoring outside the house. This limitation can be overtaken using on-body sensors [6], [7].

Body-worn sensors can improve and expand the possibilities of the human monitoring system [8], [9]: not only by being able to measure body signals (e.g. physiological, motion, location) but also by providing portable and off-site user supervision at any location without the need of fixed infrastructure. The work presented in [8] was pioneer in developing an approach for the classification of Activities of Daily Living (ADL) using five body-worn accelerometers and employing well-known machine learning classifiers. Different approaches present different adaptation of motion sensors in different body parts such as the waist, wrist, chest and thighs achieving good classification performance [10], [11], [12], [13]. Unfortunately there are also limitations that arise with the use of body sensors such as user discomfort while wearing them and energy-limited mobile devices. These sensors are usually uncomfortable for the common user and do not provide a long-term solution for activity monitoring (e.g. sensor repositioning after dressing [8]). Even some forgetful old people or patients with Alzheimer might not wear the sensors nor recharge the battery, complicating the applicability.

In recent years, smartphones have become widespread. Smartphones have brought new research opportunities for human-centered applications. In these applications, the user is the main source of information and the phone is the firsthand sensing tool. Almost all smartphones include embedded built-in sensors such as microphones, dual cameras, accelerometers, gyroscopes, etc. The use of smartphones with inertial sensors is a very interesting possibility for monitoring ADL. Smartphone-based applications have advantages when compared with other well-known wearable HAR alternatives that use special-purpose devices or on-body sensor networks (e.g. [14], [15]). These advantages are easy device portability, unobtrusive sensing provided by its embedded sensors and the processing power of new smartphones that allow online computation. Because of this aspect, in the last 5 years, some works focused on HAR using smartphones have been developed. For instance in [16], authors approach to use a Smartphone for HAR considering its embedded triaxial accelerometers. Additional results have also been presented in [17], [18], [19], [20], [21].

Improvements are still expected in topics such as activity modeling, feature extraction, standardizing performance evaluation metrics [22], and providing public data for evaluation.

This work deals with the improvement of feature extraction from inertial signals. This paper proposes the use of frequency-based features (widely considered in speech processing) for human activity segmentation using accelerometer and gyroscope signals from smartphones. The results presented in this paper overcome significantly baseline results, constituting a relevant contribution in this area.

This paper is organized as follows. Section 2 presents the background, summarizing the main contributions of this paper compared to previous works. Section 3 includes the paper justification. Section 4 describes dataset and evaluation metrics. Section 5 presents an overview of the recognition system. Section 6 describes the process for adapting speech techniques to inertial signals. Final results and discussions are presented in Section 7. Finally, Section 8 summarizes the main conclusions.

Section snippets

Background

Feature extraction for HAR depends strongly on the type of sensors used to develop the monitoring system. For example, when considering cameras for HAR, it is possible to use video-based features proposed in a wide research area like video processing [23]. One example is [24], where authors propose Histograms of Oriented Gradients (HOG). HOG are feature descriptors usually used in computer vision and image processing for the purpose of object detection. Other examples are the semantic features

Justification

Although some works in feature extraction have been reported, there is still an important lack of agreements in this subject. In order to improve feature extraction for inertial signals obtained from human movements, it is interesting to analyze similar areas with a long research trajectory like speech processing. Speech signals and these inertial signals have been generated by human physical movements, so they have important similarities.

The first aspect is that both types of signals contain

Data base and evaluation metrics used in the experiments

This work has been developed using the Human Activity Recognition Using Smartphones Data Set, available at the UCI Machine Learning Repository [26], [40]. This dataset contains inertial information (from smartphones sensors) of a group of 30 people within an age ranging from 19 to 48 years. Each person performed several times six different physical activities (walking, walking-upstairs, walking-downstairs, sitting, standing and lying) while wearing a smartphone (Samsung Galaxy S II). Using its

System architecture

Fig. 3 shows the general architecture of the HARS system used in this work. The system is composed by five main modules or steps: signal pre-processing, feature extraction, HMMs training, Activity Sequence Model (ASM) training, and activity recognition-segmentation.

In the pre-processing module, sensor signals (accelerometer and gyroscope) are sampled at 50 Hz rate and filtered for noise reduction. Gravitational and body motion components included in the sensor acceleration signals are separated

Baseline feature extraction

The baseline feature extraction module obtains vectors combining several measurements from time and frequency domains. From the time domain, well-known standard measures are considered [33] like the mean, correlation, SMA and auto regression coefficients [34]. From the frequency domain, the feature vectors include new characteristics like energy of different frequency bands and frequency skewness. Other features like the angle between vectors (e.g. mean body acceleration and y vector) are also

Final results

This section presents the final results over test sets. These experiments have been done considering the best system configuration obtained from the analyses carried out over the validation sets (see previous section). Table 4 includes final results that are very similar to those obtained in the system developing section.

In order to obtain the best possible result, the next table (Table 5) includes some experiments when activating the Activity Sequence Model for improving the results. This

Discussion and conclusions

This work has focused on improving the feature extraction module of a HARS system based on HMMs. This feature extractor tries to obtain the most relevant characteristics from inertial signals for recognizing and segmenting six different physical activities: walking, walking-upstairs, walking-downstairs, sitting, standing and lying. This paper has proposed to adapt to the field well-known strategies successfully used in speech processing: MFCCs and PLP coefficients. Additionally characteristics

Acknowledgments

Authors want to thank UCI Machine Language Repository and specially the researchers who make the record and develop the Human Activity Recognition Using Smartphones Data Set: researchers from Smartlab (Non-Linear Complex Systems Laboratory, DITEN, Università degli Studi di Genova) and Technical Research Centre for Dependency Care and Autonomous Living (Universitat Politècnica de Catalunya).

References (43)

  • L. Bao, S.S. Intille, Activity recognition from user-annotated acceleration data, in: T. Kanade, J. Kittler, J.M....
  • P. Lukowicz, J.A. Ward, H. Junker, M. St¨ager, G. Tröster, A. Atrash and T. Starner, 2004. Recognizing workshop...
  • Pierluigi Casale, Oriol Pujol and Petia Radeva, Human activity recognition from accelerometer data using a wearable...
  • Narayanan C Krishnan, Dirk Colbry, Colin Juillard and Sethuraman Panchanathan, Real time human activity recognition...
  • R. Nishkam, D. Nikhil, M. Preetham, and M.L. Littman, Activity recognition from accelerometer data. In Proceedings of...
  • Y. Hanai, J. Nishimura, and T. Kuroda, Haar-like filtering for human activity recognition using 3d accelerometer, in:...
  • A. Mannini et al.

    Machine learning methods for classifying human physical activity from on-body accelerometers

    Sensors

    (2010)
  • L.T. Vinh et al.

    Semi-markov conditional random fields for accelerometer-based activity recognition

    Appl. Intell.

    (2011)
  • M. Berchtold, M. Budde, D. Gordon, H. Schmidtke, M. Beigl, Activity recognition service for mobile phones,...
  • T. Brezmes, J.L. Gorricho, J. Cotrina, Activity recognition from accelerometer data on a mobile phone, Distributed...
  • Jennifer R Kwapisz, Gary M Weiss and Samuel A Moore, Cell phone-based biometric identification, 2010 Fourth IEEE...
  • Cited by (87)

    • An Integration of feature extraction and Guided Regularized Random Forest feature selection for Smartphone based Human Activity Recognition

      2022, Journal of Network and Computer Applications
      Citation Excerpt :

      So, before feeding the data into the classification algorithm to recognize human physical activities, it is necessary to extract relevant features to improve the classification accuracy. Mainly time and frequency domain feature extraction methods (San-Segundo et al., 2016; Zhu et al., 2017) are used in smartphone-based HAR systems. Sometimes only using feature extraction methods lead to a complex interpretation of the features.

    • Magic-hand: Turn a smartwatch into a mouse

      2021, Pervasive and Mobile Computing
    View all citing articles on Scopus
    View full text