Skip to main content
Log in

Human activity recognition using segmented body part and body joint features with hidden Markov models

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, human activity recognition from video has been getting considerable research attentions by computer vision researchers due to its prominent applications in various fields such as surveillance environments, human computer interactions, and smart home healthcare. For instance, activity recognition can be used in a surveillance environment to alert the related authority of potential dangerous behaviors. Similarly, the activity recognition can improve the human computer interaction (HCI) in an entertainment environment such as the automatic recognition of different player’s actions in a game so as to create an avatar to play on behalf for the player. Furthermore, the activity recognition can help the rehabilitation of patients in a healthcare system where patient’s action recognition can help to facilitate the rehabilitation processes. Basically, a video-based activity recognition system consists of many prominent goals, one of which is to provide information based on people’s behavior in order to allow the system to proactively assist them with their tasks. A novel approach is proposed here for depth video based human activity recognition, using joint-based spatiotemporal features of depth body shapes and hidden Markov models. From depth video, different body parts of human activities are first segmented using a trained random forest. Spatial features consisting of the 3-D body joint pair angles, the mean of the depth values, the variance of the depth values, and the area of each segmented body part are combined with the motion features representing the magnitude and direction of each joint in the next frame to build the spatiotemporal features in a frame. The activity features are then further enhanced using generalized discriminant analysis to classify them nonlinearly in order to convert them to more robust features. Finally, the features are utilized for training distinguished activity hidden Markov models that can be later used for recognition. The proposed approach shows superior recognition performance compared to other conventional activity recognition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Althloothi S, Mahoor MH, Zhang X, Voyles RM (2014) Human activity recognition using multi-features and multiple kernel learning. Pattern Recogn 47(5):1800–1812

    Article  Google Scholar 

  2. Baum E, Eagon J (1967) An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull Am Math Soc 73:360–363

    Article  MathSciNet  MATH  Google Scholar 

  3. Bosch A, Zisserman A and Munoz X (2007) Image classification using random forests and ferns. IEEE Int Conf Comput Vis 1–8

  4. Breitenstein MD, Jensen J, Hoilund C, Moeslund TB and Van Gool L (2009) Head pose estimation from passive stereo images. In: Proceedings of 16th Scandinavian Conference on Image Analysis, p 219–228

  5. Breitenstein MD, Kuettel D, Weise T, Van Gool L and Pfister H (2008) Real-time face pose estimation from single range images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 1–8

  6. Breuer P, Eckes C and Muller S (2007) Hand gesture recognition with a novel IR time-of-flight range camera: a pilot study. In: Proceedings of the 3rd International Conference on Computer vision/computer graphics collaboration techniques, p 247–260

  7. Cai Q, Gallup D, Zhang C and Zhang Z (2010) 3D deformable face tracking with a commodity depth camera. In: Proceeding of European Conference on Computer Vision, p 229–242

  8. Chang KI, Bowyer KW, Flynn PJ (2006) Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Trans Pattern Anal Mach Intell 28(10):1695–1700

    Article  Google Scholar 

  9. Derpanis K, Wildes R, Tsotsos J (2004) Hand gesture recognition within a linguistics-based framework. In: Proceedings of European Conference on Computer Vision, p 282–296

  10. Dreuw P, Ney H, Martinez G, Crasborn O, Piater J, Moya JM and Wheatley M (2010) The signspeak project - bridging the gap between signers and speakers. In: Proceedings of International Conference on Language Resources and Evaluation, p 476–481

  11. Dreuw P, Ney H, Martinez G, Crasborn O, Piater J, Moya JM, Wheatley M (2010) The signspeak project - bridging the gap between signers and speakers. In: Proceedings of International Conference on Language Resources and Evaluation, p 476–481

  12. Fanelli G, Gall J and Van Gool L (2011) Real time head pose estimation with random regression forests. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 617–624

  13. Ferrari V, Jimenez M-M, Zisserman A (2009) 2D human pose estimation in TV shows, visual motion analysis. LNCS 5604:128–147

    Google Scholar 

  14. Hamer H, Gall J, Weise T and Van Gool L (2010) An object-dependent hand pose prior from sparse training data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 671–678

  15. Hamer H, Gall J, Weise T, Van Gool L (2010) An object-dependent hand pose prior from sparse training data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 671–678

  16. Hamer H, Schindler K, Koller-Meier E, and Van Gool L (2009) Tracking a hand manipulating an object. In: Proceedings of IEEE International Conference on Computer Vision, p 1475–1482

  17. Hamer H, Schindler K, Koller-Meier E, Van Gool L (2009) Tracking a hand manipulating an object. In: Proceedings of IEEE International Conference on Computer Vision, p 1475–1482

  18. Iddan GJ, Yahav G (2001) 3D imaging in the studio (and elsewhere…). Proc SPIE 4298:48–55

    Article  Google Scholar 

  19. Jalal A, Uddin MZ, Kim JT, Kim TS (2011) Recognition of human home activities via depth silhouettes and transformation for smart homes. Indoor Built Environ 21(1):184–190

    Article  Google Scholar 

  20. Jalal A, Uddin MZ, Lee JJ, Kim T-S (2012) Recognition of human home activities via depth silhouettes and R transformation for smart home. Indoor Built Environ 21(1):184–190

    Article  Google Scholar 

  21. Kanungu T, Mount DM, Netanyahu N, Piatko C, Silverman R and Wu AY (2000) The analysis of a simple k-means clustering algorithm. 16th ACM Symposium on Computational Geometry, p 101–109

  22. Knossow D, Ronfard R, Horaud R (2008) Human motion tracking with a kinematic parameterization of extremal contours. Int J Comput Vis 79(3):247–269

    Article  Google Scholar 

  23. Kollorz E, Penne J, Hornegger J, Barke A (2008) Gesture recognition with a time-of-flight camera. Int J Intell Syst Technol Appl 5:334–343

    Google Scholar 

  24. Koppula HS, Gupta R, Saxena A (2013) Human activity learning using object affordances from rgb-d videos. Int J Robot Res 32(8):951–970

    Article  Google Scholar 

  25. Lahamy H, Litchi D (2010) Real-time hand gesture recognition using range cameras. In: Proceedings of Canadian Geomatics Conference

  26. Lawrence R, Rabiner A (1989) Tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  27. Lei J, Ren X and Fox D (2012) Fine-grained kitchen activity recognition using rgb-d. In: Proceedings of ACM Conference on Ubiquitous Computing, p 208-211

  28. Lepetit V, Fua P (2006) Keypoint recognition using randomized trees. IEEE Trans Pattern Anal Mach Intell 28:1465–1479

    Article  Google Scholar 

  29. Li Z and Jarvis R (2009) Real time hand gesture recognition using a range camera. In: Proceedings of Australasian Conference on Robotics and Automation

  30. Li W, Zhang Z, Liu Z (2008) Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans Circuits Syst Video Technol 18(11):1499–1510

    Article  Google Scholar 

  31. Li W, Zhang Z and Liu Z (2010) Action recognition based on a bag of 3d points. In: Proceedings of Workshop on Human Activity Understanding from 3D Data, p 9–14

  32. Linde Y, Buzo A, Gray R (1980) An algorithm for vector quantizer design. IEEE Trans Commun 28(1):84–94

    Article  Google Scholar 

  33. Liu X and Fujimura K (2004) Hand gesture recognition using depth data. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, p 529–534

  34. Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. International Joint Conference on Artificial Intelligence (IJCAI), p 1493–1500

  35. Lu X and Aggarwal J (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p 2834–2841, IEEE, Portland

  36. Lu X and Jain AK (2006) Automatic feature extraction for multiview 3d face recognition. In: Proceedings of 7th International Conference on Automatic Face and Gesture Recognition, p 585–590

  37. Lu H, Plataniotis KN, Venetsanopoulos AN (2008) A full-body layered deformable model for automatic model-based gait recognition. EURASIP J Adv Signal Proc 2008:1–13

    MATH  Google Scholar 

  38. Luo J, Wang W and Qi H (2013) Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: Proceedings of IEEE International Conference on Computer Vision, p 1809–1816

  39. Luong DD, Lee S and Kim T-S (2013) Human computer interface using the recognized finger parts of hand depth silhouette via random forests. In: Proceedings of 13th International Conference on Control, Automation and Systems, p 905–909

  40. Marnik J (2007) The polish finger alphabet hand postures recognition using elastic graph matching. Comput Recog Syst 2 45:454–461

    Google Scholar 

  41. Martinez-Camarena M, Oramas MJ and Tuytelaars T (2015) Towards sign language recognition based on body parts relations. In: Proceedings of IEEE International Conference on Image Processing (ICIP), p 2454–2458

  42. McCallum A, Freitag D and Pereira FCN (2000) Maximum entropy markov models for information extraction and segmentation. In: Proceedings of International Conference on Machine Learning, p 591–598

  43. Mian A, Bennamoun M and Owens R (2006) Automatic 3d face detection, normalization and recognition. In: Proceedings of Third International Symposium on 3D Data Processing, Visualization, and Transmission, p 735–742

  44. Microsoft Corporation, “Kinect for Xbox 360-Xbox.com”, [Online]. Available: http://www.xbox.com/en-GB/kinect/, [2014, August 28]

  45. Mo Z and Neumann U (2006) Real-time hand pose recognition using low-resolution depth images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 1499–1505

  46. Morency LP, Sundberg P and Darrell T (2003) Pose estimation using 3d view-based eigenspaces. In: Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures, p 45–52

  47. Nair P, Cavallaro A (2009) 3-d face detection, landmark localization, and registration using a point distribution model. IEEE Trans Multimedia 11(4):611–623

    Article  Google Scholar 

  48. Nishimura H and Tsutsumi M (2001) Off-line hand-written character recognition using integrated 1DHMMs based on feature extraction filters. Sixth International Conference on Document Analysis and Recognition, p 417–421

  49. Ohn-Bar E and Trivedi M (2013) Joint angles similarities and hog2 for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), p 465–470

  50. Oikonomidis I, Kyriazis N and Argyros AA (2012) Tracking the articulated motion of two strongly interacting hands. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 1862–1869

  51. Oikonomidis I, Kyriazis N, Argyros AA (2012) Tracking the articulated motion of two strongly interacting hands. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 1–8

  52. Ong S, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 27(6):873–891

    Article  Google Scholar 

  53. Oreifej O and Liu Z (2013) Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 716–723

  54. Oreifej O and Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p 716–723, Portland

  55. Pei T, Starner T, Hamilton H, Essa I and Rehg J (2009) Learnung the basic units in american sign language using discriminative segmental feature selection. In: Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, p 4757–4760

  56. Pei T, Starner T, Hamilton H, Essa I, Rehg J (2009) Learnung the basic units in american sign language using discriminative segmental feature selection. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, p 4757–4760

  57. Penne J, Soutsche, Fedorowicz L and Hornegger J (2008) Robust real-time 3d time-of-flight based gesture navigation. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, p 1–2

  58. Seemann E, Nickel K and Stiefelhagen R (2004) Head pose estimation using stereo vision for human-robot interaction. In: Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, p 626–631

  59. Segundo M, Silva L, Bellon O, Queirolo C (2010) Automatic face segmentation and facial landmark detection in range images. IEEE Trans Syst Man Cybern Part B Cybern 40(5):1319–1330

    Article  Google Scholar 

  60. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2013) Real-time human pose recognition in parts from single depth images. Mach Learn Comput Vis 411:119–135

    Google Scholar 

  61. Simari P, Nowrouzezahrai D, Kalogerakis E, Singh K (2009) Multi-objective shape segmentation and labeling. Eurographics Symp Geom Process 28:1415–1425

    Google Scholar 

  62. Song YM, Noh S, Yu J, Park CW, Lee BG (2014) Background subtraction based on Gaussian mixture models using color and depth information. International Conference on Control, Automation and Information Sciences (ICCAIS), p 132–135

  63. Soutschek S, Penne J, Hornegger J and Kornhuber J (2008) 3-D gesture-based scene navigation in medical imaging applications using time-of-flight cameras. In: Proceedings of Workshop on Time of Flight Camera based Computer Vision, p 1–6

  64. Starner T, Weaver J, Pentland A (1998) Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375

    Article  Google Scholar 

  65. Sun Y and Yin L (2008) Automatic pose estimation of 3d facial models. In: Proceedings of International Conference on Pattern Recognition, p 1–4

  66. Sung J, Ponce C, Selman B and Saxena A (2012) Unstructured human activity detection from rgbd images. In: Proceedings of IEEE International Conference on Robotics and Automation, p 842–849

  67. Takimoto H, Yoshimori S, Mitsukura Y and Fukumi M (2010) Classification of hand postures based on 3d vision model for human-robot interaction. In: Proceedings of International Symposium on Robot and Human Interactive Communication, p 292–297

  68. Theodorakis S, Pitsikalis V, Maragos P (2010) Model-level data-driven sub-units for signs in videos of continuous sign language. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, p 2262–2265

  69. Uddin MZ, Hassan MM (2013) A depth video-based facial expression recognition system using radon transform, generalized discriminant analysis, and hidden Markov model. Multimed Tools Appl. doi:10.1007/s11042-013-1793-1

    Google Scholar 

  70. Uddin MZ, Kim T-S (2014) A 3-D body joint-specific HMM-based approach for human activity recognition from stereo posture image sequence. Multimed Tools Appl. doi:10.1007/s11042-014-2225-6

    Google Scholar 

  71. Uddin MZ, Kim DH, Kim JT, Kim T-S (2013) An indoor human activity recognition system for smart home using local binary pattern features with hidden Markov models. Indoor Built Environ 22:289–298

    Article  Google Scholar 

  72. Uddin MZ, Kim T-S, Kim J-T (2013) A spatiotemporal robust approach for human activity recognition. Int J Adv Robot Syst. doi:10.5772/57054

    Google Scholar 

  73. Uddin MZ, Lee JJ, Kim T-S (2010) Independent shape component-based human activity recognition via hidden Markov model. J Appl Intell 33:193–206

    Article  Google Scholar 

  74. Van den Bergh M and Van Gool L (2011) Combining rgb and tof cameras for real-time 3d hand gesture interaction. In: Proceedings of IEEE Workshop on Applications of Computer Vision, p 66–72

  75. Vieira A, Nascimento E, Oliveira G, Liu Z and Campos M (2012) Stop: space-time occupancy patterns for 3d action recognition from depth map sequences. In: Proceedings of Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, p 252–259

  76. Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MFM (2012) STOP: Space-time occupancy patterns for 3D action recognition from depth map sequences. Lect Notes Comput Sci 7441:252–259

    Article  Google Scholar 

  77. Vo VH, Ly NQ, Son TT and Hoang PM (2015) Multiple kernel learning and optical flow for action recognition in RGB-D video. In: Proceedings of Seventh International Conference on Knowledge and Systems Engineering (KSE), p 222–227

  78. Wang Y, Huang K and Tan T (2007) Human activity recognition based on r transform. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 1–8

  79. Wang J, Liu Z, Chorowski J, Chen Z and Wu Y (2012) Robust 3d action recognition with random occupancy patterns. In: Proceedings of European Conference on Computer Vision, p 872–885

  80. Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. Lect Notes Comput Sci 7573:872–885

    Article  Google Scholar 

  81. Wang J, Liu Z, Wu Y and Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p 1290–1297, IEEE, Providence

  82. Wang C, Wang Y, Yuille A (2013) An approach to pose-based action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p 915–922

  83. Weise T, Bouaziz S, Li H, Pauly M (2011) Realtime performance-based facial animation. ACM Trans Graph 30(4):1–10, article 77

    Article  Google Scholar 

  84. Weise T, Leibe B and Van Gool L (2007) Fast 3d scanning with automatic motion compensation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, p 1–8

  85. Wright J and Hua G (2009) Implicit elastic matching with random projections for pose-variant face recognition. IEEE conf Comput Vis Pattern Recognit 1502–1509

  86. Yang HD, Lee SW (2007) Reconstruction of 3D human body pose from stereo image sequences based on top-down learning. J Pattern Recognit 40(11):3120–3131

    Article  MATH  Google Scholar 

  87. Yang HD, Sclaroff S, Lee SW (2009) Sign language spotting with a threshold model based on conditional random fields. IEEE Trans Pattern Anal Mach Intell 31(7):1264–1277

    Article  Google Scholar 

  88. Yang X and Tian Y (2012) Eigenjoints-based action recognition using naive-bayesnearest-neighbor. In: Proceedings of Workshop on Human Activity Understanding from 3D Data, p 14–19

  89. Yang X, Zhang C and Tian Y (2012) Recognizing actions using depth motion mapsbased histograms of oriented gradients. In: Proceedings of ACM International Conference on Multimedia, p 1057–1060

  90. Zafrulla Z, Brashear H, Hamilton H, Starner T (2010) A novel approach to American sign language (asl) phrase verification using reversed signing. In: Proceedings of IEEE Workshop on CVPR for Human Communicative Behavior Analysis, p 48–55

Download references

Acknowledgments

This work was supported by the Samsung Research Fund, Sungkyunkwan University, 2015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Zia Uddin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uddin, M.Z. Human activity recognition using segmented body part and body joint features with hidden Markov models. Multimed Tools Appl 76, 13585–13614 (2017). https://doi.org/10.1007/s11042-016-3742-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3742-2

Keywords

Navigation