Continuous human action recognition in real time

Guo, Ping; Miao, Zhenjiang; Shen, Yuan; Xu, Wanru; Zhang, Dianyong

doi:10.1007/s11042-012-1084-2

Continuous human action recognition in real time

Published: 14 June 2012

Volume 68, pages 827–844, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ping Guo¹,
Zhenjiang Miao¹,
Yuan Shen¹,
Wanru Xu¹ &
…
Dianyong Zhang¹

527 Accesses
10 Citations
Explore all metrics

Abstract

This paper discusses the task of continuous human action recognition. By continuous, it refers to videos that contain multiple actions which are connected together. This task is important to applications like video surveillance and content based video retrieval. It aims to identify the action category and detect the start and end key frame of each action. It is a challenging task due to the frequent changes of human actions and the ambiguity of action boundaries. In this paper, a novel and efficient continuous action recognition framework is proposed. Our approach is based on the bag of words representation. A visual local pattern is regarded as a word and the action is modeled by the distribution of words. A generative translation and scale invariant probabilistic Latent Semantic Analysis model is presented. The continuous action recognition result is obtained frame by frame and updated from time to time. Experimental results show that this approach is effective and efficient to recognize both isolated actions and continuous actions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous Action Recognition Based on Sequence Alignment

Article 09 September 2014

Improved use of descriptors for early recognition of actions in video

Article 01 July 2022

A statistical framework for few-shot action recognition

Article 05 April 2021

References

Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):428–440
Article Google Scholar
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tool Appl 51(1):279–302
Article Google Scholar
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In Proceedings of the International Conference on Computer Vision
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. Proceedings of the European Conference on Computer Vision
Cao LL, Liu ZC, Huang TS (2010) Cross-dataset action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In Proceedings of the International Conference on Computer Vision
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82–98
Article MATH Google Scholar
Girolami M, Kaban A (2003) On an equivalence between PLSI and LDA, SIGIR
Gutman P, Velger M (1990) Tracking targets using adaptive Kalman filtering. IEEE Trans Aero Electron Syst 26(5):691–699
Article Google Scholar
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Article MATH Google Scholar
Hu YX, Cao LL, Lv F, Yan SC, Gong YH, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of the International Conference on Computer Vision
Huazhong N, Han TX, Walther DB, Ming L, Huang TS (2009) Hierarchical space-time model enabling efficient search for human actions. IEEE Trans Circ Syst Video Tech 19(6):808–820
Article Google Scholar
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In Proceedings of the International Conference on Computer Vision
Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In Proceedings of the International Conference on Computer Vision
Ke Y, Sukthankar R, Hebert M (2010) Volumetric features for video event detection. Int J Comput Vis 88(3):339–362
Article MathSciNet Google Scholar
Laptev I (2005) On space-time interest points. Intl. Journal of Computer Vision
Laptev I, Perez P (2007) Retrieving actions in movies. In Proceedings of the International Conference on Computer Vision
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In Proceedings of the International Conference on Computer Vision
Liu J, Shah M (2008) Learning human action via information maximization, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Understand 104(2):90–126
Article Google Scholar
Niebles JC, Wang HC, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Article Google Scholar
Ping G, Zhenjiang M, Yuan S, Heng-Da C (2010) Real time human action recognition in a long video sequence. IEEE International Conference on Advanced Video and Signal Based Surveillance
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Quelhas P, Monay F, Odobez J-M, Gatica-Perez D, Tuytelaars T (2007) A thousand words in a scene. IEEE Trans Pattern Anal Mach Intell 29(9):1575–1589
Article Google Scholar
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In Proceedings of the International Conference on Pattern Recognition
Seo HJ, Milanfar P (2011) Action recognition from one example. IEEE Trans Pattern Anal Mach Intell 33(5):867–882
Article Google Scholar
Shechtman E, Irani M (2007) Space-time behavior based correlation– or –how to tell if two underlying motion fields are similar without computing them? IEEE Trans Pattern Anal Mach Intell 29(11):2045–2056
Article Google Scholar
Shen Y, Miao ZJ (2010) An improved background subtraction method based on region growing International Symposium on Intelligent Signal Processing and Communication Systems
Simon C, Meessen J, De Vleeschouwer C (2009) Visual event recognition using decision trees. Multimed Tool Appl 50(1):95–121
Article Google Scholar
Snoek C, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tool Appl 25(1):5–35
Article Google Scholar
Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimed 12(7):665–681
Article Google Scholar
Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774
Article Google Scholar
Wong S, Kim T, Cipolla R (2007) Learning motion categories using both semantic and structural information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process 19(7):1908–1920
Article MathSciNet Google Scholar
Yuan J, Liu ZC, Wu Y (2009) Discriminative subvolume search for efficient action detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition

Download references

Acknowledgments

This work is supported by National Natural Science Foundation Program 60973061, National 973 Key Research Program of China 2011CB302203, and Ph.D. Programs Foundation of Ministry of Education of China 20100009110004. We thank all anonymous reviewers for their comments and suggestions that have helped us to improve our work. Especially, the computation of confidence intervals using a Gaussian approximation in this paper is presented based on one reviewer’s comments.

Author information

Authors and Affiliations

Beijing Jiaotong University, Beijing, China
Ping Guo, Zhenjiang Miao, Yuan Shen, Wanru Xu & Dianyong Zhang

Authors

Ping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhenjiang Miao
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Wanru Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dianyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, P., Miao, Z., Shen, Y. et al. Continuous human action recognition in real time. Multimed Tools Appl 68, 827–844 (2014). https://doi.org/10.1007/s11042-012-1084-2

Download citation

Published: 14 June 2012
Issue Date: February 2014
DOI: https://doi.org/10.1007/s11042-012-1084-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous human action recognition in real time

Abstract

Access this article

Similar content being viewed by others

Continuous Action Recognition Based on Sequence Alignment

Improved use of descriptors for early recognition of actions in video

A statistical framework for few-shot action recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Continuous human action recognition in real time

Abstract

Access this article

Similar content being viewed by others

Continuous Action Recognition Based on Sequence Alignment

Improved use of descriptors for early recognition of actions in video

A statistical framework for few-shot action recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation