Abstract
This paper discusses the task of continuous human action recognition. By continuous, it refers to videos that contain multiple actions which are connected together. This task is important to applications like video surveillance and content based video retrieval. It aims to identify the action category and detect the start and end key frame of each action. It is a challenging task due to the frequent changes of human actions and the ambiguity of action boundaries. In this paper, a novel and efficient continuous action recognition framework is proposed. Our approach is based on the bag of words representation. A visual local pattern is regarded as a word and the action is modeled by the distribution of words. A generative translation and scale invariant probabilistic Latent Semantic Analysis model is presented. The continuous action recognition result is obtained frame by frame and updated from time to time. Experimental results show that this approach is effective and efficient to recognize both isolated actions and continuous actions.
Similar content being viewed by others
References
Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):428–440
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tool Appl 51(1):279–302
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In Proceedings of the International Conference on Computer Vision
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. Proceedings of the European Conference on Computer Vision
Cao LL, Liu ZC, Huang TS (2010) Cross-dataset action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In Proceedings of the International Conference on Computer Vision
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82–98
Girolami M, Kaban A (2003) On an equivalence between PLSI and LDA, SIGIR
Gutman P, Velger M (1990) Tracking targets using adaptive Kalman filtering. IEEE Trans Aero Electron Syst 26(5):691–699
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Hu YX, Cao LL, Lv F, Yan SC, Gong YH, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of the International Conference on Computer Vision
Huazhong N, Han TX, Walther DB, Ming L, Huang TS (2009) Hierarchical space-time model enabling efficient search for human actions. IEEE Trans Circ Syst Video Tech 19(6):808–820
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In Proceedings of the International Conference on Computer Vision
Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In Proceedings of the International Conference on Computer Vision
Ke Y, Sukthankar R, Hebert M (2010) Volumetric features for video event detection. Int J Comput Vis 88(3):339–362
Laptev I (2005) On space-time interest points. Intl. Journal of Computer Vision
Laptev I, Perez P (2007) Retrieving actions in movies. In Proceedings of the International Conference on Computer Vision
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In Proceedings of the International Conference on Computer Vision
Liu J, Shah M (2008) Learning human action via information maximization, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Understand 104(2):90–126
Niebles JC, Wang HC, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Ping G, Zhenjiang M, Yuan S, Heng-Da C (2010) Real time human action recognition in a long video sequence. IEEE International Conference on Advanced Video and Signal Based Surveillance
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Quelhas P, Monay F, Odobez J-M, Gatica-Perez D, Tuytelaars T (2007) A thousand words in a scene. IEEE Trans Pattern Anal Mach Intell 29(9):1575–1589
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In Proceedings of the International Conference on Pattern Recognition
Seo HJ, Milanfar P (2011) Action recognition from one example. IEEE Trans Pattern Anal Mach Intell 33(5):867–882
Shechtman E, Irani M (2007) Space-time behavior based correlation– or –how to tell if two underlying motion fields are similar without computing them? IEEE Trans Pattern Anal Mach Intell 29(11):2045–2056
Shen Y, Miao ZJ (2010) An improved background subtraction method based on region growing International Symposium on Intelligent Signal Processing and Communication Systems
Simon C, Meessen J, De Vleeschouwer C (2009) Visual event recognition using decision trees. Multimed Tool Appl 50(1):95–121
Snoek C, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tool Appl 25(1):5–35
Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimed 12(7):665–681
Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774
Wong S, Kim T, Cipolla R (2007) Learning motion categories using both semantic and structural information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process 19(7):1908–1920
Yuan J, Liu ZC, Wu Y (2009) Discriminative subvolume search for efficient action detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition
Acknowledgments
This work is supported by National Natural Science Foundation Program 60973061, National 973 Key Research Program of China 2011CB302203, and Ph.D. Programs Foundation of Ministry of Education of China 20100009110004. We thank all anonymous reviewers for their comments and suggestions that have helped us to improve our work. Especially, the computation of confidence intervals using a Gaussian approximation in this paper is presented based on one reviewer’s comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, P., Miao, Z., Shen, Y. et al. Continuous human action recognition in real time. Multimed Tools Appl 68, 827–844 (2014). https://doi.org/10.1007/s11042-012-1084-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1084-2