Skip to main content
Log in

Continuous human action recognition in real time

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper discusses the task of continuous human action recognition. By continuous, it refers to videos that contain multiple actions which are connected together. This task is important to applications like video surveillance and content based video retrieval. It aims to identify the action category and detect the start and end key frame of each action. It is a challenging task due to the frequent changes of human actions and the ambiguity of action boundaries. In this paper, a novel and efficient continuous action recognition framework is proposed. Our approach is based on the bag of words representation. A visual local pattern is regarded as a word and the action is modeled by the distribution of words. A generative translation and scale invariant probabilistic Latent Semantic Analysis model is presented. The continuous action recognition result is obtained frame by frame and updated from time to time. Experimental results show that this approach is effective and efficient to recognize both isolated actions and continuous actions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):428–440

    Article  Google Scholar 

  2. Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tool Appl 51(1):279–302

    Article  Google Scholar 

  3. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In Proceedings of the International Conference on Computer Vision

  4. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. Proceedings of the European Conference on Computer Vision

  6. Cao LL, Liu ZC, Huang TS (2010) Cross-dataset action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  7. Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  8. Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from google’s image search. In Proceedings of the International Conference on Computer Vision

  9. Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82–98

    Article  MATH  Google Scholar 

  10. Girolami M, Kaban A (2003) On an equivalence between PLSI and LDA, SIGIR

  11. Gutman P, Velger M (1990) Tracking targets using adaptive Kalman filtering. IEEE Trans Aero Electron Syst 26(5):691–699

    Article  Google Scholar 

  12. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196

    Article  MATH  Google Scholar 

  13. Hu YX, Cao LL, Lv F, Yan SC, Gong YH, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of the International Conference on Computer Vision

  14. Huazhong N, Han TX, Walther DB, Ming L, Huang TS (2009) Hierarchical space-time model enabling efficient search for human actions. IEEE Trans Circ Syst Video Tech 19(6):808–820

    Article  Google Scholar 

  15. Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In Proceedings of the International Conference on Computer Vision

  16. Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In Proceedings of the International Conference on Computer Vision

  17. Ke Y, Sukthankar R, Hebert M (2010) Volumetric features for video event detection. Int J Comput Vis 88(3):339–362

    Article  MathSciNet  Google Scholar 

  18. Laptev I (2005) On space-time interest points. Intl. Journal of Computer Vision

  19. Laptev I, Perez P (2007) Retrieving actions in movies. In Proceedings of the International Conference on Computer Vision

  20. Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In Proceedings of the International Conference on Computer Vision

  21. Liu J, Shah M (2008) Learning human action via information maximization, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  22. Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Understand 104(2):90–126

    Article  Google Scholar 

  23. Niebles JC, Wang HC, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  24. Ping G, Zhenjiang M, Yuan S, Heng-Da C (2010) Real time human action recognition in a long video sequence. IEEE International Conference on Advanced Video and Signal Based Surveillance

  25. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  26. Quelhas P, Monay F, Odobez J-M, Gatica-Perez D, Tuytelaars T (2007) A thousand words in a scene. IEEE Trans Pattern Anal Mach Intell 29(9):1575–1589

    Article  Google Scholar 

  27. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In Proceedings of the International Conference on Pattern Recognition

  28. Seo HJ, Milanfar P (2011) Action recognition from one example. IEEE Trans Pattern Anal Mach Intell 33(5):867–882

    Article  Google Scholar 

  29. Shechtman E, Irani M (2007) Space-time behavior based correlation– or –how to tell if two underlying motion fields are similar without computing them? IEEE Trans Pattern Anal Mach Intell 29(11):2045–2056

    Article  Google Scholar 

  30. Shen Y, Miao ZJ (2010) An improved background subtraction method based on region growing International Symposium on Intelligent Signal Processing and Communication Systems

  31. Simon C, Meessen J, De Vleeschouwer C (2009) Visual event recognition using decision trees. Multimed Tool Appl 50(1):95–121

    Article  Google Scholar 

  32. Snoek C, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tool Appl 25(1):5–35

    Article  Google Scholar 

  33. Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimed 12(7):665–681

    Article  Google Scholar 

  34. Wang Y, Mori G (2009) Human action recognition by semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774

    Article  Google Scholar 

  35. Wong S, Kim T, Cipolla R (2007) Learning motion categories using both semantic and structural information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  36. Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process 19(7):1908–1920

    Article  MathSciNet  Google Scholar 

  37. Yuan J, Liu ZC, Wu Y (2009) Discriminative subvolume search for efficient action detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition

Download references

Acknowledgments

This work is supported by National Natural Science Foundation Program 60973061, National 973 Key Research Program of China 2011CB302203, and Ph.D. Programs Foundation of Ministry of Education of China 20100009110004. We thank all anonymous reviewers for their comments and suggestions that have helped us to improve our work. Especially, the computation of confidence intervals using a Gaussian approximation in this paper is presented based on one reviewer’s comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, P., Miao, Z., Shen, Y. et al. Continuous human action recognition in real time. Multimed Tools Appl 68, 827–844 (2014). https://doi.org/10.1007/s11042-012-1084-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1084-2

Keywords

Navigation