Abstract
The increasing demand for human activity analysis in surveillance scenarios has been triggered by the emergence of new features and concepts to help in identifying activities of interest. However, the characterisation of individual and group behaviours is a topic not so well studied in the video surveillance community due to not only its intrinsic difficulty and large variety of topics involved, but also because of the lack of valid semantic concepts that relate human activity to social context. In this paper, we address the topic of social semantic meaning in a well-defined surveillance scenario, namely shopping mall, and propose new definitions of individual and group behaviour that consider environment context, a relational descriptor that emphasises position and attention-based characteristics, and a new classification approach based on mini-batches. We also present a wide evaluation process that analyses the sociological meaning of the individual features and outlines the performance impact of automatic features extraction processes into our classification framework. We verify the discriminative value of the selected features, state the descriptor performance and robustness over different stress conditions, confirm the advantage of the proposed mini-batch classification approach which obtains promising results, and outline future research lines to improve our novel social behavioural analysis framework.
Similar content being viewed by others
Notes
Faculdade de Psicologia e de Ciências da Educação da Universidade do Porto—http://sigarra.up.pt/fpceup.
We thank to the first author of [7], Isarun Chamveha from the Institute of Industrial Science, The University of Tokyo, for helping us to recalibrate the technique for our data set.
References
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560
Aggarwal J, Ryoo M (2011) Human activity analysis: a review. ACM Comput Surv 43(3):16:1–16:43. doi:10.1145/1922649.1922653
Ali S, Shah M (2008) Floor fields for tracking in high density crowd scenes. In: ECCV, pp 1–14
Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning. In: IEEE conference on computer vision and pattern recognition (CVPR), Miami, FL
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. J Image Video Process 2008:1:1–1:10. doi:10.1155/2008/246309
Cartwright D, Zander A (eds) (1968) Group dynamics: research and theory, 3rd edn. Harper & Row, New York
Chamveha I, Sugano Y, Sugimura D, Siriteerakul, T, Okabe T, Sato Y, Sugimoto A (2011) Appearance-based head pose estimation with scene-specific adaptation. In: IEEE international conference on computer vision workshops (ICCV Workshops), 2011, pp 1713–1720. doi:10.1109/ICCVW.2011.6130456
Chang MC, Krahnstoever N, Ge W (2011) Probabilistic group-level motion analysis and scenario recognition. In: ICCV, pp 747–754
Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: CVPR, pp 3273–3280
Ge W, Collins RT, Ruback B (2012) Vision-based analysis of small groups in pedestrian crowds. IEEE Trans Pattern Anal Mach Intell 34(5):1003–1016
Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting. Proc BMVC. doi:10.5244/C.20.6
Helbing D, Molnár P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282–4286. doi:10.1103/physreve.51.4282
Chamveha I, Sugano Y, Sato Y (2013) Social group discovery from surveillance videos: a data-driven approach with attention-based cues. In: Proceedings of the British machine vision conference. BMVA Press
Jin B, Hu W, Wang H (2012) Human interaction recognition based on transformation of spatial semantics. IEEE Signal Process Lett 19(3):139–142
Kalal Z, Matas J, Mikolajczyk K (2011) Tracking learning detection. IEEE Trans Pattern Anal Mach Intell 34:1409–1422
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409-1422
Khalid S, Naftel A (2005) Classifying spatiotemporal object trajectories using unsupervised learning of basis function coefficients. In: VSSN ’05: proceedings of the third ACM international workshop on Video surveillance and sensor networks, pp 45–52. ACM, New York, NY, USA. doi:10.1145/1099396.1099404
Klgl F, Rindsfser G (2007) Large-scale agent-based pedestrian simulation. In: Petta P, Mller JP, Klusch M, Georgeff MP (eds) MATES, Lecture notes in computer science, vol 4687. Springer, pp 145–156
Makris D, Ellis T (2005) Learning semantic scene models from observing activity in visual surveillance. IEEE Trans Syst Man Cybern Part B 35(3):397–408
McPhail C, Wohlstein RT (1982) Using film to analyze pedestrian behavior. Sociol Methods Res 10(3):347–375
Owens J, Hunter A (2000) Application of the self-organizing map to trajectory classification. In: Proceedings of the third IEEE international workshop on visual surveillance (VS’2000), VS ’00, pp 77. IEEE Computer Society, Washington, DC, USA
Pereira EM, Cardoso JS, Morla R (2015) Long-range trajectories from global and local motion representations. http://arxiv.org/abs/1509.08647
Pereira EM, Ciobanu L, Cardoso JS (2014) Context-based trajectory descriptor for human activity profiling. In: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, IEEE, San Diego, CA, USA, pp 2385–2390
Pereira EM, Ciobanu L, Cardoso JS (2015) Social signaling descriptor for group behavior analysis. In: Proceedings of Iberian conference on pattern recognition and image analysis (IbPRIA)
Prisacariu V, Reid I (2009) fasthog—a real-time gpu implementation of hog. Technical Report 2310/09. Department of Engineering Science, Oxford University
Pusiol G, Bremond F, Thonnat M (2010) Trajectory based activity discovery. In: 7th IEEE international conference on advanced video and signal-based surveillance, Boston, États-Unis
Qiu F, Hu X (2010) Modeling group structures in pedestrian crowd simulation. Simul Model Pract Theory 18(2):190–205
Rummel RJ (1976) Understanding conflict and war: the conflict helix, vol 2. Sage Publications, Beverly Hills
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV, IEEE, pp 1593–1600
Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Takahashi M, Naemura M, Fujii M, Satoh S (2011) Human action recognition in crowded surveillance video sequences by using features taken from key-point trajectories. Comput Vis Pattern Recogn. doi:10.1109/CVPRW.2011.5981713
Wang X, Ma KT, Ng GW, Grimson WE (2011) Trajectory analysis and semantic region modeling using nonparametric hierarchical bayesian models. Int J Comput Vis 95(3):287–312. doi:10.1007/s11263-011-0459-6
Wang X, Tieu K, Grimson E (2006) Learning semantic scene models by trajectory analysis. In: Leonardis A, Bischof H, Pinz A (eds) ECCV (3), lecture notes in computer science, vol 3953. Springer, pp 110–123
Wu J, Rehg JM (2009) Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: IEEE 12th international conference on computer vision, 2009, IEEE, pp 630–637
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Yao B, Jiang X, Khosla A, Lin AL, Guibas LJ, Li FF (2011) Human action recognition by learning bases of action attributes and parts. In: ICCV, pp 1331–1338
Zhou B, Wang X, Tang X (2012) Understanding collective crowd behaviors: learning a mixture model of dynamic pedestrian-agents. In: CVPR, pp 2871–2878
Acknowledgments
This work was financed by the ERDF-European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation—COMPETE 2020 Programme within Project POCI-01-0145-FEDER-006961, and by National Funds through the FCT—Fundação para a Ciência e Tecnologia (Portuguese Foundation for Science and Technology) as part of Project UID/EEA/50014/2013, through the Ph.D. Grant reference SFRH/BD/51430/2011 and postdoctoral Grant SFRH/BPD/85225/2012. The authors would like to thank Amit Adam for supplying the video sequences, Kelly Rodrigues and the Social Psychology Research Group of the University of Porto for their scientific advice.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pereira, E.M., Ciobanu, L. & Cardoso, J.S. Cross-layer classification framework for automatic social behavioural analysis in surveillance scenario. Neural Comput & Applic 28, 2425–2444 (2017). https://doi.org/10.1007/s00521-016-2282-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-016-2282-z