UAV surveillance for violence detection and individual identification

Srivastava, Anugrah; Badal, Tapas; Saxena, Pawan; Vidyarthi, Ankit; Singh, Rishav

doi:10.1007/s10515-022-00323-3

UAV surveillance for violence detection and individual identification

Published: 02 March 2022

Volume 29, article number 28, (2022)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Anugrah Srivastava¹,
Tapas Badal¹^na1,
Pawan Saxena¹^na1,
Ankit Vidyarthi²^na1 &
…
Rishav Singh ORCID: orcid.org/0000-0003-2947-9046³^na1

827 Accesses
15 Citations
Explore all metrics

Abstract

Violence detection and face recognition of the individuals involved in the violence has an influence that’s noticeable on the development of automated video surveillance research. With increasing risks in society and insufficient staff to monitor them, there is an expanding demand for drones square measure and computerized video surveillance. Violence detection is expeditious and can be utilized as the method to selectively filter the surveillance videos, and identify or take note of the individual who is creating the anomaly. Individual identification from drone surveillance videos in a crowded area is difficult because of the expeditious movement, overlapping features, and bestrew backgrounds. The goal is to come with a better drone surveillance system that recognizes the violent individuals that are implicated in violence and evoke a distress signal so that fast help can be offered. This paper uses the currently developed techniques based on deep learning and proposed the concept of transfer learning using deep learning-based different hybrid models with LSTM for violence detection. Identifying individuals incriminated in violence from drone-captured images involves major issues in variations of human facial appearance, hence the paper uses a CNN model combined with image processing techniques. For testing, the drone captured video dataset is developed for an unconstrained environment. Ultimately, the features extracted from a hybrid of inception modules and residual blocks, with LSTM architecture yielded an accuracy of 97.33% and thereby proved to be noteworthy and thereby, demonstrating its superiority over other models that have been tested. For the individual identification module, the best accuracy of 99.20% obtained on our dataset, is a CNN model with residual blocks trained for face identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smart Surveillance System and Prediction of Abnormal Activity in ATM Using Deep Learning

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

Article 03 May 2023

Violence Detection Through Surveillance System

References

Amato, G., Falchi, F., Gennaro, C., Vairo, C.: A comparison of face verification with facial landmarks and deep features. In: proceedings of the 10th international conference on advances in multimedia (MMEDIA 2018) (c), 1–6 (2018)
Aydin, B.: Public acceptance of drones: knowledge, attitudes, and practice. Technol. Soc. 59, 101180 (2019). https://doi.org/10.1016/j.techsoc.2019.101180
Article Google Scholar
Baba, M., Gui, V., Cernazanu, C., Pescaru, D.: A sensor network approach for violence detection in smart cities using deep learning. Sensors (Switzerland) (2019). https://doi.org/10.3390/s19071676
Article Google Scholar
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6855 LNCS, pp. 332–339 (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Bindemann, M., Fysh, M.C., Sage, S.S.K., Douglas, K., Tummon, H.M.: Person identification from aerial footage by a remote-controlled drone. Sci. Rep. 7(1), 1–10 (2017). https://doi.org/10.1038/s41598-017-14026-3
Article Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol. 2017-Janua (2017). https://doi.org/10.1109/CVPR.2017.143
Cheng, W.H., Chu, W.T., Wu, J.L.: Semantic context detection based on hierarchical audio models. In: proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, MIR 2003, pp. 109–115 (2003). https://doi.org/10.1145/973264.973282
Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops 2009, pp. 1282–1289 (2009). https://doi.org/10.1109/ICCVW.2009.5457461
Dandagpl Vishwajit, Hiemanshu Gautam, Akshay Ghavale, Radhika Mahore, Sonewar., P.A.: IRJET- review of violence detection system using deep learning. Int. Res. J. Eng. Technol. (IRJET) (2019)
Deeba, F., Ahmed, A., Memon, H., Dharejo, F.A., Ghaffar, A.: LBPH-based enhanced real-time face recognition. Int. J. Adv. Comput. Sci. Appl. 10(5), 274–280 (2019). https://doi.org/10.14569/ijacsa.2019.0100535
Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: VISAPP 2014 - proceedings of the 9th international conference on computer vision theory and applications, vol. 2, pp. 478–485 (2014). https://doi.org/10.5220/0004695104780485
Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Lecture notes in computer science (including Subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 8888, pp. 551–558 (2014). https://doi.org/10.1007/978-3-319-14364-4_53
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017) arXiv:1411.4389. https://doi.org/10.1109/TPAMI.2016.2599174
Fu, E.Y., Leong, H.V., Ngai, G., Chan, S.C.F.: Automatic fight detection in surveillance videos. Int. J. Pervasive Comput. Commun. 13(2), 130–156 (2017). https://doi.org/10.1108/IJPCC-02-2017-0018
Article Google Scholar
Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., Theodoridis, S.: Violence content classification using audio features. In: Lecture notes in computer science (including Subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 3955 LNAI, pp. 502–507 (2006). https://doi.org/10.1007/11752912_55
Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A multi-class audio classification method with respect to violent content in movies using Bayesian Networks. In: 2007 IEEE 9Th international workshop on multimedia signal processing, MMSP 2007 - proceedings, pp. 90–93 (2007). https://doi.org/10.1109/MMSP.2007.4412825
Goya, K., Zhang, X., Kitayama, K., Nagayama, I.: A method for automatic detection of crimes for public security by using motion analysis. In: IIH-MSP 2009 - 2009 5th international conference on intelligent information hiding and multimedia signal processing, pp. 736–741 (2009). https://doi.org/10.1109/IIH-MSP.2009.264
Ha, S., Choi, S.: Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In: proceedings of the international joint conference on neural networks, vol. 2016-October, pp. 381–388 (2016). https://doi.org/10.1109/IJCNN.2016.7727224
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-December, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 2017–2025 (2015)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1725–1732 (2014). https://doi.org/10.1109/CVPR.2014.223
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1867–1874 (2014). https://doi.org/10.1109/CVPR.2014.241
Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The MegaFace benchmark: 1 million faces for recognition at scale. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition 2016-Decem, 4873–4882 (2016) arXiv:1512.00596. https://doi.org/10.1109/CVPR.2016.527
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Kohonen, T.: Self-organization and associative memory, vol. 8. Springer, Berlin (2012)
MATH Google Scholar
Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vision 64(2), 107–123 (2005)
Article Google Scholar
Li, X., Chuah, M.C.: SBGAR: Semantics Based Group Activity Recognition. In: proceedings of the IEEE international conference on computer vision, vol. 2017-Octob, pp. 2895–2904 (2017). https://doi.org/10.1109/ICCV.2017.313
Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1728–1740 (2008). https://doi.org/10.1109/TPAMI.2008.73
Article Google Scholar
Mumtaz, A., Sargano, A.B., Habib, Z.: Violence detection in surveillance videos with deep network using transfer learning. In: proceedings - 2018 2nd European conference on electrical engineering and computer science, EECS 2018, pp. 558–563 (2018). https://doi.org/10.1109/EECS.2018.00109
Naik, A.J., Gopalakrishna, M.T.: Violence detection in surveillance video-a survey. Int. J. Latest Res. Eng. Technol. (IJLRET) 2017, 11–17 (2017)
Google Scholar
Ordóñez, F.J., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland) (2016). https://doi.org/10.3390/s16010115
Article Google Scholar
Penmetsa, S., Minhuj, F., Singh, A., Omkar, S.N.: Autonomous UAV for suspicious action detection using pictorial human pose estimation and classification. Electron. Lett. Comput. Vision Image Anal. (2014). https://doi.org/10.5565/rev/elcvia.582
Article Google Scholar
Ramanathan, M., Yau, W.Y., Teoh, E.K.: Human action recognition with video data: research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. (2014). https://doi.org/10.1109/THMS.2014.2325871
Article Google Scholar
Saypadith, S., Aramvith, S.: Real-time multiple face recognition using deep learning on embedded GPU system. In: 2018 Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC 2018 - proceedings, pp. 1318–1324 (2019). https://doi.org/10.23919/APSIPA.2018.8659751
Singh, A., Patil, D., Omkar, S.N.: Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scatternet hybrid deep learning network. In: IEEE computer society conference on computer vision and pattern recognition workshops, vol. 2018-June, pp. 1710–1718 (2018). https://doi.org/10.1109/CVPRW.2018.00214
Soliman, M.M., Kamal, M.H., Nashed, M.A.E.-M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. (2019). https://doi.org/10.1109/ICICIS46948.2019.9014714
Srivastava, A., et al.: Recognizing human violent action using drone surveillance within real-time proximity. J. Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01171-2
Article Google Scholar
Sumon, S.A., Goni, R., Hashem, N.B., Shahria, T., Rahman, R.M.: Violence detection by pretrained modules with different deep learning approaches. Vietnam J. Comput. Sci. 07(01), 19–40 (2020). https://doi.org/10.1142/s2196888820500013
Article Google Scholar
Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors (Switzerland) (2019). https://doi.org/10.3390/s19112472
Article Google Scholar
van der Spoel, E., Rozing, M.P., Houwing-Duistermaat, J.J., Eline Slagboom, P., Beekman, M., de Craen, A.J.M., Westendorp, R.G.J., van Heemst, D.: Siamese neural networks for one-shot image recognition. ICML - deep learning workshop 7(11), 956–963 (2015) arXiv:arXiv:1011.1669v3
Wang, M., Deng, W.: Deep face recognition: a survey. Neurocomputing 429, 215–244 (2021) arXiv:1804.06655. https://doi.org/10.1016/j.neucom.2020.10.081
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Lecture notes in computer science (including Subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 9912 LNCS, pp. 20–36 (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Wu, B., Ai, H., Huang, C., Lao, S.: Fast rotation invariant multi-View face detection based on real adaboost. In: proceedings - Sixth IEEE international conference on automatic face and gesture recognition, pp. 79–84 (2004). https://doi.org/10.1300/J083v43n02_06
Xu, M.: Robust object detection with real-time fusion of multiview foreground silhouettes. Opt. Eng. 51(4), 047202 (2012). https://doi.org/10.1117/1.oe.51.4.047202
Article Google Scholar
Zaheer, M.Z., Kim, J.Y., Kim, H.G., Na, S.Y.: A preliminary study on deep-learning based screaming sound detection. In: 2015 5th international conference on IT convergence and security, ICITCS 2015 - proceedings (July) (2015). https://doi.org/10.1109/ICITCS.2015.7292925
Zajdel, W., Krijnders, J.D., Andringa, T., Gavrila, D.M.: CASSANDRA: audio-video sensor fusion for aggression detection. In: 2007 IEEE conference on advanced video and signal based surveillance, AVSS 2007 proceedings (2007). https://doi.org/10.1109/AVSS.2007.4425310
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-Time Action Recognition with Enhanced Motion Vector CNNs. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-December, pp. 2718–2726 (2016). https://doi.org/10.1109/CVPR.2016.297
Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. J. Phys. Conf. Ser. (2017). https://doi.org/10.1088/1742-6596/844/1/012044
Article Google Scholar

Download references

Author information

T. Badal, P. Saxena, A. Vidyarthi, R. Singh have contributed equally to this work.

Authors and Affiliations

Computer Science Engineering Department, Bennett University, Greater Noida, India
Anugrah Srivastava, Tapas Badal & Pawan Saxena
Department of CSE and IT, Jaypee Institute of Information Technology, Noida, India
Ankit Vidyarthi
Department of Computer Science and Engineering, National Institute of Technology, Delhi, India
Rishav Singh

Authors

Anugrah Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Tapas Badal
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Vidyarthi
View author publications
You can also search for this author in PubMed Google Scholar
Rishav Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rishav Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srivastava, A., Badal, T., Saxena, P. et al. UAV surveillance for violence detection and individual identification. Autom Softw Eng 29, 28 (2022). https://doi.org/10.1007/s10515-022-00323-3

Download citation

Received: 21 September 2021
Accepted: 03 January 2022
Published: 02 March 2022
DOI: https://doi.org/10.1007/s10515-022-00323-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UAV surveillance for violence detection and individual identification

Abstract

Access this article

Similar content being viewed by others

Smart Surveillance System and Prediction of Abnormal Activity in ATM Using Deep Learning

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

Violence Detection Through Surveillance System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

UAV surveillance for violence detection and individual identification

Abstract

Access this article

Similar content being viewed by others

Smart Surveillance System and Prediction of Abnormal Activity in ATM Using Deep Learning

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

Violence Detection Through Surveillance System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation