Skip to main content

Advertisement

Log in

UAV surveillance for violence detection and individual identification

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Violence detection and face recognition of the individuals involved in the violence has an influence that’s noticeable on the development of automated video surveillance research. With increasing risks in society and insufficient staff to monitor them, there is an expanding demand for drones square measure and computerized video surveillance. Violence detection is expeditious and can be utilized as the method to selectively filter the surveillance videos, and identify or take note of the individual who is creating the anomaly. Individual identification from drone surveillance videos in a crowded area is difficult because of the expeditious movement, overlapping features, and bestrew backgrounds. The goal is to come with a better drone surveillance system that recognizes the violent individuals that are implicated in violence and evoke a distress signal so that fast help can be offered. This paper uses the currently developed techniques based on deep learning and proposed the concept of transfer learning using deep learning-based different hybrid models with LSTM for violence detection. Identifying individuals incriminated in violence from drone-captured images involves major issues in variations of human facial appearance, hence the paper uses a CNN model combined with image processing techniques. For testing, the drone captured video dataset is developed for an unconstrained environment. Ultimately, the features extracted from a hybrid of inception modules and residual blocks, with LSTM architecture yielded an accuracy of 97.33% and thereby proved to be noteworthy and thereby, demonstrating its superiority over other models that have been tested. For the individual identification module, the best accuracy of 99.20% obtained on our dataset, is a CNN model with residual blocks trained for face identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Amato, G., Falchi, F., Gennaro, C., Vairo, C.: A comparison of face verification with facial landmarks and deep features. In: proceedings of the 10th international conference on advances in multimedia (MMEDIA 2018) (c), 1–6 (2018)

  • Aydin, B.: Public acceptance of drones: knowledge, attitudes, and practice. Technol. Soc. 59, 101180 (2019). https://doi.org/10.1016/j.techsoc.2019.101180

    Article  Google Scholar 

  • Baba, M., Gui, V., Cernazanu, C., Pescaru, D.: A sensor network approach for violence detection in smart cities using deep learning. Sensors (Switzerland) (2019). https://doi.org/10.3390/s19071676

    Article  Google Scholar 

  • Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6855 LNCS, pp. 332–339 (2011). https://doi.org/10.1007/978-3-642-23678-5_39

  • Bindemann, M., Fysh, M.C., Sage, S.S.K., Douglas, K., Tummon, H.M.: Person identification from aerial footage by a remote-controlled drone. Sci. Rep. 7(1), 1–10 (2017). https://doi.org/10.1038/s41598-017-14026-3

    Article  Google Scholar 

  • Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol. 2017-Janua (2017). https://doi.org/10.1109/CVPR.2017.143

  • Cheng, W.H., Chu, W.T., Wu, J.L.: Semantic context detection based on hierarchical audio models. In: proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, MIR 2003, pp. 109–115 (2003). https://doi.org/10.1145/973264.973282

  • Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops 2009, pp. 1282–1289 (2009). https://doi.org/10.1109/ICCVW.2009.5457461

  • Dandagpl Vishwajit, Hiemanshu Gautam, Akshay Ghavale, Radhika Mahore, Sonewar., P.A.: IRJET- review of violence detection system using deep learning. Int. Res. J. Eng. Technol. (IRJET) (2019)

  • Deeba, F., Ahmed, A., Memon, H., Dharejo, F.A., Ghaffar, A.: LBPH-based enhanced real-time face recognition. Int. J. Adv. Comput. Sci. Appl. 10(5), 274–280 (2019). https://doi.org/10.14569/ijacsa.2019.0100535

  • Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: VISAPP 2014 - proceedings of the 9th international conference on computer vision theory and applications, vol. 2, pp. 478–485 (2014). https://doi.org/10.5220/0004695104780485

  • Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Lecture notes in computer science (including Subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 8888, pp. 551–558 (2014). https://doi.org/10.1007/978-3-319-14364-4_53

  • Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017) arXiv:1411.4389. https://doi.org/10.1109/TPAMI.2016.2599174

  • Fu, E.Y., Leong, H.V., Ngai, G., Chan, S.C.F.: Automatic fight detection in surveillance videos. Int. J. Pervasive Comput. Commun. 13(2), 130–156 (2017). https://doi.org/10.1108/IJPCC-02-2017-0018

    Article  Google Scholar 

  • Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., Theodoridis, S.: Violence content classification using audio features. In: Lecture notes in computer science (including Subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 3955 LNAI, pp. 502–507 (2006). https://doi.org/10.1007/11752912_55

  • Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A multi-class audio classification method with respect to violent content in movies using Bayesian Networks. In: 2007 IEEE 9Th international workshop on multimedia signal processing, MMSP 2007 - proceedings, pp. 90–93 (2007). https://doi.org/10.1109/MMSP.2007.4412825

  • Goya, K., Zhang, X., Kitayama, K., Nagayama, I.: A method for automatic detection of crimes for public security by using motion analysis. In: IIH-MSP 2009 - 2009 5th international conference on intelligent information hiding and multimedia signal processing, pp. 736–741 (2009). https://doi.org/10.1109/IIH-MSP.2009.264

  • Ha, S., Choi, S.: Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In: proceedings of the international joint conference on neural networks, vol. 2016-October, pp. 381–388 (2016). https://doi.org/10.1109/IJCNN.2016.7727224

  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-December, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  • Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 2017–2025 (2015)

    Google Scholar 

  • Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  • Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1725–1732 (2014). https://doi.org/10.1109/CVPR.2014.223

  • Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1867–1874 (2014). https://doi.org/10.1109/CVPR.2014.241

  • Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The MegaFace benchmark: 1 million faces for recognition at scale. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition 2016-Decem, 4873–4882 (2016) arXiv:1512.00596. https://doi.org/10.1109/CVPR.2016.527

  • King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  • Kohonen, T.: Self-organization and associative memory, vol. 8. Springer, Berlin (2012)

    MATH  Google Scholar 

  • Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vision 64(2), 107–123 (2005)

    Article  Google Scholar 

  • Li, X., Chuah, M.C.: SBGAR: Semantics Based Group Activity Recognition. In: proceedings of the IEEE international conference on computer vision, vol. 2017-Octob, pp. 2895–2904 (2017). https://doi.org/10.1109/ICCV.2017.313

  • Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1728–1740 (2008). https://doi.org/10.1109/TPAMI.2008.73

    Article  Google Scholar 

  • Mumtaz, A., Sargano, A.B., Habib, Z.: Violence detection in surveillance videos with deep network using transfer learning. In: proceedings - 2018 2nd European conference on electrical engineering and computer science, EECS 2018, pp. 558–563 (2018). https://doi.org/10.1109/EECS.2018.00109

  • Naik, A.J., Gopalakrishna, M.T.: Violence detection in surveillance video-a survey. Int. J. Latest Res. Eng. Technol. (IJLRET) 2017, 11–17 (2017)

    Google Scholar 

  • Ordóñez, F.J., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland) (2016). https://doi.org/10.3390/s16010115

    Article  Google Scholar 

  • Penmetsa, S., Minhuj, F., Singh, A., Omkar, S.N.: Autonomous UAV for suspicious action detection using pictorial human pose estimation and classification. Electron. Lett. Comput. Vision Image Anal. (2014). https://doi.org/10.5565/rev/elcvia.582

    Article  Google Scholar 

  • Ramanathan, M., Yau, W.Y., Teoh, E.K.: Human action recognition with video data: research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. (2014). https://doi.org/10.1109/THMS.2014.2325871

    Article  Google Scholar 

  • Saypadith, S., Aramvith, S.: Real-time multiple face recognition using deep learning on embedded GPU system. In: 2018 Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC 2018 - proceedings, pp. 1318–1324 (2019). https://doi.org/10.23919/APSIPA.2018.8659751

  • Singh, A., Patil, D., Omkar, S.N.: Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scatternet hybrid deep learning network. In: IEEE computer society conference on computer vision and pattern recognition workshops, vol. 2018-June, pp. 1710–1718 (2018). https://doi.org/10.1109/CVPRW.2018.00214

  • Soliman, M.M., Kamal, M.H., Nashed, M.A.E.-M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. (2019). https://doi.org/10.1109/ICICIS46948.2019.9014714

  • Srivastava, A., et al.: Recognizing human violent action using drone surveillance within real-time proximity. J. Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01171-2

    Article  Google Scholar 

  • Sumon, S.A., Goni, R., Hashem, N.B., Shahria, T., Rahman, R.M.: Violence detection by pretrained modules with different deep learning approaches. Vietnam J. Comput. Sci. 07(01), 19–40 (2020). https://doi.org/10.1142/s2196888820500013

    Article  Google Scholar 

  • Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors (Switzerland) (2019). https://doi.org/10.3390/s19112472

    Article  Google Scholar 

  • van der Spoel, E., Rozing, M.P., Houwing-Duistermaat, J.J., Eline Slagboom, P., Beekman, M., de Craen, A.J.M., Westendorp, R.G.J., van Heemst, D.: Siamese neural networks for one-shot image recognition. ICML - deep learning workshop 7(11), 956–963 (2015) arXiv:arXiv:1011.1669v3

  • Wang, M., Deng, W.: Deep face recognition: a survey. Neurocomputing 429, 215–244 (2021) arXiv:1804.06655. https://doi.org/10.1016/j.neucom.2020.10.081

  • Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Lecture notes in computer science (including Subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 9912 LNCS, pp. 20–36 (2016). https://doi.org/10.1007/978-3-319-46484-8_2

  • Wu, B., Ai, H., Huang, C., Lao, S.: Fast rotation invariant multi-View face detection based on real adaboost. In: proceedings - Sixth IEEE international conference on automatic face and gesture recognition, pp. 79–84 (2004). https://doi.org/10.1300/J083v43n02_06

  • Xu, M.: Robust object detection with real-time fusion of multiview foreground silhouettes. Opt. Eng. 51(4), 047202 (2012). https://doi.org/10.1117/1.oe.51.4.047202

    Article  Google Scholar 

  • Zaheer, M.Z., Kim, J.Y., Kim, H.G., Na, S.Y.: A preliminary study on deep-learning based screaming sound detection. In: 2015 5th international conference on IT convergence and security, ICITCS 2015 - proceedings (July) (2015). https://doi.org/10.1109/ICITCS.2015.7292925

  • Zajdel, W., Krijnders, J.D., Andringa, T., Gavrila, D.M.: CASSANDRA: audio-video sensor fusion for aggression detection. In: 2007 IEEE conference on advanced video and signal based surveillance, AVSS 2007 proceedings (2007). https://doi.org/10.1109/AVSS.2007.4425310

  • Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-Time Action Recognition with Enhanced Motion Vector CNNs. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-December, pp. 2718–2726 (2016). https://doi.org/10.1109/CVPR.2016.297

  • Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. J. Phys. Conf. Ser. (2017). https://doi.org/10.1088/1742-6596/844/1/012044

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rishav Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srivastava, A., Badal, T., Saxena, P. et al. UAV surveillance for violence detection and individual identification. Autom Softw Eng 29, 28 (2022). https://doi.org/10.1007/s10515-022-00323-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-022-00323-3

Keywords

Navigation