Abstract
Violence detection and face recognition of the individuals involved in the violence has an influence that’s noticeable on the development of automated video surveillance research. With increasing risks in society and insufficient staff to monitor them, there is an expanding demand for drones square measure and computerized video surveillance. Violence detection is expeditious and can be utilized as the method to selectively filter the surveillance videos, and identify or take note of the individual who is creating the anomaly. Individual identification from drone surveillance videos in a crowded area is difficult because of the expeditious movement, overlapping features, and bestrew backgrounds. The goal is to come with a better drone surveillance system that recognizes the violent individuals that are implicated in violence and evoke a distress signal so that fast help can be offered. This paper uses the currently developed techniques based on deep learning and proposed the concept of transfer learning using deep learning-based different hybrid models with LSTM for violence detection. Identifying individuals incriminated in violence from drone-captured images involves major issues in variations of human facial appearance, hence the paper uses a CNN model combined with image processing techniques. For testing, the drone captured video dataset is developed for an unconstrained environment. Ultimately, the features extracted from a hybrid of inception modules and residual blocks, with LSTM architecture yielded an accuracy of 97.33% and thereby proved to be noteworthy and thereby, demonstrating its superiority over other models that have been tested. For the individual identification module, the best accuracy of 99.20% obtained on our dataset, is a CNN model with residual blocks trained for face identification.
Similar content being viewed by others
References
Amato, G., Falchi, F., Gennaro, C., Vairo, C.: A comparison of face verification with facial landmarks and deep features. In: proceedings of the 10th international conference on advances in multimedia (MMEDIA 2018) (c), 1–6 (2018)
Aydin, B.: Public acceptance of drones: knowledge, attitudes, and practice. Technol. Soc. 59, 101180 (2019). https://doi.org/10.1016/j.techsoc.2019.101180
Baba, M., Gui, V., Cernazanu, C., Pescaru, D.: A sensor network approach for violence detection in smart cities using deep learning. Sensors (Switzerland) (2019). https://doi.org/10.3390/s19071676
Bermejo Nievas, E., Deniz Suarez, O., Bueno García, G., Sukthankar, R.: Violence detection in video using computer vision techniques. In: Lecture notes in computer science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6855 LNCS, pp. 332–339 (2011). https://doi.org/10.1007/978-3-642-23678-5_39
Bindemann, M., Fysh, M.C., Sage, S.S.K., Douglas, K., Tummon, H.M.: Person identification from aerial footage by a remote-controlled drone. Sci. Rep. 7(1), 1–10 (2017). https://doi.org/10.1038/s41598-017-14026-3
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol. 2017-Janua (2017). https://doi.org/10.1109/CVPR.2017.143
Cheng, W.H., Chu, W.T., Wu, J.L.: Semantic context detection based on hierarchical audio models. In: proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval, MIR 2003, pp. 109–115 (2003). https://doi.org/10.1145/973264.973282
Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops 2009, pp. 1282–1289 (2009). https://doi.org/10.1109/ICCVW.2009.5457461
Dandagpl Vishwajit, Hiemanshu Gautam, Akshay Ghavale, Radhika Mahore, Sonewar., P.A.: IRJET- review of violence detection system using deep learning. Int. Res. J. Eng. Technol. (IRJET) (2019)
Deeba, F., Ahmed, A., Memon, H., Dharejo, F.A., Ghaffar, A.: LBPH-based enhanced real-time face recognition. Int. J. Adv. Comput. Sci. Appl. 10(5), 274–280 (2019). https://doi.org/10.14569/ijacsa.2019.0100535
Deniz, O., Serrano, I., Bueno, G., Kim, T.K.: Fast violence detection in video. In: VISAPP 2014 - proceedings of the 9th international conference on computer vision theory and applications, vol. 2, pp. 478–485 (2014). https://doi.org/10.5220/0004695104780485
Ding, C., Fan, S., Zhu, M., Feng, W., Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Lecture notes in computer science (including Subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 8888, pp. 551–558 (2014). https://doi.org/10.1007/978-3-319-14364-4_53
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017) arXiv:1411.4389. https://doi.org/10.1109/TPAMI.2016.2599174
Fu, E.Y., Leong, H.V., Ngai, G., Chan, S.C.F.: Automatic fight detection in surveillance videos. Int. J. Pervasive Comput. Commun. 13(2), 130–156 (2017). https://doi.org/10.1108/IJPCC-02-2017-0018
Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., Theodoridis, S.: Violence content classification using audio features. In: Lecture notes in computer science (including Subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 3955 LNAI, pp. 502–507 (2006). https://doi.org/10.1007/11752912_55
Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A multi-class audio classification method with respect to violent content in movies using Bayesian Networks. In: 2007 IEEE 9Th international workshop on multimedia signal processing, MMSP 2007 - proceedings, pp. 90–93 (2007). https://doi.org/10.1109/MMSP.2007.4412825
Goya, K., Zhang, X., Kitayama, K., Nagayama, I.: A method for automatic detection of crimes for public security by using motion analysis. In: IIH-MSP 2009 - 2009 5th international conference on intelligent information hiding and multimedia signal processing, pp. 736–741 (2009). https://doi.org/10.1109/IIH-MSP.2009.264
Ha, S., Choi, S.: Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In: proceedings of the international joint conference on neural networks, vol. 2016-October, pp. 381–388 (2016). https://doi.org/10.1109/IJCNN.2016.7727224
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-December, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 2017–2025 (2015)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1725–1732 (2014). https://doi.org/10.1109/CVPR.2014.223
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 1867–1874 (2014). https://doi.org/10.1109/CVPR.2014.241
Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The MegaFace benchmark: 1 million faces for recognition at scale. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition 2016-Decem, 4873–4882 (2016) arXiv:1512.00596. https://doi.org/10.1109/CVPR.2016.527
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Kohonen, T.: Self-organization and associative memory, vol. 8. Springer, Berlin (2012)
Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vision 64(2), 107–123 (2005)
Li, X., Chuah, M.C.: SBGAR: Semantics Based Group Activity Recognition. In: proceedings of the IEEE international conference on computer vision, vol. 2017-Octob, pp. 2895–2904 (2017). https://doi.org/10.1109/ICCV.2017.313
Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1728–1740 (2008). https://doi.org/10.1109/TPAMI.2008.73
Mumtaz, A., Sargano, A.B., Habib, Z.: Violence detection in surveillance videos with deep network using transfer learning. In: proceedings - 2018 2nd European conference on electrical engineering and computer science, EECS 2018, pp. 558–563 (2018). https://doi.org/10.1109/EECS.2018.00109
Naik, A.J., Gopalakrishna, M.T.: Violence detection in surveillance video-a survey. Int. J. Latest Res. Eng. Technol. (IJLRET) 2017, 11–17 (2017)
Ordóñez, F.J., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland) (2016). https://doi.org/10.3390/s16010115
Penmetsa, S., Minhuj, F., Singh, A., Omkar, S.N.: Autonomous UAV for suspicious action detection using pictorial human pose estimation and classification. Electron. Lett. Comput. Vision Image Anal. (2014). https://doi.org/10.5565/rev/elcvia.582
Ramanathan, M., Yau, W.Y., Teoh, E.K.: Human action recognition with video data: research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. (2014). https://doi.org/10.1109/THMS.2014.2325871
Saypadith, S., Aramvith, S.: Real-time multiple face recognition using deep learning on embedded GPU system. In: 2018 Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC 2018 - proceedings, pp. 1318–1324 (2019). https://doi.org/10.23919/APSIPA.2018.8659751
Singh, A., Patil, D., Omkar, S.N.: Eye in the sky: Real-time drone surveillance system (DSS) for violent individuals identification using scatternet hybrid deep learning network. In: IEEE computer society conference on computer vision and pattern recognition workshops, vol. 2018-June, pp. 1710–1718 (2018). https://doi.org/10.1109/CVPRW.2018.00214
Soliman, M.M., Kamal, M.H., Nashed, M.A.E.-M., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. (2019). https://doi.org/10.1109/ICICIS46948.2019.9014714
Srivastava, A., et al.: Recognizing human violent action using drone surveillance within real-time proximity. J. Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01171-2
Sumon, S.A., Goni, R., Hashem, N.B., Shahria, T., Rahman, R.M.: Violence detection by pretrained modules with different deep learning approaches. Vietnam J. Comput. Sci. 07(01), 19–40 (2020). https://doi.org/10.1142/s2196888820500013
Ullah, F.U.M., Ullah, A., Muhammad, K., Haq, I.U., Baik, S.W.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors (Switzerland) (2019). https://doi.org/10.3390/s19112472
van der Spoel, E., Rozing, M.P., Houwing-Duistermaat, J.J., Eline Slagboom, P., Beekman, M., de Craen, A.J.M., Westendorp, R.G.J., van Heemst, D.: Siamese neural networks for one-shot image recognition. ICML - deep learning workshop 7(11), 956–963 (2015) arXiv:arXiv:1011.1669v3
Wang, M., Deng, W.: Deep face recognition: a survey. Neurocomputing 429, 215–244 (2021) arXiv:1804.06655. https://doi.org/10.1016/j.neucom.2020.10.081
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Lecture notes in computer science (including Subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 9912 LNCS, pp. 20–36 (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Wu, B., Ai, H., Huang, C., Lao, S.: Fast rotation invariant multi-View face detection based on real adaboost. In: proceedings - Sixth IEEE international conference on automatic face and gesture recognition, pp. 79–84 (2004). https://doi.org/10.1300/J083v43n02_06
Xu, M.: Robust object detection with real-time fusion of multiview foreground silhouettes. Opt. Eng. 51(4), 047202 (2012). https://doi.org/10.1117/1.oe.51.4.047202
Zaheer, M.Z., Kim, J.Y., Kim, H.G., Na, S.Y.: A preliminary study on deep-learning based screaming sound detection. In: 2015 5th international conference on IT convergence and security, ICITCS 2015 - proceedings (July) (2015). https://doi.org/10.1109/ICITCS.2015.7292925
Zajdel, W., Krijnders, J.D., Andringa, T., Gavrila, D.M.: CASSANDRA: audio-video sensor fusion for aggression detection. In: 2007 IEEE conference on advanced video and signal based surveillance, AVSS 2007 proceedings (2007). https://doi.org/10.1109/AVSS.2007.4425310
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-Time Action Recognition with Enhanced Motion Vector CNNs. In: proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-December, pp. 2718–2726 (2016). https://doi.org/10.1109/CVPR.2016.297
Zhou, P., Ding, Q., Luo, H., Hou, X.: Violent interaction detection in video based on deep learning. J. Phys. Conf. Ser. (2017). https://doi.org/10.1088/1742-6596/844/1/012044
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Srivastava, A., Badal, T., Saxena, P. et al. UAV surveillance for violence detection and individual identification. Autom Softw Eng 29, 28 (2022). https://doi.org/10.1007/s10515-022-00323-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-022-00323-3