Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

Deepak, K.; Srivathsan, G.; Roshan, S.; Chandrakala, S.

doi:10.1007/s00034-020-01522-7

Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

Published: 25 August 2020

Volume 40, pages 1333–1349, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

K. Deepak¹,
G. Srivathsan¹,
S. Roshan¹ &
…
S. Chandrakala ORCID: orcid.org/0000-0003-4723-1984¹

1390 Accesses
21 Citations
Explore all metrics

Abstract

Visual perception is a transformative technology that can recognize patterns from environments through visual inputs. Automatic surveillance of human activities has gained significant importance in both public and private spaces. It is often difficult to understand the complex dynamics of events in real-time scenarios due to camera movements, cluttered backgrounds, and occlusion. Existing anomaly detection systems are not efficient because of high intra-class variations and inter-class similarities existing among activities. Hence, there is a demand to explore different kinds of information extracted from surveillance videos to improve overall performance. This can be achieved by learning features from multiple forms (views) of the given raw input data. We propose two novel methods based on the multi-view representation learning framework. The first approach is a hybrid multi-view representation learning that combines deep features extracted from 3D spatiotemporal autoencoder (3D-STAE) and robust handcrafted features based on spatiotemporal autocorrelation of gradients. The second approach is a deep multi-view representation learning that combines deep features extracted from two-stream STAEs to detect anomalies. Results on three standard benchmark datasets, namely Avenue, Live Videos, and BEHAVE, show that the proposed multi-view representations modeled with one-class SVM perform significantly better than most of the recent state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video Anomaly Detection Based on Adaptive Multiple Auto-Encoders

Dynamic video anomaly detection and localization using sparse denoising autoencoders

Article 21 June 2017

Appearance-motion heterogeneous networks for video anomaly detection

Article 17 October 2023

References

A. Ali, G.W. Taylor, Real-time end-to-end action detection with two-stream networks, in 2018 15th Conference on Computer and Robot Vision (CRV) (IEEE, 2018), pp. 31–38
A. Appathurai, R. Sundarasekar, C. Raja, E.J. Alex, C.A. Palagan, A. Nithya, An efficient optimal neural network-based moving vehicle detection in traffic video surveillance system. Circuits Syst. Signal Process. 39(2), 734–756 (2020)
Article Google Scholar
S. Biswas, R. V. Babu, Real time anomaly detection in \(h\). 264 compressed videos, in 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG) (IEEE, 2013), pp. 1–4
S. Blunsden, R. Fisher, The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Ann. BMVA 4(1–12), 4 (2010)
Google Scholar
C.C. Chang, C.J. Lin, Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
K.W. Cheng, Y.T. Chen, W.H. Fang, Video anomaly detection and localization using hierarchical feature representation and gaussian process regression, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 2909–2917
W. Chu, H. Xue, C. Yao, D. Cai, Sparse coding guided spatiotemporal feature learning for abnormal event detection in large videos. IEEE Trans. Multimed. 21(1), 246–255 (2019)
Article Google Scholar
A. Del Giorno, J.A. Bagnell, M. Hebert, A discriminative framework for anomaly detection in large videos, in European Conference on Computer Vision (Springer, 2016), pp. 334–349
A. Dilawari, M.U.G. Khan, ur Rehman Z, Awan KM, Mehmood I, Rho S, Toward generating human-centered video annotations. Circuits Syst. Signal Process. 39(2), 857–883 (2020)
Article Google Scholar
G. Dong, G. Liao, H. Liu, G. Kuang, A review of the autoencoder and its variants: a comparative perspective from target recognition in synthetic-aperture radar images. IEEE Geosci. Remote Sens. Mag. 6(3), 44–68 (2018)
Article Google Scholar
J.K. Dutta, B. Banerjee, Online detection of abnormal events using incremental coding length, in 29th AAAI Conference on Artificial Intelligence (2015)
Z. Fang, F. Fei, Y. Fang, C. Lee, N. Xiong, L. Shu, S. Chen, Abnormal event detection in crowded scenes based on deep learning. Multimed. Tools Appl. 75(22), 14617–14639 (2016)
Article Google Scholar
G. Farnebäck, Two-frame motion estimation based on polynomial expansion, in Scandinavian Conference on Image Analysis (Springer, 2003), pp. 363–370
N.B. Ghrab, E. Fendri, M. Hammami, Abnormal events detection based on trajectory clustering, in 2016 13th International Conference on Computer Graphics (Imaging and Visualization (CGiV), IEEE, 2016), pp. 301–306
D. Gong, L. Liu, V. Le, B. Saha, M.R. Mansour, S. Venkatesh, A.V.D. Hengel, Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. ArXiv preprint arXiv:1904.02639 (2019)
M. Hasan, J. Choi, J. Neumann, A.K. Roy-Chowdhury, L.S. Davis, Learning temporal regularity in video sequences, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 733–742
X. Hu, S. Hu, Y. Huang, H. Zhang, H. Wu, Video anomaly detection using deep incremental slow feature analysis network. IET Comput. Vis. 10(4), 258–267 (2016)
Article Google Scholar
R.T. Ionescu, S. Smeureanu, M. Popescu, B. Alexe, Detecting abnormal events in video using narrowed normality clusters, in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2019), pp. 1951–1960
V. Kaltsa, A. Briassouli, I. Kompatsiaris, M.G. Strintzis, Swarm-based motion features for anomaly detection in crowds, in 2014 IEEE International Conference on Image Processing (ICIP) (IEEE, 2014), pp. 2353–2357
A. Khamparia, B. Pandey, S. Tiwari, D. Gupta, A. Khanna, J.J. Rodrigues, An integrated hybrid CNN-RNN model for visual description and generation of captions. Circuits Syst. Signal Process. 39(2), 776–788 (2020)
Article Google Scholar
M.U.K. Khan, H.S. Park, C.M. Kyung, Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans. Inf. Forensics Secur. 14(2), 541–556 (2018)
Article Google Scholar
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. ArXiv preprint arXiv:1412.6980v9 (2014)
T. Kobayashi, N. Otsu, Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn. Lett. 33(9), 1188–1195 (2012)
Article Google Scholar
S.K. Kumaran, D.P. Dogra, P.P. Roy, A. Mitra, Video trajectory classification and anomaly detection using hybrid CNN-VAE. ArXiv preprint arXiv:1812.07203 (2018)
R. Leyva, V. Sanchez, C.T. Li, Abnormal event detection in videos using binary features, in 2017 40th International Conference on Telecommunications and Signal Processing (TSP) (IEEE, 2017), pp. 621–625
R. Leyva, V. Sanchez, C.T. Li, The LV dataset: a realistic surveillance video dataset for abnormal event detection, in 2017 5th International Workshop on Biometrics and Forensics (IWBF) (IEEE, 2017), pp. 1–6
Q. Li, W. Li, A novel framework for anomaly detection in video surveillance using multi-feature extraction, in 2016 9th International Symposium on Computational Intelligence and Design (ISCID), vol. 1 (IEEE, 2016), pp. 455–459
Y. Li, M. Yang, Z.M. Zhang, A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31(10), 1863–1883 (2018)
Article Google Scholar
W. Liu, W. Luo, D. Lian, S. Gao, Future frame prediction for anomaly detection—a new baseline, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 6536–6545
D.G. Lowe et al., Object recognition from local scale-invariant features. ICCV 99, 1150–1157 (1999)
Google Scholar
C. Lu, J. Shi, J. Jia, Abnormal event detection at 150 fps in matlab, in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 2720–2727
W. Luo, W. Liu, S. Gao, A revisit of sparse coding based anomaly detection in stacked rnn framework, in Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 341–349
E.B. Nievas, O.D. Suarez, G.B. García, R. Sukthankar, Violence detection in video using computer vision techniques, in International Conference on Computer Analysis of Images and Patterns (Springer, 2011), pp. 332–339
N. Noceti, F. Odone, A. Sciutti, G. Sandini, Exploring biological motion regularities of human actions: a new perspective on video analysis. ACM Trans. Appl. Percept. 14(3), 21:1–21:20 (2017). https://doi.org/10.1145/3086591
Article Google Scholar
R. Ramya, K. Mala, S.S. Nidhyananthan, 3D facial expression recognition using multi-channel deep learning framework. Circuits Syst. Signal Process. 39(2), 789–804 (2020)
Article Google Scholar
M. Ravanbakhsh, M. Nabi, E. Sangineto, L. Marcenaro, C. Regazzoni, N. Sebe, Abnormal event detection in videos using generative adversarial nets, in 2017 IEEE International Conference on Image Processing (ICIP) (IEEE, 2017), pp. 1577–1581
M. Sabokrou, M. Fayyaz, M. Fathy, R. Klette, Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)
Article MathSciNet Google Scholar
S. Smeureanu, R.T. Ionescu, M. Popescu, B. Alexe, Deep appearance features for abnormal behavior detection in video, in International Conference on Image Analysis and Processing (Springer, 2017), pp. 779–789
S. Sudhakaran, O. Lanz, Learning to detect violent videos using convolutional long short-term memory, in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (IEEE, 2017), pp. 1–6
W. Sultani, C. Chen, M. Shah, Real-world anomaly detection in surveillance videos, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 6479–6488
Q. Sun, H. Liu, T. Harada, Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recogn. 64, 187–201 (2017)
Article Google Scholar
T. Tadros, N.C. Cullen, M.R. Greene, E.A. Cooper, Assessing neural network scene classification from degraded images. ACM Trans. Appl. Percept. 16(4), 21:1–21:20 (2019). https://doi.org/10.1145/3342349
Article Google Scholar
H.T. Tran, D. Hogg, Anomaly detection using a convolutional winner-take-all autoencoder, in Proceedings of the British Machine Vision Conference 2017 (British Machine Vision Association, 2017)
R. Tudor Ionescu, S. Smeureanu, B. Alexe, M. Popescu, Unmasking the abnormal events in video, in Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2895–2903
S. Wang, Y. Zeng, Q. Liu, C. Zhu, E. Zhu, J. Yin, Detecting abnormality without knowing normality, in ACM International Conference on Multimedia (ACM Press, 2018)
S. Xingjian, Z. Chen, H. Wang, D.Y. Yeung, W.K. Wong, W.C. Woo, Convolutional LSTM network: a machine learning approach for precipitation nowcasting, in Advances in Neural Information Processing Systems (2015), pp. 802–810
D. Xu, E. Ricci, Y. Yan, J. Song, N. Sebe, Learning deep representations of appearance and motion for anomalous event detection. ArXiv preprint arXiv:1510.01553 (2015)
M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional networks, in CVPR, vol. 10 (2010), p. 7
T. Zhang, W. Jia, X. He, J. Yang, Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans. Circuits Syst. Video Technol. 27(3), 696–709 (2017)
Article Google Scholar
Y. Zhang, H. Lu, L. Zhang, X. Ruan, S. Sakai, Video anomaly detection based on locality sensitive hashing filters. Pattern Recogn. 59, 302–311 (2016)
Article Google Scholar
Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu, X.S. Hua, Spatio-temporal autoencoder for video anomaly detection, in ACM Multimedia (2017)
J. Zhao, X. Xie, X. Xu, S. Sun, Multi-view learning overview: recent progress and new challenges. Inf. Fus. 38, 43–54 (2017)
Article Google Scholar
J.T. Zhou, J. Du, H. Zhu, X. Peng, Y. Liu, R.S.M. Goh, Anomalynet: an anomaly detection network for video surveillance. IEEE Trans. Inf. Forensics Secur. 14(10), 2537–2550 (2019)
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the following funding agencies: “Council of Scientific and Industrial Research (CSIR)” (09/1095(0043)/19-EMR-I) and “Assistive speech” (No. DST/CSRI/2017/131(G)) project under the Cognitive Science Research Initiative (CSRI) sanctioned by the Department of Science and Technology, Government of India.

Author information

Authors and Affiliations

Intelligent Systems Group, School of Computing, SASTRA University, Thanjavur, 613401, India
K. Deepak, G. Srivathsan, S. Roshan & S. Chandrakala

Authors

K. Deepak
View author publications
You can also search for this author in PubMed Google Scholar
G. Srivathsan
View author publications
You can also search for this author in PubMed Google Scholar
S. Roshan
View author publications
You can also search for this author in PubMed Google Scholar
S. Chandrakala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Chandrakala.

Ethics declarations

Data Availability Statement

The datasets used in the experiments, namely CUHK Avenue, Live Videos (LV), and BEHAVE, are publicly available. The code supporting this work is available from the corresponding author upon request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deepak, K., Srivathsan, G., Roshan, S. et al. Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders. Circuits Syst Signal Process 40, 1333–1349 (2021). https://doi.org/10.1007/s00034-020-01522-7

Download citation

Received: 21 January 2020
Revised: 07 August 2020
Accepted: 11 August 2020
Published: 25 August 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00034-020-01522-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

Abstract

Access this article

Similar content being viewed by others

Video Anomaly Detection Based on Adaptive Multiple Auto-Encoders

Dynamic video anomaly detection and localization using sparse denoising autoencoders

Appearance-motion heterogeneous networks for video anomaly detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability Statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

Abstract

Access this article

Similar content being viewed by others

Video Anomaly Detection Based on Adaptive Multiple Auto-Encoders

Dynamic video anomaly detection and localization using sparse denoising autoencoders

Appearance-motion heterogeneous networks for video anomaly detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability Statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation