research-article

TranAD: deep transformer networks for anomaly detection in multivariate time series data

Authors:
Shreshth Tuli

Imperial College London, London, UK

Imperial College London, London, UK
View Profile

,
Giuliano Casale

Imperial College London, London, UK

Imperial College London, London, UK
View Profile

,
Nicholas R. Jennings

Loughborough University, London, UK

Loughborough University, London, UK
View Profile

Proceedings of the VLDB Endowment Volume 15 Issue 6pp 1201–1214https://doi.org/10.14778/3514061.3514067

Published:01 February 2022Publication History

Proceedings of the VLDB Endowment

Abstract

Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Additionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines.

References

Hossein Abbasimehr, Mostafa Shabani, and Mohsen Yousefi. 2020. An optimized model using LSTM network for demand forecasting. Computers & industrial engineering 143 (2020), 106435.Google Scholar
Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha Agha. 2017. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262 (2017), 134--147.Google ScholarCross Ref
Chuadhry Mujeeb Ahmed, Venkata Reddy Palleti, and Aditya P Mathur. 2017. WADI: a water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks. 25--28.Google ScholarDigital Library
Julien Audibert, Pietro Michiardi, Frédéric Guyard, Sébastien Marti, and Maria A Zuluaga. 2020. USAD: UnSupervised Anomaly Detection on Multivariate Time Series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3395--3404.Google ScholarDigital Library
Tharindu R Bandaragoda, Kai Ming Ting, David Albrecht, Fei Tony Liu, and Jonathan R Wells. 2014. Efficient anomaly detection by isolation using nearest neighbour ensemble. In 2014 IEEE International Conference on Data Mining Workshop. IEEE, 698--705.Google ScholarCross Ref
Julian Bellendorf and Zoltán Ádám Mann. 2020. Classification of optimization problems in fog computing. Future Generation Computer Systems 107 (2020), 158--176.Google ScholarDigital Library
Nejc Bezak, Mitja Brilly, and Mojca Šraj. 2014. Comparison between the peaks-over-threshold method and the annual maximum method for flood frequency analysis. Hydrological Sciences Journal 59, 5 (2014), 959--977.Google ScholarCross Ref
Paul Boniol, Michele Linardi, Federico Roncallo, and Themis Palpanas. 2020. Automated anomaly detection in large sequences. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1834--1837.Google ScholarCross Ref
Paul Boniol, Themis Palpanas, Mohammed Meftah, and Emmanuel Remy. 2020. Graphan: Graph-based subsequence anomaly detection. Proceedings of the VLDB Endowment 13, 12 (2020), 2941--2944.Google ScholarDigital Library
Paul Boniol, John Paparrizos, Themis Palpanas, and Michael J. Franklin. 2021. SAND: Streaming Subsequence Anomaly Detection. Proc. VLDB Endow. 14, 10 (2021), 1717--1729.Google ScholarDigital Library
Saikiran Bulusu, Bhavya Kailkhura, Bo Li, Pramod K Varshney, and Dawn Song. 2020. Anomalous example detection in deep learning: A survey. IEEE Access 8 (2020), 132330--132347.Google ScholarCross Ref
Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).Google Scholar
Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. 2018. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.Google Scholar
Ailin Deng and Bryan Hooi. 2021. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4027--4035.Google ScholarCross Ref
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126--1135.Google Scholar
Shaghayegh Gharghabi, Shima Imani, Anthony Bagnall, Amirali Darvishzadeh, and Eamonn Keogh. 2018. Matrix profile XII: MPDIST: a novel time series distance measure to allow data mining in more challenging scenarios. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 965--970.Google ScholarCross Ref
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation 101, 23 (2000), e215--e220.Google Scholar
Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.Google ScholarCross Ref
Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. 2020. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management 17, 4 (2020), 2064--2076.Google ScholarDigital Library
Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. 2018. Detecting spacecraft anomalies using LSTMs and non-parametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 387--395.Google ScholarDigital Library
Shima Imani, Frank Madrid, Wei Ding, Scott Crouter, and Eamonn Keogh. 2018. Matrix profile xiii: Time series snippets: a new primitive for time series data mining. In 2018 IEEE international conference on big knowledge (ICBK). IEEE, 382--389.Google ScholarCross Ref
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33, 4 (2019), 917--963.Google ScholarDigital Library
Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2020. Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series. Proceedings of the VLDB Endowment (2020).Google Scholar
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446.Google ScholarDigital Library
Stratis Kanarachos, Jino Mathew, Alexander Chroneos, and M Fitzpatrick. 2015. Anomaly detection in time series data using a combination of wavelets, neural networks and Hilbert transform. In 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE, 1--6.Google ScholarCross Ref
Eamonn Keogh, Dutta Roy Taposh, U Naik, and A Agrawal. 2021. Multi-dataset Time-Series Anomaly Detection Competition. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://compete.hexagon-ml.com/practice/competition/39/.Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Kyle Kingsbury and Peter Alvaro. 2020. Elle: inferring isolation anomalies from experimental observations. Proceedings of the VLDB Endowment 14, 3 (2020), 268--280.Google ScholarDigital Library
Dan Li, Dacheng Chen, Baihong Jin, Lei Shi, Jonathan Goh, and See-Kiong Ng. 2019. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In International Conference on Artificial Neural Networks. Springer, 703--716.Google ScholarDigital Library
Guoliang Li, Xuanhe Zhou, Ji Sun, Xiang Yu, Yue Han, Lianyuan Jin, Wenbo Li, Tianqing Wang, and Shifu Li. 2021. opengauss: An autonomous database system. Proceedings of the VLDB Endowment 14, 12 (2021), 3028--3042.Google ScholarDigital Library
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 2008 eighth ieee international conference on data mining. IEEE, 413--422.Google ScholarDigital Library
Steven Liu, Tongzhou Wang, David Bau, Jun-Yan Zhu, and Antonio Torralba. 2020. Diverse image generation via self-conditioned GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14286--14295.Google ScholarCross Ref
Aditya P Mathur and Nils Ole Tippenhauer. 2016. SWaT: a water treatment testbed for research and training on ICS security. In 2016 international workshop on cyber-physical systems for smart water networks (CySWater). IEEE, 31--36.Google ScholarCross Ref
Gideon Mbiydzenyuy. 2020. Univariate Time Series Anomaly Labelling Algorithm. In International Conference on Machine Learning, Optimization, and Data Science. Springer, 586--599.Google Scholar
Steena Monteiro, Forrest Iandola, and Daniel Wong. 2016. STOMP: Statistical Techniques for Optimizing and Modeling Performance of blocked sparse matrix vector multiplication. In 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 93--100.Google ScholarCross Ref
George B Moody and Roger G Mark. 2001. The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine 20, 3 (2001), 45--50.Google ScholarCross Ref
Takaaki Nakamura, Makoto Imamura, Ryan Mercer, and Eamonn Keogh. 2020. MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1190--1195.Google Scholar
Sasho Nedelkoski, Jasmin Bogatinovski, Ajay Kumar Mandapati, Soeren Becker, Jorge Cardoso, and Odej Kao. 2020. Multi-source distributed system data for AI-powered analytics. In European Conference on Service-Oriented and Cloud Computing. Springer, 161--176.Google ScholarDigital Library
Daehyung Park, Yuuna Hoshi, and Charles C Kemp. 2018. A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robotics and Automation Letters 3, 3 (2018), 1544--1551.Google ScholarCross Ref
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32 (2019), 8026--8037.Google Scholar
Animesh Patcha and Jung-Min Park. 2007. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer networks 51, 12 (2007), 3448--3470.Google Scholar
Noorhan Saleh and Maggie Mashaly. 2019. A Dynamic Simulation Environment for Container-based Cloud Data Centers using Container CloudSim. In 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS). IEEE, 332--336.Google Scholar
Osman Salem, Alexey Guerassimov, Ahmed Mehaoua, Anthony Marcus, and Borko Furht. 2014. Anomaly detection in medical wireless sensor networks using SVM and linear regression models. International Journal of E-Health and Medical Communications (IJEHMC) 5, 1 (2014), 20--45.Google ScholarDigital Library
Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet. 2017. Anomaly detection in streams with extreme value theory. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1067--1075.Google ScholarDigital Library
Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. 2019. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2828--2837.Google ScholarDigital Library
Srikanth Thudumu, Philip Branch, Jiong Jin, and Jugdutt Jack Singh. 2020. A comprehensive survey of anomaly detection techniques for high dimensional big data. Journal of Big Data 7, 1 (2020), 1--30.Google ScholarCross Ref
Luan Tran, Min Y Mun, and Cyrus Shahabi. 2020. Real-time distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 14, 2 (2020), 141--153.Google ScholarDigital Library
Shreshth Tuli, Giuliano Casale, and Nicholas R Jennings. 2022. PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing. In IEEE Conference on Computer Communications (INFOCOM). IEEE.Google Scholar
Shreshth Tuli, Giuliano Casale, and Nicholas R Jennings. 2022. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. arXiv preprint arXiv:2201.07284 (2022).Google Scholar
Shreshth Tuli, Shivananda Poojara, Satish Narayana Srirama, Giuliano Casale, and Nick Jennings. 2021. COSCO: Container Orchestration using Co-Simulation and Gradient Based Optimization for Fog Computing Environments. IEEE Transactions on Parallel and Distributed Systems (2021).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000--6010.Google Scholar
Yiyang Wang, Neda Masoud, and Anahita Khojandi. 2020. Real-time sensor anomaly detection and recovery in connected automated vehicle sensors. IEEE Transactions on Intelligent Transportation Systems 22, 3 (2020), 1411--1421.Google ScholarDigital Library
Y Webscope. [n.d.]. S5-A Labeled Anomaly Detection Dataset, Version 1.0. https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70. Accessed: 2021-08-31.Google Scholar
Krzysztof Witkowski. 2017. Internet of things, big data, industry 4.0--innovative solutions in logistics and supply chains management. Procedia engineering 182 (2017), 763--769.Google Scholar
Renjie Wu and Eamonn J Keogh. 2020. Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress. arXiv preprint arXiv:2009.13807 (2020).Google Scholar
Asrul H Yaacob, Ian KT Tan, Su Fong Chien, and Hon Khi Tan. 2010. Arima based network anomaly detection. In 2010 Second International Conference on Communication Software and Networks. IEEE, 205--209.Google ScholarDigital Library
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.Google ScholarDigital Library
Dragomir Yankov, Eamonn Keogh, and Umaa Rebbapragada. 2008. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets. Knowledge and Information Systems 17, 2 (2008), 241--262.Google ScholarDigital Library
Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2016. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM). Ieee, 1317--1322.Google Scholar
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, and Nitesh V Chawla. 2019. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1409--1416.Google ScholarDigital Library
Yuxin Zhang, Yiqiang Chen, Jindong Wang, and Zhiwen Pan. 2021. Unsupervised Deep Anomaly Detection for Multi-Sensor Time-Series Signals. IEEE Transactions on Knowledge and Data Engineering (2021).Google ScholarCross Ref
Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. 2020. Multivariate time-series anomaly detection via graph attention network. International Conference on Data Mining (2020).Google ScholarCross Ref
Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar, and Eamonn Keogh. 2018. Matrix profile XI: SCRIMP++ : time series motif discovery at interactive speeds. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 837--846.Google ScholarCross Ref
Zachary Zimmerman, Nader Shakibay Senobari, Gareth Funning, Evangelos Papalexakis, Samet Oymak, Philip Brisk, and Eamonn Keogh. 2019. Matrix profile XVIII: time series mining in the face of fast moving streams using a learned approximate matrix profile. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 936--945.Google ScholarCross Ref
Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International Conference on Learning Representations.Google Scholar

Index Terms

TranAD: deep transformer networks for anomaly detection in multivariate time series data
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More
Autoencoding Binary Classifiers for Supervised Anomaly Detection
PRICAI 2019: Trends in Artificial Intelligence
Abstract
We propose the Autoencoding Binary Classifiers (ABC), a novel supervised anomaly detector based on the Autoencoder (AE). There are two main approaches in anomaly detection: supervised and unsupervised. The supervised approach accurately detects ...
Read More
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 15, Issue 6
February 2022
179 pages
ISSN:2150-8097
Editors:
Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 February 2022
Published in pvldb Volume 15, Issue 6
Badges
- Artifacts Available / v1.1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 593
  Total Downloads
- Downloads (Last 12 months)320
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TranAD: deep transformer networks for anomaly detection in multivariate time series data

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Transductive Multilabel Learning via Label Set Propagation

Autoencoding Binary Classifiers for Supervised Anomaly Detection

Inductive Semi-supervised Multi-Label Learning with Co-Training