research-article

Profile Decomposition Based Hybrid Transfer Learning for Cold-Start Data Anomaly Detection

Authors:
Ziyue Li

University of Cologne and The Hong Kong University of Science and Technology, Kowloon, Hong Kong

University of Cologne and The Hong Kong University of Science and Technology, Kowloon, Hong Kong

0000-0003-4983-9352
View Profile

,
Hao Yan

Arizona State University, Tempe, Arizona

Arizona State University, Tempe, Arizona

0000-0002-4322-7323
View Profile

,
Fugee Tsung

The Hong Kong University of Science and Technology, Kowloon, Hong Kong

The Hong Kong University of Science and Technology, Kowloon, Hong Kong

0000-0002-0575-8254
View Profile

,
Ke Zhang

The Hong Kong University of Science and Technology, Kowloon, Hong Kong

The Hong Kong University of Science and Technology, Kowloon, Hong Kong

0000-0002-7827-1770
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 16 Issue 6Article No.: 121pp 1–28https://doi.org/10.1145/3530990

Published:30 July 2022Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Anomaly detection is an essential task for quality management in smart manufacturing. An accurate data-driven detection method usually needs enough data and labels. However, in practice, there commonly exist newly set-up processes in manufacturing, and they only have quite limited data available for analysis. Borrowing the name from the recommender system, we call this process a cold-start process. The sparsity of anomaly, the deviation of the profile, and noise aggravate the detection difficulty.

Transfer learning could help to detect anomalies for cold-start processes by transferring the knowledge from more experienced processes to the new processes. However, the existing transfer learning and multi-task learning frameworks are established on task- or domain-level relatedness. We observe instead, within a domain, some components (background and anomaly) share more commonality, others (profile deviation and noise) not. To this end, we propose a more delicate component-level transfer learning scheme, i.e., decomposition-based hybrid transfer learning (DHTL): It first decomposes a domain (e.g., a data source containing profiles) into different components (smooth background, profile deviation, anomaly, and noise); then, each component’s transferability is analyzed by expert knowledge; Lastly, different transfer learning techniques could be tailored accordingly. We adopted the Bayesian probabilistic hierarchical model to formulate parameter transfer for the background, and “L_2,1+L₁”-norm to formulate low dimension feature-representation transfer for the anomaly. An efficient algorithm based on Block Coordinate Descend is proposed to learn the parameters. A case study based on glass coating pressure profiles demonstrates the improved accuracy and completeness of detected anomaly, and a simulation demonstrates the fidelity of the decomposition results.

REFERENCES

[1] Bakker Bart and Heskes Tom. 2003. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research 4, May (2003), 83–99.Google ScholarDigital Library
[2] Beck Amir and Teboulle Marc. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 1 (2009), 183–202.Google ScholarDigital Library
[3] Ben-David Shai and Schuller Reba. 2003. Exploiting task relatedness for multiple task learning. In Proceedings of the Learning Theory and Kernel Machines. Springer, 567–580.Google ScholarCross Ref
[4] Bitar Ahmad W., Cheong Loong-Fah, and Ovarlez Jean-Philippe. 2019. Sparse and low-rank matrix decomposition for automatic target detection in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing 57, 8 (2019), 5239–5251.Google ScholarCross Ref
[5] Chandola Varun, Banerjee Arindam, and Kumar Vipin. 2009. Anomaly detection: A survey. ACM Computing Surveys 41, 3 (2009), 1–58.Google ScholarDigital Library
[6] Chen Jianhui, Liu Ji, and Ye Jieping. 2012. Learning incoherent sparse and low-rank patterns from multiple tasks. ACM Transactions on Knowledge Discovery from Data 5, 4 (2012), 1–31.Google ScholarDigital Library
[7] Longwei Cheng, Kai Wang, and Fugee Tsung. 2021. A hybrid transfer learning framework for in-plane freeform shape accuracy control in additive manufacturing. IISE Transactions 53, 3 (2021), 298–312. Google ScholarCross Ref
[8] Coakley John D.. 1950. Human operators and automatic machines. Personnel Psychology 3, 4 (1950), 401–411.Google ScholarCross Ref
[9] Du Bo and Zhang Liangpei. 2014. A discriminative metric learning based anomaly detection method. IEEE Transactions on Geoscience and Remote Sensing 52, 11 (2014), 6844–6857.Google ScholarCross Ref
[10] Bo Du, Liangpei Zhang, Dacheng Tao, and Dengyi Zhang. 2013. Unsupervised transfer learning for target detection from hyperspectral images. Neurocomputing 120 (2013), 72–82. Google ScholarCross Ref
[11] Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. 226–231.Google Scholar
[12] Gonçalves André R., Das Puja, Chatterjee Soumyadeep, Sivakumar Vidyashankar, Zuben Fernando J. Von, and Banerjee Arindam. 2014. Multi-task sparse structure learning. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 451–460.Google ScholarDigital Library
[13] Gonçalves André R., Zuben Fernando J. Von, and Banerjee Arindam. 2016. Multi-task sparse structure learning with gaussian copula models. The Journal of Machine Learning Research 17, 1 (2016), 1205–1234.Google ScholarDigital Library
[14] Guo Jie, Yan Hao, Zhang Chen, and Hoi Steven. 2020. Partially Observable Online Change Detection via Smooth-Sparse Decomposition. arXiv:2009.10645. Retrieved from https://arxiv.org/abs/2009.10645.Google Scholar
[15] Huang Shuai, Li Jing, Chen Kewei, Wu Teresa, Ye Jieping, Wu Xia, and Yao Li. 2012. A transfer learning approach for network modeling. IIE Transactions 44, 11 (2012), 915–931.Google ScholarCross Ref
[16] Idé Tsuyoshi, Phan Dzung T., and Kalagnanam Jayant. 2017. Multi-task multi-modal models for collective anomaly detection. In Proceedings of the 2017 IEEE International Conference on Data Mining. IEEE, 177–186.Google ScholarCross Ref
[17] III Hal Daume and Marcu Daniel. 2006. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research 26,1 (2006), 101–126.Google Scholar
[18] Jiang Ruoyi, Fei Hongliang, and Huan Jun. 2011. Anomaly localization for network data streams with graph joint sparse PCA. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 886–894.Google ScholarDigital Library
[19] Kumagai Atsutoshi, Iwata Tomoharu, and Fujiwara Yasuhiro. 2019. Transfer anomaly detection by inferring latent domain representations. In Proceedings of the Advances in Neural Information Processing Systems. 2471–2481.Google Scholar
[20] Li Bin, Yang Qiang, and Xue Xiangyang. 2009. Transfer learning for collaborative filtering via a rating-matrix generative model. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 617–624.Google ScholarDigital Library
[21] Li Ziyue, Sergin Nurettin Dorukhan, Yan Hao, Zhang Chen, and Tsung Fugee. 2020. Tensor completion for weakly-dependent data on graph for metro passenger flow prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4804–4810.Google ScholarCross Ref
[22] Jun Liu, Shuiwang Ji, and Jieping Ye. 2009. Multi-task feature learning via efficient l2, 1-norm minimization. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. AUAI Press, 339–348.Google Scholar
[23] Song Liu, Makoto Yamada, Nigel Collier, and Masashi Sugiyama. 2013. Change-point detection in time-series data by relative density-ratio estimation. Neural Networks 43 (2013), 72–83. Google ScholarDigital Library
[24] Ma Jiaqi, Zhao Zhe, Yi Xinyang, Chen Jilin, Hong Lichan, and Chi Ed H.. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930–1939.Google ScholarDigital Library
[25] Marques Pedro A., Cardeira Carlos B., Paranhos Paula, Ribeiro Sousa, and Gouveia Helena. 2015. Selection of the most suitable statistical process control approach for short production runs: A decision-model. International Journal of Information and Education Technology 5, 4 (2015), 303.Google ScholarCross Ref
[26] Masoudnia Saeed and Ebrahimpour Reza. 2014. Mixture of experts: A literature survey. Artificial Intelligence Review 42, 2 (2014), 275–293.Google ScholarDigital Library
[27] Pan Sinno Jialin and Yang Qiang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345–1359.Google ScholarDigital Library
[28] Pan Sinno Jialin, Zheng Vincent Wenchen, Yang Qiang, and Hu Derek Hao. 2008. Transfer learning for wifi-based indoor localization. In Proceedings of the Association for the Advancement of Artificial Intelligence Workshop. 6.Google Scholar
[29] Pratt Lorien Y.. 1993. Discriminability-based transfer between neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 204–211.Google Scholar
[30] Qu Ying, Wang Wei, Guo Rui, Ayhan Bulent, Kwan Chiman, Vance Steven, and Qi Hairong. 2018. Hyperspectral anomaly detection through spectral unmixing and dictionary-based low-rank decomposition. IEEE Transactions on Geoscience and Remote Sensing 56, 8 (2018), 4391–4405.Google ScholarCross Ref
[31] Samarov Daniel V., Allen David, Hwang Jeeseong, Lee Young Jong, and Litorja Maritoni. 2017. A coordinate-descent-based approach to solving the sparse group elastic net. Technometrics 59, 4 (2017), 437–445.Google ScholarCross Ref
[32] Bo Shen, Rongxuan Wang, Andrew Chung Chee Law, Rakesh Kamath, Hahn Choo, and Zhenyu (James) Kong. 2022. Super resolution for multi-Sources image stream data using smooth and sparse tensor completion and its applications in data acquisition of additive manufacturing. Technometrics 64, 1 (2022), 2–17. Google ScholarCross Ref
[33] Tsung Fugee, Zhang Ke, Cheng Longwei, and Song Zhenli. 2018. Statistical transfer learning: A review and some extensions to statistical process control. Quality Engineering 30, 1 (2018), 115–128.Google ScholarCross Ref
[34] Wang Zirui, Dai Zihang, Póczos Barnabás, and Carbonell Jaime. 2019. Characterizing and avoiding negative transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11293–11302.Google ScholarCross Ref
[35] Weiss Karl, Khoshgoftaar Taghi M., and Wang DingDing. 2016. A survey of transfer learning. Journal of Big Data 3, 1 (2016), 9.Google ScholarCross Ref
[36] Tao Wu, Ellie Ka-In Chio, Heng-Tze Cheng, Yu Du, Steffen Rendle, Dima Kuzmin, Ritesh Agarwal, Li Zhang, John Anderson, Sarvjeet Singh, Tushar Chandra, Ed H. Chi, Wen Li, Ankit Kumar, Xiang Ma, Alex Soares, Nitin Jindal, and Pei Cao. 2020. Zero-shot heterogeneous transfer learning from recommender systems to cold-start search retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2821–2828.Google ScholarDigital Library
[37] Xu Yang, Wu Zebin, Li Jun, Plaza Antonio, and Wei Zhihui. 2015. Anomaly detection in hyperspectral images based on low-rank and sparse representation. IEEE Transactions on Geoscience and Remote Sensing 54, 4 (2015), 1990–2000.Google ScholarCross Ref
[38] Yan Hao, Paynabar Kamran, and Shi Jianjun. 2017. Anomaly detection in images with smooth background via smooth-sparse decomposition. Technometrics 59, 1 (2017), 102–114.Google ScholarCross Ref
[39] Yan Hao, Paynabar Kamran, and Shi Jianjun. 2018. Real-time monitoring of high-dimensional functional data streams via spatio-temporal smooth sparse decomposition. Technometrics 60, 2 (2018), 181–197.Google ScholarCross Ref
[40] Yan Ming, Sang Jitao, Mei Tao, and Xu Changsheng. 2013. Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge. In Proceedings of the 2013 IEEE International Conference on Multimedia and Expo. IEEE, 1–6.Google Scholar
[41] Yuan Yuan, Chen Nan, and Zhou Shiyu. 2013. Adaptive B-spline knot selection using multi-resolution basis set. IIE Transactions 45, 12 (2013), 1263–1277.Google ScholarCross Ref
[42] Xiaowei Yue, Hao Yan, Jin Gyu Park, Zhiyong Liang, and Jianjun Shi. 2018. A wavelet-based penalized mixed-effects decomposition for Multichannel profile detection of in-line Raman spectroscopy. IEEE Transactions on Automation Science and Engineering 15, 3 (2018), 1258–1271. Google ScholarCross Ref
[43] Yuksel Seniha Esen, Wilson Joseph N., and Gader Paul D.. 2012. Twenty years of mixture of experts. IEEE Transactions on Neural Networks and Learning Systems 23, 8 (2012), 1177–1193.Google ScholarCross Ref
[44] Zhang Chen, Yan Hao, Lee Seungho, and Shi Jianjun. 2018. Weakly correlated profile monitoring based on sparse multi-channel functional principal component analysis. IISE Transactions 50, 10 (2018), 878–891.Google ScholarCross Ref
[45] Zhang Xiaotong, Zhang Xianchao, Liu Han, and Luo Jiebo. 2018. Multi-task clustering with model relation learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3132–3140.Google Scholar
[46] Zhang Yuxiang, Du Bo, Zhang Liangpei, and Liu Tongliang. 2016. Joint sparse representation and multitask learning for hyperspectral target detection. IEEE Transactions on Geoscience and Remote Sensing 55, 2 (2016), 894–906.Google ScholarCross Ref
[47] Zhang Yuxiang, Du Bo, Zhang Liangpei, and Wang Shugen. 2015. A low-rank and sparse matrix decomposition-based Mahalanobis distance method for hyperspectral anomaly detection. IEEE Transactions on Geoscience and Remote Sensing 54, 3 (2015), 1376–1389.Google ScholarCross Ref
[48] Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. Google ScholarCross Ref
[49] Zhao Liang, Sun Qian, Ye Jieping, Chen Feng, Lu Chang-Tien, and Ramakrishnan Naren. 2015. Multi-task learning for spatio-temporal event forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1503–1512.Google ScholarDigital Library
[50] Zhao Liang, Sun Qian, Ye Jieping, Chen Feng, Lu Chang-Tien, and Ramakrishnan Naren. 2017. Feature constrained multi-task learning models for spatiotemporal event forecasting. IEEE Transactions on Knowledge and Data Engineering 29, 5 (2017), 1059–1072.Google ScholarDigital Library
[51] Yujie Zhao, Hao Yan, Sarah Holte, and Yajun Mei. 2022. Rapid detection of hot-spots via tensor decomposition with applications to crime rate data. Journal of Applied Statistics 49, 7 (2022), 1636–1662. Google ScholarCross Ref
[52] Zhong Runxing, Lv Weifeng, Du Bowen, Lei Shuo, and Huang Runhe. 2017. Spatiotemporal multi-task learning for citywide passenger flow prediction. In Proceedings of the 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1–8.Google Scholar
[53] Zhou Jiayu, Chen Jianhui, and Ye Jieping. 2011. Malsar: Multi-task learning via structural regularization. Arizona State University 21 (2011).Google Scholar
[54] Zhou Jiayu, Yuan Lei, Liu Jun, and Ye Jieping. 2011. A multi-task learning formulation for predicting disease progression. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 814–822.Google ScholarDigital Library
[55] Zhou Joey Tianyi, Pan Sinno Jialin, Tsang Ivor W., and Yan Yan. 2014. Hybrid heterogeneous transfer learning through deep learning. In Proceedings of the National Conference on Artificial Intelligence.Google ScholarCross Ref

Index Terms

Profile Decomposition Based Hybrid Transfer Learning for Cold-Start Data Anomaly Detection
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Unsupervised learning
        Anomaly detection

Recommendations

Anomaly Subgraph Detection with Feature Transfer
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Anomaly detection in multilayer graphs becomes more critical in many application scenarios, i.e., identifying crime hotspots in urban areas by discovering suspicious and illicit behaviors in social networks. However, it is a big challenge to identify ...
Read More
Transfer learning for video anomaly detection
Soft Computing and Intelligent Systems: Techniques and Applications

Anomaly detection from crowd is a widely addressed problem in the field of computer vision. It is an essential part of video surveillance and security. In surveillance videos, very little information about anomalous behaviors is available, so it becomes ...
Read More
Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies

Anomaly detection is an important problem that has been popularly researched within diverse research areas and application domains. One of the open problems in anomaly detection is the modeling and prediction of complex sequential data, which consist of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 16, Issue 6
December 2022
631 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3543989
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 July 2022
- Online AM: 24 April 2022
- Accepted: 1 April 2022
- Revised: 1 February 2022
- Received: 1 April 2021
Published in tkdd Volume 16, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cold-start anomaly detection
Profile decomposition
Hybrid Transfer learning
Bayesian probabilistic model and regularization
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 549
  Total Downloads
- Downloads (Last 12 months)196
- Downloads (Last 6 weeks)46
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Profile Decomposition Based Hybrid Transfer Learning for Cold-Start Data Anomaly Detection

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Anomaly Subgraph Detection with Feature Transfer

Transfer learning for video anomaly detection

Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies