Abstract
Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guha, S., et al.: Clustering data streams. In: Data Stream Management, pp. 169–187 (2000)
Moshtaghi, M., et al.: Streaming analysis in wireless sensor networks. Wirel. Commun. Mobile Comput. 14(9), 905–921 (2014)
Silva, J., et al.: Data stream clustering: a survey. ACM CSUR 46(1), 13 (2013)
Kranen, P., et al.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)
Cao, F., et al.: Density-based clustering over an evolving data stream with noise. In: SIAM International Conference on Data Mining, pp. 328–339 (2006)
Carpenter, G.A., et al.: Art 2-a: an adaptive resonance algorithm for rapid category learning and recognition. In: IEEE International Joint Conference on Neural Networks, pp. 151–156 (1991)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1(14), pp. 281–297 (1967)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1992)
Angelov, P.: Evolving takagi-sugeno fuzzy systems from streaming data. In: Evolving Intelligent Systems: Methodology and Applications, vol. 12, p. 21 (2010)
Kohonen, T.: The self-organizing map. Neurocomputing 21(1), 1–6 (1998)
Charikar, M., et al.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)
Feldman, J.A., Ballard, D.H.: Connectionist models and their properties. Cogn. Sci. 6(3), 205–254 (1982)
Moshtaghi, M., et al.: Online clustering of multivariate time-series. In: SIAM International Conference on Data Mining, pp. 360–368 (2016)
Rajasegarar, S., et al.: Elliptical anomalies in wireless sensor networks. ACM Trans. Sensor Netw. 6(1), 7 (2009)
Moshtaghi, M., et al.: Evolving fuzzy rules for anomaly detection in data streams. IEEE Trans. Fuzzy Syst. 23(3), 688–700 (2015)
Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Heidelberg (2007)
Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, vol. 34, pp. 226–231 (1996)
Bielecki, A., Wójcik, M.: Hybrid system of ART and RBF neural networks for online clustering. Appl. Soft Comput. 58, 1–10 (2017)
Lei, Y., et al.: Generalized information theoretic cluster validity indices for soft clusterings. In: IEEE Symposium on CIDM, pp. 24–31 (2014)
Salehi, M., Leckie, C.A., Moshtaghi, M., Vaithianathan, T.: A relevance weighted ensemble model for anomaly detection in switching data streams. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8444, pp. 461–473. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06605-9_38
Chenaghlou, M., et al.: An efficient method for anomaly detection in nonstationary environments. In: IEEE Globecom (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Chenaghlou, M., Moshtaghi, M., Leckie, C., Salehi, M. (2018). Online Clustering for Evolving Data Streams with Online Anomaly Detection. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-93037-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)