Abstract
Clustering has become an increasingly important task in analysing huge amounts of data. Traditional applications require that all data has to be located at the site where it is scrutinized. Nowadays, large amounts of heterogeneous, complex data reside on different, independently working computers which are connected to each other via local or wide area networks. In this paper, we propose a scalable density-based distributed clustering algorithm which allows a user-defined trade-off between clustering quality and the number of transmitted objects from the different local sites to a global server site. Our approach consists of the following steps: First, we order all objects located at a local site according to a quality criterion reflecting their suitability to serve as local representatives. Then we send the best of these representatives to a server site where they are clustered with a slightly enhanced density-based clustering algorithm. This approach is very efficient, because the local determination of suitable representatives can be carried out quickly and independently from each other. Furthermore, based on the scalable number of the most suitable local representatives, the global clustering can be done very effectively and efficiently. In our experimental evaluation, we will show that our new scalable density-based distributed clustering approach results in high quality clusterings with scalable transmission cost.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. ACM SIGMOD, Philadelphia, PA, pp. 49–60 (1999)
Brecheisen, S., Kriegel, H.-P., Kröger, P., Pfeifle, M.: Visually Mining Through Cluster Hierarchies. In: Proc. SIAM Int. Conf. on Data Mining, Orlando, FL (2004)
EsterM., K.H.-P., Sander, J., WimmerM., X.X.: Incremental Clustering for Mining in a Data Warehousing Environment. In: Proc. 24th Int. Conf. on Very Large Databases (VLDB), New York City, NY, pp. 323–333 (1998)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD), Portland, OR, pp. 226–231. AAAI Press, Menlo Park (1996)
Hanisch, R.J.: Distributed Data Systems and Services for Astronomy and the Space Sciences. In: Manset, N., Veillet, C., Crabtree, D. (eds.) ASP Conf. Ser., Vol. 216, Astronomical Data Analysis Software and Systems IX, ASP, San Francisco (2000)
Januzaj, E., Kriegel, H.-P., Pfeifle, M.: DBDC: Density-Based Distributed Clusteringö. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 88–105. Springer, Heidelberg (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 265–323 (1999)
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press (2000)
Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering Multi-Represented Objects with Noiseö. In: Proc. 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Sydney, Australia (2004)
Orenstein, J.A.: Redundancy in Spatial Databasesö. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 294–305 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Januzaj, E., Kriegel, HP., Pfeifle, M. (2004). Scalable Density-Based Distributed Clustering. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive