Skip to main content

SAHN Clustering in Arbitrary Metric Spaces Using Heuristic Nearest Neighbor Search

  • Conference paper
Algorithms and Computation (WALCOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8344))

Included in the following conference series:

Abstract

Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods that are applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of \(\mathcal{O}(n\log n)\) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. In extensive experimental evaluations on real-world and synthetic data sets, we compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and running time. The evaluations show a subquadratic running time in practice and a very low memory footprint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. SIGMOD Rec. 28(2), 49–60 (1999)

    Article  Google Scholar 

  2. Breunig, M.M., Kriegel, H.P., Kröger, P., Sander, J.: Data bubbles: quality preserving performance boosting for hierarchical clustering. SIGMOD Rec. 30(2), 79–90 (2001)

    Article  Google Scholar 

  3. Chen, J., MacEachren, A.M., Peuquet, D.J.: Constructing overview + detail dendrogram-matrix views. TVCG 15(6), 889–896 (2009)

    Google Scholar 

  4. Downs, G.M., Barnard, J.M.: Clustering Methods and Their Uses in Computational Chemistry, pp. 1–40. John Wiley & Sons, Inc., New Jersey (2003)

    Google Scholar 

  5. Elkan, C.: Using the triangle inequality to accelerate k-means. In: ICML 2003, pp. 147–153. AAAI Press, Menlo Park (2003)

    Google Scholar 

  6. Eppstein, D.: Fast hierarchical clustering and other applications of dynamic closest pairs. Exp. Algorithmics 5(1) (2000)

    Google Scholar 

  7. Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78(383), 553–569 (1983)

    Article  MATH  Google Scholar 

  8. Koga, H., Ishibashi, T., Watanabe, T.: Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems 12(1), 25–53 (2007)

    Article  Google Scholar 

  9. Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies 1. hierarchical systems. The Computer Journal 9(4), 373–380 (1967)

    Article  Google Scholar 

  10. Meilă, M.: Comparing clusterings—an information based distance. JMVA 98(5), 873–895 (2007)

    MATH  Google Scholar 

  11. Murtagh, F.: Multidimensional clustering algorithms. In: COMPSTAT Lectures 4. Physica-Verlag, Wuerzburg (1985)

    Google Scholar 

  12. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl. Discov. 2(1), 86–97 (2012)

    Article  MathSciNet  Google Scholar 

  13. Müllner, D.: Modern hierarchical, agglomerative clustering algorithms, arXiv:1109.2378v1 (2011)

    Google Scholar 

  14. Nanni, M.: Speeding-up hierarchical agglomerative clustering in presence of expensive metrics. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 378–387. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Patra, B.K., Hubballi, N., Biswas, S., Nandi, S.: Distance based fast hierarchical clustering method for large datasets. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 50–59. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Rohlf, F.J.: Hierarchical clustering using the minimum spanning tree. Computer Journal 16, 93–95 (1973)

    Google Scholar 

  17. Wetzel, S., Klein, K., Renner, S., Rauh, D., Oprea, T.I., Mutzel, P., Waldmann, H.: Interactive exploration of chemical space with Scaffold Hunter. Nature Chemical Biology 5(8), 581–583 (2009)

    Article  Google Scholar 

  18. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. In: Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  19. Zhou, J.: Efficiently Searching and Mining Biological Sequence and Structure Data. Ph.D. thesis, University of Alberta (2009)

    Google Scholar 

  20. Zhou, J., Sander, J.: Speedup clustering with hierarchical ranking. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 1205–1210 (2006)

    Google Scholar 

  21. Zhou, J., Sander, J.: Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 452–463, VLDB Endowment (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kriege, N., Mutzel, P., Schäfer, T. (2014). SAHN Clustering in Arbitrary Metric Spaces Using Heuristic Nearest Neighbor Search. In: Pal, S.P., Sadakane, K. (eds) Algorithms and Computation. WALCOM 2014. Lecture Notes in Computer Science, vol 8344. Springer, Cham. https://doi.org/10.1007/978-3-319-04657-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04657-0_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04656-3

  • Online ISBN: 978-3-319-04657-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics