Skip to main content
Log in

Generating multiple alternative clusterings via globally optimal subspaces

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Clustering analysis is important for exploring complex datasets. Alternative clustering analysis is an emerging subfield involving techniques for the generation of multiple different clusterings, allowing the data to be viewed from different perspectives. We present two new algorithms for alternative clustering generation. A distinctive feature of our algorithms is their principled formulation of an objective function, facilitating the discovery of a subspace satisfying natural quality and orthogonality criteria. The first algorithm is a regularization of the Principal Components analysis method, whereas the second is a regularization of graph-based dimension reduction. In both cases, we demonstrate a globally optimum subspace solution can be computed. Experimental evaluation shows our techniques are able to equal or outperform a range of existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. In order to keep the values in \(L\) not proportional to the number of reference clusterings, we normalize \(L\)’s values within the range of 0 and 1.

References

  • Achlioptas D (2001) Database-friendly random projections. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’01. ACM, New York, pp 274–281

  • Arthur G, Olivier B, Alexander S, Bernhard S (2005) Measuring statistical dependence with Hilbert-Schmidt norms. In: Algorithmic learning theory

  • Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine

  • Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: The IEEE international conference on data mining. pp 53–62

  • Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404

    Article  Google Scholar 

  • Cover TM, Thomas JA (1991) Elements of Information Theory. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  • Cui Y, Fern X, Dy J (2007) Non-redundant multi-view clustering via orthogonalization. In: The IEEE international conference on data mining. pp 133–142

  • Dang XH, Bailey J (2010) Generation of alternative clusterings using the cami approach. In: SIAM international conference on data mining (SDM). pp 118–129

  • Dang XH, Bailey J (2010) A hierarchical information theoretic technique for the discovery of non linear alternative clusterings. In: ACM conference on knowledge discovery and data mining (SIGKDD). pp 573–582

  • Dasgupta S, Ng V (2010) Mining clustering dimensions. In: International conference on, machine learning, pp 263–270

  • Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: The IEEE international conference on data mining. pp 773–778

  • Fern X, Lin W (2008) Cluster ensemble selection. Stat Anal Data Min 1(3):128–141

    Article  MathSciNet  Google Scholar 

  • Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore

    Google Scholar 

  • Gondek D, Hofmann T (2004) Non-redundant data clustering. In: The IEEE international conference on data mining. pp 75–82

  • Jain P, Meka R, Dhillon I (2008) Simultaneous unsupervised learning of disparate clusterings. In: SIAM international conference on data mining (SDM). pp 858–869

  • Jolliffe IT (2010) Principal component analysis. Springer Series in Statistics. Springer, New York

    Google Scholar 

  • Kapur J (1994) Measures of information and their application. John Wiley, New York

  • Law M, Topchy A, Jain A (2004) Multiobjective data clustering. In: IEEE conference on computer vision and, pattern recognition (CVPR), pp 424–430

  • Lehoucq RB, Sorensen DC, Yang C (1998) ARPACK users guide: solution of large-scale eigenvalue problems with implicitly restarted arnoldi methods. SIAM, Philadelphia

  • Mikhail B, Partha N (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: International conference on neural information processing systems (NIPS), pp 585–591

  • Nguyen XV, Epps J (2010) minCEntropy: a novel information theoretic approach for the generation of alternative clusterings. In: The IEEE international conference on data mining, pp 521–530

  • Niu D, Dy J, Jordan IM (2010) Multiple non-redundant spectral clustering views. In: International conference on, machine learning, pp 831–838

  • Olken F, Rotem D (1990) Random sampling from database files: a survey. In: Proceedings of the 5th international conference on statistical and scientific database management, SSDBM’1990. Springer, London, pp 92–111

  • Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076

    Article  MATH  MathSciNet  Google Scholar 

  • Principe J, Xu D, Fisher J (2000) Information theoretic learning. Wiley, New York

  • Qi Z, Davidson I (2009) A principled and flexible framework for finding alternative clusterings. In: ACM conference on knowledge discovery and data mining (SIGKDD), pp 717–726

  • Sam TR, Lawrence KS (2000) Nonlinear dimensionality reduction by locally linear embedding. Sci J 290(5500):2323–2326

    Article  Google Scholar 

  • Stewart GW (2001) Matrix algorithms volume II: eigensystems. SIAM, Philadelphia

  • Wand, Jones M (1994) Kernel smoothing-monographs on statistics and applied probability. Chapman & Hall, Boca Raton

  • Wilkinson JH (1965) The algebraic eigenvalue problem. Claredon Press, Oxford

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Hong Dang.

Additional information

Responsible editor: Charu Aggarwal.

Majority of this work was done while the first author was with The University of Melbourne.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dang, X.H., Bailey, J. Generating multiple alternative clusterings via globally optimal subspaces. Data Min Knowl Disc 28, 569–592 (2014). https://doi.org/10.1007/s10618-013-0314-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0314-1

Keywords

Navigation