skip to main content
10.1145/1835804.1835848acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Unsupervised feature selection for multi-cluster data

Published:25 July 2010Publication History

ABSTRACT

In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.

Skip Supplemental Material Section

Supplemental Material

kdd2010_cai_ufsm_01.mov

mov

150.2 MB

References

  1. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14, pages 585--591. 2001.Google ScholarGoogle Scholar
  2. J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229--1243, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Boutemedjet, N. Bouguila, and D. Ziou. A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8):1429--1443, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for principal components analysis. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), pages 61--69, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Cai. Spectral Regression: A Regression Framework for Efficient Regularized Subspace Learning. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Cai, X. He, and J. Han. Spectral regression: A unified approach for sparse subspace learning. In Proc. Int. Conf. on Data Mining (ICDM'07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Cai, X. He, and J. Han. Sparse projections over graph. In Proc. 2008 AAAI Conf. on Artificial Intelligence (AAAI'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. K. Chan, D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning and clustering. IEEE Transactions on Computer-Aided Design, 13:1088--1096, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. AMS, 1997.Google ScholarGoogle Scholar
  10. C. Constantinopoulos, M. K. Titsias, and A. Likas. Bayesian feature and model selection for gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6):1013--1018, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. G. Dy and C. E. Brodley. Feature selection for unsupervised learning. Journal of Machine Learning Research, 5:845--889, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407--499, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. A. Fanty and R. Cole. Spoken letter recognition. In Advances in Neural Information Processing Systems 3, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.Google ScholarGoogle Scholar
  17. X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. In Advances in Neural Information Processing Systems 18, 2005.Google ScholarGoogle Scholar
  18. J. J. Hull. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell., 16(5), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. H. C. Law, M. A. T. Figueiredo, and A. K. Jain. Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1154--1166, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4):491--502, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856. MIT Press, Cambridge, MA, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. L. Rodgers and W. A. Nicewander. Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1):59--66, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  24. V. Roth and T. Lange. Feature selection in clustering problems. In Advances in Neural Information Processing Systems 16. 2003.Google ScholarGoogle Scholar
  25. S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  29. L. Wolf and A. Shashua. Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6:1855--1887, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Zhao and H. Liu. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th Annual International Conference on Machine Learning (ICML'07), pages 1151--1157, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Unsupervised feature selection for multi-cluster data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader