Article

Adaptive dimension reduction using discriminant analysis and K-means clustering

Authors:
Chris Ding

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

,
Tao Li

Florida International University, Miami, FL

Florida International University, Miami, FL
View Profile

ICML '07: Proceedings of the 24th international conference on Machine learningJune 2007Pages 521–528https://doi.org/10.1145/1273496.1273562

Published:20 June 2007Publication History

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 521–528

ABSTRACT

We combine linear discriminant analysis (LDA) and K-means clustering into a coherent framework to adaptively select the most discriminative subspace. We use K-means clustering to generate class labels and use LDA to do subspace selection. The clustering process is thus integrated with the subspace selection process and the data are then simultaneously clustered while the feature subspaces are selected. We show the rich structure of the general LDA-Km framework by examining its variants and their relationships to earlier approaches. Relations among PCA, LDA, K-means are clarified. Extensive experimental results on real-world datasets show the effectiveness of our approach.

References

Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., & Modha, D. (2004). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. Proc. ACM Int'l Conf Knowledge Disc. Data Mining (KDD). Google ScholarDigital Library
Beyer, K. S., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is nearest neighbor meaningful? Proceedings of 7th International Conference on Database Theory(ICDT'99) (pp. 217--235). Springer. Google ScholarDigital Library
Cheng, Y., & Church, G. (2000). Biclustering of expression data. Proc. Int'l Symp. Mol. Bio (ISMB), 93--103. Google ScholarDigital Library
Dasgupta, S. (2000). Experiments with random projection. Proc. 16th Conf. Uncertainty in Artificial Intelligence (UAI 2000). Google ScholarDigital Library
De la Torre, F., & Kanade, T. (2006). Discriminative cluster analysis. Proc. Int'l Conf. Machine Learning. Google ScholarDigital Library
Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. Proc. ACM Int'l Conf Knowledge Disc. Data Mining (KDD 2001). Google ScholarDigital Library
Ding, C., & He, X. (2004). K-means clustering and principal component analysis. Int'l Conf. Machine Learning (ICML). Google ScholarDigital Library
Ding, C., He, X., & Simon, H. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. Proc. SIAM Data Mining Conf.Google ScholarCross Ref
Ding, C., He, X., Zha, H., & Simon, H. (2002). Adaptive dimension reduction for clustering high dimensional data. Proc. IEEE Int'l Conf. Data Mining. Google ScholarDigital Library
Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal nonnegative matrix tri-factorizations for clustering. Proc. SIGKDD Int'l Conf. Knowledge Discovery and Data Mining(KDD), 126--135. Google ScholarDigital Library
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification, 2nd ed. Wiley. Google ScholarDigital Library
Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., & Moore, J. (1998). WebACE: A web agent for document categorization and exploration. Proceedings of the 2nd International Conference on Autonomous Agents (Agents'98). ACM Press. Google ScholarDigital Library
Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Springer Verlag.Google Scholar
Jolliffe, I. (2002). Principal component analysis. Springer. 2nd edition.Google Scholar
Lee, D., & Seung, H. (2001). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press.Google Scholar
Li, T., & Ma, S. (2004). IFD: Iterative feature and data clustering. Pro. SIAM Int'l conf. on Data Mining (SDM 2004) (pp. 472--476).Google Scholar
Li, T., Ma, S., & Ogihara, M. (2004). Document clustering via adaptive subspace iteration. Proc. conf. Research and development in IR (SIRGIR) (pp. 218--225). Google ScholarDigital Library
McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/mccallum/bow.Google Scholar
Park, H., & Howland, P. (2004). Generalizing discriminant analysis using the generalized singular value decomposition. IEEE. Trans. on Pattern Analysis and Machine Intelligence, 26, 995 -- 1006.Google ScholarDigital Library
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: a review. SIGKDD Explorations, 6, 90--105. Google ScholarDigital Library
Ye, J., & Xiong, T. (2006). Null space versus orthogonal linear discriminant analysis. Proc. Int'l Conf. Machine Learning. Google ScholarDigital Library
Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2002). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS'01), 1057--1064.Google Scholar
Zha, H., He, X., Ding, C., Gu, M., & Simon, H. (2001). Bipartite graph partitioning and data clustering. Proc. Int'l Conf. Information and Knowledge Management (CIKM 2001). Google ScholarDigital Library

Adaptive dimension reduction using discriminant analysis and K-means clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

New Dimension Reduction Methods for Quadratic Discriminant Analysis
Read More
Semi-supervised Dimension Reduction Using Graph-Based Discriminant Analysis
CIT '09: Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02

Semi-supervised learning aims to utilize unlabeled data in the process of supervised learning. In particular, combining semi-supervised learning with dimension reduction can reduce overfitting caused by small sample size in high dimensional data. By ...
Read More
Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures

A method for dimension reduction with clustering, classification, or discriminant analysis is introduced. This mixture model-based approach is based on fitting generalized hyperbolic mixtures on a reduced subspace within the paradigm of model-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 163
  Total Citations
  View Citations
- 1,535
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive dimension reduction using discriminant analysis and K-means clustering

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

New Dimension Reduction Methods for Quadratic Discriminant Analysis

Semi-supervised Dimension Reduction Using Graph-Based Discriminant Analysis

Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adaptive dimension reduction using discriminant analysis and K-means clustering

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

New Dimension Reduction Methods for Quadratic Discriminant Analysis

Semi-supervised Dimension Reduction Using Graph-Based Discriminant Analysis

Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media