ABSTRACT
In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.
Supplemental Material
- M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14, pages 585--591. 2001.Google Scholar
- J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229--1243, 2003. Google ScholarDigital Library
- S. Boutemedjet, N. Bouguila, and D. Ziou. A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8):1429--1443, 2009. Google ScholarDigital Library
- C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for principal components analysis. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), pages 61--69, 2008. Google ScholarDigital Library
- D. Cai. Spectral Regression: A Regression Framework for Efficient Regularized Subspace Learning. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, May 2009. Google ScholarDigital Library
- D. Cai, X. He, and J. Han. Spectral regression: A unified approach for sparse subspace learning. In Proc. Int. Conf. on Data Mining (ICDM'07), 2007. Google ScholarDigital Library
- D. Cai, X. He, and J. Han. Sparse projections over graph. In Proc. 2008 AAAI Conf. on Artificial Intelligence (AAAI'08), 2008. Google ScholarDigital Library
- P. K. Chan, D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning and clustering. IEEE Transactions on Computer-Aided Design, 13:1088--1096, 1994.Google ScholarDigital Library
- F. R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. AMS, 1997.Google Scholar
- C. Constantinopoulos, M. K. Titsias, and A. Likas. Bayesian feature and model selection for gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6):1013--1018, 2006. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, Hoboken, NJ, 2nd edition, 2000. Google ScholarDigital Library
- J. G. Dy and C. E. Brodley. Feature selection for unsupervised learning. Journal of Machine Learning Research, 5:845--889, 2004. Google ScholarDigital Library
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407--499, 2004.Google ScholarCross Ref
- M. A. Fanty and R. Cole. Spoken letter recognition. In Advances in Neural Information Processing Systems 3, 1990. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001.Google Scholar
- X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. In Advances in Neural Information Processing Systems 18, 2005.Google Scholar
- J. J. Hull. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell., 16(5), 1994. Google ScholarDigital Library
- R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997. Google ScholarDigital Library
- M. H. C. Law, M. A. T. Figueiredo, and A. K. Jain. Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1154--1166, 2004. Google ScholarDigital Library
- H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4):491--502, 2005. Google ScholarDigital Library
- A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856. MIT Press, Cambridge, MA, 2001.Google ScholarDigital Library
- J. L. Rodgers and W. A. Nicewander. Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1):59--66, 1988.Google ScholarCross Ref
- V. Roth and T. Lange. Feature selection in clustering problems. In Advances in Neural Information Processing Systems 16. 2003.Google Scholar
- S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarCross Ref
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000. Google ScholarDigital Library
- G. W. Stewart. Matrix Algorithms Volume II: Eigensystems. SIAM, 2001. Google ScholarDigital Library
- J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarCross Ref
- L. Wolf and A. Shashua. Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6:1855--1887, 2005. Google ScholarDigital Library
- Z. Zhao and H. Liu. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th Annual International Conference on Machine Learning (ICML'07), pages 1151--1157, 2007. Google ScholarDigital Library
Index Terms
- Unsupervised feature selection for multi-cluster data
Recommendations
Cluster structure preserving unsupervised feature selection for multi-view tasks
Multi-view or multi-modal tasks exist in many areas of pattern analysis as the advancement of feature acquisition or extraction. These tasks are usually confronted with the issue of curse of dimensionality. In this work we consider the unsupervised ...
A Redundancy Based Unsupervised Feature Selection Method for High-Dimensional Data
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and ComputingFeature selection is a process to select key features from the initial feature set. It is commonly used as a preprocessing step to improve the efficiency and accuracy of a classification model in artificial intelligence and machine learning domains. ...
A clustering-based feature selection via feature separability
ICNC-FSKD 2015With the extensive increase of the amount of data, such as text categorization, genomic microarray data, bio-informatics and digital images, there are more and more challenges in feature selection. Recently, feature selection has been widely studied in ...
Comments