ABSTRACT
Feature selection aims to reduce dimensionality for building comprehensible learning models with good generalization performance. Feature selection algorithms are largely studied separately according to the type of learning: supervised or unsupervised. This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. The proposed framework is able to generate families of algorithms for both supervised and unsupervised feature selection. And we show that existing powerful algorithms such as ReliefF (supervised) and Laplacian Score (unsupervised) are special cases of the proposed framework. To the best of our knowledge, this work is the first attempt to unify supervised and unsupervised feature selection, and enable their joint study under a general framework. Experiments demonstrated the efficacy of the novel algorithms derived from the framework.
- Chapelle, O., Schölkopf, B., and Zien, A. (Eds.) (2006). Semi-supervised learning, chapter Graph-Based Methods. The MIT Press.Google Scholar
- Chung, F. (1997). Spectral graph theory. AMS.Google Scholar
- Dy, J., & Brodley, C. E. (2004). Feature selection for unsupervised learning. JMLR., 5, 845--889. Google ScholarDigital Library
- Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. JMLR., 3, 1157--1182. Google ScholarDigital Library
- He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. In NIPS. MIT Press.Google Scholar
- Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. ICML. Google ScholarDigital Library
- Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. JMLR., 5, 27--72. Google ScholarDigital Library
- Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE TKDE, 17, 491--502. Google ScholarDigital Library
- Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. NIPS.Google Scholar
- Lehoucq, R. B. (2001). Implicitly Restarted Arnoldi Methods and Subspace Iteration. SIAM J. Matrix Anal. Appl. 23, 551--562. Google ScholarDigital Library
- Robnik-Sikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of Relief and ReliefF. Machine Learning, 53, 23--69. Google ScholarDigital Library
- Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. CVPR. Google ScholarDigital Library
- Smola, A., & Kondor, I. (2003). Kernels and regularization on graphs. COLT.Google Scholar
- Wolf, L., & Shashua, A. (2005). Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. JMLR., 6, 1855--1887. Google ScholarDigital Library
- Zhang, T., & Ando, R. (2006). Analysis of spectral kernel design based semi-supervised learning. NIPS.Google Scholar
- Zhao, Z., & Liu, H. (2007). Semi-supervised Feature Selection via Spectral Analysis. SDM.Google Scholar
- Spectral feature selection for supervised and unsupervised learning
Recommendations
l2,1-norm regularized discriminative feature selection for unsupervised learning
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume TwoCompared with supervised learning for feature selection, it is much more difficult to select the discriminative features in unsupervised learning due to the lack of label information. Traditional unsupervised feature selection algorithms usually select ...
Discriminative semi-supervised feature selection via manifold regularization
Feature selection has attracted a huge amount of interest in both research and application communities of data mining. We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount ...
Unsupervised Selective Labeling for More Effective Semi-supervised Learning
Computer Vision – ECCV 2022AbstractGiven an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right data ...
Comments