Evaluating Feature Selection for SVMs in High Dimensions

Nilsson, Roland; Peña, José M.; Björkegren, Johan; Tegnér, Jesper

doi:10.1007/11871842_72

Roland Nilsson²¹,
José M. Peña²¹,
Johan Björkegren²² &
…
Jesper Tegnér²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

5562 Accesses
10 Citations

Abstract

We perform a systematic evaluation of feature selection (FS) methods for support vector machines (SVMs) using simulated high- dimensional data (up to 5000 dimensions). Several findings previously reported at low dimensions do not apply in high dimensions. For example, none of the FS methods investigated improved SVM accuracy, indicating that the SVM built-in regularization is sufficient. These results were also validated using microarray data. Moreover, all FS methods tend to discard many relevant features. This is a problem for applications such as microarray data analysis, where identifying all biologically important features is a major objective.

Download to read the full chapter text

Chapter PDF

Feature selection for high-dimensional data

Article 15 February 2016

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

Selecting Features with SVM

References

Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Article MATH Google Scholar
Dougherty, E.R.: The fundamental role of pattern recognition for the gene-expression/microarray data in bioinformatics. Pattern Recognition 38, 2226–2228 (2005)
Article Google Scholar
Golub, T.R., et al.: Molecular classifiation of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Article MATH Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Article Google Scholar
Davies, S., Russel, S.: NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI fall symposium on relevance, pp. 37–39. AAAI Press, Menlo Park (1994)
Google Scholar
Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Fung, G., Mangasarian, O.L.: A feature selection newton method for support vector machine classification. Computational Optimization and Applications 28, 185–202 (2004)
Article MATH MathSciNet Google Scholar
Weston, J., et al.: Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439–1461 (2003)
Article MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Article Google Scholar
Keerthi, S.S.: Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks 13(5), 1225–1229 (2002)
Article Google Scholar
Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS 99(10), 6562–6566 (2002)
Article MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc., Chichester (1998)
MATH Google Scholar
Perkins, S., et al.: Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research 3, 1333–1356 (2003)
Article MATH MathSciNet Google Scholar
Statnikov, A., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)
Article Google Scholar
Speed, T. (ed.): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall, Boca Raton (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

IFM Computational Biology, Linköping University, SE58183, Linköping, Sweden
Roland Nilsson, José M. Peña & Jesper Tegnér
Gustav V Research Institute, Karolinska Institute, SE17177, Stockholm, Sweden
Johan Björkegren

Authors

Roland Nilsson
View author publications
You can also search for this author in PubMed Google Scholar
José M. Peña
View author publications
You can also search for this author in PubMed Google Scholar
Johan Björkegren
View author publications
You can also search for this author in PubMed Google Scholar
Jesper Tegnér
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nilsson, R., Peña, J.M., Björkegren, J., Tegnér, J. (2006). Evaluating Feature Selection for SVMs in High Dimensions. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_72

Download citation

DOI: https://doi.org/10.1007/11871842_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Feature Selection for SVMs in High Dimensions

Abstract

Chapter PDF

Similar content being viewed by others

Feature selection for high-dimensional data

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

Selecting Features with SVM

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluating Feature Selection for SVMs in High Dimensions

Abstract

Chapter PDF

Similar content being viewed by others

Feature selection for high-dimensional data

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

Selecting Features with SVM

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation