Abstract
A noise-aware version of support vector machines is utilized for feature selection in this paper. Combining this method and sequential backward search (SBS), a new algorithm for removing irrelevant features is proposed. Although feature selection methods in the literature which utilize support vector machines have provided acceptable results, noisy samples and outliers may affect the performance of SVM and feature selections method, consequently. Recently, we have proposed relaxed constrains SVM (RSVM) which handles noisy data and outliers. Each training sample in RSVM is associated with a degree of importance utilizing the fuzzy c-means clustering method. Therefore, a less importance degree is assigned to noisy data and outliers. Moreover, RSVM has more relaxed constraints that can reduce the effect of noisy samples. Feature selection increases the accuracy of different machine learning applications by eliminating noisy and irrelevant features. In the proposed RSVM-SBS feature selection algorithm, noisy data have small effect on eliminating irrelevant features. Experimental results using real-world data verify that RSVM-SBS has better results in comparison with other feature selection approaches utilizing support vector machines.
Similar content being viewed by others
References
Abdar M, Makarenkov V (2019) CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Measurement 146:557–570
Abdoos A, Khorshidian Mianaei P, Rayatpanah Ghadikolaei M (2016) Combined VMD-SVM based feature selection method for classification of power quality events. Appl Soft Comput 38:637–646
Aljarah I, Al-Zoubi A, Haris F, Hassonah M, Mirjalili S, Saadeh H (2018) Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cognit Comput 10:478–495
Benítez-Peña S, Blanquero R, Carrizosa E, Ramírez-Cobo P (2019) Cost-sensitive feature selection for support vector machines. Comput Oper Res 106:169–178
Blake C, Keogh E, Merz CJ (1998) UCI repository of machine learning databases. Technical report, Department of Information and Computer Science, University of California, Irvine, CA, http://www.ics.uci.edu/∼mlearn/MLRepository.htm
Blum A, Langley PP (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:13–156
Duda PEHRO, Stork DG (2001) Pattern classification. Wiley-Interscience Publication, Hoboken
Faris H, Al-Zoubi A, Heidari A, Aljarah I, Mafarja M, Hassonah M, Fujita H (2019) An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf Fusion 48:67–83
GhasemiGol M, Sabzekar M, Monsefi R, Naghibzadeh M, Sadoghi Yazdi H (2010) Support vector data description with fuzzy constraints. In: First international conference on intelligent systems, modelling and simulation (ISMS), pp 10–14, Liverpool, England
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 140:103–119
He Q, Wu C (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15:1105–1114
Kursa M, Rudnicki W (2010) Feature selection with the Boruta package. J Stat Softw 36:1–13
Liu Y, Zheng YF (2006) FS-SFS: a novel feature selection method for support vector machines. Pattern Recogn 39:1333–1345
Lu M (2019) Embedded feature selection accounting for unknown data heterogeneity. Expert Syst Appl 119:350–361
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Mafarja M, Aljarah I, Heidari A, Hammouri A, Faris H, Al-Zoubi A, Mirjalali S (2018) Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowl Based Syst 145:25–45
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179:2208–2217
Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181:115–128
Maldonadoa S, López J (2018) Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Appl Soft Comput 67:94–105
Marill T, Green DM (1963) On the effectiveness of receptors in recognition system. IEEE Trans Inf Theory 9:11–17
Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):31–37
Nasiri JA, Sabzekar M, Sadoghi Yazdi H, Naghibzadeh M, Naghibzadeh B (2009) Intelligent arrhythmia detection using genetic algorithm and emphatic SVM (ESVM). In: Third UKSim European symposium on computer modeling and simulation (EMS), pp 112–117, Athens, Greece
Neumann J, Schnorr C, Steidl G (2005) Combined SVM-based feature selection and classification. Mach Learn 61:129–150
Pławiak P, Abdar M, Acharya UR (2019) Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Appl Soft Comput 84:105740
Sabzekar M, Naghibzadeh M (2013a) Fuzzy c-means improvement using relaxed constraints support vector machines. Appl Soft Comput 13:881–890
Sabzekar M, Naghibzadeh M (2013b) Fuzzy c-means improvement using relaxed constraints support vector machines. Appl Soft Comput 13(2):881–890
Sabzekar M, Naghibzadeh M, Sadoghi Yazdi H, Effati S (2009) Emphatic constraints support vector machines for multiclass classification. In: Third UKSim European symposium on computer modeling and simulation (EMS), pp 118–123, Athens, Greece
Sabzekar M, Sadoghi Yazdi H, Naghibzadeh M (2011) Relaxed constraints support vector machines for noisy data. Neural Comput Appl 20:671–685
Sabzekar M, Hossein Yaghmaee Moghaddam M, Naghibzadeh M (2013) TCP traffic classification using relaxed constraints support vector machines. In: Fathi M (ed) Integration of practice-oriented knowledge technology: trends and prospectives. ISBN 978–3–642–34470–1, pp 129–141
Shieh MD, Yang CC (2008) Multiclass SVM-RFE for product form feature selection. Expert Syst Appl 35:531–541
Tang W (2010) Feature selection using hybrid Taguchi genetic algorithm and fuzzy support vector machine. In: Sixth international conference on natural computation, pp 2348–2355
Torres-Valencia C, Álvarez-López M, Orozco-Gutiérrez Á (2017) SVM-based feature selection methods for emotion recognition from multimodal data. J Multimodal User Interfaces 11:9–23
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Vapnik V (1998) Statistical learning theory. Wiley, New York
Xia H (2008) Feature selection based on fuzzy SVM. In: Fifth international conference on fuzzy systems and knowledge discovery (FSKD), vol 1, pp 586–589
Xia H, Hu BQ (2006) Feature selection using fuzzy support vector machines. Fuzzy Optim Decis Mak 5:187–192
Xiong W, Wang C (2008) Feature selection: a hybrid approach based on self-adaptive ant colony and support vector machine. In: International conference on computer science and software engineering, pp 751–754
Yan C, Ma J, Luo H, Patel A (2019) Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 184:102–111
Zakeri A, Hokmabadi A (2019) Efficient feature selection method using real-valued grasshopper optimization algorithm. Expert Syst Appl 119:61–72
Zaman S, Karray F (2009) Features selection for intrusion detection systems based on support vector machines. In: 2009 6th IEEE consumer communications and networking conference, pp 1–8
Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recogn 77:20–29
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sabzekar, M., Aydin, Z. A noise-aware feature selection approach for classification. Soft Comput 25, 6391–6400 (2021). https://doi.org/10.1007/s00500-021-05630-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05630-7