Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification

Thirumoorthy, K; Muneeswaran, K

doi:10.1007/s12046-020-01443-w

Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification

Published: 08 August 2020

Volume 45, article number 201, (2020)
Cite this article

Sādhanā Aims and scope Submit manuscript

317 Accesses
23 Citations
Explore all metrics

Abstract

Feature selection is an important task in the high-dimensional problem of text classification. Nowadays most of the feature selection methods use the significance of optimization algorithm to select an optimal subset of feature from the high-dimensional feature space. Optimal feature subset reduces the computation cost and increases the text classifier accuracy. In this paper, we have proposed a new hybrid feature selection method based on normalized difference measure and binary Jaya optimization algorithm (NDM-BJO) to obtain the appropriate subset of optimal features from the text corpus. We have used the error rate as a minimizing objective function to measure the fitness of a solution. The nominated optimal feature subsets are evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The observed results have confirmed that the proposed work NDM-BJO shows auspicious improvements compared with existing work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Article 21 June 2023

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

A Comparative Study of Recent Feature Selection Techniques Used in Text Classification

References

Drucker H, Donghui W and Vapnik V N 1999 Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10: 1048–1054
Article Google Scholar
Guzella T S and Caminhas W M 2009 A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36: 10206–10222
Article Google Scholar
Günal S, Ergin S, Gülmezoğlu M B and Gerek Ö N 2006 On feature extraction for spam E-Mail detection. In: Proceedings of the Conference on Multimedia Content Representation, Classification and Security (MRCS 2006), Springer, Berlin, pp. 635–642
Yu B and Dong-hua Z 2009 Combining neural networks and semantic feature space for Email classification. Knowl.-Based Syst. 22: 376–381
Anagnostopoulos I, Anagnostopoulos C, Loumos V and Kayafas E 2004 Classifying web pages employing a probabilistic neural network. IEE Proc. Softw. 151: 139–150
Article Google Scholar
Chen R C and Hsieh C H 2006 Web page classification based on a support vector machine using a weighted vote schema. Expert Syst. Appl. 31: 427–435
Article Google Scholar
Cheng N, Chandramouli R and Subbalakshmi K P 2011 Author gender identification from text. Digit. Invest. 8: 78–88
Article Google Scholar
Forman G 2003 An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3: 1289–1305
MATH Google Scholar
Gunal S and Edizkan R 2008 Subspace based feature selection for pattern recognition. Inf. Sci. 178: 3716–3726
Article Google Scholar
Guyon I and Elisseeff A 2003 An introduction to variable and feature selection. J. Mach. Learn. Res. 3: 1157–1182.
MATH Google Scholar
Wenqian S, Houkuan H, Haibin Z, Yongmin L, Youli Q and Zhihai W 2007 A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33: 1–5
Article Google Scholar
Gunal S, Omer G, Ece D G and Rifat E 2009 The search for optimal feature set in power quality event classification. Expert Syst. Appl. 36: 10266–10273
Article Google Scholar
Gunal S 2012 Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comput. Sci. 20: 1296–1311
Google Scholar
Hoque N, Mihir S and Bhattacharyya D K 2018 EFS-MI: an ensemble feature selection method for classification. Complex Intell. Syst. 4: 105–118
Article Google Scholar
Xu Y, Wang B, Li J and Jing H 2008 An extended document frequency metric for feature selection in text categorization. In: Proceedings of the Information Retrieval Technology Symposium (AIRS 2008), Springer, Berlin, pp. 71–82
Lee C and Lee G G 2006 Information gain and divergence-based feature selection for machine learning-based text categorization. Inf. Process. Manag. 42: 155–165
Article Google Scholar
Liu H, Jigui S, Lei L and Huijie Z 2009 Feature selection with dynamic mutual information. Pattern Recognit. 42: 1330–1339
Article Google Scholar
Vergara J R and Estévez P A 2014 A review of feature selection methods based on mutual information. Neural Comput. Appl. 24: 175–186
Article Google Scholar
Uysal A K and Gunal S 2012 A novel probabilistic feature selection method for text classification. Knowl.-Based Syst. 36: 226–235
Article Google Scholar
Ogura H, Hiromi A and Masato K 2009 Feature selection with a measure of deviations from Poisson in text categorization. Expert Syst. Appl. 36: 6826–6832
Article Google Scholar
Kohavi R and John G H 1997 Wrappers for feature subset selection. Artif. Intell. 97: 273–324
Article Google Scholar
Yang J, Yuanning L, Zhen L, Xiaodong Z and Xiaoxu Z 2011 A new feature selection algorithm based on binomial hypothesis testing for spam filtering. Knowl.-Based Syst. 24: 904–914
Article Google Scholar
Tan F, Xuezheng F, Yanqing Z and Bourgeois A G 2008 A genetic algorithm-based method for feature subset selection. Soft Comput. 12: 111–120
Article Google Scholar
Yang J and Honavar V 1998 Feature subset selection using a genetic algorithm. IEEE Intell. Syst. 13: 44–49
Article Google Scholar
Marie-Sainte S L and Alalyani N 2018 Firefly algorithm based feature selection for Arabic text classification. J. King Saud Univ. Comput. Inf. Sci. 10: 1016–1025
Google Scholar
Mesleh A M and Kanaan G G 2008 Support vector machine text classification system: using ant colony optimization based feature subset selection. In: Proceedings of the International Conference on Computer Engineering & Systems, Cairo, Egypt, IEEE, pp. 143–148
Raho G, Al-Shalabi R, Ghassan K and Asmaa N 2015 Different classification algorithms based on Arabic text classification: feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. 6: 192–195
Google Scholar
Banati H and Bajaj M 2011 Firefly based feature selection approach. Int. J. Comput. Sci. 4: 273–280
Google Scholar
Abdur R, Kashif J and Haroon A B 2017 Feature selection based on a normalized difference measure for text classification. Inf. Process. Manag. 53: 473–489
Article Google Scholar
Rao R V 2016 Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7: 19–34
Google Scholar
Mishra S and Ray P K 2016 Power quality improvement using photovoltaic fed DSTATCOM based on JAYA optimization. IEEE Trans. Sustain. Energy 7: 1672–1680
Article Google Scholar
Sinha R K and Ghosh S 2007 Jaya based ANFIS for monitoring of two class motor imagery task. IEEE Access 4: 9273–9282
Google Scholar
Chen J, Houkuan H, Shengfeng T and Youli Q 2009 Feature selection for text classification with Naıve Bayes. Expert Syst. Appl. 36: 5432–5435
Article Google Scholar
Sang-Bum K, Kyoung-Soo H, Hae-Chang R and Sung H M 2006 Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18: 1457–1466
Article Google Scholar
Hsu C W, and Lin C J 2002 A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13: 415–425
Article Google Scholar
Huang C L and Wang C J 2006 A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 31: 231–240
Article Google Scholar
Kumar M A and Gopal M 2010 A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognit. Lett. 31: 1437–1444
Article Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and advice in improving this work. Also, we would like to thank the Management and Principal of Mepco Schlenk Engineering College (Autonomous), Sivakasi, for providing us the state of art facilities to carry out this proposed research work in the Mepco Research Centre in collaboration with Anna University Chennai, Tamil Nadu, India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, 626005, India
K Thirumoorthy & K Muneeswaran

Authors

K Thirumoorthy
View author publications
You can also search for this author in PubMed Google Scholar
K Muneeswaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K Thirumoorthy.

Nomenclature

i :: solution index
j :: position index
t :: iteration/generation index
f :: number of features to be selected
\(T_{max}\) :: maximum number of iterations/generations
\(\Psi _{i}\) :: \(i^{\mathrm{th}}\) solution
\(\Psi _{i,j}\) :: \(j^{\mathrm{th}}\) position of solution \(\Psi _{i}\)
\(\Psi _{i}^{(t)}\) :: \(i^{\mathrm{th}}\) solution \(\Psi _{i}\) of iteration/generation t
\(\Psi _{i}^{fitness}\) :: fitness value of solution \(\Psi _{i}\)
\(\Psi _{best}\) :: best solution
\(\Psi _{best}^{fitness}\) :: fitness value of best solution
\(\Psi _{worst}\) :: worst solution
\(\Psi _{worst}^{fitness}\) :: fitness value of worst solution
\(\alpha , \beta \) :: random numbers in [0,1]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thirumoorthy, K., Muneeswaran, K. Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sādhanā 45, 201 (2020). https://doi.org/10.1007/s12046-020-01443-w

Download citation

Received: 23 September 2019
Revised: 28 April 2020
Accepted: 26 May 2020
Published: 08 August 2020
DOI: https://doi.org/10.1007/s12046-020-01443-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification

Abstract

Access this article

Similar content being viewed by others

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

A Comparative Study of Recent Feature Selection Techniques Used in Text Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Nomenclature

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification

Abstract

Access this article

Similar content being viewed by others

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

A Comparative Study of Recent Feature Selection Techniques Used in Text Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Nomenclature

Nomenclature

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation