Application of Rough Set-Based Characterisation of Attributes in Feature Selection and Reduction

Stańczyk, Urszula

doi:10.1007/978-3-030-93052-3_3

Urszula Stańczyk⁷

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 24))

438 Accesses
2 Citations

Abstract

Quality of predictions depends heavily on features that are chosen for a classification system to rely on. It is one of the reasons why approaches, focused on feature selection and reduction, play a significant role in data mining. Among all available attributes, these should be detected that are of the highest relevance and importance for a given task. This objective can be achieved by an application of one of feature ranking algorithms. Some of data exploration methods have their own inherent mechanisms dedicated to feature reduction, and decision reducts, defined within rough set theory, offer such option. The chapter presents research on application of reduct-based characterisation of features, employed to support classification by selected inducers working outside rough set domain. The problem to be solved comes from the field of stylometry. It is the study of writing styles with the main task of authorship attribution, while using characteristic features not of qualitative, but quantitative type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.gutenberg.org/.

References

J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques (Morgan Kaufmann, 2011)
Google Scholar
M. Dash, H. Liu, Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Article Google Scholar
U. Stańczyk, Relative reduct-based estimation of relevance for stylometric features, in Advances in Databases and Information Systems. ed. by B. Catania, G. Guerrini, J. Pokorny, LNCS, vol. 8133 (Springer, Berlin, 2013), pp. 135–147
Google Scholar
L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
MathSciNet MATH Google Scholar
J. Biesiada, W. Duch, A. Kachel, S. Pałucha, Feature ranking methods based on information entropy with Parzen windows, in Proceedings of International Conference on Research in Electrotechnology and Applied Informatics, Katowice, Poland (2005), pp. 109–119
Google Scholar
I. Witten, E. Frank, M. Hall, Data Mining. Practical Machine Learning Tools and Techniques, 3rd edn. (Morgan Kaufmann, 2011)
Google Scholar
Z. Pawlak, Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)
Article MathSciNet Google Scholar
Z. Pawlak, A. Skowron, Rough sets and boolean reasoning. Inf. Sci. 177(1), 41–73 (2007)
Article MathSciNet Google Scholar
U. Stańczyk, B. Zielosko, K. Żabiński, Application of greedy heuristics for feature characterisation and selection: a case study in stylometric domain, in Proceedings of the International Joint Conference on Rough Sets, IJCRS 2018. Volume 11103 of Lecture Notes in Computer Science, ed. by H. Nguyen, Q. Ha, T. Li, Przybyla-Kasperek, M. (Springer, Quy Nhon, Vietnam, 2018), pp. 350–362
Google Scholar
D. Holmes, Authorship attribution. Comput. Hum. 28, 87–106 (1994). (April)
Google Scholar
S. Argamon, K. Burns, S. Dubnov (eds.), The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning (Springer, Berlin, 2010)
Google Scholar
H. Liu, H. Motoda, Computational Methods of Feature Selection. Data Mining and Knowledge Discovery Series (Chapman & Hall/Crc, 2007)
Google Scholar
I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh (eds.), Feature Extraction: Foundations and Applications. Volume 207 of Studies in Fuzziness and Soft Computing (Physica-Verlag, Springer, 2006)
Google Scholar
E. Mansoori, Using statistical measures for feature ranking. Int. J. Pattern Recognit. Artifficial Intell. 27(1), 1350003–14 (2013)
Article MathSciNet Google Scholar
U. Stańczyk, Weighting attributes and decision rules through rankings and discretisation parameters, in Machine Learning Paradigms: Theory and Application. ed. by A.E. Hassanien (Springer International Publishing, Cham, 2019), pp. 25–43
Chapter Google Scholar
U. Stańczyk, RELIEF-based selection of decision rules. Procedia Comput. Sci. 35, 299–308 (2014)
Article Google Scholar
B. Zielosko, M. Piliszczuk, Greedy algorithm for attribute reduction. Fundam. Inform. 85(1–4), 549–561 (2008)
MathSciNet MATH Google Scholar
M. Reif, F. Shafait, Efficient feature size reduction via predictive forward selection. Pattern Recognit. 47, 1664–1673 (2014)
Article Google Scholar
Z. Pawlak, A. Skowron, Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)
Article MathSciNet Google Scholar
J.W. Grzymała-Busse, S.Y. Sedelow, W.A. Sedelow, Machine learning & knowledge acquisition, rough sets, and the english semantic code, in Rough Sets and Data Mining: Analysis of Imprecise Data. ed. by N. Cercone, T. Lin (Springer, Boston, 1997), pp. 91–107
Chapter Google Scholar
X. Jia, L. Shang, B. Zhou, Y. Yao, Generalized attribute reduct in rough set theory. Knowl.-Based Syst. 91, 204–218 (2016)
Google Scholar
A. Janusz, D. Ślȩzak, Rough set methods for attribute clustering and selection. Appl. Artif. Intell. 28(3), 220–242 (2014)
Article Google Scholar
U. Stańczyk,, B. Zielosko, Assessing quality of decision reducts, in Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24rd International Conference KES-2020, Verona, Italy, 16-18 September 2020, ed. by M. Cristani, C. Toro, C. Zanni-Merk, R.J. Howlett, L.C. Jain. Volume 176 of Procedia Computer Science (Elsevier, 2020), pp. 3273–3282
Google Scholar
B. Zielosko, U. Stańczyk, Reduct-based ranking of attributes, in Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24rd International Conference KES-2020, Verona, Italy, 16-18 September 2020, ed. by M. Cristani, C. Toro, C. Zanni-Merk, R.J. Howlett, L.C. Jain. Volume 176 of Procedia Computer Science. (Elsevier, 2020), pp. 2576–2585
Google Scholar
F. Mosteller, D. Wallace, Inference in an authorship problem. J. Am. Stat. Assoc. 58(303), 275–309 (1963)
Article Google Scholar
J. Rybicki, M. Eder, D. Hoover, Computational stylistics and text analysis, in Doing Digital Humanities: Practice, Training, Research, ed. by C. Crompton, R. Lane, R. Siemens, 1st edn. (Routledge, 2016), pp. 123–144
Google Scholar
L. Pearl, M. Steyvers, Detecting authorship deception: a supervised machine learning approach using author writeprints. Lit. Linguist. Comput. 27(2), 183–196 (2012)
Article Google Scholar
M. Koppel, J. Schler, S. Argamon, Authorship attribution: what’s easy and what’s hard? J. Law Policy 21(2), 317–331 (2013)
Google Scholar
H. Baayen, H. van Haltern, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)
Article Google Scholar
Y. Zhao, J. Zobel, Searching with style: authorship attribution in classic literature, in Proceedings of the Thirtieth Australasian Conference on Computer Science - Volume 62. ACSC ’07, Darlinghurst, Australia, Australian Computer Society, Inc. (2007), pp. 59–68
Google Scholar
M. Koppel, J. Schler, S. Argamon, Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)
Article Google Scholar
E. Stamatatos, A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)
Article Google Scholar
D. Khmelev, F. Tweedie, Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)
Article Google Scholar
S. García, J. Luengo, J.A. Sáez, V. López, F. Herrera, A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
Article Google Scholar
H. Liu, F. Hussain, C. Tan, M. Dash, Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
U. Stańczyk, B. Zielosko, G. Baron, Discretisation of conditions in decision rules induced for continuous data. PLOS ONE 15(40), 1–33 (2020)
Google Scholar
Y. Yang, G.I. Webb, X. Wu, Discretization methods, in Data Mining and Knowledge Discovery Handbook. ed. by O. Maimon, L. Rokach (Springer, US, Boston, MA, 2005), pp. 113–130
Google Scholar
U. Fayyad, K. Irani, Multi-interval discretization of continuous valued attributes for classification learning, in Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2 (Morgan Kaufmann Publishers, 1993), pp. 1022–1027
Google Scholar
U. Stańczyk, Evaluating importance for numbers of bins in discretised learning and test sets, in Intelligent Decision Technologies 2017: Proceedings of the 9th KES International Conference on Intelligent Decision Technologies (KES-IDT 2017) – Part II. Volume 72 of Smart Innovation, Systems and Technologies, ed. by I. Czarnowski, J.R. Howlett, C.L. Jain (Springer International Publishing, 2018), pp. 159–169
Google Scholar
S.G. Weidman, J. O’Sullivan, The limits of distinctive words: re-evaluating literature’s gender marker debate. Digit. Sch. Hum. 33, 374–390 (2018)
Google Scholar
U. Stańczyk, The class imbalance problem, in construction of training datasets for authorship attribution, in Man-Machine Interactions 4. ed. by A. Gruca, A. Brachman, S. Kozielski, T. Czachórski, AISC, vol. 391 (Springer, Berlin, 2016), pp. 535–547
Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
J. Bazan, M. Szczuka, The rough set exploration system, in Transactions on Rough Sets III, ed. by J.F. Peters, A. Skowron. Lecture Notes in Computer Science, vol. 3400 (Springer, Berlin, 2005), pp. 37–56
Google Scholar
S. Theodoridis, K. Koutroumbas, Pattern Recognit, 4 edn. (Academic Press, 2008)
Google Scholar
G. Baron, Analysis of multiple classifiers performance for discretized data in authorship attribution, in Intelligent Decision Technologies 2017: Proceedings of the 9th KES International Conference on Intelligent Decision Technologies (KES-IDT 2017) – Part II. Volume 73 of Smart Innovation, Systems and Technologies, ed. by I. Czarnowski, J.R. Howlett, C.L. Jain (Springer International Publishing, 2018), pp. 33–42
Google Scholar
G. Baron, Influence of data discretization on efficiency of Bayesian Classifier for authorship attribution. Procedia Comput. Sci. 35, 1112–1121 (2014); Knowledge-Based and Intelligent Information & Engineering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September 2014 Proceedings
Google Scholar
J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993)
Google Scholar
D.M. Farid, L. Zhang, C.M. Rahman, M. Hossain, R. Strachan, Hybrid decision tree and Naive Bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4, Part 2), 1937–1946 (2014)
Google Scholar
K. Sta̧por, Evaluation of classifiers: current methods and future research directions, in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS). Volume 13 of ACSIS (2017), pp. 37–40
Google Scholar

Download references

Acknowledgements

The research works presented in the chapter were performed within the statutory project of the Department of Graphics, Computer Vision and Digital Systems (RAU-6, 2021), at the Silesian University of Technology, Gliwice, Poland.

Author information

Authors and Affiliations

Faculty of Automatic Control, Electronics and Computer Science, Department of Graphics, Computer Vision and Digital Systems, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Urszula Stańczyk

Authors

Urszula Stańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Piraeus, Greece
Maria Virvou
Department of Informatics, University of Piraeus, Piraeus, Greece
George A. Tsihrintzis
KES International, Shoreham-By-Sea, UK
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stańczyk, U. (2022). Application of Rough Set-Based Characterisation of Attributes in Feature Selection and Reduction. In: Virvou, M., Tsihrintzis, G.A., Jain, L.C. (eds) Advances in Selected Artificial Intelligence Areas. Learning and Analytics in Intelligent Systems, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-030-93052-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-93052-3_3
Published: 27 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93051-6
Online ISBN: 978-3-030-93052-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Application of Rough Set-Based Characterisation of Attributes in Feature Selection and Reduction