Skip to main content

Application of Rough Set-Based Characterisation of Attributes in Feature Selection and Reduction

  • Chapter
  • First Online:
Advances in Selected Artificial Intelligence Areas

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 24))

Abstract

Quality of predictions depends heavily on features that are chosen for a classification system to rely on. It is one of the reasons why approaches, focused on feature selection and reduction, play a significant role in data mining. Among all available attributes, these should be detected that are of the highest relevance and importance for a given task. This objective can be achieved by an application of one of feature ranking algorithms. Some of data exploration methods have their own inherent mechanisms dedicated to feature reduction, and decision reducts, defined within rough set theory, offer such option. The chapter presents research on application of reduct-based characterisation of features, employed to support classification by selected inducers working outside rough set domain. The problem to be solved comes from the field of stylometry. It is the study of writing styles with the main task of authorship attribution, while using characteristic features not of qualitative, but quantitative type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.gutenberg.org/.

References

  1. J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques (Morgan Kaufmann, 2011)

    Google Scholar 

  2. M. Dash, H. Liu, Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  3. U. Stańczyk, Relative reduct-based estimation of relevance for stylometric features, in Advances in Databases and Information Systems. ed. by B. Catania, G. Guerrini, J. Pokorny, LNCS, vol. 8133 (Springer, Berlin, 2013), pp. 135–147

    Google Scholar 

  4. L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

  5. J. Biesiada, W. Duch, A. Kachel, S. Pałucha, Feature ranking methods based on information entropy with Parzen windows, in Proceedings of International Conference on Research in Electrotechnology and Applied Informatics, Katowice, Poland (2005), pp. 109–119

    Google Scholar 

  6. I. Witten, E. Frank, M. Hall, Data Mining. Practical Machine Learning Tools and Techniques, 3rd edn. (Morgan Kaufmann, 2011)

    Google Scholar 

  7. Z. Pawlak, Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)

    Article  MathSciNet  Google Scholar 

  8. Z. Pawlak, A. Skowron, Rough sets and boolean reasoning. Inf. Sci. 177(1), 41–73 (2007)

    Article  MathSciNet  Google Scholar 

  9. U. Stańczyk, B. Zielosko, K. Żabiński, Application of greedy heuristics for feature characterisation and selection: a case study in stylometric domain, in Proceedings of the International Joint Conference on Rough Sets, IJCRS 2018. Volume 11103 of Lecture Notes in Computer Science, ed. by H. Nguyen, Q. Ha, T. Li, Przybyla-Kasperek, M. (Springer, Quy Nhon, Vietnam, 2018), pp. 350–362

    Google Scholar 

  10. D. Holmes, Authorship attribution. Comput. Hum. 28, 87–106 (1994). (April)

    Google Scholar 

  11. S. Argamon, K. Burns, S. Dubnov (eds.), The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning (Springer, Berlin, 2010)

    Google Scholar 

  12. H. Liu, H. Motoda, Computational Methods of Feature Selection. Data Mining and Knowledge Discovery Series (Chapman & Hall/Crc, 2007)

    Google Scholar 

  13. I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh (eds.), Feature Extraction: Foundations and Applications. Volume 207 of Studies in Fuzziness and Soft Computing (Physica-Verlag, Springer, 2006)

    Google Scholar 

  14. E. Mansoori, Using statistical measures for feature ranking. Int. J. Pattern Recognit. Artifficial Intell. 27(1), 1350003–14 (2013)

    Article  MathSciNet  Google Scholar 

  15. U. Stańczyk, Weighting attributes and decision rules through rankings and discretisation parameters, in Machine Learning Paradigms: Theory and Application. ed. by A.E. Hassanien (Springer International Publishing, Cham, 2019), pp. 25–43

    Chapter  Google Scholar 

  16. U. Stańczyk, RELIEF-based selection of decision rules. Procedia Comput. Sci. 35, 299–308 (2014)

    Article  Google Scholar 

  17. B. Zielosko, M. Piliszczuk, Greedy algorithm for attribute reduction. Fundam. Inform. 85(1–4), 549–561 (2008)

    MathSciNet  MATH  Google Scholar 

  18. M. Reif, F. Shafait, Efficient feature size reduction via predictive forward selection. Pattern Recognit. 47, 1664–1673 (2014)

    Article  Google Scholar 

  19. Z. Pawlak, A. Skowron, Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)

    Article  MathSciNet  Google Scholar 

  20. J.W. Grzymała-Busse, S.Y. Sedelow, W.A. Sedelow, Machine learning & knowledge acquisition, rough sets, and the english semantic code, in Rough Sets and Data Mining: Analysis of Imprecise Data. ed. by N. Cercone, T. Lin (Springer, Boston, 1997), pp. 91–107

    Chapter  Google Scholar 

  21. X. Jia, L. Shang, B. Zhou, Y. Yao, Generalized attribute reduct in rough set theory. Knowl.-Based Syst. 91, 204–218 (2016)

    Google Scholar 

  22. A. Janusz, D. Ślȩzak, Rough set methods for attribute clustering and selection. Appl. Artif. Intell. 28(3), 220–242 (2014)

    Article  Google Scholar 

  23. U. Stańczyk,, B. Zielosko, Assessing quality of decision reducts, in Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24rd International Conference KES-2020, Verona, Italy, 16-18 September 2020, ed. by M. Cristani, C. Toro, C. Zanni-Merk, R.J. Howlett, L.C. Jain. Volume 176 of Procedia Computer Science (Elsevier, 2020), pp. 3273–3282

    Google Scholar 

  24. B. Zielosko, U. Stańczyk, Reduct-based ranking of attributes, in Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24rd International Conference KES-2020, Verona, Italy, 16-18 September 2020, ed. by M. Cristani, C. Toro, C. Zanni-Merk, R.J. Howlett, L.C. Jain. Volume 176 of Procedia Computer Science. (Elsevier, 2020), pp. 2576–2585

    Google Scholar 

  25. F. Mosteller, D. Wallace, Inference in an authorship problem. J. Am. Stat. Assoc. 58(303), 275–309 (1963)

    Article  Google Scholar 

  26. J. Rybicki, M. Eder, D. Hoover, Computational stylistics and text analysis, in Doing Digital Humanities: Practice, Training, Research, ed. by C. Crompton, R. Lane, R. Siemens, 1st edn. (Routledge, 2016), pp. 123–144

    Google Scholar 

  27. L. Pearl, M. Steyvers, Detecting authorship deception: a supervised machine learning approach using author writeprints. Lit. Linguist. Comput. 27(2), 183–196 (2012)

    Article  Google Scholar 

  28. M. Koppel, J. Schler, S. Argamon, Authorship attribution: what’s easy and what’s hard? J. Law Policy 21(2), 317–331 (2013)

    Google Scholar 

  29. H. Baayen, H. van Haltern, F. Tweedie, Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)

    Article  Google Scholar 

  30. Y. Zhao, J. Zobel, Searching with style: authorship attribution in classic literature, in Proceedings of the Thirtieth Australasian Conference on Computer Science - Volume 62. ACSC ’07, Darlinghurst, Australia, Australian Computer Society, Inc. (2007), pp. 59–68

    Google Scholar 

  31. M. Koppel, J. Schler, S. Argamon, Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)

    Article  Google Scholar 

  32. E. Stamatatos, A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)

    Article  Google Scholar 

  33. D. Khmelev, F. Tweedie, Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)

    Article  Google Scholar 

  34. S. García, J. Luengo, J.A. Sáez, V. López, F. Herrera, A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)

    Article  Google Scholar 

  35. H. Liu, F. Hussain, C. Tan, M. Dash, Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  36. U. Stańczyk, B. Zielosko, G. Baron, Discretisation of conditions in decision rules induced for continuous data. PLOS ONE 15(40), 1–33 (2020)

    Google Scholar 

  37. Y. Yang, G.I. Webb, X. Wu, Discretization methods, in Data Mining and Knowledge Discovery Handbook. ed. by O. Maimon, L. Rokach (Springer, US, Boston, MA, 2005), pp. 113–130

    Google Scholar 

  38. U. Fayyad, K. Irani, Multi-interval discretization of continuous valued attributes for classification learning, in Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2 (Morgan Kaufmann Publishers, 1993), pp. 1022–1027

    Google Scholar 

  39. U. Stańczyk, Evaluating importance for numbers of bins in discretised learning and test sets, in Intelligent Decision Technologies 2017: Proceedings of the 9th KES International Conference on Intelligent Decision Technologies (KES-IDT 2017) – Part II. Volume 72 of Smart Innovation, Systems and Technologies, ed. by I. Czarnowski, J.R. Howlett, C.L. Jain (Springer International Publishing, 2018), pp. 159–169

    Google Scholar 

  40. S.G. Weidman, J. O’Sullivan, The limits of distinctive words: re-evaluating literature’s gender marker debate. Digit. Sch. Hum. 33, 374–390 (2018)

    Google Scholar 

  41. U. Stańczyk, The class imbalance problem, in construction of training datasets for authorship attribution, in Man-Machine Interactions 4. ed. by A. Gruca, A. Brachman, S. Kozielski, T. Czachórski, AISC, vol. 391 (Springer, Berlin, 2016), pp. 535–547

    Google Scholar 

  42. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  43. J. Bazan, M. Szczuka, The rough set exploration system, in Transactions on Rough Sets III, ed. by J.F. Peters, A. Skowron. Lecture Notes in Computer Science, vol. 3400 (Springer, Berlin, 2005), pp. 37–56

    Google Scholar 

  44. S. Theodoridis, K. Koutroumbas, Pattern Recognit, 4 edn. (Academic Press, 2008)

    Google Scholar 

  45. G. Baron, Analysis of multiple classifiers performance for discretized data in authorship attribution, in Intelligent Decision Technologies 2017: Proceedings of the 9th KES International Conference on Intelligent Decision Technologies (KES-IDT 2017) – Part II. Volume 73 of Smart Innovation, Systems and Technologies, ed. by I. Czarnowski, J.R. Howlett, C.L. Jain (Springer International Publishing, 2018), pp. 33–42

    Google Scholar 

  46. G. Baron, Influence of data discretization on efficiency of Bayesian Classifier for authorship attribution. Procedia Comput. Sci. 35, 1112–1121 (2014); Knowledge-Based and Intelligent Information & Engineering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September 2014 Proceedings

    Google Scholar 

  47. J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993)

    Google Scholar 

  48. D.M. Farid, L. Zhang, C.M. Rahman, M. Hossain, R. Strachan, Hybrid decision tree and Naive Bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4, Part 2), 1937–1946 (2014)

    Google Scholar 

  49. K. Sta̧por, Evaluation of classifiers: current methods and future research directions, in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS). Volume 13 of ACSIS (2017), pp. 37–40

    Google Scholar 

Download references

Acknowledgements

The research works presented in the chapter were performed within the statutory project of the Department of Graphics, Computer Vision and Digital Systems (RAU-6, 2021), at the Silesian University of Technology, Gliwice, Poland.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Stańczyk, U. (2022). Application of Rough Set-Based Characterisation of Attributes in Feature Selection and Reduction. In: Virvou, M., Tsihrintzis, G.A., Jain, L.C. (eds) Advances in Selected Artificial Intelligence Areas. Learning and Analytics in Intelligent Systems, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-030-93052-3_3

Download citation

Publish with us

Policies and ethics