A Novel Categorical Data Attribute Split Technique in Decision Tree Learning
D. Mabuni

D. Mabuni*, Department of Computer Science, Dravidian University, Kuppam, India. 

Manuscript received on April 30, 2020. | Revised Manuscript received on May 06, 2020. | Manuscript published on May 30, 2020. | PP: 1607-1612 | Volume-9 Issue-1, May 2020. | Retrieval Number: A2568059120/2020©BEIESP | DOI: 10.35940/ijrte.A2568.059120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: A new technique is proposed for splitting categorical data during the process of decision tree learning. This technique is based on the class probability representations and manipulations of the class labels corresponding to the distinct values of categorical attributes. For each categorical attribute aggregate similarity in terms of class probabilities is computed and then based on the highest aggregated similarity measure the best attribute is selected and then the data in the current node of the decision tree is divided into the number of sub sets equal to the number of distinct values of the best categorical split attribute. Many experiments are conducted using this proposed method and the results have shown that the proposed technique is better than many other competitive methods in terms of efficiency, ease of use, understanding, and output results and it will be useful in many modern applications. 
Keywords: Aggregated similarity measure and class probability representations, categorical split attribute, decision tree learning, splitting categorical data.
Scope of the Article: Machine Learning