Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set
Gururaj T.1, Vishrutha Y. M.2, Uma M.3, Rajeshwari D.4, Ramya B. K.5

1Gururaj T., Associate Professor, Department of CSE, S J M Institute of Technology, Chitradurga, India. Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India.
2Vishrutha Y. M., Under Graduate Students, B.E., Department of CSE, S J M Institute of Technology, Chitradurga, India. Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India.
3Uma M., Under Graduate Students, B.E., Department of CSE, S J M Institute of Technology, Chitradurga, India. Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India.
4Rajeshwari D., Under Graduate Students, B.E., Department of CSE, S J M Institute of Technology, Chitradurga, India. Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India.
5Ramya B. K., Under Graduate Students, B.E., Department of CSE, S J M Institute of Technology, Chitradurga, India. Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 20, 2020. | Manuscript published on March 30, 2020. | PP: 1623-1630 | Volume-8 Issue-6, March 2020. | Retrieval Number: F7879038620/2020©BEIESP | DOI: 10.35940/ijrte.F7879.038620

Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient’s datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn’t learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.
Keywords: Cancer Prediction, Decision Rules, Data Encoding, Random Forest Algorithm, Lung Cancer, Supervised Learning.
Scope of the Article: Machine Learning.