Heart Disease Prediction Integrating UMAP and XGBoost
Ayushi1, Shilpa Sethi2, Jyoti3

1Ayushi*, Department of Computer Science, J. C. Bose University of Science and Technology, YMCA, Faridabad, India.
2Shilpa Sethi, Department of Computer Applications, J. C. Bose University of Science and Technology, YMCA, Faridabad, India.
3Jyoti, Department of Computer Science, J. C. Bose University of Science and Technology, YMCA, Faridabad, India. 

Manuscript received on April 30, 2020. | Revised Manuscript received on May 06, 2020. | Manuscript published on May 30, 2020. | PP: 2449-2457 | Volume-9 Issue-1, May 2020. | Retrieval Number: A2961059120/2020©BEIESP | DOI: 10.35940/ijrte.A2961.059120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The healthcare industry is flooded with the plethora of data about the patients which is supplemented each day in the form of medical records. Researchers have been putting in various efforts to bring this data into usage for the prediction of various diseases. Prediction of heart diseases is one such area. Data mining algorithms have been at the centre of improving the prediction of accuracy of heart diseases. But it has been found that these algorithms are not using adequate set of attributes for prediction that sometimes may lead to wrong predictions. The aim of this paper is to deploy the right set of algorithms to accurately predict the heart diseases and help both the patient and the doctor. The paper thrives to put UMAP and XGBoost techniques in this regard and exploit the advantages of both techniques. UMAP helps in dimensionality reduction without loss of useful data while XGBoost uses parallelization for tree construction reducing the time required to get the results. The experiment is carried on real data taken from Fortis Escorts, Faridabad, India. The results are compared with existing techniques such as Naïve Bayes, Decision Tree model, Logistic Regression model and Support Vector Machine (SVM) model based on various parameters such as accuracy, recall and precision. Remarkable accuracy of 94.59%, recall of 87.87, precision of 100 has been achieved. 
Keywords: Classification algorithms, Ensemble Techniques, PCA, UMAP, XG Boost.
Scope of the Article: Classification