Validation of Machine Learning Models for Health Insurance Risks Assessment
Amrik Singh1, K R Ramkumar2

1Amrik Singh*, Department of Computer Science Engineering, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.
2K R Ramkumar, Department of Computer Science Engineering, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.
Manuscript received on September 22, 2019. | Revised Manuscript received on October 20, 2019. | Manuscript published on October 30, 2019. | PP: 4247-4256 | Volume-9 Issue-1, October 2019 | Retrieval Number: A1670109119/2019©BEIESP | DOI: 10.35940/ijeat.A1670.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: A universal healthcare policy success is impossible without the use of insurance instruments. The healthcare and insurance industries are on the verge of integrating seamlessly with the help of sensors and algorithms. This research work focuses on validating an algorithm that can help to model and classify health insurance risk data. Six algorithms Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) were evaluated and objective validation of these algorithms has been demonstrated. To maintain the replicability of the study the data and code are available in public repository. From the study, it is clear that the KNN algorithm is best suited as a risk classifier. This is evidence from the values of R2, error metrics, completeness score, explained variance, normalized mutual score v measure score, precision, recall, f1 score, and accuracy metrics. Secondly, the algorithms have been validated using 10 k-fold method using five types of performance metrics. In almost all cases, it was found that the KNN algorithm performs consistently and is the most suitable numerically. This can be attributed that the standard deviation remains tight of performance metrics in evaluation. From all the validation test, it can be claimed that on the current dataset, the KNN algorithm with Accuracy, Homogeneity Score Explained variance and Normalized mutual score hyper-parameter configuration is the best performer.
Keywords: Subjective Validation, Objective Validation, Health Insurance Risks.