Abstract
Prediction of diseases is sensitive as any error can result in the wrong person's treatment or not treating the right patient. Besides, some features distinguish a disease from curable to fatal or curable to chronic disease. Data mining techniques have been widely used in health-related research. The researchers, so far, could attain around 97 percent accuracy using several methods. Some researchers have demonstrated that the selection of correct features increases the prediction accuracy. This research work propose a method to distinguish between chronic and non-chronic kidney disease, identify its crucial features without reducing the accuracy of prediction, and a prediction algorithm to eliminate the possibility of under or overfitting. This study uses the recursive feature elimination (RFE) method that selects an optimal subset of features and an ensemble algorithm, the enhanced decision tree (EDT), to predict the disease. The results obtained in this paper show that the accuracy level of EDT is not changed with the removal of less significant features, thus enabling the decision-makers to concentrate on few features to reduce time and error of treatment. EDT establishes substantially high consistency in predicting, with or without feature selection, the disease.
Similar content being viewed by others
References
Alaiad A, Najadat H, Mohsen B, Balhaf K (2020) Classification and association rule mining technique for predicting chronic kidney disease. J Inf Knowl Manag 19(01):2040015
Alasker H, Alharkan S, Alharkan W, Zaki A, Riza LS (2017) Detection of kidney disease using various intelligent classifiers. In: 2017 3rd international conference on science in information technology (ICSITech). IEEE, New York, pp 681–684
Al-Hadeethi H, Abdulla S, Diykh M, Deo RC, Green JH (2020) Adaptive boost LS-SVM classification approach for time-series signal classification in epileptic seizure diagnosis applications. Expert Syst Appl 161:113676
Aljaaf AJ, Al-Jumeily D, Haglan HM, Alloghani M, Baker T, Hussain AJ, Mustafina J (2018). Early prediction of chronic kidney disease using machine learning supported by predictive analytics. In: 2018 IEEE congress on evolutionary computation (CEC). IEEE, New York, pp 1–9
Alloghani M, Al-Jumeily D, Hussain A, Liatsis P, Aljaaf AJ (2020) Performance-based prediction of chronic kidney disease using machine learning for high-risk cardiovascular disease patients. Nature-inspired computation in data mining and machine learning. Springer, Cham, pp 187–206
Almansour NA, Syed HF, Khayat NR, Altheeb RK, Juri RE, Alhiyafi J et al (2019) Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study. Comput Biol Med 109:101–111
Almasoud M, Ward TE (2019) Detection of chronic kidney disease using machine learning algorithms with least number of predictors. Int J Soft Comput Appl. https://doi.org/10.14569/IJACSA.2019.0100813
Amdur RL, Chawla LS, Amodeo S, Kimmel PL, Palant CE (2009) Outcomes following diagnosis of acute renal failure in US veterans: focus on acute tubular necrosis. Kidney Int 76(10):1089–1097
Arai H, Maung C, Xu K, Schweitzer H (2016). nsupervised feature selection by heuristic search with provable bounds on suboptimality. In: Proceedings of the AAAI conference on artificial intelligence, vol. 30, No. 1.
Basar MD, Akan A (2017) Detection of chronic kidney disease by using ensemble classifiers. In: 2017 10th international conference on electrical and electronics engineering (ELECO). IEEE, New York, pp 544–547
Bashir S, Khan ZS, Khan FH, Anjum A, Bashir K (2019). Improving heart disease prediction using feature selection approaches. In: 2019 16th international bhurban conference on applied sciences and technology (IBCAST). IEEE, New York, pp 619–623
Besra B, Majhi B (2019) An analysis on chronic kidney disease prediction system: cleaning, preprocessing, and effective classification of data. Recent findings in intelligent computing techniques. Springer, Singapore, pp 473–480
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, vol 432. Wadsworth International Group, Belmont, pp 151–166
Briscoe E, Feldman J (2011) Conceptual complexity and the bias/variance tradeoff. Cognition 118(1):2–16
Cai Q, Mukku KV, Ahmad M (2013) Coronary artery disease in patients with chronic kidney disease: a clinical update. Curr Cardiol Rev 9(4):331–339
Chalak LF, Pavageau L, Huet B, Hynan L (2020) Statistical rigor and kappa considerations: which, when and clinical context matters. Pediatr Res 88(1):5–5
Charleonnan A, Fufaung T, Niyomwong T, Chokchueypattanakit W, Suwannawach S, Ninchawee N (2016). Predictive analytics for chronic kidney disease using machine learning techniques. In: 2016 management and innovation technology international conference (MITicon). IEEE, New York, pp MIT-80
Chatterjee S, Banerjee S, Basu P, Debnath M, Sen S (2017) Cuckoo search coupled artificial neural network in detection of chronic kidney disease. In: 2017 1st international conference on electronics, materials engineering and nano-technology (IEMENTech). IEEE, New York, pp 1–4
Chawla LS, Kimmel PL (2012) Acute kidney injury and chronic kidney disease: an integrated clinical syndrome. Kidney Int 82(5):516–524
Chawla LS, Amdur RL, Amodeo S, Kimmel PL, Palant CE (2011) The severity of acute kidney injury predicts progression to chronic kidney disease. Kidney Int 79(12):1361–1369
Chawla LS, Eggers PW, Star RA, Kimmel PL (2014) Acute kidney injury and chronic kidney disease as interconnected syndromes. N Engl J Med 371(1):58–66
Chen Z, Zhang Z, Zhu R, Xiang Y, Harrington PB (2016) Diagnosis of patients with chronic kidney disease by using two fuzzy classifiers. Chemom Intell Lab Syst 153:140–145
Chetty N, Vaisla KS, Sudarsan SD (2015) Role of attributes selection in classification of Chronic Kidney Disease patients. In: 2015 international conference on computing, communication and security (ICCCS). IEEE, New York, pp 1–6
Chronic Kidney Disease Prognosis Consortium (2010) Association of estimated glomerular filtration rate and albuminuria with all-cause and cardiovascular mortality in general population cohorts: a collaborative meta-analysis. Lancet 375(9731):2073–2081
Chung CJ, Kuo YC, Hsieh YY, Li TC, Lin CC, Liang WM et al (2017) Subject-enabled analytics model on measurement statistics in health risk expert system for public health informatics. Int J Med Inf 107:18–29
Coca SG, Singanamala S, Parikh CR (2012) Chronic kidney disease after acute kidney injury: a systematic review and meta-analysis. Kidney Int 81(5):442–448
Coresh J, Wei GL, McQuillan G, Brancati FL, Levey AS, Jones C, Klag MJ (2001) Prevalence of high blood pressure and elevated serum creatinine level in the United States: findings from the third National Health and Nutrition Examination Survey (1988–1994). Arch Intern Med 161(9):1207–1216
Davazdahemami B, Delen D (2019) The confounding role of common diabetes medications in developing acute renal failure: a data mining approach with emphasis on drug-drug interactions. Expert Syst Appl 123:168–177
de Barros RSM, Hidalgo JIG, de Lima Cabral DR (2018) Wilcoxon rank sum test drift detector. Neurocomputing 275:1954–1963
Devika R, Avilala SV, Subramaniyaswamy V (2019) Comparative study of classifier for chronic kidney disease prediction using Naive Bayes, KNN and random forest. In: 2019 3rd international conference on computing methodologies and communication (ICCMC). IEEE, New York, pp 679–684
Di Noia T, Ostuni VC, Pesce F, Binetti G, Naso D, Schena FP, Di Sciascio E (2013) An end stage kidney disease predictor based on an artificial neural networks ensemble. Expert Syst Appl 40(11):4438–4445
Dolatabadi AD, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput Methods Programs Biomed 138:117–126
Draper NR, Smith H (1998) Applied regression analysis, vol 326. John Wiley & Sons, Hoboken
Dubey A (2015) A classification of ckd cases using multivariate k-means clustering. Int J Sci Res Publ 5(8):1–5
Elhoseny M, Shankar K, Uthayakumar J (2019) Intelligent diagnostic prediction and classification system for chronic kidney disease. Sci Rep 9(1):1–14
Escanilla NS, Hellerstein L, Kleiman R, Kuang Z, Shull J, Page D (2018). Recursive feature elimination by sensitivity testing. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, New York, pp 40–47
Fan J, Upadhye S, Worster A (2006) Understanding receiver operating characteristic (ROC) curves. Can J Emerg Med 8(1):19–20
Gansevoort RT, Matsushita K, Van Der Velde M, Astor BC, Woodward M, Levey AS et al (2011) Lower estimated GFR and higher albuminuria are associated with adverse kidney outcomes. A collaborative meta-analysis of general and high-risk population cohorts. Kidney Int 80(1):93–104
Giovannetti S, Barsotti G (1991) defense of creatinine clearance. Nephron 59(1):11–14
Golberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addion wesley 1989(102):36
Goldstein SL (2012) Acute kidney injury in children and its potential consequences in adulthood. Blood Purif 33(1–3):131–137
Gunasundari S, Janakiraman S, Meenambal S (2016) Velocity bounded boolean particle swarm optimization for improved feature selection in liver and kidney disease diagnosis. Expert Syst Appl 56:28–47
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422
Hasan KZ, Hasan MZ (2019) Performance evaluation of ensemble-based machine learning techniques for prediction of chronic kidney disease. Emerging research in computing, information, communication and applications. Springer, Singapore, pp 415–426
Hore S, Chatterjee S, Shaw RK, Dey N, Virmani J (2018) Detection of chronic kidney disease: a NN-GA-based approach. Nature Inspired Computing. Springer, Singapore, pp 109–115
Ishani A, Xue JL, Himmelfarb J, Eggers PW, Kimmel PL, Molitoris BA, Collins AJ (2009) Acute kidney injury increases risk of ESRD among elderly. J Am Soc Nephrol 20(1):223–228
Ishani A, Nelson D, Clothier B, Schult T, Nugent S, Greer N et al (2011) The magnitude of acute serum creatinine increase after cardiac surgery and the risk of chronic kidney disease, progression of kidney disease, and death. Arch Intern Med 171(3):226–233
James MT, Hemmelgarn BR, Wiebe N, Pannu N, Manns BJ, Klarenbach SW et al (2010) Glomerular filtration rate, proteinuria, and the incidence and consequences of acute kidney injury: a cohort study. Lancet 376(9758):2096–2103
Jerlin Rubini L, Perumal E (2020) Efficient classification of chronic kidney disease by using multi-kernel support vector machine and fruit fly optimization algorithm. Int J Imaging Syst Technol 30(3):660–673
Jha V, Garcia-Garcia G, Iseki K, Li Z, Naicker S, Plattner B et al (2013) Chronic kidney disease: global dimension and perspectives. Lancet 382(9888):260–272
Kemal ADEM (2018) Diagnosis of chronic kidney disease using random subspace method with particle swarm optimization. Int J Eng Res Dev 10(3):1–5
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kopple JD (2001) The National Kidney Foundation K/DOQI clinical practice guidelines for dietary protein intake for chronic dialysis patients. Am J Kidney Dis 38(4):S68–S73
Kriplani H, Patel B, Roy S (2019) Prediction of chronic kidney diseases using deep artificial neural network technique. Computer aided intervention and diagnostics in clinical and medical images. Springer, Cham, pp 179–187
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205
Larson R, Farber E, Farber E (2009) Elementary statistics: picturing the world. Pearson Prentice Hall
Lee S, Schowe B, Sivakumar V, Morik K (2012) Feature selection for high-dimensional data with rapidminer. Universitätsbibliothek Dortmund
Levey AS, Coresh J (2012) Chronic kidney disease. Lancet 379(9811):165–180
Levey AS, Bosch JP, Lewis JB, Greene T, Rogers N, Roth D (1999) A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation. Ann Intern Med 130(6):461–470
Levey AS, Atkins R, Coresh J, Cohen EP, Collins AJ, Eckardt KU et al (2007) Chronic kidney disease as a global public health problem: approaches and initiatives—a position statement from Kidney Disease Improving Global Outcomes. Kidney Int 72(3):247–259
Levey AS, Stevens LA, Schmid CH, Zhang Y, Castro AF III, Feldman HI et al (2009) A new equation to estimate glomerular filtration rate. Ann Intern Med 150(9):604–612
Levin A, Hemmelgarn B, Culleton B, Tobe S, McFarlane P, Ruzicka M et al (2008) Guidelines for the management of chronic kidney disease. CMAJ 179(11):1154–1162
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Computing Surveys (CSUR) 50(6):1–45
Malmir B, Amini M, Chang SI (2017) A medical decision support system for disease diagnosis under uncertainty. Expert Syst Appl 88:95–108
Manikandan R, Patan R, Gandomi AH, Sivanesan P, Kalyanaraman H (2020) Hash polynomial two factor decision tree using IoT for smart health care scheduling. Expert Syst Appl 141:112924
McRae MP, Bozkurt B, Ballantyne CM, Sanchez X, Christodoulides N, Simmons G et al (2016) Cardiac ScoreCard: a diagnostic multivariate index assay system for predicting a spectrum of cardiovascular disease. Expert Syst Appl 54:136–147
Meza-Palacios R, Aguilar-Lasserre AA, Ureña-Bogarín EL, Vázquez-Rodríguez CF, Posada-Gómez R, Trujillo-Mata A (2017) Development of a fuzzy expert system for the nephropathy control assessment in patients with type 2 diabetes mellitus. Expert Syst Appl 72:335–343
Mitchell TM (2006) The discipline of machine learning, vol 9. Carnegie Mellon University, School of Computer Science, Machine Learning Department, Pittsburgh
Mohammed Siyad B, Manoj M, Mohammed Siyad B, Manoj M (2016) Fused features classification for the effective prediction of chronic kidney disease. Int J 2:44–48
Nadi A, Moradi H (2019) Increasing the views and reducing the depth in random forest. Expert Syst Appl 138:112801
Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Comput Archit Lett 26(09):917–922
Neter J, Wasserman W, Kutner MH (1990) Applied linear statistical models: regression, analysis of variance, and experimental designs. Richard D Irwin, Homewood
Nilashi M, Roudbaraki MZ, Farahmand M (2017) A Predictive method for mesothelioma disease classification using Naïve Bayes classifier. J Soft Comput Decis Support Syst 4(6):7–14
Nilashi M, Ahmadi H, Sheikhtaheri A, Naemi R, Alotaibi R, Alarood AA et al (2020) Remote tracking of parkinson’s disease progression using ensembles of deep belief network and self-organizing map. Expert Syst Appl 159:113562
Perrone RD, Madias NE, Levey AS (1992) Serum creatinine as an index of renal function: new insights into old concepts. Clin Chem 38(10):1933–1953
Qin J, Chen L, Liu Y, Liu C, Feng C, Chen B (2019) A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 8:20991–21002
Radha N, Ramya S (2015) Performance analysis of machine learning algorithms for predicting chronic kidney disease. Int J Comput Sci Eng Open Access 3:72–76
Raghavendra U, Fujita H, Gudigar A, Shetty R, Nayak K, Pai U et al (2018) Automated technique for coronary artery disease characterization and classification using DD-DTDWT in ultrasound images. Biomed Signal Process Control 40:324–334
Ray A, Chaudhuri AK (2021) Smart healthcare disease diagnosis and patient management: innovation, improvement and skill development. Mach Learn Appl 3:100011
Rubini LJ (2015) UCIMachineLearningRepository. Karaikudi. TamilNadu: Algappa University, Department of Computer Science and Engineering. http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease.
Salekin A, Stankovic J (2016). Detection of chronic kidney disease and selecting important predictive attributes. In: 2016 IEEE international conference on healthcare informatics (ICHI). IEEE, New York, pp 262–270
Saringat Z, Mustapha A, Saedudin RR, Samsudin NA (2019) Comparative analysis of classification algorithms for chronic kidney disease diagnosis. Bull Electr Eng Inf 8(4):1496–1501
Schreiner SJ, Imbach LL, Werth E, Poryazova R, Baumann-Vogel H, Valko PO et al (2019) Slow-wave sleep and motor progression in Parkinson disease. Ann Neurol 85(5):765–770
Sharaff A, Gupta H (2019) Extra-tree classifier with metaheuristics approach for email classification. Advances in computer communication and computational sciences. Springer, Singapore, pp 189–197
Sinha P, Sinha P (2015) Comparative study of chronic kidney disease prediction using KNN and SVM. Int J Eng Res Technol 4(12):608–612
Speiser JL, Miller ME, Tooze J, Ip E (2019) A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl 134:93–101
Stevens LA, Levey AS (2009) Current status and future perspectives for CKD testing. Am J Kidney Dis 53(3):S17–S26
Tazin N, Sabab SA, Chowdhury MT (2016) Diagnosis of Chronic Kidney Disease using effective classification and feature selection technique. In: 2016 international conference on medical engineering, health informatics and technology (MediTec). IEEE, New York, pp 1–6
Thakar CV, Christianson A, Himmelfarb J, Leonard AC (2011) Acute kidney injury episodes and chronic kidney disease risk in diabetes mellitus. Clin J Am Soc Nephrol 6(11):2567–2572
Tikariha P, Richhariya P (2018) Comparative study of chronic kidney disease prediction using different classification techniques. In: Proceedings of international conference on recent advancement on computer and communication. Springer, Singapore, pp 195–203
Vandewiele G, Dehaene I, Kovács G, Sterckx L, Janssens O, Ongenae F, VanHoecke S (2020) Overly optimistic prediction results on imbalanced data: flaws and benefits of applying over-sampling. Preprint at https://arxiv.org/abs/quant-ph/2001.06296
Wahba G, Wang Y, Gu C, Klein R, Klein B (1994) Structured machine learning forsoft’classification with smoothing spline ANOVA and stacked tuning, testing and evaluation. Adv Neural Inf Process Syst 6:415–422
Wahba G, Lin X, Gao F, Xiang D, Klein R, Klein BE (1998). The bias-variance tradeoff and the randomized GACV. In: NIPS, pp 620–626
Wald R, Quinn RR, Luo J, Li P, Scales DC, Mamdani MM et al (2009) Chronic dialysis and death among survivors of acute kidney injury requiring dialysis. JAMA 302(11):1179–1185
Weiss SM, Kulikowski CA (1991) Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann Publishers Inc., Burlington
Wibawa MS, Maysanjaya IMD, Putra IMAW (2017) Boosted classifier and features selection for enhancing chronic kidney disease diagnose. In: 2017 5th international conference on cyber and IT service management (CITSM). IEEE, New York, pp 1–6
Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics. Springer, New York, pp 196–202
World Health Organization (2011) Global status report on noncommunicable diseases 2010. WHO, Geneva
Zeynu S, Patil S (2018) Prediction of chronic kidney disease using data mining feature selection and ensemble method. Int J Data Min Genomics Proteomics 9(1):1–9
Zhang Y, Wang S, Phillips P, Ji G (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl-Based Syst 64:22–31
Acknowledgements
The authors thank the anonymous referees, and the editor for their valuable feedback, which significantly improved the positioning and presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chaudhuri, A.K., Sinha, D., Banerjee, D.K. et al. A novel enhanced decision tree model for detecting chronic kidney disease. Netw Model Anal Health Inform Bioinforma 10, 29 (2021). https://doi.org/10.1007/s13721-021-00302-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-021-00302-w