Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms

Saxena, Surabhi; Mohapatra, Debashish; Padhee, Subhransu; Sahoo, Goutam Kumar

doi:10.1007/s12065-021-00685-9

Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms

Research Paper
Published: 24 November 2021

Volume 16, pages 587–603, (2023)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Surabhi Saxena¹,
Debashish Mohapatra²,
Subhransu Padhee ORCID: orcid.org/0000-0001-9946-4662³ &
…
Goutam Kumar Sahoo⁴

873 Accesses
3 Citations
Explore all metrics

Abstract

Recently machine learning algorithms are widely used for the prediction of different attributes, and these algorithms find widespread applications in a variety of domains. Machine learning in health care has been one of the core areas of research where machine learning models are used on the medical datasets to predict different attributes. This work provides a comparative evaluation of different classical as well as ensemble machine learning models, which are used to predict the risk of diabetes from two different datasets, i.e., PIMA Indian diabetes dataset and early-stage diabetes risk prediction dataset. From the comparative analysis, it is found that the superlearner model provides the best accuracy i.e. 86% for PIMA Indian diabetes dataset, and it provides 97% accuracy for diabetes risk prediction dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Study on Diabetes Detection Using Various Machine Learning Algorithms

Diabetes Prediction Using Ensemble Methods

iDP: ML-driven diabetes prediction framework using deep-ensemble modeling

Article 21 November 2023

Ajay Kumar, Seema Bawa & Neeraj Kumar

References

Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–37
Google Scholar
Yang J, Li Y, Liu Q, Li L, Feng A, Wang T, Zheng S, Anding X, Lyu J (2020) Brief introduction of medical database and data mining technology in big data era. J Evid Based Med 13(1):57–69
Article Google Scholar
Shadi A, Aurea A, Atwood JW, Lara JA, Lizcano D (2019) Particularities of data mining in medicine: lessons learned from patient medical time series data analysis. EURASIP J Wirel Commun Netw 1:260
Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inf 77(2):81–97
Article Google Scholar
Bellazzi R, Ferrazzi F, Sacchi L (2011) Predictive data mining in clinical medicine: a focus on selected methods and applications. Wiley Interdiscip Rev Data Min Knowl Discov 1(5):416–430
Article Google Scholar
Parva E, Boostani R, Ghahramani Z, Paydar S (2017) The necessity of data mining in clinical emergency medicine; a narrative review of the current literatrue. Bull Emerg Trauma 5(2):90
Google Scholar
Dirar AHM, Doupis J (2017) Gestational diabetes from a to z. World J Diabetes 8(12):489
Article Google Scholar
Ramachandran A, Snehalatha C, Shyamala P, Vijay V, Viswanathan M (1994) Prevalence of diabetes in pregnant women-a study from southern india. Diabetes Res Clin Pract 25(1):71–74
Article Google Scholar
Mishra M, Nayak J, Naik B, Abraham A (2020) Deep learning in electrical utility industry: a comprehensive review of a decade of research. Eng Appl Artif Intell 96:104000
Article Google Scholar
Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190
Article Google Scholar
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116
Article Google Scholar
Larabi-Marie-Sainte S, Aburahmah L, Almohaini R, Saba T (2019) Current techniques for diabetes prediction: review and case study. Appl Sci 9(21):4604
Article Google Scholar
Elhadd T, Mall R, Bashir M, Palotti J, Fernandez-Luque L, Farooq F, Al Mohanadi D, Dabbous Z, Malik RA, Abou-Samra AB (2020). Artificial intelligence (AI) based machine learning models predict glucose variability and hypoglycaemia risk in patients with type 2 diabetes on a multiple drug regimen who fast during ramadan (the profast–it ramadan study). Diabetes Res Clin Pract
Zarkogianni K, Athanasiou M, Thanopoulou AC, Nikita KS (2017) Comparison of machine learning approaches toward assessing the risk of developing cardiovascular disease as a long-term diabetes complication. IEEE J Biomed Health Inf 22(5):1637–1647
Article Google Scholar
Han W, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inf Med Unlocked 10:100–107
Article Google Scholar
Alkhasawneh MS (2019) Hybrid cascade forward neural network with elman neural network for disease prediction. Arab J Sci Eng 44(11):9209–9220
Article Google Scholar
Guo Y, Bai G, Hu Y (2012) Using bayes network for prediction of type-2 diabetes. In: 2012 International conference for internet technology and secured transactions, pp 471–472. IEEE
Rahman M, Islam D, Mukti RJ, Saha I (2020) A deep learning approach based on convolutional LSTM for detecting diabetes. Comput Biol Chem 88:107329
Article Google Scholar
Xia Y, Chen K, Yang Y (2021) Multi-label classification with weighted classifier selection and stacked ensemble. Inf Sci 557:421–442
Article MathSciNet MATH Google Scholar
Mohapatra D, Subudhi B (2020) Weighted majority rule ensemble classifier for sensor fault classification for plasma position control in tokamaks. Fusion Eng Des 160:111969
Article Google Scholar
Moyano JM, Gibaja EL, Cios KJ, Ventura S (2018) Review of ensembles of multi-label classifiers: models, experimental study and prospects. Inf Fus 44:33–45
Article Google Scholar
Pari R, Sandhya M, Sankar S (2018) A multitier stacked ensemble algorithm for improving classification accuracy. Comput Sci Eng 22(4):74–85
Article Google Scholar
Graczyk M, Lasota T, Trawiński B, Trawiński K (2010) Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal. In: Asian conference on intelligent information and database systems. Springer, pp 340–350
Hasan MK, Alam MA, Das D, Hossain E, Hasan M (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531
Article Google Scholar
https://www.kaggle.com/uciml/pima-indians-diabetes-database. Online; accessed 08-Jun-2021
https://www.kaggle.com/ishandutta/early-stage-diabetes-risk-prediction-dataset. Online; accessed 02-Aug-2021
Friedman L, Komogortsev OV (2019) Assessment of the effectiveness of seven biometric feature normalization techniques. IEEE Trans Inf Forensics Secur 14(10):2528–2536
Article Google Scholar
Jo J-M (2019) Effectiveness of normalization pre-processing of big data to the machine learning performance. J Korea Inst Electron Commun Sci 14(3):547–552
Google Scholar
Ben-Gal I (2005) Outlier detection. Data mining and knowledge discovery handbook. Springer, pp 131–146
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Article MATH Google Scholar
Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000
Article Google Scholar
Nnamoko N, Korkontzelos I (2020) Efficient treatment of outliers and class imbalance for diabetes prediction. Artif Intell Med 104:101815
Article Google Scholar
Hemphill E, Lindsay J, Lee C, Măndoiu II, Nelson CE (2014) Feature selection and classifier performance on diverse bio-logical datasets. volume 15, p S4. Springer, Springer Science and Business Media LLC
Tuv E, Borisov A, Runger G, Torkkola K (2009) Feature selection with ensembles, artificial variables, and redundancy elimination. J Mach Learn Res 10:1341–1366
MathSciNet MATH Google Scholar
Kamkar I, Gupta SK, Phung D, Venkatesh S (2015) Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso. J Biomed Inf 53:277–290
Article Google Scholar
Arlot S, Celisse A et al (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Article MathSciNet MATH Google Scholar
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, pp 841–848
Merghadi A, Yunus AP, Dou J, Whiteley J, ThaiPham B, Bui DT, Avtar R, Abderrahmane B(2020) Machine learning methods for landslide susceptibility studies: a comparative overview of algorithm performance. Earth Sci Rev, p 103225
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
MathSciNet MATH Google Scholar
Pradeep Kandhasamy J, Balamurali SJPCS (2015) Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci 47:45–51
Article Google Scholar
Yuvaraj N, SriPreethaa KR (2019) Diabetes prediction in healthcare systems using machine learning algorithms on hadoop cluster. Clust Comput 22(1):1–9
Article Google Scholar
Anuja Kumari V, Chitra R (2013) Classification of diabetes disease using support vector machine. Int J Eng Res Appl 3(2):1797–1801
Google Scholar

Download references

Author information

Authors and Affiliations

Department of BCA, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
Surabhi Saxena
Department of Electrical Engineering, National Institute of Technology Rourkela, Rourkela, Odisha, India
Debashish Mohapatra
Department of Electrical and Electronics Engineering, Sambalpur University Institute of Information Technology, Burla, Odisha, India
Subhransu Padhee
Department of Electronics and Communication Engineering, National Institute of Technology Rourkela, Rourkela, Odisha, India
Goutam Kumar Sahoo

Authors

Surabhi Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Debashish Mohapatra
View author publications
You can also search for this author in PubMed Google Scholar
Subhransu Padhee
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Kumar Sahoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Subhransu Padhee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saxena, S., Mohapatra, D., Padhee, S. et al. Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evol. Intel. 16, 587–603 (2023). https://doi.org/10.1007/s12065-021-00685-9

Download citation

Received: 09 November 2020
Revised: 12 September 2021
Accepted: 09 November 2021
Published: 24 November 2021
Issue Date: April 2023
DOI: https://doi.org/10.1007/s12065-021-00685-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Study on Diabetes Detection Using Various Machine Learning Algorithms

Diabetes Prediction Using Ensemble Methods

iDP: ML-driven diabetes prediction framework using deep-ensemble modeling

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Study on Diabetes Detection Using Various Machine Learning Algorithms

Diabetes Prediction Using Ensemble Methods

iDP: ML-driven diabetes prediction framework using deep-ensemble modeling

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation