Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods

Gopi Battineni, Getu Gamo Sagaro, Chintalapudi Nalini, Francesco Amenta, Seyed Khosrow Tayebati

Research output: Contribution to journalArticlepeer-review

Abstract

(1) Background: Diabetes is a common chronic disease and a leading cause of death. Early diagnosis gives patients with diabetes the opportunity to improve their dietary habits and lifestyle and manage the disease successfully. Several studies have explored the use of machine learning (ML) techniques to predict and diagnose this disease. In this study, we conducted experiments to predict diabetes in Pima Indian females with particular ML classifiers. (2) Method: A Pima Indian diabetes dataset (PIDD) with 768 female patients was considered for this study. Different data mining operations were performed to a conduct comparative analysis of four different ML classifiers: Naïve Bayes (NB), J48, Logistic Regression (LR), and Random Forest (RF). These models were analyzed by different cross-validation (K = 5, 10, 15, and 20) values, and the performance measurements of accuracy, precision, F-score, recall, and AUC were calculated for each model. (3) Results: LR was found to have the highest accuracy (0.77) for all 'k' values. When k = 5, the accuracy of J48, NB, and RF was found to be 0.71, 0.76, and 0.75. For k = 10, the accuracy of J48, NB, and RF was found to be 0.73, 0.76, 0.74, while for k = 15, 20, the accuracy of NB was found to be 0.76. The accuracy of J48 and RF was found to be 0.76 when k = 15, and 0.75 when k = 20. Other parameters, such as precision, f-score, recall, and AUC, were also considered in evaluations to rank the algorithms. (4) Conclusion: The present study on PIDD sought to identify an optimized ML model, using with cross-validation methods. The AUC of LR was 0.83, RF 0.82, and NB 0.81). These three were ranked as the best models for predicting whether a patient is diabetic or not..

Original languageEnglish
JournalMachines
Volume7
Issue number4
DOIs
Publication statusPublished - 2019
Externally publishedYes

Keywords

  • Accuracy
  • Diabetes
  • Machine learning (ML)
  • Model validation
  • PIDD

Fingerprint

Dive into the research topics of 'Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods'. Together they form a unique fingerprint.

Cite this