Skip to main content

Research Repository

Advanced Search

A robust machine learning framework for diabetes prediction

Olisah, Chollette; Adeleye, Oluwaseun; Smith, Lyndon; Smith, Melvyn

Authors

Profile image of Chollette Olisah

Dr. Chollette Olisah Chollette.Olisah@uwe.ac.uk
Research Fellow in Computer Vision and Machine Learning

Oluwaseun Adeleye

Lyndon Smith Lyndon.Smith@uwe.ac.uk
Professor in Computer Simulation and Machine

Profile image of Melvyn Smith

Melvyn Smith Melvyn.Smith@uwe.ac.uk
Research Centre Director Vision Lab/Prof



Abstract

Diabetes mellitus is a metabolic disorder characterized by hyperglycemia which results from the inadequacy of the body to secret and responds to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and can be life-threatening. From the many years of research in computational diagnosis of diabetes, machine learning has been proven to be a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework to improve the performance of diabetes prediction with the PIMA Indian dataset. Through analysis, we observe that the main challenges of the dataset, which flaws learning, are feature selection and missing values. For each of these challenges, we propose a working solution that incorporates, Spearman Correlation and polynomial regression from a new perspective. Further, we optimize the random forest classifier by tuning its hyperparameters using grid search and repeated stratified k-fold cross-validation to build a robust random forest model that scales to the prediction problem. Finally, through exhaustive experiments, we demonstrate that our proposed data preparation approaches lead to a robust machine learning framework for the diagnosis of diabetes mellitus with train accuracy, and test-accuracy values that range from 98.96% to 100% and 97.92% to 100%, respectively, which outperforms all the state-of-the-art results. The source code for the proposed machine learning framework is made publicly available.

Presentation Conference Type Conference Paper (published)
Conference Name Future Technologies Conference (FTC) 2021
Start Date Oct 28, 2021
End Date Oct 29, 2021
Online Publication Date Nov 4, 2021
Publication Date Jan 1, 2022
Deposit Date Mar 2, 2022
Publisher Springer
Pages 775-792
Series Title Lecture Notes in Networks and Systems
Series Number 359
Series ISSN 2367-3370
Book Title Proceedings of the Future Technologies Conference (FTC) 2021
ISBN 9783030898793
DOI https://doi.org/10.1007/978-3-030-89880-9_58
Public URL https://uwe-repository.worktribe.com/output/9088917