Skip to main content

Research Repository

Advanced Search

Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective

Olisah, Chollette C.; Smith, Lyndon; Smith, Melvyn

Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective Thumbnail


Profile Image

Dr. Chollette Olisah
Research Fellow in Computer Vision and Machine Learning

Lyndon Smith
Professor in Computer Simulation and Machine

Profile Image

Melvyn Smith
Research Centre Director Vision Lab/Prof


Background and Objective: Diabetes mellitus is a metabolic disorder characterized by hyperglycemia, which results from the inadequacy of the body to secrete and respond to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and so can be life-threatening. The many years of research in computational diagnosis of diabetes have pointed to machine learning to as a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework for diabetes prediction and diagnosis using the PIMA Indian dataset and the laboratory of the Medical City Hospital (LMCH) diabetes dataset. We hypothesize that adopting feature selection and missing value imputation methods can scale up the performance of classification models in diabetes prediction and diagnosis. Methods: In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem. Results: Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively. Conclusion: The data preprocessing approaches and the classifiers with hyperparameter optimization proposed within the machine learning framework yield a robust machine learning model that outperforms state-of-the-art results in diabetes mellitus prediction and diagnosis. The source code for the models of the proposed machine learning framework has been made publicly available.


Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, Article 106773.

Journal Article Type Article
Acceptance Date Mar 22, 2022
Online Publication Date Mar 31, 2022
Publication Date 2022-06
Deposit Date Apr 14, 2022
Publicly Available Date Apr 20, 2022
Journal Computer Methods and Programs in Biomedicine
Print ISSN 0169-2607
Electronic ISSN 1872-7565
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 220
Article Number 106773
Keywords Health Informatics; Computer Science Applications; Software
Public URL
Additional Information This article is maintained by: Elsevier; Article Title: Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective; Journal Title: Computer Methods and Programs in Biomedicine; CrossRef DOI link to publisher maintained version:; Content Type: article; Copyright: Crown Copyright © 2022 Published by Elsevier B.V.


You might also like

Downloadable Citations