Dr. Chollette Olisah Chollette.Olisah@uwe.ac.uk
Research Fellow in Computer Vision and Machine Learning
Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
Olisah, Chollette C.; Smith, Lyndon; Smith, Melvyn
Authors
Lyndon Smith Lyndon.Smith@uwe.ac.uk
Professor in Computer Simulation and Machine
Melvyn Smith Melvyn.Smith@uwe.ac.uk
Research Centre Director Vision Lab/Prof
Abstract
Background and Objective: Diabetes mellitus is a metabolic disorder characterized by hyperglycemia, which results from the inadequacy of the body to secrete and respond to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and so can be life-threatening. The many years of research in computational diagnosis of diabetes have pointed to machine learning to as a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework for diabetes prediction and diagnosis using the PIMA Indian dataset and the laboratory of the Medical City Hospital (LMCH) diabetes dataset. We hypothesize that adopting feature selection and missing value imputation methods can scale up the performance of classification models in diabetes prediction and diagnosis. Methods: In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem. Results: Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively. Conclusion: The data preprocessing approaches and the classifiers with hyperparameter optimization proposed within the machine learning framework yield a robust machine learning model that outperforms state-of-the-art results in diabetes mellitus prediction and diagnosis. The source code for the models of the proposed machine learning framework has been made publicly available.
Journal Article Type | Article |
---|---|
Acceptance Date | Mar 22, 2022 |
Online Publication Date | Mar 31, 2022 |
Publication Date | 2022-06 |
Deposit Date | Apr 14, 2022 |
Publicly Available Date | Apr 20, 2022 |
Journal | Computer Methods and Programs in Biomedicine |
Print ISSN | 0169-2607 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 220 |
Article Number | 106773 |
DOI | https://doi.org/10.1016/j.cmpb.2022.106773 |
Keywords | Health Informatics; Computer Science Applications; Software |
Public URL | https://uwe-repository.worktribe.com/output/9327462 |
Additional Information | This article is maintained by: Elsevier; Article Title: Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective; Journal Title: Computer Methods and Programs in Biomedicine; CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.cmpb.2022.106773; Content Type: article; Copyright: Crown Copyright © 2022 Published by Elsevier B.V. |
Files
Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
(1.5 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search