Font Size: a A A

Screen Type 2 Diabetes Mellitus Basing On Machine Learning Methods And Non-invasive Features

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:T Z YangFull Text:PDF
GTID:2404330611452935Subject:Biomedical statistics
Abstract/Summary:PDF Full Text Request
Chronic disease is always threatening human lives.Diabetes mellitus not only damages population health but also generates serious economical and social burden in all of the chronic disease.Early diabetes screening could effectively improve life quality,reduce the burden of disease and years lived with disability.However,a large number of resources are necessary for natural population-based screening projects.With the development of machine learning,more and more researchers begun to focus on the machine learning method as screening and prediction tools.To further improve the population health status,the body measurements,questionnaire,and intestinal microflora were applied to build diabetes screening and diagnosis models.NHANES(National Health and Nutrition Examination Survey)from American CDC(centers for disease control and prevention)and type 2 diabetes mellitus data(T2DM)from iHMP(Integrative Human Microbiome Project)were as the datasets of this study.First of all,after data cleaning and feature selection,the NEANES dataset was split into a training set(80%,2011-2014),test set(20%,2011-2014)and validation set(2015-2016).Three simple machine learning methods(linear discriminant analysis,support vector machine,and random forest)and the easy ensemble method were used to build diabetes prediction models.Model performance was evaluated through 5-fold cross-validation and external validation.In the 5-fold cross-validation,the three simple methods yielded high predictive performance models with areas under the curve(AUCs)over 0.800,wherein the ensemble models significantly outperformed the simple models.When evaluating the models in the test set and validation set,the same trends were also observed.The ensemble model of linear discriminant analysis yielded the best performance with an AUC of 0.849,an accuracy of 0.730,a sensitivity of 0.819,and a specificity of 0.709 in the validation set.After the same process of the type 2 diabetes dataset in iHMP,this dataset was split into an 80% training set and a 20% test set by follow-up time ID and sample ID.LDA,SVM,and RF,these three machine learning methods were used to generate models,and these models were also validated by 5-fold cross-validation and test set.The RF yielded the best performance with an AUC of 0.760,a sensitivity of 0.601,and a specificity of 0.756,an accuracy of 0.693,in 5-fold cross-validation by follow-up time.The SVM yielded the best performance with an AUC of 0.750,a sensitivity of 0.368,and a specificity of 0.882,an accuracy of 0.676 in the test set by follow-up time.The RF yielded the best performance with an AUC of 0.783,a sensitivity of 0.640,and a specificity of 0.770,the accuracy of 0.716 in 5-fold cross-validation by ID.The RF also yielded the best performance with an AUC of 0.645,a sensitivity of 0.564,and a specificity of 0.634,an accuracy of 0.609 in the test set by ID.In conclusion,this study indicated that efficient screening using machine learning methods with non-invasive tests could be applied to a large population and achieve secondary prevention and diagnosis objective.The highlights of this study are as follows:(1)The invasive features,including body measurement,questionnaire,and intestinal microflora,were selected to construct the screening and diagnosis models.(2)The machine learning models with good performance have been constructed to screen and predict type 2 diabetes.
Keywords/Search Tags:Type 2 diabetes, machine learning, non-invasive attributes, screening, diagnosis
PDF Full Text Request
Related items