Applications Of Resampling Classification Models In Diabetes Diagnosis And Blood Glucose Control Prediction For Middle-Aged And Elderly Population

Posted on:2024-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2544307079499034

Subject:Public health

Abstract/Summary:

PDF Full Text Request

Objectives This thesis studied the prediction problem of“category imbalance”in diabetes diagnosis and blood glucose control for middle-aged and elderly population,and used resampling algorithm to improve the prediction performance of classification models.The aim was to provide theoretical and data support for clinical workers to carry out diabetes diagnosis and blood glucose control for middle-aged and elderly population.Methods Based on the cohort data of“China Health and Retirement Longitudinal Study(CHARLS)”,this thesis respectively selected 5261 and 155 cases in terms of the inclusion and exclusion criteria for diabetes diagnosis and glycemic control in middle-aged and elderly population.Relevant data about socio-demographics,lifestyle,physical examination and blood test of the study subjects were collected,and the missing values of continuous and categorical independent variables were filled with mean and clustering algorithm,respectively.The Chi square test,t test and rank sum test were used for single factor screening that may affect diabetes diagnosis and blood glucose control,and LASSO-logistic(the Least Absolute Shrinkage and Selection Operator for Logistic Regression)was used for multi-factors screening.The statistically significant varibles of LASSO-logistic were selected as predictor variables,and whether diabetes was onset and whether blood glucose was controlled as outcome variables,respectively.RUS,ove-rsampling(SMOTE,ADASYN)and mixed sampling(SMOTEENN,SMOTETomek)were used to equalize the training data set,and three classification models of logistic regression,SVM and RF were introduced to predicte the onset and glycemic control of diabetes,respectively.Based on the training set,the optimal parameters were determined with hierarchical 5-fold cross validation and AUC.In order to analyze the influence of resampling algorithms on the performanc of classify models,the evaluation metrics of accuracy,sensitivity,specificity,precision,G-means,F-measure(F1 score)and AUC were employed to compare the performance of classification models for original data and resampling data.Results 1.The risk predictions of diabetes onset for middle-aged and elderly people:(1)There were 11 possible influencing factors,among which the risk factors were smoking,alcohol consumption(more than once a month),high level of systolic pressure(mm Hg),BMI(kg/m~2),TG(mg/dl),glucose(mg/dl),uric acid(mg/dl),C-reactive protein(mg/L)and glycated hemoglobin(%);protective factors were adequate sleep(h)and high levels of HDL-C(mg/dl).(2)For the diabetes imbalance dataset,the accuracy of the logistic,SVM,and RF classification models were 95.50%,96.33%,and96.33%,the sensitivity were 5.17%,0,and 0,the specificity were 98.95%,100%,and100%,the G-means were 0.2262,0,and 0,and the AUC were 0.7235,0.7196,and0.6990,respectively.(3)Several resampling algorithms mostly improved the sensitivity,G-means and F1 scores of logistic,SVM,and RF classification models.SMOTE,SMOTEENN,and SMOTETomek improved the AUC of the three classification models to different degrees(P<0.05).Compared with logistic,SVM,RF imbalance classification models,SMOTE under any sampling rate improved the AUC of logistic and SVM classification model,SMOTEENN increased the AUC of logistic and SVM classification model by 1.32%,2.63%,respectively.SMOTETomek increased the AUC of RF classification models by 4.94%.RUS and ADASYN do not significantly improved the AUC of the classification model.2.The predictions of glycemic control in middle-aged and elderly diabetic patients:(1)There were 9 possible influencing factors,among which,the risk factors were advanced age,disease course≥2 years,hypertension,overweight and obesity,elevated TG and reduced HDL-C,and protective were urban,exercise and having physician’s advice.(2)For the imbalanced glycemic control dataset,the accuracy of logistic,SVM,and RF classification models were 83.67%,83.67%,and 73.67%,the sensitivity were12.50%,0 and 0,the specificity were 97.56%,100%and 100%,the G-means values were 0.3493,0 and 0,and the AUC values were 0.7226,0.7012 and 0.6662,respectively.(3)Several resampling algorithms can improve the sensitivity,G-means,and F1 scores of logistic,SVM,and RF classification models.ADASYN,SMOTEENN,and SMOTETomek improved the AUC values of the three classification models to different degrees(P<0.05).Compared with logistic,SVM and RF unbalance classification models,ADASYN increased the AUC of logistic classification model by 2.13%,and SMOTEENN increased the AUC of logistic classification model by 3.05%.SMOTETomek increased the AUC of RF classification model by 2.13%;RUS and SMOTE cannot significantly increase the AUC of the classification model.Conclusions 1.The imbalanced data of diabetes onset had an important impact on the classification model,and three classifiers constructed based on the original data cannot identify the diabetic patients better.The SMOTE,SMOTEENN,and SMOTE-Tomek algorithms can better handle the problem of unbalanced diabetes data and improve the predictive performance of diabetes classification models.2.The imbalance data of glycemic control of diabetic patients had an important impact on the classification model,and the three classifiers constructed based on the original data cannot better identify the population with poorly controlled glucose.The ADASYN,SMOTEENN,and SMOTETomek can better handle the problem of imbalanced data of blood glucose control in diabetic patients and improve the predictive performance of the classification model of blood glucose control in diabetic patients.

Keywords/Search Tags:

Middle-aged and elderly diabetes, Resampling algorithm, Imbalanced classification, Disease diagnose, Blood glucose control

PDF Full Text Request

Related items

1	Correlation Analysis Of Triglycerides And Blood Glucose In Middle-aged And Elderly Hypertensive Patients In Rural Areas Of Yuexi County
2	The Status Of Vitamin D And Its Relationship With Glucose Tolerance Among Middle-aged And Elderly Individuals
3	Statistical Analysis Of The Safety And Effectiveness Of A Chinese Medicine Injection Based On Machine Learning
4	Relationship Between Weight Change And The Changes In Cardiovascular Risk Factors In Middle-aged And Elderly Chinese People
5	Cohort Study On Influence Of Behavior Change On Blood Glucose Among Middle-aged And Elderly People In Rural Areas Of Changchun
6	The Effects Of Stratification Vitamin D Supplementation On Vitamin D Nutritional Status In Middle-aged And Elderly People In Beijing Area And Its Effects And Mechanisms On Metabolic Related Indexes
7	Relationship Between Different Glucose Metabolism And Chronic Kidney Disease Among Middle-aged And Elderly Individuals In Lanzhou
8	Effects Of 12 Weeks Of Tai Chi Exercise On Inhibitory Control And Conversion Function In Prediabetic Middle-aged And Older Adults
9	Correlation Analysis Between Baseline Serum Uric Acid Level With Impaired Fasting Glucose And Diabetes In Middle-aged And Elderly People
10	The Study Of The Diagnostic Values Of Glycosylated Hemoglobin On Abnormal Glucose Metabolism In The Middle Aged And Elderly Population