Font Size: a A A

The Study Of Classification Algorithm Of The Type ? And ? Diabetes Based On SMOTE And XGBoost

Posted on:2021-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y WangFull Text:PDF
GTID:2404330602971523Subject:Engineering
Abstract/Summary:PDF Full Text Request
Diabetes mellitus is one of the most serious diseases endangering human life and health.There are four types: type I diabetes mellitus,type II diabetes mellitus,specific diabetes mellitus and gestational diabetes mellitus.Doctors judge patients according to their clinical manifestations and medical test results.However,there are some similar symptoms in the clinical manifestations of type I and type II diabetes,which are difficult to judge.Different doctors may judge the results differently.Aiming at the classification of type I and type II diabetes,this paper proposes to use the dynamic blood glucose detector(CGMS)to collect the time series blood glucose concentration data of diabetic patients,to extract and classify the characteristics of the patient data,and to provide a new classification model for the classification of diabetes.However,due to the different patient base,the number of patients diagnosed as type I diabetes is far less than that of type II diabetes,resulting in the imbalance of the sample data category,which belongs to the category imbalance data.Therefore,this paper proposes SMOTE and its improved algorithm for processing the category imbalance data,and then trains and tests the classification model of the processed data.The main work of this paper is as follows:1.Preprocess the patient's original data from CGMS instrument,extract the features of different dimensions with PCA technology,set the PCA threshold to 85%,90%,95%,99% and MLE,and finally get five different feature groups;2.To construct the classification model of SMOTE + XGBoost algorithm for the serious class imbalance data of type I and type II diabetes.For the five groups of feature groups obtained by PCA,use SMOTE and its two improved algorithms,borderline SMOTE1 and borderline SMOTE2,are used to deal with the class imbalance of feature set,and then XGBoost model is used to train and test the model.In this paper,CV function is used to optimize the parameters of the model in order to improve the classification effect;3.Finally,verify the proposed classification model.Experiments show that the more features are retained in PCA algorithm,the higher the accuracy of classification model.The final classification results of SMOTE + XGBoost and its two improved algorithms are in accordance with the clinical diagnosis rate [0.9238,0.9619],and compared with GBDT algorithm and light GBM algorithm,it is found that the accuracy of the classification model proposed in this paper is higher.
Keywords/Search Tags:Classification of diabetes, CGMS, feature extraction, PCA, Category imbalance, SMOTE+XGBoost
PDF Full Text Request
Related items