Font Size: a A A

Prediction Of Diabetes Based On GA-LightGBM’s Stacking Model Fusion

Posted on:2024-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:H K YangFull Text:PDF
GTID:2544307163462994Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,the number of patients with diabetes is increasing and tends to be younger.Although they can be quickly diagnosed and treated t hrough hospital physical examination,the patients at the beginning have no obvious symptoms.The patients may not go to the hospital for physical examination in time,and it is impossible to early judge whether an individual has diabetes.With the rapid development of artificial intelligence,how to predict the risk of diabetes through machine learning will help to find it in time,so as to remind patients to go to hospital as soon as possible.It is an urgent problem to be solved in the early diagnosis of diabetes.This paper selects the data set in Xun Fei competition as the research object,preprocesses the data set,and uses the Filter method and PCA for Feature selection.After that,this paper proposes the Stacking integration model based on GA Light GBM to train and predict the data set,and compares it with other common algorithms,and uses the Ensemble learning model to continuously optimize the fitting effect,in order to improve the prediction accuracy of samples.The main problem to be solved in this paper is to use machine learning methods to predict the risk of diabetes in the data set,so as to improve the ability of early diagnosis of diabetes.First,preprocess the diabetes dataset and fill in the missing values,and use Pearson correlation coe fficient method to screen the eigenvalues,then use PCA to calculate the contribution rate of each eigenvalue and eliminate the features unrelated to the label;Secondly,we choose the commonly used single model algorithm: Logistic regression,KNN,decisio n tree,SVM,ANN for comparative experimental analysis,and conclude that the decision Tree model model has a better fitting effect;Once again,the Stacking integrated model based on GA-Light GBM is constructed and experimentally predicted;Then select XGBoost,Random Forest,Light GBM algorithm with default parameters and GA-Light GBM algorithm in Ensemble learning from the preprocessed diabetes dataset to model,predict and compare with the decision tree algorithm with better performance in the single model algorithm for experimental analysis;Finally,it is concluded that GA-Light GBM and Random Forest algorithm perform well on the five evaluation indicators.The Stacking model fusion algorithm based on GA-Light GBM adopted in this paper achieves a good degree of effect in the prediction of the disease risk on the diabetes data set,and the AUC value obtained is 0.9940,which further verifies the effect of the model fitting,and proves the effectiveness of the algorithm in the prediction of the diabetes data set.
Keywords/Search Tags:binary classification, filtration method, pearson correlation coefficient, SVM, GA-Light GBM, Stacking
PDF Full Text Request
Related items