Font Size: a A A

Research On Prediction Of Diabetes Based On Improved Neighbourhood Rough Set And Random Forest Algorithm

Posted on:2019-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:W HuFull Text:PDF
GTID:2394330545968064Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Diabetes has become one of the most harmful chronic diseases in the world,and China is the world’s largest diabetes country.In recent years,the prevalence rate has also increased year by year,which seriously affects human health.And with the continuous improvement of the level of medical services,people have higher requirements for medical diagnosis efficiency and accuracy.At present,the diagnosis of the disease is mainly based on laboratory tests performed by a doctor.However,the incubation period of diabetes is longer,and the medical resources in different regions are inconsistent.In response to these problems,the paper collected a diabetes dataset of a national hospital for population and health science data sharing service platform,and for the shortcomings of calculating the importance of single-attribute calculations for neighboring rough sets,the paper proposed an improved attribute reduction for neighboring rough sets.The algorithm,which was later applied to the classification prediction model of random forests,was applied to diabetes data to form a high-precision diabetes prediction model,which aims to provide support for clinical diagnosis and disease research of doctors and improve clinical diagnosis and treatment.In this paper,we use MATLAB and WEKA to realize the improved neighboring rough sets for attribute reduction and random forest classification prediction of diabetes dataset.In order to explore the effectiveness of the combinatorial model,we compare the three aspects of attribute reduction and classifier selection.Firstly,the attribute reduction effect analysis is performed.The feature model is established by irreduced,rough set reduction,neighborhood rough set reduction and improved neighborhood rough set algorithm respectively,and the reduced data is evaluated by random forest classifier.The data after reduction was obviously better than no reduction,and the classification was more accurate.Although the number of attributes after INRS reduction was not improved,the classification accuracy was the best,indicating that the improved algorithm improved.Then,the classifier selection effect analysis is used to model the data of INRS reduction using random forest,BP neural network,C4.5 decision tree and Na?ve Byes classifier.Through the comparison and analysis from four aspects of modeling time consuming,error,classification accuracy and ROC area,it is found that the optimal comprehensive effect is the random forest classifier.To sum up,the classification accuracy of diabetes prediction model based on improved neighborhood rough set and random forest combined model is 92.05%,and its comprehensive effect is very good.Through this study,it is hoped that the predictive function of diabetes can be added to the hospital’s diagnosis and treatment system,thereby assisting physicians in making scientific diagnosis and decision making for the diagnosis of diabetes.
Keywords/Search Tags:improved neighborhood rough set, random forest, diabetes
PDF Full Text Request
Related items