Research On Risk Prediction Of Diabetes Based On Random Forest And Support Vector

Posted on:2020-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:Q Miao

Full Text:PDF

GTID:2404330596496911

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

It takes a long time to treatment diabetes and there is no immediately effective treatment.And with the aggravation of the diabetes,there are serious complications such as retinal dysfunction,increased risk of cerebral infarction,and coronary artery disease.The timely detection of pre-diabetes is extremely important for controlling the development of diabetes,but the symptoms of pre-diabetes are not obvious and it is difficult to judge pre-diabetes by one index alone.However,if multiple indicators are added in the general physical examination,it will cost a lof time and money.so an effective mathematical model can be established to assist doctors in making effectivejudgments on pre-diabetes,thereby increasing thediagnosis rateof pre-diabetes.Many studies have shown that SVM can effectively classify nonlinear diabetes data.The use of random forest algorithms can help SVM models to identify the main correlation features from feature sets with small marginal effects and complex interactions.In this thesis,the support vector machine is used to train the diabetes data to obtain the classification model.A more adaptive prediction model can be obtained by combining with the characteristics of the random forest can givetheimportance of the feature.Themain work ofthis thesis is as follows:(1)Diabetes data contains multiple factors,and the correlation between these factors and objective function is often different.Aiming at the problem that the irrelevant features have adverse effects on the prediction model,an improved random forest feature screening method is proposed.The method uses the random forest algorithm to calculate the average permutation importance of the features and sorts the data features after weighting.The wrapper evaluation method and the backward elimination method are used to screen out the optimal feature subsets.The experimental results show that the method effectively identifies and eliminates the redundant orirrelevant features.(2)The traditional SVM algorithm uses a single-core kernel function,which has its own limitations on data analysis for single-core.A multi-core kernel function is proposed for the establishment of prediction model,which compatible with the advantages of single-core functions.According to the analysis of diabetes data,the appropriate combination of kernel functions is selected,and the particle swarm optimization algorithm is used to find the optimal value of the parameters.At the same time,considering the problem of unbalanced sample data,the experimentaldata set is sampledbefore modeling.(3)Aiming at the problem that the importance of the characteristics is significantly different will result in low reliability of the model,a prediction model of diabetes based on random forestand multi-core support vector machine is proposed.Firstly,the appropriate feature dataset are selected by using the random forest algorithm and the classification result of multi-core support vector machine,and then use the multi-core support vector machine model to train and predict diabete data.Feature weighting is used to enhance the effect of features related to diabetes classification on the results.Experimental results show that the method improves the classification accuracy and reliability of themodel as a whole.

Keywords/Search Tags:

support vector machine, feature weighting, multi-core kernel function, random forest algorithm

PDF Full Text Request

Related items

1	Research And Application Of Brain Image Analysis Algorithm Based On Multi-modal Data
2	Research On Medical Image Mining Based On Improved Multi Kernel Support Vector Machine
3	Analysis Of Cancer Gene Data Base On Random Forest And Support Vector Machine
4	Selection Of Tb Susceptible Genes Based On Improved Random Forest Algorithm
5	The Application Of Random Forest And Support Vector Machine In High Dimensional Transcriptome Data Of Breast Cancer
6	Application Of Support Vector Machine In Prediction Of Diabetes Genetic Risk
7	Research On ECG Signal Processing Method Based On Machine Learning
8	Research And Application Of FCM Based Multi-Kernel Support Vector Machine
9	Data Feature Awareness Of FECG Extraction Algorithm
10	Research On The Image Classification Of Brain Glioma Based On Machine Learning