| It takes a long time to treatment diabetes and there is no immediately effective treatment.And with the aggravation of the diabetes,there are serious complications such as retinal dysfunction,increased risk of cerebral infarction,and coronary artery disease.The timely detection of pre-diabetes is extremely important for controlling the development of diabetes,but the symptoms of pre-diabetes are not obvious and it is difficult to judge pre-diabetes by one index alone.However,if multiple indicators are added in the general physical examination,it will cost a lof time and money.so an effective mathematical model can be established to assist doctors in making effectivejudgments on pre-diabetes,thereby increasing thediagnosis rateof pre-diabetes.Many studies have shown that SVM can effectively classify nonlinear diabetes data.The use of random forest algorithms can help SVM models to identify the main correlation features from feature sets with small marginal effects and complex interactions.In this thesis,the support vector machine is used to train the diabetes data to obtain the classification model.A more adaptive prediction model can be obtained by combining with the characteristics of the random forest can givetheimportance of the feature.Themain work ofthis thesis is as follows:(1)Diabetes data contains multiple factors,and the correlation between these factors and objective function is often different.Aiming at the problem that the irrelevant features have adverse effects on the prediction model,an improved random forest feature screening method is proposed.The method uses the random forest algorithm to calculate the average permutation importance of the features and sorts the data features after weighting.The wrapper evaluation method and the backward elimination method are used to screen out the optimal feature subsets.The experimental results show that the method effectively identifies and eliminates the redundant orirrelevant features.(2)The traditional SVM algorithm uses a single-core kernel function,which has its own limitations on data analysis for single-core.A multi-core kernel function is proposed for the establishment of prediction model,which compatible with the advantages of single-core functions.According to the analysis of diabetes data,the appropriate combination of kernel functions is selected,and the particle swarm optimization algorithm is used to find the optimal value of the parameters.At the same time,considering the problem of unbalanced sample data,the experimentaldata set is sampledbefore modeling.(3)Aiming at the problem that the importance of the characteristics is significantly different will result in low reliability of the model,a prediction model of diabetes based on random forestand multi-core support vector machine is proposed.Firstly,the appropriate feature dataset are selected by using the random forest algorithm and the classification result of multi-core support vector machine,and then use the multi-core support vector machine model to train and predict diabete data.Feature weighting is used to enhance the effect of features related to diabetes classification on the results.Experimental results show that the method improves the classification accuracy and reliability of themodel as a whole. |