Font Size: a A A

Studt On Disease Diagnosis Based On Relief Feature Selection And Mixed Kernel SVM

Posted on:2018-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:S Q MaFull Text:PDF
GTID:2334330536965897Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Medical diagnosis refers to the doctor checking the disease for patients.And make a classification for the patient’s disease etiology,with this as a means to develop a treatment plan.Essentially,this is a classification process,also known as pattern recognition.The existing classification methods include Support Vector Machine(SVM),K-Nearest Neighbor(KNN),Neural Network(NN),Decision Tree Algorithm,and so on.For the pattern recognition problem of small sample,nonlinear and high dimensional data,SVM has excellent robustness.And SVM has good ability of identification and adaptation.In the process of constructing the classification model,the learning ability of the training sample and the generalization ability of the test data are determined by three factors: the processing of the original data set,the kernel function you choose,and its parameters.At present,the main problems in the classification process of SVM as follows:(1)At present,SVM adopts a single kernel function.That is,global kernel function or local kernel function.The global kernel function has a good performance of generalization ability,but its learning ability is weak.While the local kernel function is opposite.It has a good learning ability while its generalization ability is poor.So SVM are unable to achieve the higher learning ability and generalization ability at the same time.(2)For the choice of SVM parameters,there are two major methods: traditional grid search method and heuristic algorithm.The grid search method can always find the optimal parameters,but the drawback is that it’s time-consuming and low efficiency.While the heuristic algorithm is efficient,but the accuracy of the parameters is less than the grid search method,And genetic algorithm cannot ensure the optimal solution.In order to improve the classification performance of SVM,this paper mainly focuses on the following aspects:(1)Select the Relief algorithm for feature selection.In the diagnosis of the disease,the correlations between patient’s various clinical features and the disease is different.And the physician cannot quantify the degree of association between each feature and the disease.Therefore,in order to improve the accuracy of diagnosis,we need to use the feature selection algorithm to calculate the weight of each feature,that is,the degree of correlation between the clinical symptoms and the disease;(2)Combines the global kernel function and local kernel functions,and constructs a mixed kernel function which have a better learning ability and generalization ability.(3)Combinatorial optimize the kernel function parameters.First use the genetic algorithm to find the approximate range of the optimal solution,and then use the grid search method to perform a second accurate search in the small range.This method can not only reduce the computing time greatly,but it also find the better solution,compared with the genetic algorithm.In the experiment,we use Matlab R2015 b and LiBSVM toolkit developed by Professor Lin Zhiren in Taiwan to modeling.Thesis analyzed the Matlab development environment,the interface configuration of LIBSVM toolkit,how to choose the kernel function and its parameters,how to construct the mixed kernel function and how to make the combinatorial optimization of parameters.And use the Heart disease data set and the Breast cancer data set in the UCI to construct and verify the disease diagnosis model.
Keywords/Search Tags:Relief feature selection, SVM, Combinatorial optimization, mixed kernel function
PDF Full Text Request
Related items