Font Size: a A A

Research Of Support Vector Machine For The Analysis Of Gene Expression Data

Posted on:2014-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2248330395491984Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of DNA microarray technology by leaps and bounds, thesimultaneous detection of thousands of genes’ expression level has become a reality.How to obtain the available biological information of gene expression data rapidlyand accurately by using data mining technology has become the focus of researchersin recent years. Clustering analysis technology has been widely applied due to its owncharacteristics in the analysis of gene expression data; but each clustering algorithmhas its defects, so we need to find a new clustering analysis method.Support vector machine (SVM) algorithm obtains a good effect and gets thefavor of many researchers as a kind of supervised clustering analysis algorithm onhigh dimensional data of small sample classification. And the tumor gene data meetsthe characteristics of SVM. So this paper mainly does the relevant research for tumorgene classification based on SVM, the main content and innovation are as follows:(1) Although SVMmethod is suitable for high dimensional data analysis, geneexpression data are often thousands of dimensions, the time cost of this algorithm isquite high. The method of dimension reduction based on principal component analysis(PCA) and kernel principal component analysis (KPCA) can not only shorten the algorithm’s running time, but also integrate the useful characteristics information.This paper compares the classification accuracy of the search range with differentparameters between PCA-SVM and KPCA-SVM when the cumulative contributionrate reaches100%,95%and90%by three groups of experimental data. Theexperimental results show that the change of classification accuracy based onPCA-SVM is none of the business of the change of the cumulative contribution rate,but change of classification accuracy based on KPCA-SVM decreases or remainsunchanged with the loss of the cumulative contribution.(2) In the parameter optimization method based on grid search, in order to findthe global optimal solution of parameters, we usually choose larger parameter rangesand smaller search step length to obtain a good classification accuracy which is at thesacrifice of time efficiency. An improved grid search method is proposed in this paper,it shortens the range of search through reducing by half search. By the experimentalanalysis of three groups of data sets, compared with the traditional grid searchalgorithm, the proposed algorithm greatly reduces the search time in the premise ofclassification accuracy increases or remains the same level.(3) Through the performance analysis of standard support vector machine(C-SVM) algorithm,we deduced in theory that SVM didn’t have an ideal classification effect for unbalanced number of samples in each class. We use C-SVM to classifywhen the number of sample in each class varies widely, the classified accuracy oftraining samples is very high, but the classified accuracy of prediction data is low, theclassification accuracy of more number of samples is higher than that of less number.With the distance of samples and class center presented in this paper, we introduce thedistance relationship between samples with other samples, we put forward a kind ofpunishment weighted support vector machine algorithm (WC-SVM), this algorithmtakes into account the density distribution of each type of sample, and design differentpunishment power value for different samples, which compensate for the lowcontribution for hyper plane of the class with less sample. Experiments show that WC-SVM can improve classification accuracy on the less samples classes, and the wrongclassification generally reduce.
Keywords/Search Tags:Gene expression data, grid search, principal component analysis, kernel principalcomponent analysis, support vector machine, penalty weighted
PDF Full Text Request
Related items