Font Size: a A A

Study On SVMs-based Classification Of Gene Expression Data

Posted on:2007-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhanFull Text:PDF
GTID:2178360182480913Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The new molecular biological technology, gene microarray technology is a great science and technology achievement with deep influence, its emergence will bring a revolution to bio-informatics, provide a important method for research of bio-informatics. Gene microarray makes it feasible to obtain large number of gene expression data. These gene expression data can express gene expression modes under any given conditions. And they also help us make deep research into biological processes in essential.Support vector machines(SVMs) is a new kind of machine learning method based on statistical learning theory, which has many advantages. SVMs solve small-sample problems by using structural risk minimization(SRM) to take the place of empirical risk minimization(ERM).Moreover, nonlinear problems are changed into linear ones by using mapping the low dimension original space to high dimension feature space, and employing kernel function, which make the algorithm be realized easily. Because of such advantage, SVMs become a hot spot of machine learning theory, and are applied successfully in many areas. The gene microarray expression data with high dimensionality, few samples and nonlinear characteristics, which is a new challenge for some traditional machine learning methods, their data analysis has become the focus research of biological informatics.Through support vector machine algorithms for gene expression data classification training, SVMs provide a effective way for analysis of gene expression data. This paper focused on support vector machine classification algorithm based on gene expression data, and proposed some improvements to the algorithm according to the existing problems in those algorithms and models. This thesis improves classification using gene expression data method in two aspects: feature selection and SVMs classification algorithm.The gene expression data set is always "few samples, high dimensionality". To solve this problem, this thesis improves the classification accuracy by using feature selection method. We have proposed a new recursive feature elimination method: correlativity-based RFE. This new method searches for the minimum redundancy as well as avoids deleting the genes that most dominate the target phenotypes by calculating correlativity between genes. Higher classification accuracy is achieved by using the new feature selection approach, and the feature selection process costs less time. We make some appropriate improvements of sequential minimaloptimization(SMO) algorithm to improve the classification accuracy and training speed according to the analysis of the traditional algorithm. The algorithm used radial base kernel function, optimize support vector machine classification performance by adjusting parameters. Experiments results show that the new algorithm can improve the classification accuracy than the traditional algorithm.
Keywords/Search Tags:Bioinformatics, Gene Expression Data, Statistical Learning Theory, Support Vector Machine, Feature Selection
PDF Full Text Request
Related items