Font Size: a A A

The Application Of Support Vector Machine(SVM) In DNA Data Analysis Research

Posted on:2016-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiuFull Text:PDF
GTID:2180330470968924Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Statistical learning theory is gradually mature in the 1990 s as a comparatively perfect theory of Machine learning. Compared with the previous Machine learning, Support Vector Machine(Support Vector Machine, SVM) basing on the theory can better solve the problem of small sample learning, have good robustness and low operation cost. Implementing the theory, support vector machine(SVM) algorithm has become an important tool for machine learning and knowledge mining.Bioinformatics is a cross subject combining life science, mathematics, computer science and other disciplines, and the DNA sequence is the typical type of data in bioinformatics. With the opening and smooth completing of human genome project, the development of DNA sequence analysis is powerful promoted. The research of DNA sequence data connotation is one of the most important subject of post genome era. Finding the rule of some characteristic fragment in the DNA sequence of life science and human genetics has very important significance.This paper uses the SVM algorithm for DNA sequence classification experiment. Firstly, by the sliding window method extracting features from the classification of known DNA sequence will feature sequences generated from feature matrix vector as input vector. Then using R language software to achieve a DNA sequence classification process based on support vector machine(SVM).R is used to implement the first call in class package, recycling network search method and 10 fold cross-validation to find the optimal parameters, the range need to loosen if a parameter is given in the scope of the boundary in the optimization process to find the optimal parameters to construct the SVM model. Use a variety of kernel function for classification experiment and finally select the optimal kernel function by statistical analysis. The good effect of classification of the SVM classifier used in the paper can be applied to the actual DNA data classification, and has certain generalization performance. The algorithm can also be extended to the multiple classification problems.
Keywords/Search Tags:support vector machine(SVM) classification, DNA sequence, feature vector, kernel function, structural risk minimization
PDF Full Text Request
Related items