Font Size: a A A

The Classification Of DNA Sequence Based On Support Vector Machine

Posted on:2011-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhaoFull Text:PDF
GTID:2178330332974116Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the achievement of human genome project and beginning of post-genome projects, a great deal of Biological Data has become available. These make scientists need to analysis large amount of data. How to make use of these data to reveals the meaning of these data. That is a tough challenge for Scientists. Disposal method of DNA sequence is to first find a mathematical model to express DNA, and then using other techniques analyzed DNA.Support Vector Machine (SVM) is a new machine learning method based on Statistical Learning Theory (SLT), it is a pattern recognition technique, equivalent to a pattern classifier. The training algorithm is essentially a problem of solving quadratic programming. It performance many unique advantages in solving the small sample,nonlinear and high dimensional pattern recognition, SVM has successful application in many fields such as text classification, image recognition and biological information processing speech recognition,remote sensing image analysis,fault identification and prediction,time series forecasting, information security and so on.In this paper, SVM algorithm classifies DNA sequences. In order to provide the data format for SVM algorithm. First, DNA sequences are expressed as a mathematical model, the data represented in the form of eigenvector. So this article using the content of a single base of DNA sequence and the length of DNA sequence and frequency of occurrence of frequent pattern of DNA sequence by Sliding window method, extracted DNA sequence features. DNA sequences are represented as eigenvector. SVM classifier use the known types of DNA samples to train and get hyperplane. Using this hyperplane classify DNA sequence. The assoeted results show that the method has good assoeted accuracy.In this article, Using matlab achieve SVM arithmetic. Two assoeted data using the data of the ninth literature.DNA sequences of known are trained to obtain classifier. According to the accuracy of classification to select whether normalize the data and reduce the dimension of the data and other operations and so on. Form the best hyperplane.then classify the remaining 20 DNA sequences of human and 182 DNA Sequences of nature.At last, multi-class classification of DNA sequence by multi-class assoeted theory. DNA data is available in the database UCI, it has been represented a eigenvector.There are categories in both Training sets and test sets. Algorithm implementation also uses matlab.Two results show that SVM assoeted algorithm has the advantages of a simple classification and a higher assoeted accuracy. It also show that extraction of the attributes of DNA sequences is very effective.It can be applied to classify the DNA sequences in practice.
Keywords/Search Tags:SVM, Classification of DNA Sequence, feature vector, Classification Hyperplane
PDF Full Text Request
Related items