Font Size: a A A

The Application Of Graphical Representation In Gene Recognition Of DNA Sequences

Posted on:2011-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178360308469344Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently, the number of DNA sequences and bases in Nucleotide Sequence Database is increasing exponentially with the development of Human Gene Groups (HGP).It is necessary to develop analysis and computation methods of genome data. Gene identification is an important and basal part of bioinformatics.Recognition of coding/noncoding sequences is the first important task of gene identification.In genomic research it's a decisive step.Splicing is an important process of gene expressing.Many diseases would be caused if the splicing process has a mistake.So, recognition of splice sites is another important subject.In this paper we focus on these two problems,and propose algorithms of recognizing coding/non-conding sequences and acceptor sites.In this paper, we present a novel feature representation of DNA sequences based on the graphical representation called PZ-curve.Support vector machine(SVM)is applied to classify the coding/un-coding sequence in short human genes.In the process of identifying, we propose an improved self-similar map method to avoid the lack of negative samples sequence.According to the GC content we divide the dataset into several groups and identify these sequences respectively. The results show that the proposed method obtains a higher accuracy with fewer parameters.Methods of gene splice site recognition usually based on statistics.Gene graphical representation is applied in this paper to identify acceptor sites in human genes.For each sequences to be recognized,we use the PZ-curve to extract feathers from the whole sequences and the subsequences around acceptor sites,and calculate the differences between extrons and introns.These feathers consider the frequencies of phase-independent of mononucleotides,di-nucleotides and tri-nucleotides.SVM is applied as classfier. The results show that our method is feasible.And our method gets an accuary almost equal with existing splice site recognition method, while our method is easy to understand and calculate.
Keywords/Search Tags:DNA, Graphical representation, Gene recognition, coding/non-coding sequence, splice site, acceptor site, support vector machine
PDF Full Text Request
Related items