Font Size: a A A

Research On The 4D Representation Of DNA Sequence And Gene Identification Algorithm

Posted on:2008-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2178360215979827Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The accomplishment of the human genome sequencing projects leads to a need for automatic genome annotation. One of the most important tasks of annotation is to recognize genes in DNA. This paper describes some new approaches for recognizing genes and proposes a new representation of DNA sequence.We propose a dynamic feature choosing algorithm to determine the major features. We describe nucleotide sequence by feature vector and use Discriminant analysis to them to make decision on coding/non-coding.After analyzing the common DNA representation, we propose a new 4D representation of DNA sequence, which has the virtue of containing all the biological meaning of DNA sequence. And there is no loss of information in the transfer of data from a DNA sequence to its mathematical representation. We apply the Fourier Transform to the 4D representation, and analyze the spectrum of the coding and non-coding to identify genes.It is well known that the coding sequences have the feature of 3-period, which can be rarely observed in the non-coding sequences. Based on this difference of coding and non-coding, we propose a common frequency coefficient, which describes the correlation of signal sequences. This coefficient is used to identify the genes.Additionally, we construct a Database for training and testing use, which contains all the essential genes in S.cerevisiae genomes that are currently available. The results suggest that the gene identification based on dynamic feature algorithm has accuracy of above 98%; The results of the identification using the common frequency coefficient based on 4D representation method show superior performance.
Keywords/Search Tags:Gene identification, DNA, Coding, Non-coding, Accuracy
PDF Full Text Request
Related items