Font Size: a A A

The Word Frequency And Palindromic Of Nucleotide Sequence In Some Genomes

Posted on:2007-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:P F LiFull Text:PDF
GTID:2120360185482063Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
With the implementation of a large scale of genome sequencing project, a great amount of genome sequences of prokaryotic and eukaryotic organisms have been accomplished. It is the most pressing task to analyze these sequences at gene and genome level. DNA's genetic code can be expressed as an alphabetic sequence composed of four letters A, C, G, and T, which represents the four types of nucleotides—adenine, cytosine, guanine, and thymine. This thesis has studied some distribution characteristics of k-word frequency in some genome sequences and the distribution of palindromes in Bacillus subtilis genome sequences.The first part describes the distribution features of k-word frequencies to analyze genome sequences in some prokaryotic and eukaryotic genomes. Considering the difference between word domain and frequency domain, five types of functional of k-word frequency, using Shannon information and Fisher information, are defined. Good linear relations are found existing in each species for four functional deduced from Shannon information. Moreover, these linear relations are basically universal among species studied.In the second part, an important word-palindrome in Bacillus subtilis genome...
Keywords/Search Tags:genome, k-word, frequency, information, palindrome, AT content
PDF Full Text Request
Related items