Font Size: a A A

Relationships Of 8-MER Usage Separation In Genomic Sequences With Different Sequence Construction And Species Evolution

Posted on:2016-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:P MaFull Text:PDF
GTID:2180330461982231Subject:Physics
Abstract/Summary:PDF Full Text Request
The study of k-mer distribution in genomic sequence has been much attention. Researchers have proposed various probability models and parameters in k-mer distribution.The main focus is to study rare extreme or preferences k-mer biological functions. Some work to study the distribution of k-mer whole genome sequences. From A bacteriophage species and lowly bacteria to higher organisms humans. Found that lower organisms k-mer (k> 6) are unimodal distribution, and higher organisms (four-legged mammals) k-mer (k> 6) showed a trimodal distribution. But the real essence of generating triplet or singlet phenomenon remains unclear. We will be based on five eukaryotes’8-mer separation to research on the sequences characterized 8-mer and 8-mer-apart relation to reveal the evolutionary relationship between the structure and species composition between different sequences.Selected the human, mouse, zebrafish, C. elegans and yeast genome sequence as five biological sample then extracted between intergenic sequences, introns and coding sequences of each genome. That is to statistics for each type of 8-mer frequency sequence to give 8-mer relative frequency distribution We found that zebrafish, C. elegans and yeast are unimodal 8-mer distribution of various types of sequences. Human and mouse gene are trimodal in intergenic sequences and intron, but their coding sequence is unimodal distribution.It showed that the 8-mer frequency of human and mouse in intergenic sequence and intron sequence have a clear separation. We speculate that the zebrafish, C. elegans, yeast are like the coding sequences of the human genomes and mouse, there are still 8-mer separation but different degree of separation.In order to explore the 8-mer separation of various types of sequences, revealing the root causes trimodal and unimodal distribution, and the whole collections according to the number of 8-mer containing CG dinucleotide into OCG, 1CG,2CG three motif subsets (called CG classification), and give each subset of the relative frequency distribution of 8-mer. Find OCG, 1CG and 2CG subset presented entirely independent unimodal distribution, and the distribution of the three peaks between human and mouse are correspond exactly to three subsets in intergenic sequences and intron sequences.However, the sequence appears unimodal distribution is actually superimposed on CG classification of three unimodal distribution. Since the three single peak from the CG classify are closer to each-other that all phenomena resulting 8-mer unimodal, distribution. Between human and mouse the intergenic sequences and intron sequences are due to three single-peak distance CG classification more distant caused. This is the phenomenon’s essence of the triplet or the singlet. It was also found in the other 15 kinds of XY dinucleotide classification, OXY,1XY and 2XY subset did not show a completely independent unimodal distribution, therefore the motif subset of CG classification reflects the structure and sequence of the various types of genomic sequences evolutionary patterns.After the standardized length of the sequence was compared with that:OCG distribution center in accordance with the center of random sequence distribution, 2CG and 1CG frequency is much smaller than the OCG frequency. This indicates that 2CG and 1CG motif is directed evolution, OCG motif is random evolution. The most probable distribution of 1CG motif in different species and a variety of sequences is significantly greater than OCG, show directed evolution is a CG dinucleotide began as the center. The more advanced levels of biological evolution, the greater distance between the centers of OCG sequence, 1CG and 2CGsequence distribution. For the same species, the distribution of distance between the centers of the three phantom intergenic sequences and intron sequences have no significant difference. Three motif distribution center between intergenic sequences and intron sequences have a significantly increased with the evolution of species, and three distribution centers of coding sequence motif distance increases slowly with the evolution of species. Evolution shows that the coding sequence is conservative, the evolution of species mainly in the non-coding sequences.In summary said above, DNA sequence composed by three 8-mer motif of OCG, 1CG and 2CG. OCG phantom follow a random laws of evolution, 1CG and 2CG motif directed evolution follow the law, and directed evolution is a CG dinucleotide began as the center. Differences three phantom constituents is between intergenic sequences, introns and coding sequences. Differences in genome sequence evolution is mainly reflected in the non-coding sequences, or is 1CG and 2CG directed evolution and OCG result of random evolution. Frequency distribution of three motif distance is the root cause 8-mer frequency unimodal or triplet phenomena. This study further clarify the law for the evolution of all types of structural and genomic sequences is of great significance.
Keywords/Search Tags:intergenic sequence, intron coding sequence, 8-mer frequency, unimodal distribution, trimodal distribution
PDF Full Text Request
Related items