Font Size: a A A

Analysis On K-MER Frequencies Of Genome Sequences And Theoretiol Prediction And Validation Of Nucleosome Bounding Motifs

Posted on:2017-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:M D W NiFull Text:PDF
GTID:1220330485966596Subject:Physics
Abstract/Summary:PDF Full Text Request
We knew that the 8-mer frequency appeared on genome sequence are different. Based on this phenomenon, the differences of k-mer (k≤8) usage between nucleosome core sequence and linker sequences in yeast genome were analyzed. Moreover, the three peak distribution of 8-mer frequency of appearances with their frequencies was obtained in the intergenic sequences of human DNA sequences. Based on the characters of XY dinucleotide classifications of 8-mer set, we gave a theoretical prediction of nucleosome bounding motif set and compared the motif set with the nucleosome occupation rates which were gotten by experiments. The concrete content of our study are as follows:Based on the nucleosome positioning map of yeast genome with single-base precision proposed by Brogaard, nucleosome core sequences and nucleosome linker sequences were extracted from the 16 chromosomes of yeast. The relative frequency (RF) of k-mer (k=4,5,6, and 8) were calculated and the difference of k-mer usage between the two kinds of sequences were analyzed. We found that few of the optimal k-mer usage have obvious differences. The logarithmic ratio of relative frequency (LRF) of k-mer between the two kinds of sequences was introduced. According to the increase order of LRF values to reorder k-mer set, results show that the difference of RF usage are becoming remarkable with the k value increasing. When k≥8 the LRF distributions are stabilizing and closing to symmetrical distribution. When the k-mer set are reordered by RF values, the obvious RF difference of 8-mers happened in the region of RF<0.5. Moreover, the G+C contents and dinucleotide contents of the 8-mers favored by the core sequence and the linker sequence were also calculated in the seven sample ranges. Results show that the G+C contents of the two kinds of sequences are increasing with the RF decreasing, the core sequences are more preference CG and GC dinucleotide and the linker sequences are more preference GG and CC dinucleotide. In a word, besides few of optimal k-mers, the major differences of k-mer usage between the two kinds of sequences are appeared in lower RF k-mers, these k-mers have higher G+C content.Theoretical prediction of nucleosome bounding motifs is very important to study the nucleosome positioning and remodeling and to understand deeply the constitution and evolution of genome sequences. Based on the intergenic sequences of human chromosome 1, the distribution of 8-mers frequency of appearances with their frequencies was obtained, the distribution have three peaks. According to the CG dinucleotide content including in 8-mers, the 8-mer motifs were classified into three subsets. We found that the three 8-mer subsets (OCG, ICG and 2CG) form independent and mono-peak distributions respectively. But the other 15 dinucleotide classifications do not show the phenomena. The three peak distribution of 8-mer set is just the overlying of the three CG sub sets. We analyzed the distinctive property of 8-mer usage in DNA sequences and the experimental conclusion of nucleosome positioning, we proposed a hypothesis that the ICG motif set should be the nucleosome bounding motifs. In order to verify our guess, the relative frequencies of optimal and rare tri-nucleotides were extracted from ICG 8-mer sets and constructed two characters to describe the nucleosome bounding signals called Ktri(O)and Ktri(R). The distributions of the two characters on the transcriptional start site (TSS) sequences were obtained and the liner regressions were done between Ktri(O) or Ktri(R) values and the nucleosome occupation rates which were gotten by experiments. For the analyzed 1177 TSS sequences, statistical results show that the 89.2% sequences have more than 95 percent confidence level and the 81.6% sequences have more than 99 percent confidence level. Our results verified our theoretical guess that 1CG motif set are nucleosome bounding motifs.
Keywords/Search Tags:human and yeast genomes, nucleosome core sequence and linker sequence, distribution of 8-mer frequency, CG dinucleotide classification, nucleosome characteristic quantity, nucleosome bounding motifs, theoretical prediction
PDF Full Text Request
Related items