Font Size: a A A

Predicting Nucleosome Binding Motifs And Analyzing Their Distributions Around Functional Sites Of Human Genes

Posted on:2014-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:T L G BaoFull Text:PDF
GTID:1220330398496413Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The complex pattern of dispersed regulation and pervasive transcription of the human genome uncovered by the ENCODE project, together with the abundance of noncoding genes have opened a broader understanding of gene definition and transcriptional regulation mechanisms. Recently, lots of work has been done to explore the putative functional elements in noncoding sequences and their evolution.75-90%of eukaryotic genomic DNA is wrapped in nucleosomes, the basic repeating unit of eukaryotic chromatin. The precise positioning of nucleosomes along the DNA plays important roles in gene transcription, mRNA splicing, DNA replication and DNA repair. Although nucleosome positioning can be influenced by several factors, present nucleosome positioning data showed that nucleosomes have higher affinity for particular DNA sequences and intrinsic DNA sequence preference plays an important role in nucleosome positioning in vivo. Noncoding sequences occupy most of the human genome, thus studying their characters and preferential elements are important to our understanding of the noncoding sequence functions and nucleosome positioning. Many experiments and theoretical analysis pointed out that nucleosomes were not distributed uniformly along the genome, and their distributions showed stereotypical patterns at some particular functional sites. Furthermore, experimental data revealed that three regions of strong interactions between core histones and DNA sequences in a nucleosome. These evidences implied that there was a motif set which were interacted with histone octamers exist on the DNA sequences; we termed them "nucleosome binding motifs". We believe these motifs are main determinants of nucleosome positioning, distribution and remodeling. Transcriptional regulation is a complicated process that involves the interplay of multiple components, as a particular transcription factor, noncoding regulating elements and Nucleosome depleted regions that accessible for binding proteins. Nucleosome positioning and dynamic nucleosome remodeling modulate the accessibility of DNA to protein, suggesting that the distribution of nucleosomes across the genome is a global determinant in gene transcription. Functions of nucleosome in gene transcription regulation and in other cellular process make an investigation of the interaction between core histones and genomic DNA becomes a hot topic in current molecular biology.Based on our research, previous studies of other researchers are limited to the assembling pattern of transcription-related nucleosomes in the gene adjacent regions. And a few studies focus on the interacting motifs of histone and DNA sequences through the whole genome-level. The current experimental data of nucleosome positioning and distribution are scattered and uncompleted. Using the exhaustive method to improve the nucleosome positioning research is unrealistic, and has to rely on the systematic theoretical analysis. It has been confirmed that k-mer frequency distribution of the human genome has multimodal spectra. Although several theoretical investigations about genomic k-mer distribution have been done, these studies focus on the probabilistic model, rare k-mers and over-represented k-mers and regulatory role of k-mer. However, few studies focused on their role in interacted with histone and formation of nucleosome. Thus, it is important to investigate the relationship between k-mer compositions of DNA sequences with nucleosome. On this basis, the paper mainly studies the following aspects:1. Using statistical theory to analyzed k-mer composition and distribution of human intergenic sequences, and predicted all possible DNA motifs that interacted with histone, this motif set contain23,8808-mer. This motif set can classify into two groups, named them as P1-mers and P2-mers respectively and collectively referred to as "nucleosome binding motif set. P1-mers contain2,6328-mer and P2-mers contain21,2488-mer. The analysis revealed that these motifs have higher GC content and flexibility, and their frequencies are significantly high in experimental nucleosome positioning sequences. Combined with the relevant experimental studies with theoretical analysis, we proved the predicted motif set is nucleosome binding motif or closely related to the nucleosome.2. Explored the distribution of the predicted motifs around the functional sites of coding genes and noncoding genes in the human genome. Analyzed functional sites contain transcription start sites, transcriptional termination sites, start codon, stop codon and junction sites of intron and exon. Nucleosome binding motifs showed the specific distribution in the vicinity of these functional sites, their distribution patterns are different from each other, and these distribution patterns are consistent with others’experimental results. This result can provide us a new idea for prediction of different functional sites. We also analyzed the distance distribution of nucleosome binding motifs on the different sequences. We analyzed the intergenic sequences, intron and coding sequences. The average distances of nucleosome binding motifs are decrease in turn for these three types of sequences, indicate that the average density of the nucleosome on these three types of sequences is successively reduced, which is consistent with the results of previous studies. These results further demonstrate the predicted motifs are the nucleosome binding motifs, and hinted nucleosome binding motifs are not only involved in the regulation of gene transcription, their distribution may participate in the distinction and recognition of different types of sequences and different functional sites. Nucleosome binding motif features and strength of preference in the vicinity region of different functional sites indicated that P1-mers probably participate in nucleosome positioning and P2-mers might be closely related with nucleosome remodeling.3. Analyzed the distribution of nucleosome binding motifs near the functional sites of human housekeeping genes and compared the similarities and differences with the Ensemble genes. The distribution pattern of nucleosome binding motifs around the functional sites of housekeeping genes and the Ensemble gene is similar, but their frequency on the housekeeping gene sequence is significantly higher the Ensemble gene. This result indicates that nucleosome positioning and remodeling signal near the functional site of housekeeping gene is stronger than Ensemble genes. In transcription and translation boundary region, nucleosome binding motifs show the highest frequency in the vicinity of the transcription start site. Frequency statistics of nucleosome binding motifs show the reliance of nucleosome binding motifs are variable in different functional sites. Distribution of nucleosome binding motifs on the single housekeeping gene indicate that their distribution of the DNA sequences is scattered and convenient to dominate nucleosome positioning and remodeling; density of nucleosome binding motif is variable for the different nucleosome unit.4. Nucleosome binding motifs show bias around different functional sites. Comparison of relative preference of nucleosome binding motifs shows that the mode number of preferred nucleosome motifs in±500bp region around the transcription start sites, transcription termination sites, start codon and stop codon is different. Transcription start sites region cotaining the maximum preferred nucleosome binding motifs. Comparison between transcription start sites and start codon show that mode number of their common preferred motif is2,489; the model number of their peculiar preferred motif is257、847. Comparison between transcription termination sites and stop codon show that the mode number of their common preferred motif is1,071; the mode number of their peculiar preferred motif is52、1,371. Not only mode numbers of preferred nucleosome binding motifs are different between these functional sites, the specific pattern of these motifs is also different. Discrimination of relative preference of nucleosome binding motif indicates that nucleosome positioning and remodeling pattern are different for each functional site. Correlation analysis of nucleosome binding motifs shows that the frequency of P1-mers and P2-mers around the functional sites shows significant positive correlation, suggesting the more nucleosome remodeling signal may also exist near the strongly positioned nucleosomes.
Keywords/Search Tags:8-mer distribution, Theoretical prediction, Nucleosome binding motif, Motiffeature, Functional site, Preferred motif
PDF Full Text Request
Related items