Font Size: a A A

Integrative Identification And Annotation Of DNA Elements In The Human Genome

Posted on:2016-11-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:H B ChenFull Text:PDF
GTID:1220330461496600Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
With the completion of the Human Genome Project in 2003, we obtained the human genome sequence, but there are still many problems to be solved. An important scientific issue plaguing us, which is how the complex regulatory networks in vivo encode on a one-dimensional genome? To understand the molecular mechanisms that underlie global transcriptional regulation, it is essential to identify all the transcriptional regulatory elements in the genome. So, the key to solve this problem is to accurately identify and annotate the important functional DNA elements in the human genome. Unfortunately, traditional procedure known as ChIP is only applicable to known(previously characterized) trans-acting factors and is limited by its requirement for a high quality ChIP-grade antibody to isolate the transcription factor(TF) to be analyzed.Recent years, large scientific projects such as ENCODE, modENCODE and ROADMAP epigenomics provide nearly 700 T public data related to the identification and annotation of DNA elements, with the booming of next-generation sequencing and the development of bioinformatics, making it possibe for a comprehensive analysis of functional DNA elements in the human genome. Based on these data resources, we carried out the research on identification and annotation of the functional elements of in human genome.Firstly, our research started from a single representative of the DNA elements. Chromatin insulators are DNA elements that regulate the level of gene expression either by preventing gene silencing through the maintenance of heterochromatin boundaries or by preventing gene activation by blocking interactions between enhancers and promoters. CCCTC-binding factor(CTCF), a ubiquitously expressed 11-zinc-finger DNA-binding protein, is the only protein implicated in the establishment of insulators in vertebrates. While CTCF has been implicated in diverse regulatory functions, CTCF has only been studied in a limited number of cell types across human genome. Thus, it is not clear whether the identified cell type-specific differences in CTCF-binding sites are functionally significant. Here, we identify and characterize cell type-specific and ubiquitous CTCF-binding sites in the human genome across 38 cell types designated by the Encyclopedia of DNA Elements(ENCODE) consortium. These cell type-specific and ubiquitous CTCFbinding sites show uniquely versatile transcriptional functions and characteristic chromatin features. In addition, we confirm the insulator barrier function of CTCF-binding and explore the novel function of CTCF in DNA replication. These results represent a critical step toward the comprehensive and systematic understanding of CTCF-dependent insulators and their versatile roles in the human genome.Next, we studied the open chromatin of the whole genome. Whole-genome mapping of DHSs sites has provided crucial clues to regions of transcriptional regulation. We performed a genomewide meta-analysis of DNaseI HS sites identified in 29 different cell types. We sought to determine the relationship between DNaseI HS, histone modifications and gene expression. We found that specific correlations exist between DNaseI HS, gene expression and the amounts of active and repressive histone modifications across different cell types. These correlations displayed four distinct modes(repressive, active, bivalent and primed), reflecting different functions of the chromatin domains. Furthermore, CCCTC binding factor(CTCF) binding sites were newly identified based on these integrative data. Our findings revealed a situation of complex regulation of gene expression mediated by DNaseI hypersensitive chromatin regions and their histone modifications.Thirdly, expanded to identify a large number DNA elements. It’s very difficult to obtain numerous transcription factor binding sites, and almost impossible to identify all the DNA elements using traditional methods. Fortunately, transcription factor binding sites have some specificities, which can be identified. Transcription Factor Databases such as TRANSFAC, JASPAR, TRRD, TRED and PAZAR provide a wealth of information on transcription factor motifs. Based on these resources, we developed iFORM. Compared with FIMO, CONSENSUS, HOMER, RSAT and STORM, iFORM can identify the exact sites no matter the others could or not. What’s more, iFORM is much better than the others when using ROC. IFORM lays a solid cornerstone for the comprehensive analysis of DNA elements in the human genome.The forth, integrated analysis of many DNA elements in different cell lines. DHSs define the accessible chromatin landscape and have revolutionised the discovery of distinct cis-regulatory elements in diverse organisms. Here, we report the first comprehensive map of human transcription factor binding site(TFBS)-clustered regions using Gaussian kernel density estimation based on genome-wide mapping of the TFBSs in 133 human cell and tissue types. Approximately 1.6 million distinct TFBS-clustered regions, collectively spanning 27.7% of the human genome, were discovered. The TFBS complexity assigned to each TFBS-clustered region was highly correlated with genomic location, cell selectivity, evolutionary conservation, sequence features, and functional roles. An integrative analysis of these regions using ENCODE data revealed transcription factor occupancy, transcriptional activity, histone modification, DNA methylation, and chromatin structures that varied based on TFBS complexity. Furthermore, we found that we could recreate lineage-branching relationships by simple clustering of the TFBS-clustered regions from terminally differentiated cells. Based on these findings, a model of transcriptional regulation determined by TFBS complexity is proposed.Finally, we constructed and analysed the transcription factor regulatory networks. Transcription factors regulate gene expression upstream of the gene, the products of gene expression are transcription factors, and can regulate gene, which constitutes a transcription factor regulatory network. We produced high-quality genome-wide maps of the binding sites for 542 transcription factors in 133 human cell and tissue types using iFORM, added the gene location from GENECODE, we constructed transcription factor regulatory networks for the 133 cell lines. When comparing the networks of different cell lines, we found that all the networks have the same motif pattern. Furthermore, we found out the motif instances which can represent lineages, the FFL motif formed with POU5F1, SOX2 and NANOG is unique in Embryonic stem cells.
Keywords/Search Tags:CTCF, DHSs, iFORM, TFBS-clustered regions, TFBS network
PDF Full Text Request
Related items