Font Size: a A A

Genome-wide Identificaion Of Cell Specific TCF7L2Target Genes And Regulatory Network

Posted on:2014-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:R WangFull Text:PDF
GTID:1220330398469650Subject:Chemical informatics
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary field of biology, computer and mathematics. With the various organizations in the world pay attention to biology, more and more biological information is stored in the form of electronic data, including relational databases, text documents, and other storage methods. How to effectively use these rich biological information resources is a challenge, but also has a very important significance.In Chapter1, we first review the he six research progress aspects related to gene transcription regulation. Then, we introduce ChIP-Seq which combined ChIP with second-generation sequencing technology technology. We summarize several common repetitive DNA sequences in the genome and discusss their possible mechanism, distribution and biological function. Finally we introduce background knowledge of TCF7L2, and its participation in the Wnt signaling pathway, and summarize recent findings demonstrating the assciation between TCF7L2and the risk of type2diabetes and cancer.The TCF7L2transcription factor is linked to a variety of human diseases such as Type2diabetes and cancer. In Chapter2, we performed ChIP-seq for TCF7L2in6human cell lines. We identified116,000non-redundant TCF7L2binding sites, with only1,864sites common to the6cell lines. Using ChIP-seq we showed that many genomic regions that are marked by both H3K4mel and H3K27Ac are also bound by TCF7L2, suggesting that TCF7L2plays a critical role in enhancer activity. Bioinformatic analysis of the cell type-specific TCF7L2binding sites revealed enrichment for multiple transcription factors including HNF4a and FOXA2motifs in HepG2cells and the GATA3motif in MCF7cells. ChIP-seq analysis revealed that TCF7L2co-localizes with HNF4a and FOXA2in HepG2cells and with GATA3in MCF7cells. Interestingly, in MCF7cells the TCF7L2motif is enriched in most TCF7L2sites but is not enriched in the sites bound by both GATA3and TCF7L2. This analysis suggested that GATA3might tether TCF7L2to the genome at these sites. To test this hypothesis, we depleted GATA3in MCF7cells using siRNAs and showed that TCF7L2binding was lost at a subset of sites. RNA-seq analysis suggests that TCF7L2represses transcription when tethered to the genome via GATA3. Our studies demonstrate a novel relationship between GATA3and TCF7L2and reveal important insights into TCF7L2-mediated gene regulation.Identifying gene regulatory network from Chip-Seq data has attracted more and more attentions. Although genome-wide study have identified thousands of TCF7L2binding sites and have revealed the associated transcription factor (TF) partners, such as GATA3in MCF7cells, little is known about TCF7L2associated hierarchical transcriptional regulatory networks. We have identified30119TCF7L2binding sites in MCF7cell. In Chapter3, we applied computational approaches to analyze this ChIP-seq data and to investigate the hierarchical regulatory network for TCF7L2and TCF7L2partner TFs regulation in breast cancer MCF7cells. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM). We found that FOXO1, CAD, GATA3were involved in the up-regulated genes and AP2a, PBF, GATA3, API were found in the down-regulated genes. Our study uncovers new TCF7L2associated regulatory networks by mining three ChIP-seq data in MCF7cells. Our computational analytical approach may guide biologists to further study the underlying mechanisms in breast cancer cells or other human diseases.Despite the large number of computational tools that have been developed to analyze ChIP-seq, one big limitation is that most of the existing tools ignore non-unique matched tags (NUTs), including multiple matched tags (MMTs) and no matched tags (NMTs), and merely focus on unique matched tags (UMTs). However, NUTs comprise up to60%of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth of each sample and allow a more accurate detection of enriched binding sites and target genes, which in turn could lead to more precise and significant biological interpretations. In Chapter4, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), which can improve the detection of enriched regions from ChIP-seq. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using UMTs and the enrichment score for those peaks. Using this analysis, each NUT is assigned to a unique location on the reference genome. Then, the newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on17different datasets representing third different characteristics of biological data types. The detected enriched regions were validated using de novo motif discovery, and ChIP-qPCR. We demonstrate the sufficiency, specificity and accuracy of LONUT and show that our program not only improves the detection of enriched regions (binding sites for ChIP-seq), but also identifies additional enriched regions from the sequencing data.
Keywords/Search Tags:transcription factor, ChIP-seq, transcriptional regulation, TCF7L2, regulatory network, unique matched tags, non-unique matched tags, repeats regions
PDF Full Text Request
Related items