Font Size: a A A

Analysis of 4C-Seq Data to Identify Chromatin Interactions from Regulatory Elements and Transposon

Posted on:2018-11-23Degree:Ph.DType:Dissertation
University:New York UniversityCandidate:Raviram, RamyaFull Text:PDF
GTID:1470390020956724Subject:Bioinformatics
Abstract/Summary:
4C-Seq has proven to be a powerful technique to identify genome-wide chromatin interactions with a single locus of interest (or "bait"), such as those between enhancers and promoters that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome, which is highest near the bait and lower in far-cis and trans . Resolution of 4C-Seq data is also influenced by the frequency at which the primary restriction enzyme can cut. Current methods of 4C-Seq analysis, do not comprehensively analyze 4C signals at different length scales and some fail to analyze data generated using a more frequent cutter. To address these issue we developed 4C-ker, a Hidden-Markov Model based pipeline that identifies regions that interact with the 4C bait locus throughout the genome and performs differential analysis across conditions. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes. As an extension of this work, we adapted 4C-Seq to identify interactions from transposable elements (TEs), which comprise almost 50% of mammalian genomes. These elements contain regulatory elements that can be bound by transcription factors and recent studies have suggested that they can influence the expression of nearby genes. However, it is difficult to identify these targets without knowing which genes they are in contact with. Moreover, the repetitive nature of these elements has made them difficult to analyze with high throughput sequencing data since the majority of reads cannot be uniquely mapped to a particular integration site. Here we have exploited the repetitive nature of transposons and designed 4C 'baits' on the consensus sequence of a particular transposon to capture uniquely mapped interactions that occur with each integration site in the genome. Our approach, which we call 4Tran, also enables us to identify new sites of transposition and we have used it to identify the differences in transposon integration events between mouse strains using baits on ETnERV and MuLV repeats across the genome. In addition our approach allows for the identification of target genes that could potentially be controlled by a TE. Thus 4Tran provides a tool for probing the potential role of transposons as regulatory elements that impact gene expression in healthy and diseased states.
Keywords/Search Tags:4c-seq, Regulatory elements, Identify, Interactions, Genome
Related items