Font Size: a A A

Integrative Approaches for Mining High-Throughput Genomic Data

Posted on:2012-09-04Degree:Ph.DType:Dissertation
University:University of California, IrvineCandidate:Daily, KennethFull Text:PDF
GTID:1450390011956054Subject:Biology
Abstract/Summary:
The study of transcriptional regulation encompasses many fields in molecular, cell, evolutionary, and computational biology. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically discover and map all these elements. High-throughput techniques have made genome-wide assays standard for the analysis of mechanisms of regulation, and the amount of data available for analysis is increasing exponentially. Computational techniques have been developed in tandem to process, synthesize, index, and store these datasets. We describe here results from various levels of the study of transcriptional regulation and the methods developed to facilitate analysis. First, we develop and improve a pipeline (termed MotifMap) for the search, storage, and integration of transcription factor binding sites in the species of multiple model organisms. We employ a phylogenetic footprinting approach to reducing the number of false positive sites reported, and evaluate the performance using high-throughput sequencing datasets for a number of transcription factors. Next, we employ this pipeline in conjunction with high-throughput sequencing data in a study to annotate retrotransposon insertion sites across the yeast genome. Specific elements are observed proximal to these insertion sites, and their identification is aided by the MotifMap pipeline. Lastly, we describe techniques to compress and store high-throughput sequencing data. Our algorithm's performance is comparable to standard compression techniques, while maintaining the ability to use the data for analysis.
Keywords/Search Tags:Data, High-throughput, Techniques
Related items