Font Size: a A A

Computational identification of discriminative sequence motifs with dynamic search spaces

Posted on:2013-02-04Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Karnik, RahulFull Text:PDF
GTID:1452390008473835Subject:Biology
Abstract/Summary:
Regulatory regions in mammalian genomes play important roles both in development and in the maintenance of cellular homestasis. Mutations in these regulatory regions are implicated in several disease phenotypes. Understanding the precise role of these regions requires detailed maps of where regulatory proteins bind to DNA. Experimentally determined genome-wide maps of protein binding are available at fairly coarse resolution, but cannot pinpoint the exact locations in the DNA where the proteins bind. Computational methods can identify the specific putative binding locations within the broader loci and build a model of the DNA sequences to which the protein binds. Yet state-of-the-art computational approches to identify specific DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called INSPECTOR, designed to find specific or predictive motifs, in contrast to over-represented sequence elements. Key distinguishing features of this algorithm are that it uses a dynamic search space to find discriminative motifs and that it models binding motifs using a full PWM (position weight matrix) rather than k-mers or regular expressions. We demonstrate that INSPECTOR finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, but that motifs found by INSPECTOR classify the ChIP-seq signals better than motifs from existing algorithms. We also show that I NSPECTOR outperforms a technology-specific algorithm in finding predictive motifs from protein-binding microarray (PBM) datasets. Finally we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters and find novel predictive motifs.
Keywords/Search Tags:Motifs, Dynamic, Computational, DNA
Related items