Font Size: a A A

Computational detection of gene regulatory signals in nucleotide sequences

Posted on:2005-06-16Degree:Ph.DType:Thesis
University:Boston UniversityCandidate:Frith, Martin CorneliusFull Text:PDF
GTID:2458390008997949Subject:Biology
Abstract/Summary:
Genomes are the most ancient and profound of texts, containing the information necessary to build living creatures, including ourselves. This thesis describes computational tools developed for deciphering genomic regulatory signals, focusing primarily on regulation of gene transcription, and secondarily on localization of messenger RNA.; Regulatory signals can be uncovered by gathering nucleotide sequences that share some biological property of interest, and identifying subsequence motifs common to them. Two approaches to discovering motifs in this way are described: an a priori method that searches for an optimal alignment of subsequences, and a technique that compares the sequences to a library of previously identified motifs and assesses which if any of the motifs are statistically overrepresented in the sequences.; Transcription is typically regulated by clusters of signals that are individually weak but collectively strong. Three statistical methods for identifying clusters of predefined motifs in large genomic sequences are developed, which consider both the quality of the motif matches and the tightness of their clustering. In addition, an algorithm for discovering unknown signals that occur multiple times in repeat clusters in one sequence is described.; These methods successfully identify signals previously known to regulate transcription and localization of messenger RNA, and also predict novel regulatory elements, some of which have since been confirmed experimentally. Finally, the challenge of reading the regulatory information in the mammalian genome is met in a concerted fashion: several types of promoter sequence are analyzed for statistically overrepresented motifs, and these motifs are then fed into a cluster-finding algorithm to detect other instances of these promoter types in the genome. This thesis demonstrates that computational methods can greatly accelerate the rate of discovery of regulatory elements in nucleotide sequences. All of the methods described herein have been carefully packaged and placed on the World Wide Web so that they are available to all researchers.
Keywords/Search Tags:Regulatory signals, Sequences, Computational, Nucleotide
Related items