Font Size: a A A

Identification of gene regulatory networks with computational comparative genomics

Posted on:2007-12-07Degree:Ph.DType:Thesis
University:Washington University in St. LouisCandidate:Wang, TingFull Text:PDF
GTID:2440390005963963Subject:Biology
Abstract/Summary:
Regulation of gene transcription involves a complex molecular network. Expression of genes is controlled primarily by transcription factors (TFs) binding to specific short DNA sequence motifs in the cis-regulatory region of these genes, leading to activation or repression of transcription in response to changes in the environment, as well as during development. Discovery of regulatory motifs is one of the fundamental problems in computational biology. Identification of all TF binding sites will provide the information necessary to eventually construct models for global networks of transcriptional regulation. In recent years, the rapid accumulation of complete genome sequences and the advance of high-throughput expression profiling technology are changing the ways that we look at genomic sequences and redefining the type of problems a motif discovery algorithm can tackle. This thesis describes the development of a new generation of motif finding algorithms and their applications in deciphering regulatory networks.; PhyloCon (Phylogenetic-Consensus) was developed to take into account both conservation among orthologous genes and co-regulation of genes within a species. It aligns orthologous promoters and builds profiles of conserved regions, then compares profiles of different orthologous groups to identify common motifs. An ALLR (A&barbelow;verage L&barbelow;og L&barbelow;ikelihood R&barbelow;atio) statistic was developed for profile comparison. PhyloNet (Phylo genetic-Network) was developed for identifying conserved regulatory motifs of an organism directly from genome sequences of several related species without reliance on additional information. It first constructs phylogenetic profiles for each promoter, then use a BLAST-like algorithm to search through the profile space of all promoters in the genome to identify conserved motifs and the promoters that contain them. A modified Karlin-Altschul statistic was developed to estimate the statistical significance of a profile alignment. As a complementary approach, a scoring system was developed to examine the significance of motifs or motifs clusters for selected genes.; These algorithms were applied to a multitude of datasets ranging from bacterial, yeast to mammalian systems. Binding specificities of many transcription factors of different organisms were discovered or refined, and an improved regulatory map of Saccharomyces cerevisiae was generated. A special class of sequence elements, Multispecies Conserved Element Cluster, was identified in the human genome. These results demonstrate that algorithms described by this thesis are powerful tools in study gene regulation and comparative genomic research.
Keywords/Search Tags:Gene, Regulatory, Networks, Transcription
Related items