Font Size: a A A

Enhancing comparative genomics of transcriptional regulatory networks through data collection, transfer and integration

Posted on:2017-12-11Degree:Ph.DType:Dissertation
University:University of Maryland, Baltimore CountyCandidate:Kilic, SefaFull Text:PDF
GTID:1460390014970880Subject:Bioinformatics
Abstract/Summary:
Comparative genomics has proven itself to be an invaluable approach for the characterization of transcriptional regulatory networks in Bacteria and the evolutionary analysis of transcriptional regulatory elements: transcription factors, their binding motifs and regulons they control. The growing influx of high-throughput experimental data, however, introduces challenges for each step of the comparative genomics pipeline: the collection of transcription factor binding site data, the transfer of available information on the regulatory network to the species under analysis, and the integration of binding site search results from multiple genomes across orthologs. This dissertation addresses issues on each step of the workflow and describes a platform for the analysis of transcriptional regulatory networks in the Bacteria domain. First, CollecTF, a transcription factor binding site database across the Bacteria domain, was developed to compile experimentally-validated transcription factor-binding sites through manual curation. CollecTF provides fully customizable access to high-quality curated data and integrates it with major biological resources such as RefSeq, UniProtKB and the Gene Ontology Consortium. Secondly, different methods for transferring known information from reference to target species were systematically evaluated for the first time using a large catalog of known binding sites in Bacteria. Methods assuming conservation of the binding were shown to outperform those assuming conservation of regulon composition. Lastly, a complete comparative genomics platform (CGB) was built for the analysis of transcriptional regulation on any annotated bacterial genome. It combines binding evidence from multiple sources using phylogeny, reports the probability of TF-regulation for each gene through a Bayesian framework, and performs formal ancestral state reconstruction for each group of orthologous genes across the species under analysis to reconstruct the evolutionary history of TF-regulation of the gene. CGB was benchmarked by replicating a comparative genomics analysis of LexA regulation in Gram-positive Bacteria, and was later used to characterize LexA regulon in Verrucomicrobia, a recently established Gram-negative phylum predominant in many soil bacterial communities.
Keywords/Search Tags:Transcriptional regulatory networks, Comparative genomics, Bacteria, Data
Related items