Font Size: a A A

Computational analyses of microbial genomes: Operons, protein families and lateral gene transfer

Posted on:2006-05-25Degree:Ph.DType:Dissertation
University:University of Maryland, College ParkCandidate:Yan, YongpanFull Text:PDF
GTID:1450390008961598Subject:Biology
Abstract/Summary:
As a result of recent successes in genome scale studies, especially genome sequencing, large amounts of new biological data are now available. This naturally challenges the computational world to develop more powerful and precise analysis tools. In this work, three computational studies have been conducted, utilizing complete microbial genome sequences: the detection of operons, the composition of protein families, and the detection of the lateral gene transfer events.; In the first study, two computational methods, termed the Gene Neighbor Method (GNM) and the Gene Gap Method (GGM), were developed for the detection of operons in microbial genomes. GNM utilizes the relatively high conservation of order of genes in operons, compared with genes in general. GGM makes use of the relatively short gap between genes in operons compared with that otherwise found between adjacent genes. The two methods were benchmarked using biological pathway data and documented operon data. Operons were predicted for 42 microbial genomes. The predictions are used to infer possible functions for some hypothetical genes in prokaryotic genomes and have proven a useful adjunct to structure information in deriving protein function in our structural genomics project.; In the second study, we have developed an automated clustering procedure to classify protein sequences in a set of microbial genomes into protein families. Benchmarking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. The aim of constructing this comprehensive protein family set is to address several questions key to structural genomics. First, our study indicates that approximately 20% of known families with three or more members currently have a representative structure. Second, the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes are sequenced. However, the vast majority of these families will be small. Third, it will be possible to obtain structural templates for 70--80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.; The third study is the detection of lateral gene transfer event in microbial genomes. Two new high throughput methods have been developed, and applied to a set of 66 fully sequenced genomes. (Abstract shortened by UMI.)...
Keywords/Search Tags:Genomes, Protein families, Lateral gene, Operons, Computational
Related items