Font Size: a A A

Computational methods for improving genome annotation

Posted on:2006-01-20Degree:Ph.DType:Dissertation
University:The University of North Carolina at Chapel HillCandidate:Powell, Bradford ColeFull Text:PDF
GTID:1450390008957726Subject:Biology
Abstract/Summary:
The growing availability of genome sequence information has transformed biology. The analysis and annotation of this information is an ongoing process. Because such a large amount of information is involved, computers have been an important part of this process. This dissertation describes three studies in which computational methods are used to interpret different levels of annotation of data derived from genome sequences.; In the first study, a graph-based clustering technique is applied to nucleotide sequence similarity data. Open reading frames from different organisms are grouped according to similarity using a modified analysis of Clusters of Orthologous Groups (COGS). Sequences which are preserved through multiple organisms may have a functional role, and in particular may encode proteins. Examining inconsistent gene predictions within these groups reveals target sequences that may be genes missed by current prediction methods. Simultaneous comparison among several genomes informs upon a fundamental level of genome annotation: where protein-coding genes are located.; The second study involves individual exons of Dscam. In insects, this gene is capable of generating enormous transcript diversity through alternative splicing. Iterated searches were used to find homologous exons, and the evolutionary history of these exons was studied using Bayesian phylogenetic inference. Our results indicate that the different alternatively spliced exon groups differ in their rates of fixed mutations and in the turnover in the variants available for alternative splicing. The functional role of insect Dscam and of its human orthologs remains unclear, but any explanation of this role must take into account the transcript diversity which is so prominent in insects but which has not been seen outside of this lineage.; The third study details several sources of variability in quantitation of genetic transcripts by real-time monitoring of the polymerase chain reaction. This study indicates that these measurements can be misleadingly precise-systematic bias introduced in the analysis may introduce more variability than is seen when examining a standard curve. Understanding the causes of this variability is important for interpreting annotations of gene transcript levels.
Keywords/Search Tags:Annotation, Genome, Methods
Related items