Font Size: a A A

An assessment of genome annotation tools and an approach to solving a set of problems from a genome sequencing project

Posted on:2002-01-25Degree:Ph.DType:Thesis
University:University of California, BerkeleyCandidate:Hartzell, George W., IIIFull Text:PDF
GTID:2460390011997164Subject:Computer Science
Abstract/Summary:
This thesis discusses my work on two projects at the Berkeley Drosophila Genome Project (BDGP). Both projects contributed to one of the BDGP's principal goals, determining and annotating the sequence of the Drosophila melanogaster genome.; The first part describes the Genome Annotation and Assessment Project (GASP1). GASP1 was a collaborative effort by members of the BDGP and the computational biology community with the principal goal of evaluating the state of the art in computational genome annotation tools and techniques. GASP1 highlighted the strengths and weaknesses of the state of the art for automatically annotating large quantities of genomic sequence data, extended the state of the art for comparing gene predictions by using measures that quantified inter-gene assembly mistakes, and produced a unique and interesting data set and performance standard that a variety of groups have used to improve their annotation tools. My work, described in Part I, played a key role in all three of these contributions.; The second part describes a flexible approach to solving a class of problems that arise in the final stages of large scale sequencing projects. The approach is illustrated with three problems from the Drosophila melanogaster sequencing project. The solutions to all three problems take advantage of the particular error characteristics of nearly complete genomic sequence to efficiently produce high quality alignments between portions of large DNA sequences. The first section reviews some pairwise alignment techniques, finishing with the description of an algorithm for a bounded global alignment problem that has very small memory requirements and reasonable time requirements. The second section reviews the technology surrounding a particular kind of local alignment, a maximal segment pair (MSP). The final three sections use a novel combination of these techniques to produce useful solutions to several real problems. The solutions share the common strategy of using an MSP search heuristic to quickly locate an interesting feature (e.g. the region of overlap between a pair of large contigs) and then using that information to drive the bounded global alignment algorithm and rapidly produce a high quality result.
Keywords/Search Tags:Genome, Annotation tools, Project, Approach, Sequencing, Alignment
Related items