Font Size: a A A

Gene annotation using ab initio protein structure prediction: Method, development and application to major protein families

Posted on:2002-12-21Degree:Ph.DType:Dissertation
University:University of WashingtonCandidate:Bonneau, Richard AuthorFull Text:PDF
GTID:1460390011995347Subject:Chemistry
Abstract/Summary:
This work describes the emergence of a new technique in genome annotation: using ab initio protein structure prediction to glean functional information about open reading frames without links to proteins of known structure and/or function. Ab initio protein structure predictions are made for a large number of proteins and the predictions can then be used in several ways to infer function by finding relationships to previously characterized proteins.; The first part of this work is devoted to improvements made to the structure prediction method, Rosetta. Improvements include but are not limited to: better/integrated use of multiple sequence alignment information when predicting whole gene families, better sampling of complex topologies, improvements in our ability to recognize and remove systematic errors associated with Rosetta, and a better understanding of the clustering procedure. These improvements allowed us to make good predictions for 16 of 21 domains less than 300 residues in length at the fourth critical assessment of structure prediction (CASP4), outperforming the next best method for ab initio structure prediction by a significant margin.; Chapter 5 describes our pilot genomics project: predicting all Pfam-A families within our size range without links to known structure. Pfam is a collection of sequences clustered by homology into ∼2800 domains representing 65–70% of all sequence space. Each of these alignments contains on average 200 members, thus these alignments span major protein families. We generate predictions for the 510 families within our size range with no link to known structure. These models, and the fold links they produce when the models are searched against the protein data bank (PDB), represent possible functional inferences or templates for the interpretation of previously known functional information. Highlights of our blind predictions are given in chapter 5.
Keywords/Search Tags:Ab initio protein structure, Structure prediction, Families, Method
Related items