Font Size: a A A

Toward an automated strategy for superfamily analysis and characterization of proteins

Posted on:2006-10-19Degree:Ph.DType:Thesis
University:University of California, San FranciscoCandidate:Brown, ShoshanaFull Text:PDF
GTID:2458390008468654Subject:Chemistry
Abstract/Summary:
Computational methods for protein function prediction are required to bridge the gap between the number of sequenced genes and the number of experimentally characterized proteins. In this thesis I present a new framework for the organization of protein sequence, structure, and functional information that facilitates computational function prediction.; In Chapter II present a semi-automated method for the collection and functional annotation of human transporter genes. Annotation of these genes is performed by placing them within the context of a characterized family, and leveraging existing information about family-specific structure-function relationships to infer their function, in a preliminary application of the Superfamily analysis method that is further explored in subsequent chapters.; Streamlining the Superfamily analysis strategy requires a platform that presents Superfamily structure-function data in an easily accessible format. Chapter II describes the development of a Structure-Function Linkage Database (SFLD) to fulfill this purpose. The issues involved in database design are discussed, and an overview of the database functionality is given. The use of the database to address several types of real scientific problems is discussed.; Without data, the SFLD schema is of limited use to the research community. Chapter III describes the development of a set of gold standard superfamilies that provide a preliminary dataset for the SFLD and a test set for automated methods that aim to cluster proteins based on sequence, structure, and function. The properties differentiating the gold standard set from existing datasets, as well as the difficulties involved in clustering enzymes in mechanistically diverse superfamilies are discussed.; The SFLD may be used for several different purposes. Chapter IV presents two detailed scenarios for using the SFLD---the functional annotation of an uncharacterized protein and analysis of a previously annotated protein to detect misannotation.; Chapter V presents some additional contributions that I have made to the SFLD. Development of the database schema is discussed in terms of the requirements for representing structure-function relationships in mechanistically diverse enzyme superfamilies. The automated update protocol for the SFLD is also presented.
Keywords/Search Tags:Protein, SFLD, Superfamily analysis, Automated, Function
Related items