Font Size: a A A

A protein structure alignment method and application to the discovery of recurrent protein structure motifs

Posted on:2004-06-05Degree:Ph.DType:Dissertation
University:Boston UniversityCandidate:Szustakowski, Joseph DanielFull Text:PDF
GTID:1460390011974833Subject:Engineering
Abstract/Summary:
Proteins exhibit regularity across several levels of structure, from primary amino acid sequences, to secondary structure elements, to fully folded three-dimensional domains. The identification of such regularities has led to key advances in the understanding of the mechanisms that guide protein evolution, folding, and function.; There exists an intermediate level of regularity between secondary structure elements and domains commonly referred to as super-secondary structures. To date, most characterized super-secondary structures have been identified and curated manually. The main goal of this dissertation is to identify and characterize super-secondary structures in a directed, objective fashion using automated techniques. In essence, the aim is to construct a dictionary of super-secondary structures or ‘protein parts’ that can be used to describe full-sized protein domains.; First, I describe a computational method (K2) for aligning three-dimensional protein structures. This method employs a hierarchical approach to the alignment problem. Initially, K2 performs a rapid alignment of the proteins' secondary-structure elements. This coarse grained alignment is then refined at the amino-acid position level using a stochastic search based on a genetic algorithm. Finally, the detailed alignments are refined through a series of three-dimensional superpositions of the protein backbones that are accompanied by recruitment and pruning of the aligned regions. This method has been tested on a number of well-studied protein families, and its results are in excellent agreement with manually curated alignments.; The basic strategy used to identify super-secondary structures was to first align the structures of many unique, representative protein domains. These alignments were then clustered using a simple graph-theoretic approach based on the detection of maximal cliques in a graph. Protein parts dictionaries were constructed by selecting a subset of these clusters with a greedy algorithm and objective function based on the Minimum Description Length principle.; The parts dictionaries created by this method were both compact and descriptive. Dictionaries typically consisted of fewer than 200 super-secondary structures. The dictionaries were capable of describing not only the structures used in their creation, but other ‘un seen’ structures as well. Secondary-structure coverage for the representative proteins and proteins from the same folds was typically 80%–95%. The dictionaries were effective in describing proteins from other folds as well, with secondary-structure coverage typically between 80% and 90%.
Keywords/Search Tags:Protein, Structure, Method, Alignment, Dictionaries
Related items