Font Size: a A A

A geometric framework for robust nearest neighbor analysis of protein structure and function

Posted on:2007-04-08Degree:Ph.DType:Dissertation
University:The University of North Carolina at Chapel HillCandidate:Bandyopadhyay, DeepakFull Text:PDF
GTID:1448390005969652Subject:Biology
Abstract/Summary:
A protein is a long chain of amino acids, also called residues. In solution the protein chain folds into a compact three-dimensional shape that determines the protein's function. Nearest-neighbor analysis of protein structures, represented as one point per residue, identifies pairs, triples and quadruples of residues that interact or pack together. Such analysis has been used to score protein packing and interactions; to detect repeating elements of protein structure; to compare two protein structures; and to find packing patterns in families of proteins that are related to their function. However, point coordinates of protein structures are determined experimentally and are thus imprecise. I explore whether nearest-neighbor analysis done on precise points still applies for imprecise points.;My dissertation introduces two new geometric techniques for robust neighbor analysis, almost-Delaunay simplices and Delaunay probability, that capture imprecision in the input points, and demonstrates several applications in the analysis of protein structure. The almost-Delaunay simplices quantify possible changes in the nearest neighbors, given the maximum motion allowed for any point. For 3D points, they define new sets of neighboring pairs, triples and quadruples that may arise, called the almost-Delaunay edges, triangles and tetrahedra. The Delaunay probability estimates the probability that a set of points really are nearest neighbors, given the expected amplitude of random motion for all points. These techniques establish a framework in which existing applications of nearest-neighbor analysis can often be adapted to make them more robust for imprecise points, and entirely new applications can be designed that were not possible previously.;Using the almost-Delaunay tetrahedra, I observe that the nearest neighbors in a protein structure are more stable than in other protein-like structures, such as artificially folded decoys and structures predicted from protein sequences. I adapt a statistical score for protein packing to use my geometric framework, and show that the score is robust when used to distinguish well-packed proteins from decoys, and that the framework may make it more robust when analyzing the packing at each residue. I identify packing signatures for repeating elements of protein structure, particularly for alpha-helices, and detect these elements with high accuracy. Finally, I use changes in the neighboring residues between two or more snapshots of a protein undergoing motion to identify flexible residues.;Using the almost-Delaunay edges, I derive a sparse and robust graph representation of protein structure to support mining frequent substructures from protein families. From these I identify fingerprints, specific substructures that characterize protein families, and use them to infer the function of protein structures with unknown function.
Keywords/Search Tags:Protein, Function, Robust, Framework, Nearest, Geometric, Residues
Related items