Font Size: a A A

Analysis and prediction of contacting residues in protein-protein interfaces

Posted on:2005-12-01Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Ofran, YanayFull Text:PDF
GTID:1450390008980496Subject:Biophysics
Abstract/Summary:
Biological processes are realized through the interactions of proteins. Therefore, to fully understand the function of genes, biochemical pathways or pathological processes one needs to explore the networks of protein-protein interactions that underlie them. Most traditional research methods are designed to study only a small number of proteins at a time. Therefore, there is a pressing need for high-throughput tools, both experimental and computational, for the study of protein-protein interaction. The rapid growth of sequence databases opens the door for a large-scale analysis of protein interactions. A first step in this direction is to characterize the biophysical nature of protein-protein interfaces. However, due to technical difficulties and conceptual unclarities, many studies that have attempted to do this used small datasets and reached contradicting conclusions. In the first part of this work, we present novel methods for the analysis of protein interfaces that cope with these difficulties and unclarities. These methods enabled us to compile a very large dataset of protein interfaces. Analysis of these interfaces led to some important insights into the nature of protein-protein interaction. We found that there are at least six different types of interactions, each of which is characterized by different biophysical features. Based on these finding we present, in the second part of this work, a method for the prediction of interaction sites from sequence. Incorporating sequence information, predicted structural features and evolutionary information, this method could correctly identify interacting residues in more than 95% of the chains in our dataset. Per-residue analysis of the accuracy of the method shows that we do as well or better than existing methods, which rely on structure rather than sequence. For an accuracy of 62% we were able to find 21% of the residues that are observed in protein interfaces. The expected values at random are 34% and 9% respectively. Analysis of mutation studies suggests that if we focus on the energetically important residues, which are the residues that drive the interaction, the coverage and accuracy of the method are even higher. These results demonstrate that sequence alone, without any structural or experimental information, suffices to identify interaction sites in proteins. The possible implementations of this method are wide. Among them are high throughput analysis of whole genomes, functional studies, and docking. The method can also help identify candidate residues for site direction mutagenesis in experimental research.
Keywords/Search Tags:Residues, Protein, Method, Interaction
Related items