Font Size: a A A

Research On Improved Method For Proteinprotein Interaction Sites

Posted on:2007-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:S J AnFull Text:PDF
GTID:2120360185985732Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
More and more people are getting interested in the prediction of the protein-protein sites. The purpose of the research is to identify amino acid residues that participate in protein-protein interaction. This work is a key for deciphering the mechanism of the organism and prediction of the protein function, and is crucial for diagnosing disease and drug design.At present, the common features used in prediction are profiles of sequentially/spatially neighboring residues, the solvent accessible surface area, hydrophobicity, conservation etc. The classification algorithms are usually support vector machines or neural networks.In this paper, at first the data was prepared including selecting the protein chains, determining positive and negative examples, i.e. interaction sites and non interaction sites, and getting the features. Then based on previous research, three methods have been proposed to improve the performance of the prediction.Considering the difference of secondary structures between interaction sites and non-interaction sites, the secondary structural information is introduced on the basis of the features used in the previous method as a new feature. When the secondary structural information was combined with profiles and accessible area, the performance was improved. But if the residues'conservation or hydrophobicity were considered after the secondary structural information has been introduced, the prediction performance decreased a little.The interaction sites are much less than the non-interaction sites, so the number of the positive and negative examples in the training set are unbalanced. To reduce the infection of this problem a high weight of the positive data was introduced. The performance was better on the same feature set after the weight has been introduced.The support vector machines select only one sample to represent each class when classification, but in fact, sometimes the sample selected is not able to represent the class. So the k-nearest neighbor algorithm was combined with support vector machines. In the new algorithm, if the sample was far from the hyperplane, it was classified using support vector machines, otherwise, it was...
Keywords/Search Tags:protein-protein interaction sites prediction, support vector machines, k-nearest neighbors algorithm, secondary structural information
PDF Full Text Request
Related items