Font Size: a A A

Identification Of Protein-protein Interaction Based On Relational Similarity Of The Text

Posted on:2016-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y W WangFull Text:PDF
GTID:2180330479476583Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information on protein-protein interactions(PPIs) is an important content of biological research. Currently, PPIs found by biomedical experiments are mainly stored in literature in the form of unstructured text. Biologists are trying to identify the PPIs existed in these documents manually and input them in a relational database which can be used to establish knowledge networks. However, with the rapid growing of the biology scientific literature, the manual way to collect PPIs is clearly time-consuming and difficult to meet the actual demands. Therefore, studying how to identify PPIs from biological literature automatically has import significance for the development of biological research.Current machine learning-based protein-protein interaction identification systems make decisions solely on evidence within a single sentence. This method is difficult to comprehensively grasp the characteristics of the interaction and offer suffer from small training set. To resolve these problems, this paper propose a relational similarity method for automatic identification of protein-protein interactions by searching large scale text. A basic RS model is first established to make initial predictions. Several distance measurement strategies and weight representations are used in the basic model and the results are compared in the experiments. Results show that this approach achieves high and well balanced precision and recall when taking cosine as the similarity measurement and using binary weight. Then word similarity matrices that are sensitive to the PPI identification task are constructed using a corpus-based approach. Finally, a clustering algorithm was applied to group words according to their similarities. A mixture model is developed to integrate the word similarity model with the basic RS model by introducing the obtained clusters and adjusting weights. The experimental results show that the introduction of the word similarity model further improves the F-score by about 2.03%,1.59%,2.47% on interactions and 2.96%,1.73%,2.94% on non-interactions respectively for our three solutions.
Keywords/Search Tags:Protein-protein Interaction, Relational Similarity, Word Similarity, Vector Space Model, K-Nearest Neighbor Classification, Hierarchical Clustering
PDF Full Text Request
Related items