Font Size: a A A

Identification Of Protein-protein Interaction Based On The Constraint Of Semantic Similarity On Context

Posted on:2017-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:H M WuFull Text:PDF
GTID:2310330503495770Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein- Protein Interaction(PPI) is an important biological research. The results of PPI experiments carried out by the biomedical research are mainly stored in literatures. PPI information is significant for biology and medicine. In order to build a PPI network, the biomedical experts manually collect information from the literature and identify PPI, and store the information in a unified format. However, with the increasing in the number of biomedical literatures, the way of collecting information by manual has been difficult to meet the actual needs. Therefore, it has been an urgent problem to identify PPI.The most commonly used machine learning methods in PPI are based on a single sentence, which rely on manual annotation, and ignore the context information of protein pair. To avoid these problems, we take the large-scale corpus as the research basis, and carry on the PPI recognition according to the rich context information. We collect context information of protein pair from database, and carry out the research from the following three aspects.First, we analysis the context features of protein pair, and weight the vector based on the words' similarity and POS. Compared with the result of non-weighted method, the F score of interactive protein pairs enhance by 2.51%, and non-interactive protein pairs increase by 1.85%.Second, according to the similarity of the texts which describe the relationship of proteins, we construct a classifier for PPI identification, and focused on the comparison of the four methods of weight calculation.Third, in order to effectively combine the information of the context features and the similarity between the texts, we use the Minimum Cuts algorithm to restrict the results by the similarity between the contexts. In the experiment, we choose the different ratios of the training data to construct the classifiers. When using 80% training data, compared with the results of the SVM, the recognition result of Minimum Cuts enhances by 3%-4%. The accuracy of the Minimum Cuts classifier which uses 20% the data is comparable to that of the SVM classifier trained by 80% of the data.
Keywords/Search Tags:PPI, Word Similarity, Relation similarity, Minimum Cuts, SVM
PDF Full Text Request
Related items