Identification Of Protein-protein Interaction Based On The Constraint Of Semantic Similarity On Context

Posted on:2017-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:H M Wu

Full Text:PDF

GTID:2310330503495770

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Protein- Protein Interaction(PPI) is an important biological research. The results of PPI experiments carried out by the biomedical research are mainly stored in literatures. PPI information is significant for biology and medicine. In order to build a PPI network, the biomedical experts manually collect information from the literature and identify PPI, and store the information in a unified format. However, with the increasing in the number of biomedical literatures, the way of collecting information by manual has been difficult to meet the actual needs. Therefore, it has been an urgent problem to identify PPI.The most commonly used machine learning methods in PPI are based on a single sentence, which rely on manual annotation, and ignore the context information of protein pair. To avoid these problems, we take the large-scale corpus as the research basis, and carry on the PPI recognition according to the rich context information. We collect context information of protein pair from database, and carry out the research from the following three aspects.First, we analysis the context features of protein pair, and weight the vector based on the words' similarity and POS. Compared with the result of non-weighted method, the F score of interactive protein pairs enhance by 2.51%, and non-interactive protein pairs increase by 1.85%.Second, according to the similarity of the texts which describe the relationship of proteins, we construct a classifier for PPI identification, and focused on the comparison of the four methods of weight calculation.Third, in order to effectively combine the information of the context features and the similarity between the texts, we use the Minimum Cuts algorithm to restrict the results by the similarity between the contexts. In the experiment, we choose the different ratios of the training data to construct the classifiers. When using 80% training data, compared with the results of the SVM, the recognition result of Minimum Cuts enhances by 3%-4%. The accuracy of the Minimum Cuts classifier which uses 20% the data is comparable to that of the SVM classifier trained by 80% of the data.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Identification Of Protein-protein Interaction Based On Relational Similarity Of The Text
2	Research On Spatial Similarity Calculating Method Between GML Documents
3	Analysis Of Biological Sequences Similarity And Research On κ-Word Model
4	Study On Word Similarity Based On Quantum Theory
5	A Study On Spatial Similarity Theory And Calculation Model
6	Research On Geometric Mathematical Problems Similarity Measurement Method Based On Logical Relation Modeling
7	Regular Similarity Relation On F(S) In Propositional Logic And An New Triple-I Method Of Fuzzy Reasoning
8	Research On Theory, Methods And Applications Of Geometry Similarity Measurement For Spatial Data
9	Research On Smilarity Degrees Between Two Individual Building In Multi-scale Map Space
10	The Research On Disease-related MiRNAs Prediction Methods And Its Applications Based On Similarity Network