Font Size: a A A

Protein-protein Interaction Identification Based On Local And Global Context

Posted on:2020-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:S C CaiFull Text:PDF
GTID:2370330590972685Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein-Protein Interaction(PPI)is of vital importance in the field of biomedical research,which has important application value in discovering and exploring the law of life.With the rapid development of the Internet,the biomedical literature presents exponential growth.Relying on the manual reading method,it is hard to satisfy the actual needs of obtaining effective information of the PPI from the massive unstructured text.Therefore,the problem that how to design an algorithm to accurately extract information from a large number of literature and realize automatic PPI information extraction needs to be solved in due course.At present,PPI information recognition mainly adopts two methods based on local context and global context.The method based on local context is difficult to grasp the comprehensive information of the target protein pairs by relying on a single sentence.Although more comprehensive information for target pairs is obtained,the method based on global context treats all sentences equally and will extract some invalid features for interactive protein pairs which have some sentences that do not express the PPI information,thus affecting the accuracy of PPI recognition.For this two methods' defects,this thesis firstly establishes a novel basic model based on a two-layer logistic regression classifier framework.This basic model,using multi-instance multi-label learning framework to represent target protein pairs,combines local context information and global context information of protein pairs adequately.Testing on the experimental data set using the trained classifier and the result shows that the model we proposed has achieved good recognition performance.Then,we design an improved PPI recognition model based on multi-level clues to improve the basic model,considering the information of the single sentence level and the protein pairs level together.For protein pairs containing core words,the improved model use clues such as sentence importance,sentence similarity,and keyword set to improve the tuple-level feature by extracting additional features from protein pairs.Compared with the basic model,the F1 measures of interactive and non-interactive protein pairs are increased by 2.8% and 1.9% respectively.Observed that there are significantly fewer words between interactive proteins than non-interactive proteins for protein pairs without core words,so we adopt a graph model to model each protein pair and construct edges using similarities between words in different sentences and between words and keywords.Using the relevant attributes of the graph to valid clues to make the sentence-level classifier features more abundantly and the experiment results show that the interactive and non-interactive pairs' F1 measures are increased by 2.9% and 2.5% compared to the basic model.On the whole,the improved model based on multi-level clues obtains better performance in PPI recognition and the extraction results are more stable.
Keywords/Search Tags:Protein-Protein Interaction, Large-scale text, Multi-instance multi-label, Core words, Graph model
PDF Full Text Request
Related items