Font Size: a A A

Protein-Protein Interaction Extraction Based On Ensemble Kernel

Posted on:2014-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2248330398450782Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Protein-Protein Interaction (PPI) extraction is important in the field of biomedical information extraction, with a high application value and practical significance. Determine whether the sentence contains PPIs is closely related to a variety of linguistic information in the sentence, and how to use as much as possible linguistic information to improve the accuracy of PPI extraction is the focus of our research. This thesis applies support vector machine (SVM) to extract PPI using an ensemble kernel using a variety of linguistic information. The convolution tree kernel calculates the similarity between two input trees by counting the number of common sub-trees. Since a complete syntax parsing tree contains too much noise, it should be pruned to improve the PPI extraction performance. Firstly we discuss the influence of different pruning strategies to the experimental results with the complete tree, minimum complete tree, the minimum tree and the shortest path enclosed tree respectively, and find that the shortest path enclosed tree performs best. On the basis of the shortest path enclosed tree we propose a dynamic extended tree, it achieves better results than other syntax parsing tree. Finally, we use the ensemble kernel to extract PPI on the AIMeD corpora with10-fold cross-validation and the precision, recall and F-score reach82.40%,51.30%and63.23%respectively.Then use the semantic kernel in the task of PPI extraction. The semantic kernel consists of two parts:1) Protein Pair Similarity;2) contextual semantic similarity. Protein Pair similarity based on how close the two concepts in the taxonomy are and how much information the two concepts share in the MeSH. Context semantic similarity measure the similarity of the two sentences by using WordNet. The experiments show that the semantic kernel performs well in the PPI extraction task.The final ensemble kernel is combined with polynomial kernel, convolution tree kernel and semantic kernel. It contains a wealth of lexical information, precise syntactic information and Integral semantic information. We use the final ensemble kernel to extract PPI and the F-score reach69.46%by using the Aimed corpus. Through the comparison of the experimental results, it is found that our approach is superior to other state-of-the-art protein interactions extraction system.
Keywords/Search Tags:PPI, SVM, Convolution Tree Kernel, Semantic Kernel, Ensemble Kernel
PDF Full Text Request
Related items