Font Size: a A A

Template And Expand Features For Extracting Protein-Protein Interaction

Posted on:2012-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2120330335454429Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Before protein relationship extraction systems mainly extract the typical characteristics that indicate the existence of relations, while introducing the classic characteristics that have been classified to build their own PPI systems. In the same time of improving the classification results, such a system consumes a lot of time to complete feature extraction, analysis and integration process, and in practical applications the effect of the classification is not obvious. Therefore, how to establish the feature set based on the original system without sacrificing accuracy is needed to be considered, in order to improve the extraction efficiency of the system.In this paper, a feature optimization technique is applied to extract protein-protein interaction. The main idea is based on the consideration of following disadvantages of current methods:(1) Redundant features are extracted with various methods, and how to combine the most appropriate method has no specific, sometimes fusion of two outstanding methods unexpectedly reduces the overall extraction accuracy.(2) Effective combination of features greatly reduces the system efficiency; the details of adjustment and analysis of system efficiency and accuracy are not described.(3) Test set may contain absent features in train set, so features in train set is not comprehensive. Generalized features should be extracted in a more wide range corpus.In response to these problems, the paper attempts to build protein relation extraction system by sifting and optimizing the mainstream feature. Drawing on the previous studies, it introduces a new template features methods to simulate artificial mark principle for extracting word sequence template, and to improve classification results by multiple nuclear fusion. Based on the original feature extraction, design a kind of expansion characteristics of method to extract vector optimization of concise features. In the original feature extraction based on an extended characteristics of the design method to extract concise characteristic vector optimization, the method can automatically find appropriate feature community which most conforms to the standard of protein relation extraction. Standardizing the features community, and in the premise of maintaining the original experiment precision, it greatly improves the efficiency of the relationship extraction.This method can perfectly solve the shortage of dealing with the complex long sentence, by classified results of the matched template. Meanwhile, it can reduce the noise of redundancy in sentence to improve the effectiveness of the experiment through the semantic analysis and standard key words. With the co-work of the graph kernel, templates features and extend features, the F value reach 63.1% in the corpus of Aimed.
Keywords/Search Tags:extracting the protein protein interactions, template, graph kernel, syntactic analysis parser, Additional features
PDF Full Text Request
Related items