Font Size: a A A

The Protein-protein Interaction Extraction Based On Full Texts

Posted on:2015-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:P P ZhangFull Text:PDF
GTID:2180330467485417Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The main purpose of text mining is to automatically extract useful information from literatures. The biomedical text mining can help domain experts find significant information, and help the experts curate the database with less cost. The amount of biological literatures on research of protein-protein interactions (PPIs) is increasing rapidly. However, these studies all concentrated on the abstracts of literatures and neglected the PPIs contained in other parts in a full article, such as figures, tables. Additionally, the standard data sets, used to evaluate the performance on PPIs in full texts, are relatively scarce. The training set provided by BioCreative Ⅱ.5and the literatures recorded in FEBS Letters are used in thesis. Based on the methods about PPIs extraction from abstracts, the unique attributes belong to full texts are introduced, and finally, the feature selection is used to fix the original features set.Firstly, a new method is used to extract PPIs from full texts, which is based on the methods used in the PPIs extraction from abstracts and contains the basic word features and the syntactic pattern features. Among these features, the location information (Part) and the frequency (Coo) are added into the words features."Part" describes the position that the protein pair appears in the article, such as TITLE, ABSTRACT, FIGURE and TABLE."Coo" is the number of the protein-protein pair appearing in the full text. In addition, syntactic patterns have been treated as a feature for support vector machine. By integrating the two features, we achieve an F-score of72.57%and AUC of77.90%.Secondly, different features work differently. When these features are combined by different ways, the forward and reverse action may affect inconsistent proportion. To get better performance or reduce the dimensions of features, feature selection is used.Finally, the selected features will be combined with tree kernel. Experimental results show that the presented approach can achieve an F-score of74.46%and an AUC of78.50%. And the dynamic extended tree (DET) is extended to secondary expansion.
Keywords/Search Tags:Full-text Relation Extraction, Syntactic Patterns, Feature Selection, TreeKernel
PDF Full Text Request
Related items