The Protein-protein Interaction Extraction Based On Full Texts

Posted on:2015-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:P P Zhang

Full Text:PDF

GTID:2180330467485417

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The main purpose of text mining is to automatically extract useful information from literatures. The biomedical text mining can help domain experts find significant information, and help the experts curate the database with less cost. The amount of biological literatures on research of protein-protein interactions (PPIs) is increasing rapidly. However, these studies all concentrated on the abstracts of literatures and neglected the PPIs contained in other parts in a full article, such as figures, tables. Additionally, the standard data sets, used to evaluate the performance on PPIs in full texts, are relatively scarce. The training set provided by BioCreative â…¡.5and the literatures recorded in FEBS Letters are used in thesis. Based on the methods about PPIs extraction from abstracts, the unique attributes belong to full texts are introduced, and finally, the feature selection is used to fix the original features set.Firstly, a new method is used to extract PPIs from full texts, which is based on the methods used in the PPIs extraction from abstracts and contains the basic word features and the syntactic pattern features. Among these features, the location information (Part) and the frequency (Coo) are added into the words features."Part" describes the position that the protein pair appears in the article, such as TITLE, ABSTRACT, FIGURE and TABLE."Coo" is the number of the protein-protein pair appearing in the full text. In addition, syntactic patterns have been treated as a feature for support vector machine. By integrating the two features, we achieve an F-score of72.57%and AUC of77.90%.Secondly, different features work differently. When these features are combined by different ways, the forward and reverse action may affect inconsistent proportion. To get better performance or reduce the dimensions of features, feature selection is used.Finally, the selected features will be combined with tree kernel. Experimental results show that the presented approach can achieve an F-score of74.46%and an AUC of78.50%. And the dynamic extended tree (DET) is extended to secondary expansion.

Keywords/Search Tags:

Full-text Relation Extraction, Syntactic Patterns, Feature Selection, TreeKernel

PDF Full Text Request

Related items

1	Research On Chemical-Protein Relation Extraction For Biomedical Literature
2	Research And Application On Relation Extraction Of Biomedical Text
3	Research And Implementation Of Entity Relation Extraction Method In Oil And Gas Exploration Field
4	Research On Literature Based Entity Recognition And Relationship Extraction Of Drug Phenotype
5	Research On The Key Technologies Of Biomedical Text Representation And Mining
6	Ontology-based Protein-protein Interaction Information Text Mining Method
7	Deep Learning-Based Methods For Biomedical Text Filtering And Information Extraction
8	Meteorological Text Categorization Feature Selection Method And Its Implementation On MapReduce
9	Research On Entity Relation Extraction Technology Of Geographic Text Big Data Based On Distant Supervision
10	The Extraction Of Geospatial Informationfor Traffic Radio Text