Font Size: a A A

Research On Flat Features And Structural Information For Protein-Protein Interaction Extraction

Posted on:2012-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2210330368992441Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein-Protein Interactions (PPIs) plays an important role in understanding the biological process. Automatically extracting PPIs from biomedical text can greatly increase the extraction efficiency. However, it poses new challenges to biological text mining. This paper focuses on exploring more effective flat features and more proper representation of structural information, and applying statistical machine learning methods for PPI extraction. The contributions lie:1. Investigating the contributions of various flat features for PPI extraction. The effect of features on PPI extraction, including words, chunks, constituent and dependency parse trees as well as semantic information is explored. Furthermore, the overall performance can be effectively improved by feature combination.2. Investigating the effectiveness of structural information for PPI extraction. In order to address the representation of structural information, this paper proposes a dependency-directed constitute parse tree pruning strategy, aiming to generate a syntactic tree representation which can cover critical structural information and eliminate noise effectively, thus capturing the structural characteristics between protein entities.Experiment results on several popular PPI benchmark corpora show that lexical and dependency information contribute most to the PPI extraction, and effective combinations of various kinds of flat features can further boost the performance. On the other hand, dependency-directed parse tree can significantly improve the performance of PPI extraction, which achieves the best performance compared to other state-of-the-art PPI methods based on constituent syntactic trees.
Keywords/Search Tags:Protein-Protein Interaction Extraction, Flat features, Structural Information, Statistical Machine Learning
PDF Full Text Request
Related items