Font Size: a A A

The Research Of Protein-Protein Extraction In Biomedical Literature

Posted on:2008-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2178360245997766Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive increment of biomedicine literature,how to extract the information in the biomedicine literature is becoming a hot spot in the field of related research. Because of the peculiar important significance of protein-protein interactions to life science, extracting protein-protein interactions has become a main studies direction of text-mining in the field of biomedicine.IE consists relation extraction and event extraction. This thesis try to solve the problem of extracting protein-protein interactions based on information extraction patterns, and introduces bootstrapping event extraction method to the problem of extracting protein-protein interactions, which used in event extraction.Bootstrapping pattern extraction method need to provide a little protein-protein interaction relations, then it can automatically extract pattern and more protein-protein interactions finally. So it fits to the problem of lacking corpus and complexity in PPI extraction.But relation extraction method based on bootstrapping also encounters some difficulties with applying to the problem of extracting PPI. On one hand, it need to choose typical seed relation. If these seed relations don't have representative, then the generated pattern set may make a big mistake. On the other hand, this method is usually used in the event extraction, and short of certain adaptability forward free text application.To solve these problems, we define a new expression of information extraction patterns, which is more easily to process free text. And dynamically estimating and choosing the IE patterns and PPIs to improve the algorithm. Every time increasing new PPIs or IE patterns, we need to estimate all IE patterns or PPIs. Then reject IE patterns or PPIs, which reach a low confidence. Therefore we achieve the goal of avoiding wrong accumulating.At last in this thesis a system for extracting protein–protein interactions from plain text have been implemented. This system can extract PPI from Medline abstract database in PubMed.
Keywords/Search Tags:protein-protein interaction, bootstrapping, IE patterns
PDF Full Text Request
Related items