Font Size: a A A

Text Mining Based Extraction Method Of Protein Interaction

Posted on:2016-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:M H DongFull Text:PDF
GTID:2308330479991064Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In Bioinformatic, protein interaction is one of important research in this area, it plays a major role in understanding biological progress and also has great significance in disease treatment and diagnosis before therapy. In the aera of biological medical text mining, mining protein interaction from MEDLINE abstract or full-text research literature is a urgent problem to be solved and filled with challenging. As a lot of biological literature related to protein interaction saved in NCBI’s Pub Med database. Therefore, putting forward an automatic method of extract protein interaction is important.At first, with the principle of relation keyword co-occurrences with protein interaction pairs to extract information, we put forward an antomatic extract PPIs from biomedical literature and consists of three phases. First, we combine algorithm of conditional random fields with several rules to recognize the protein entity. Second, building a protein-gene standard dataset and set out nine rules with algorithms designed to replace the recognized entity with corresponding official gene name. At last, we use Stanford Parser parsed the normalized sentence into syntax tree, and use relation keywords combine Tregex extracted candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts true PPIs with negative word and English syntactic nature. Through the above steps we implement automatic extracted PPIs, and then we analyse the newest download 20 million Pub Med abstract from NCBI, build the protein interaction network.Secondly, our project mainly focus on the automatical extract PPIs, in order to verify it’s accuracy, we use the test data from Biocreative-II of the international evaluation institution. In the module of protein entity recognition, our accuracy reaches 87.18%, 65.84% in normalization and 68.78% in PPI automatical extraction. By compared the newest similar software.All this results prove that our extraction method is effective.At last, we constructed a web site based on analysis of all Pub Med abstract and get PPIs network. And from three aspects to present out protein interaction network: Analysis a single Pub Med ID abstract, finding out two interaction gene network, construct protein interaction network for four PPI type.
Keywords/Search Tags:Conditional Random Fields, Protein Interaction Network, Entity Recognition, Protein Normalization
PDF Full Text Request
Related items