Font Size: a A A

Ontology-based Protein-protein Interaction Information Text Mining Method

Posted on:2011-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:M S LiFull Text:PDF
GTID:2190360308474906Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Protein-Protein Interactions (PPIs) play most important roles in most cellular functions. Analyses of PPIs'contribute are important for the underlying biological mechanism. It is not only of theoretical significance, but also strong value in piratical application. With the development of experiments, great amounts of PPI information have been deposited in biological literatures, which present a significant challenge to extract them. To address this challenge, many bioinformatics methods have been developed. In this article, we firstly introduce the workflow of extracting the Protein-Protein Interaction information from the biomedical literatures. And then, we point out the development level and the problems of methods and tools developed to recognize the gene/protein names and to extract their relations from the literatures, together with the extraction of PPIs'annotation information. Most of the researches of PPI extraction did not concern about the annotation information of PPIs. We propose a new method to extract the PPI information from literatures for digging out the protein-protein interaction with the higher reliability and richer annotations. In this way, the static PPI networks can be transferred into the realistic dynamical ones.We firstly construct the PPI ontology (PPIO) for the purpose of defining the scope of the PPI information and extracting more PPI annotation information. According to event model, Protein-Protein Interaction event information should include when, where and how a Protein-Protein Interaction occurred, what's more, the evidences information to prove them. We have adopted the strategy of reusing existing Ontologies and re-build the Protein-Protein Interaction Type ontology. A PPIO which includes protein state, interaction type, biological process, sub-cellular localization, biological function and detection methods has been built.By regarding the Protein-Protein Interaction Extraction as a classification problem, we can apply a SVM-based machine learning method to predict them. We mainly focus on the context features selection from the sentence which contains a pair of proteins. The key features of the sentence, including words, part of speech, logic and syntactic features have contributed to the SVM-based method and obtained F value of 77.8%. The SVM-based method was applied to extract PPIs from the mouse liver proteins related literatures.After extracting the PPIs, we continue mining these PPIs'annotation information and this information has been previously defined in the controlled vocabularies of PPIO. We use the dictionary-based methods and co-occurrence principle in this case, and finally about 49.1% of the PPIs achieve annotation information. Meanwhile, two web services, ProteinCorral and EBIMed, also be used to complete the corresponding tasks; we evaluate the methods by comparing their results.Finally, we construct the protein-protein interaction information database (PPII DB) to store and display the information we mined. The PPI annotation information is displayed based on the hierarchy PPIO. Three search ways to find the interested PPI are also implemented. So we can query the PPI information by the protein name, by the ontology term and by the pair of proteins. In summary, this work has the following significance:(1) Construct the Protein-Protein Interaction Ontology. Based on the model of molecular events, we design a framework of PPI Ontology; which adapts to describe Protein-Protein Interaction information and the application of text mining tasks. It consists of six major parts, which are biological processes, sub-cellular localization, biological function, interaction type, detection method and the protein state of the interactions.(2) Build a SVM-based method for mining PPI from biological literatures, which has achieved an F-score of 77.8% and is better than RelEX which is state of the art.(3) Mine the PPI annotation information from literatures based on a group of controlled vocabularies provided by PPI ontology. By this way, it could provide more details of PPI, such as the time and the location the PPI occurred. This information is very useful for constructing a dynamic and more reliable PPI network.(4) Display the Protein-Protein Interaction information based on Ontology. As the ontology has a hierarchy structure, it is a good way for classification of PPIs and it is convenient for searching and grouping the protein Protein-Protein Interactions by researchers.Overall, we present a novel method to extract the PPIs and their annotations from the"mouse liver protein"related literatures; beside, we display the PPI information base on the PPI ontology and constructed an online database, which is a good way to view and use the PPI data.
Keywords/Search Tags:PPI, PPIO, Text Mining, Named Entity Recognition, Relation Extraction, Annotation Extraction
PDF Full Text Request
Related items