Font Size: a A A

Research On Protein-protein Interaction Extraction Method, Annotation System And Mining Platform

Posted on:2017-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:M S LiFull Text:PDF
GTID:1220330488455775Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Protein-Protein Interaction(PPI) is a molecular event that plays a very important role in life activities and is involved in all processes of the cell cycle. PPI research not only helps us gain keen insights into life activities, but is of great significances for disease diagnosis and treatment. With the development of life sciences, literature has been increasing that offers large amounts of PPI information on new discoveries, posing a great challenge to manual extraction and collection of various information. Automatic extraction and integration of PPIs from biological literature, which is one of the most important tasks of biological literature mining, can contribute to the research on molecular biology and is regarded as an efficient way to address the challenge. Currently, there are still some important issues to be addressed in this field. For instance, the performance of PPI extraction methods leaves much to be desired. For lack of an appropriate ontology to represent PPI, little research has focused on PPI functional annotation extraction. There is no efficient system to assist PPI extraction and there is a lack of a database for storing and exploring literature-derived PPI data. Hence, it is our goal of this research to improve the performance of PPI extraction methods and to construct a comprehensive ontology that can better represent and capture the functional annotation of PPI.First, we proposed a support vector machine(SVM)-based PPI extraction method which used four types of sentence features, including keywords features, part of speech(POS) features, logic features and dependency syntax features, to train the classification models. The model trained with the combination of these four types of features can archive the best performance with 81.8% precision, 96.4% recall and 88.5% F-score on the LLL05 test dataset, which is the state-of-the-art performance on the same test dataset.Second, we constructed a novel PPI functional annotation system- the PPI Ontology(PPIO) for the purpose of defining the scope of the PPI information and extracting PPI annotation information. PPIO, whose design was inspired by the event model and distinct characteristics of the event, consists of six core aspects of information required to report a protein interaction event, which are the interactor(who), the biological process(when), the subcellular location(where), the interaction type(how), the biological function(what) and the detection method(which). PPIO was implemented by integrating appropriate terms from the corresponding vocabularies/ontologies and assessed through its use in capturing PPI biological annotations from ―human liver protein‖ related literature. PPIO-based extraction approach was established and achieved a satisfactory performance(precision of 68.5%, recall of 76.1% and F-score of 71.1%) on the test corpus.Third, we developed an auxiliary system(PPICurator) for extracting protein-protein interactions based on the proposed SVM-based classifier. This complete system was designed to meet the batch processing requirement of large scale data. Users can not only retrieve literature in a Pub Med-like way, but filter literature and PPI extraction results easily. Moreover, users can export selected and visualize PPI information. These functionalities make this system a useful and efficient tool for biologists to extract PPI information.Finally, we designed the protein-protein interaction database(db PPII) to store and display information derived from literature mining. db PPII is able to organize and display PPI information based on the novel PPI information schema(PPI Ontology). It provides multiple query methods and rich literature information about PPI.In summary, this study in innovative in the following areas:(1) Based on the SVM model, we explored the effects of different learning features and their combinations on protein-protein interaction extractions. It is demonstrated that keywords features, POS features, logic features and dependency syntax features contribute to the improvement of PPI extraction performance.(2) A novel protein-protein interaction annotation system was constructed. This ontology is comprehensive enough to better represent and capture the biological context of PPI from literature. It achieves satisfactory performance on the tasks of PPI annotation extraction.(3) A novel protein-protein interaction extraction platform was created based on the SVM classifier with good performance that can meet the requirement of handling large scale data, which makes it different from other systems in use. And it is better suited to PPI extraction.(4) A database system for storing and managing PPI annotation information from literature was developed, in which PPIO was employed to navigate the PPI data with rich annotation. It provides a convenient way to query and group PPIs.
Keywords/Search Tags:Protein-Protein Interaction, PPI Extraction, PPI Ontology, PPI Annotation Extraction, PPI Information Database
PDF Full Text Request
Related items