Font Size: a A A

Design And Implementation Of A Protein-protein Interactions Extraction System Based On PubMed Abstracts

Posted on:2015-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:WangFull Text:PDF
GTID:2308330452456897Subject:Software engineering
Abstract/Summary:PDF Full Text Request
By accessing public databases, we can easily obtain a lot of biomedical informationfrom scientific literature. The relationships between biomedical entities are important partsof biological knowledge. Acquiring such structured information from unstructuredliterature can be done through human annotation, but is time and resource consuming. Asthis content continues to rapidly grow, the popularity and importance of text mining forobtaining information from unstructured text becomes increasingly evident.The goal of this system is to extract the relevant information of biological entitiesfrom unstructured literature. To do this, the system uses the structure of triplets, whereineach triplet contains two proteins and one word representing the relationship betweenproteins. In the system, we implemented a Bayesian networks (BNs) for extractingprotein-protein interaction triplets from unstructured literature. The system supportsmultiple user interfaces, including text input, file upload, PubMed ID and PubMedkeywords search. After parsing user input, system process the text by text-processingmodule which includes the functions of splitting the articles into sentences, splitting thesentences into words, entity recognition and generating candidate triplets. Then, theclassification module classifies the candidate triplets into two classes, system just outputthe true triplets.By design and implementation the system, users can easily extract the information ofprotein-protein interactions from unstructured literature. Using cross-validation data setwhich manually annotated to test our system, the system achieved the overall accuracy of87%, indicating that the system can meet the needs of practical use.
Keywords/Search Tags:Text mining, Bayesian network, Protein-protein interactions
PDF Full Text Request
Related items