Font Size: a A A

Biomedical Event Extraction Based On Deep Parsing And Domain Knowledge

Posted on:2016-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2308330461477991Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the speedy development of information technology, biomedical literature on network accumulates rapidly, and also contains a lot of potential. How to extract useful knowledge from massive amounts of biomedical literature automatically and efficiently and thus reveal more unknown information of human diseases becomes an urgent problem. Against this background, information extraction technology in the biomedical domain develops and evolves. Along with the increasing refinement of biomedical research, relatively simple relationship extraction can’t meet the research requirements of biological scientists; biomedical event extraction which can reveal multiple relationships between biological molecules arises at the historic moment, and quickly became the research hotspot. This paper focuses on the key issues of biomedical event extraction technology. Based on the experience from predecessors, the whole event extraction system can be divided into:corpus preprocessing, event trigger recognition, event arguments detection, and post-processing. The main work includes the following two points.For event trigger recognition phase, due to the simplistic and shallow application mode, syntactic information can’t effectively play a role in the trigger recognition phase of traditional biological. Here we propose a method which can skillfully take advantage of the similarity between trigger-argument in event structure and predicate argument structures of deep parsing results to extract event trigger-protein pairs separately, and merge them with event trigger recognition results based on dictionary. The proposed method aimed at the specialty of event trigger recognition task, can make better use of sentence structure information hidden in the deep syntactic analysis. Meanwhile, it can cover the shortage of the dictionary based method of missing not login trigger. The proposed method is tested on both BioNLP2009 and BioNLP2011 GE corpus, experiments show that it can significantly improve the various performances, and show good generalization ability.For the phase of event argument detection, this paper aimed at the data sparseness problem caused by lack of annotated corpus, and proposed a method combined domain knowledge. Firstly, we extract word representation features which contain rich domain knowledge and semantic information from un-annotated corpus, at the same time use topic model to learn topic features of sentence and the key words in the sentence, and then utilize the features together to detect event arguments. Features coming from two aspects effectively capture the global semantic information and sentence topic information needed in event argument detection phase, avoided the detection from sparse feature problem when artificially design and extract features from the only annotated corpus. At the same time, the use of language model and topic model for automatic learning features can improve the generalization ability. Experiments show that he event argument detection method combined domain knowledge finally achieved a good event extraction result.
Keywords/Search Tags:Biomedical Event, Event Extraction, Predicate Argument Structure, Word Representation, Topic Feature
PDF Full Text Request
Related items