Font Size: a A A

Detecting Hedges And Their Linguistic Scope In Biomedical Literatures

Posted on:2012-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2218330368987995Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speculative language, also known as hedging, is usually used in science text, especially in the biomedical domain. Hedges information indicates that authors do not or cannot back up their opinions or statements with facts. In information extraction, sentences that contain hedges information should be identified. In a speculative sentence, the hedges always have a scope instead of the whole sentence. Therefore, detecting of hedge cues and their linguistic scope are both important in biomedical text mining.This paper mainly focus on the research of detecting the hedge cues and their scope in biomedical literatures.For the hedges cue identifictation problem, this paper presents a machine learning system that detects hedged cues in biomedical texts. The approach applies Conditional Random Fields (CRFs) model combined with a diverse set of linguistic features. In addition, hedge cues that appeared in the training dataset and the synonyms in WordNet are regarded as keywords and employed as an important feature in hedge cue identification system. Experiments on test data from CoNLL-2010 shared task show that the proposed method is robust. The recall achieves 85.44% and F1-score of the biological hedge detection task achieves 86.32%.For the hedges scope detection problem, this paper combines the manual rules and statistics to detect the hedges scope. First, it develops the rules for scope detection, based on the syntax tree of the sentence and the Part-of-Speech of the hedge cue. Then, a statictical CRFs classifier is used to refine these predictions. As a final step, scopes were constructed from the classifier output using a small set of post-processing rules. Experiment result based on CoNLL-2010 shared task dataset show that the approach achieves a robust score with the F1-score is 57.47%. And it also shows that combines the manual rules and machine learning to detect the hedges scope can improve the detection performance.Our exploit research can be applied to many Natural Language Processing tasks, such as gene named entity extraction, question and answering system, biomedical information extraction, and so on.
Keywords/Search Tags:Hedges Identification, Hedges Scope Detection, Conditional Random Fields, Syntactic Structures
PDF Full Text Request
Related items