Font Size: a A A

Hedge Scope Detection Based On Structure And Semantic Information

Posted on:2017-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:H J DengFull Text:PDF
GTID:2348330488958695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Hedge cues are defined as "whose job is to make things fuzzier or less fuzzy". Hedge information controlled by hedge cue indicates uncertain or temporary views. Hedge information is usually used in science texts, especially in the biomedical domain to express impressions or hypothesized explanations of experimental results. In order to distinguish factual from uncertain information, hedge information detection is an important task for biomedical information extraction. Hedge information detection contains two subtasks:Task 1 aims to identify hedge cues and Task 2 devotes to detecting the in-sentence scope of a given cue. Researches on hedge cue identification have been developed rapidly. However, the results of hedge scope detection are not satisfied. Hedge scope detection is a difficult task, since it falls within the scope of semantic analysis of sentences exploiting syntactic patterns. Hedge scope detection needs to be further enhanced. This paper focuses on hedge scope detection task by making full use of syntactic and semantic information. The main contents include three aspects:(1) Dependency-based candidate boundary selection algorithmTraditional hedge scope detection approaches usually regard scope boundary tokens as positives, and the others as negatives for scope boundary classifiers. For example, the F-scope classifier takes the F-scope tokens as positives and the others on the left side of a given cue (including the cue itself) as negatives. This inevitably generates plenty of negatives. Excessive negatives mislead classifier and degrade classification performance. Meanwhile, as adjacent tokens have similar structure and contextual information, the boundaries and their neighbors are extremely difficult to distinguish for classifiers. This paper proposes a dependency-based candidate boundary selection (DCBS) algorithm, which selects the most likely tokens as candidate boundary and removes the exceptional tokens which have less potential to improve the performance based on dependency tree. Experiments on the CoNLL-2010 biomedical corpus show that negatives decreases to about three times the number of positives from ten times by using DCBS. The hedge scope detection based on lexical features obtains 68.19% F1, which is 2.76% higher than the system without using DCBS. This indicates that DCBS can decrease candidate instances and enhance the discriminability of instances for classifiers effectively.(2) Hedge scope detection based on syntactic structure informationHedge scope detection is dependent on syntactic structure information, since hedge information is a phrase or clause related to hedge cue. This paper researches structure representations for hedge scope based on dependency and phrase syntactic trees. The convolution tree kernel function is applied to capture structured syntactic information. Experiments on the CoNLL-2010 biomedical corpus show that the performance based on dependency structure achieves 64.57% F1 and the performance based on phrase structure achieves 63.51% F1. This indicates that both dependency and phrase structures are effective for hedge scope detection. The combination of dependency and phrase structures can further improve performance and obtains 66.67% F1. This shows that dependency and phrase structure are complementary to each other for hedge scope detection.(3) Hedge scope detection based on semantic informationHedge scope detection is dependent on semantic information, since hedge information is a complete semantic fragment. This paper explores the semantic representation of hedge information based on LSTM, and develops hedge scope detection system based semantic information. What's more, to making full use of lexical, syntactic structure, and semantic information, this paper integrates above three systems into a unified framework to exploit deep syntactic and semantic information for hedge scope detection. Experiments on the CoNLL-2010 biomedical corpus show that the performance based semantic information obtains 65.23% F1. This indicates that the semantic representation is effective for hedge scope detection. The hybrid system achieves best performance 70.49% F1 over any individual systems. This suggests that the hybrid system achieve complementary advantages of the three systems.These researches can significantly improve system performance for hedge scope detection, and it also can be popularized to other tasks relating to structure and semantic information, such as relation extraction.
Keywords/Search Tags:Hedge Information Scope, Candidate Boundary, Syntactic Structure Information, Semantic Information
PDF Full Text Request
Related items