Font Size: a A A

Research On Negation And Uncertainty Identification On Natural Language Text

Posted on:2016-10-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:B W ZouFull Text:PDF
GTID:1108330482963927Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Negation and uncertainty, very common phenomena in natural language, reflect either the attitudes of human beings on expressing views in natural language or the credibility of linguistic information. While negation is a grammatical category which comprises various kinds of devices to reverse the truth value of a proposition, uncertainty is a grammatical category which expresses a statement in terms of degree of modality, evidentiality, probability, and subjectivity. Obviously, in recent years, negation and uncertainty identification plays a critical role in deep natural language understanding and has being drawn more and more attentions with increasing applications in related topics, such as Information Extraction(IE), Sentiment Analysis(SA), Information Retrieval(IR), Machine Translation(MT).The research of negation and uncertainty identification on natural language text contains three main sub-tasks. The first is cue detection, which aims at detecting whether there is a negative or uncertain keyword in the given text. The second is scope resolution which aims to determine the linguistic scope of a given cue in sentence. The third is focus identification, which aims at identifying the most prominent or explicit part negated by a negative cue in scope. In this paper, firstly, we propose a tree kernel-based model for scope resolution, which utilizes the structured syntactic features effectively and improves the performance of scope resolution. Then, we propose a “word-topic” graph model which identifies focus with context. To promote the advance of related research on Chinese, we construct a Chinese negation and uncertainty corpus. Finally, we propose effective methods according to the characteristics of Chinese language. The main contents of our research work can be summarized as follows:1. Proposing a tree kernel based negation and uncertainty scope resolution model. As the scope is defined as the semantic scope of a cue, the syntactic features are always employed as the important evidence to determine scope. However, the related work only uses the flat syntactic features which are generally represented by a feature vector. It is difficult to felicitously and fully reflect the characteristic on syntactic structure. Therefore, we propose two kinds of sub-trees related to cues and measure the similarity of these structures by a convolution tree kernel. Moreover, we also fuse both the flat features and the structure features by a composite kernel, which improves the performance of scope resolution.2. Proposing a “word-topic” structured bilayer graph model based focus resolution method. Different from the focus identification on speech corpora which contains more stress or intonation information, the research on text corpora only utilize the morphological or syntactic features to identify negation focus. We find that the contextual discourse information plays a critical role on focus identification, which is determined by the semantic relatedness between the negation expression and the emphasis of author in context. On the basis, we propose a “word-topic” structured bilayer graph model to evaluate the effect of the contextual discourse information on the negation focus. Moreover, as an unsupervised method, it also reduces the time-consuming manual annotation for negation focus. The experimental results show that this method is effective for negation focus identification and outperforms the state-of-the-art negation focus identification systems.3. Constructing the Chinese negation and uncertainty corpus. Currently, the scarcity of linguistic resource seriously limits the advance of the research of negation and uncertainty identification on Chinese. Therefore, we construct the Chinese Negation and Uncertainty Corpus(CNeUn) which is the first and the only Chinese corpus for this research as far as we know. Considering the heterogeneity and characteristics of language use in different domain and literary style, the CNeUn corpus consists of three different sources and types, including scientific literature, product reviews, and financial articles. It contains 16,841 of sentences and 6,429 of instances, similar to the scale of BioScope which is the most frequently used corpus in English. The statistics and the experiment results show that the CNeUn corpus can adequately reflect the linguistic characteristics of negation and uncertainty in Chinese and provide the support for related research.4. Proposing negation and uncertainty identification methods in Chinese. Since the great difference between English and Chinese on grammatical structure and semantic expression, the performance of the state of the art method in English is low when it is used on Chinese corpus directly. Therefore, for cue detection, we propose a feature-based sequence labeling model with series of features, especially the morpheme feature. In addition, a cross-lingual cue expansion strategy is proposed to increase the coverage. For scope resolution, we propose a meta-decision tree model to integrate serialized features and structured features. As far as we know, this work is the first research which explores on Chinese negation and uncertainty identification systematically.In conclusion, this paper focuses on the negation and uncertainty identification on natural language text. On the one hand, we propose effective methods to improve the performance of the negation and uncertainty identification. On the other hand, we also try to promote the research progress in Chinese language. This research has achieved some preliminary results, which we hope can not only be helpful to other researchers in this area but also promote the development of deep natural language understanding.
Keywords/Search Tags:Negation, Uncertainty, Cue detection, Scope resolution, Focus identification
PDF Full Text Request
Related items