Font Size: a A A

The Study Of Recognizing Implicit Discourse Relations Using Semi-supervised Mcthods

Posted on:2015-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2268330428960095Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the area of Nature Language Processing (NLP), it is a difficult task for discourse relation identification. It aims to identify and label the relations that hold between arbitrary spans of text (clauses, sentences, or paragraphs). This task is crucial for understanding a given text, especially helpful for numerous natural language processing applications, e.g., text summarization, question answering and textual entailment.Generally, discourse relations marked by explicit connectives in text are defined as explicit discourse relations, otherwise when such connectives are absent they are defined as implicit discourse relations. The presence of discourse connectives between textual units can greatly express their relation senses, so explicit discourse relations have been shown to be easily identified. Implicit discourse relations, on the other hand, are quite hard to identify because there are no connectives between textual units to provide semantic information. Many researchers have employed fully supervised machine learning approaches to detect them since the release of PDTB2.0. Although supervised approaches to discourse relation recognition achieve good results on frequent relations, performance is poor on infrequent relations. In addition, supervised methods required a sufficient number of labeled training data to enhance reliability and robustness of the model. However, labeled data is often time-consuming and labor-intensive to obtain. On the other hand, unlabeled data can be found everywhere. To minimize the corpus annotation requirement, in this paper, we study on semi-supervised methods to recognize implicit discourse relations with a small amount of labeled data and a large amount of unlabeled data. In addition, we represent candidate instances by collecting several lexical, syntactic, and semantic features to further optimize the model. The main work can be summarized as follows.(1) In order to construct the knowledge representation of candidate discourse relation instances, we extract9kinds of lexical, syntactic and semantic features from PDTB in our models; they are First-Last-First3, Inquirer Tags, Production rules, Dependency rules, Polarity, Verbs, Modality, NER, and Unigram.(2) Firstly we build two supervised learning models for implicit discourse relation identification. The conclusions show that Production Rules and Verbs achieve good performance across all relations; the combination of them achieves better performance.(3) Then we build two novel semi-supervised learning models for implicit discourse relation identification, i.e., self-training and co-training. By using a small amount of labeled data together with a large amount of unlabeled data, our proposed models based on self-training and co-training algorithms achieve satisfied performance compared with other supervised methods.
Keywords/Search Tags:Implicit discourse relations recognition, semi-supervised, PDTB
PDF Full Text Request
Related items