The Study Of Recognizing Implicit Discourse Relations Using Semi-supervised Mcthods

Posted on:2015-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:C Liu

Full Text:PDF

GTID:2268330428960095

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In the area of Nature Language Processing (NLP), it is a difficult task for discourse relation identification. It aims to identify and label the relations that hold between arbitrary spans of text (clauses, sentences, or paragraphs). This task is crucial for understanding a given text, especially helpful for numerous natural language processing applications, e.g., text summarization, question answering and textual entailment.Generally, discourse relations marked by explicit connectives in text are defined as explicit discourse relations, otherwise when such connectives are absent they are defined as implicit discourse relations. The presence of discourse connectives between textual units can greatly express their relation senses, so explicit discourse relations have been shown to be easily identified. Implicit discourse relations, on the other hand, are quite hard to identify because there are no connectives between textual units to provide semantic information. Many researchers have employed fully supervised machine learning approaches to detect them since the release of PDTB2.0. Although supervised approaches to discourse relation recognition achieve good results on frequent relations, performance is poor on infrequent relations. In addition, supervised methods required a sufficient number of labeled training data to enhance reliability and robustness of the model. However, labeled data is often time-consuming and labor-intensive to obtain. On the other hand, unlabeled data can be found everywhere. To minimize the corpus annotation requirement, in this paper, we study on semi-supervised methods to recognize implicit discourse relations with a small amount of labeled data and a large amount of unlabeled data. In addition, we represent candidate instances by collecting several lexical, syntactic, and semantic features to further optimize the model. The main work can be summarized as follows.(1) In order to construct the knowledge representation of candidate discourse relation instances, we extract9kinds of lexical, syntactic and semantic features from PDTB in our models; they are First-Last-First3, Inquirer Tags, Production rules, Dependency rules, Polarity, Verbs, Modality, NER, and Unigram.(2) Firstly we build two supervised learning models for implicit discourse relation identification. The conclusions show that Production Rules and Verbs achieve good performance across all relations; the combination of them achieves better performance.(3) Then we build two novel semi-supervised learning models for implicit discourse relation identification, i.e., self-training and co-training. By using a small amount of labeled data together with a large amount of unlabeled data, our proposed models based on self-training and co-training algorithms achieve satisfied performance compared with other supervised methods.

Keywords/Search Tags:

Implicit discourse relations recognition, semi-supervised, PDTB

PDF Full Text Request

Related items

1	Semi-supervised Implicit Discourse Relation Recognition
2	Research On PDTB-based End-to-end English Discourse Parser
3	A Implicit Discourse Relation Recognition Model Based On Convolutional Neural Network
4	Research On Implicit Discourse Relation Recognition Integrating Different Levels Of Clues
5	Research On Micro Discourse Nuclearity And Relation Recognition In Chinese
6	Semi-Supervised Implicit Code Decoding Based Sever Weather Restoration
7	Research On The Method Of Chinese Micro-Discourse Analysis
8	Deep Neural Networks Based End-to-End Discourse Parsing
9	SAR Image Target Recognition Based On Semi-supervised Learning
10	Research On Semi-supervised Clustering And Classification Algorithm