Font Size: a A A

Semi-supervised Implicit Discourse Relation Recognition

Posted on:2018-06-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X WuFull Text:PDF
GTID:1368330515453677Subject:Artificial Intelligence
Abstract/Summary:PDF Full Text Request
Implicit discourse relation recognition aims to discover the semantic relationship(for example,Contrast or Cause)between two arguments(sentences or clauses in most cases)where the discourse connective is absent.The accurate identification of these relationships is also helpful for many natural language processing applications,such as machine transla-tion,sentiment classification and question answering.However,implicit discourse relation recognition is still challenging,and its difficulties lie in:1)Without discourse connectives,recognizing implicit discourse relations needs to understand the semantics of two arguments.2)The mainstream models are data-driven while the manually labeled discourse data are limited.Therefore,most of the current researches focus on:1)Developing models based on neural networks to capture the semantics of arguments.2)Using semi-supervised methods to address the shortage of labeled data.In the thesis,we follow the second line and focus on using explicit discourse instances(or synthetic implicit discourse instances constructed from them)to improve the performance.The main contributions of our work are summarized as follows:1)Co-training for implicit discourse relation recognition.Previous researchers have shown that synthetic implicit discourse instances have the domain problem and the meaning shift problem,and using them indiscriminately as additional training data would degrade the performance.Therefore,we first propose to learn distributed features based on recursive autoencoders for discourse data,which are experimentally proved to be complementary to manual features.Then we propose a co-training approach based on the manual and dis-tributed features,which leverages the complementarity of two kinds of features to select useful synthetic instances as additional training data.Experimental results show that the proposed approach is effective on both the English and Chinese implicit discourse relation recognition.2)Bilingually-constrained synthetic data for implicit discourse relation recognition.Based on the implicit/explicit mismatch phenomenon between Chinese and English,we pro-pose to construct bilingually-constrained synthetic implicit discourse instances from Chi-nese/English sentence pairs for the first time.These synthetic instances can partly avoid the domain problem and the meaning shift problem,and thus are more suitable as addition-al training data.Then,we design a simple and effective multi-task neural network model to incorporate these synthetic instances.Experimental results indicate that our approach significantly outperforms the baselines which incorporate other additional training data.3)Implicit discourse relation recognition based on connective-sensitive word vector representations.In explicit discourse instances,synonym(antonym)word pairs tend to ap-pear around the discourse connective and(but),and word pairs around other connectives also show some regularity.Therefore,we propose to learn connective-sensitive word vector representations based on large numbers of explicit discourse instances.The learned word vector representations can capture discourse relationships between words.Using them in-stead of the common word vector representations as features improves the performance of implicit discourse relation recognition significantly.Experimental results also indicate that the proposed approach can effectively make use of massive explicit discourse data.
Keywords/Search Tags:Implicit discourse relation recognition, Synthetic implicit discourse data, Co-training, Bilingually-constrained synthetic implicit discourse data, Multi-task neural network model, Connective-sensitive word vector representations
PDF Full Text Request
Related items