Font Size: a A A

Research On Data Augmentation And Argurrent Representation For Implicit Discourse Relation Recognition

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:H B RuanFull Text:PDF
GTID:2428330605974899Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Discourse relation recognition is an extremely challenging subtask of the shallow dis-course parsing,it aims to study the semantic relation between two arguments that in the same discourse.Serving as a basic task of the natural language processing,discourse rela-tion recognition benefits to the upper-layer applications.So far,Penn Discourse Treebank(PDTB)is the largest authoritative dataset in the field of English discourse relation recog-nition.The corpus construct a three layer semantic relation system for discourse relation.According to the existence of a connective between two arguments,the corpus divided the discourse relation recognition into two subtasks,which are explicit discourse relation recog-nition(EDRR)and implicit discourse relation recognition(IDRR).So far,EDRR reaches more than 93%accuracy only using connectives,which is of practicability to some exten-t.But the performance of IDRR is still relatively low.It relies on deep understanding of argument semantic information.The paper studies on the IDRR,and propose a method based on data augmentation and argument representation.The research contents contain the following three parts:(1)Data Augmentation based Implicit Causal Relation RecognitionThe existing methods usually utilize neural network for IDRR,which relies on high quality and large quantity training data.While the number of implicit discourse relation samples in the PDTB is relatively low,leading to poor performance on IDRR.To solve the problem,previous works usually use connectives to construct templates,so as to dig out explicit discourse relation samples from external corpora.The researchers remove the connective of the explicit discourse relation samples to generate synthetic implicit ones.However,removing the connectives directly may cause meaning shifts.So adding this kind of data into the training set may affect the classifier.To obtain high quality implicit discourse relation samples,based on the inherent relation between a question and its answer,we use WHY-type question answer pairs to construct synthetic implicit causally-related argument pairs.Besides,active learning is adopted to select high information samples from the syn-thetic samples.We use the selected samples to expand the implicit causal training dataset.Experiment result on PDTB shows that the method outperforms the state-of-the-art(SOTA)data augmentation method,reaching 52.19%on F1 score.(2)Graph Convolutional Network based Implicit Discourse Relation RecognitionPrevious works construct complex neural network models to improve the performance of IDRR.However,researchers usually just use the interactive information between the ar-guments and neglect the key information of the arguments itself.To solve this problem,we propose a method based on the graph convolutional network(GCN).We first encode the arguments with fine-tuned pre-trained language model BERT,and concatenate the argument representations to form the feature matrix.Besides,we compute the self attention and inter-active attention scores of the arguments,and concatenate them to form the adjacent matrix.On the basis,we construct a two layer graph convolutional network,so as to update the ar-gument representations according to the self and interactive information.Finally,we obtain the argument representation that benefit to the IDRR.We conduct experiments on the PDTB.Binary classification result shows that the method outperforms the SOTA methods on Con-tingency and Expansion,reaching 60.70%and 74.49%on F1 score respectively.Besides,the method is comparable to the SOTA methods on four-way classification.(3)Implicit Discourse Relation Recognition SystemCombined with the former research on IDDR method,we develop the IDRR system based on the Vue,Bootstrap and Tornado framework.The system provides four functional interfaces.When the user inputs two arguments and clicks the IDRR button,the system will recognize the discourse relation between the arguments according to the proposed IDDR method based on the GCN,and return the result to the user.The system displays this research and plays an assistant role in other natural language processing fields to some extend.From the aspects of data augmentation and argument representation,we relieve the data shortage problem of the implicit discourse relation data to some extent.Meanwhile,we over-come the difficulty that it is hard to obtain semantic informative argument representations.Besides,we develop an IDRR system to display our work.
Keywords/Search Tags:Implicit Discourse Relation Recognition, Data Augmentation, Argument Representation, Active Learning, Attention Mechanism
PDF Full Text Request
Related items