Font Size: a A A

Research On PDTB-based End-to-end English Discourse Parser

Posted on:2016-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2308330464453249Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Discourse Parsing is a challenge task in Natural Language Processing(NLP), which is the basis of Natural Language Understanding. It is very important to many NLP downstream tasks(like QA system, automatic summarization and discourse generation etc.). Recently,with the development of discourse theory and the building of large-scale discourse corpora,automatic discourse parser becomes the need. In this thesis, we aim to build an end-to-end PDTB-styled discourse parser with data-driven approach by using large-scale corpus PDTB.1. We use a three-stage method to do explicit discourse parsing. First, we propose an explicit connective identification model via Conditional Random Fields(CRFs), which gets rid of the limit of explicit connective candidate list. Second, we build an explicit sense classification model with maximum entropy model. Last, we also view argument extraction as a sequence labeling problem. Because of the specialization two arguments, we model Arg1 and Arg2 respectively.2. For pipelined system, it could cause error propagation problem. We instead propose a joint learning approach via structured perceptron. In order to do this, we decompose the explicit discourse parser into two components, i.e., a connective labeler, which identifies connectives from a text and determines their senses in classifying discourse relationship, and an argument labeler, which extracts corresponding arguments for a given connective. Evaluation in the PDTB corpus shows the appropriateness of our framework and the effectiveness of our joint learning approach.3. Due to the absence of connective, implicit discourse parsing is a challenge task. In this thesis, we first propose a baseline system for implicit discourse parsing. Then, for the imbalance of relations distribution, we apply all labeled data to build multiple binary classifiers for each classification task, and use the adding rule to identify final classification result for each instance. We also use forward feature selection method to select an optimal feature subset for each classification task. Experimental results in the PDTB corpus show that our proposed method can significantly improve the state-of-the-art performance of recognizing implicit discourse relation.4. End-to-End discourse parsing needs to parse all types of relations and their arguments. Previous studies classify discourse relations into explicit and implicit(or nonexplicit) relations, and conduct discourse parsing independently for different relations. The problem in this way is that the boundaries between these two categories are vague, since some implicit relations can be expressed explicitly by inserting potential connectives, while many explicit connectives are removable. In order to overcome above problem, instead, we differentiate intra- and inter-sentential relations and propose an end-to-end discourse parser based on intra- and inter-sentential discourse parsing models. Besides, in comparison with traditional exact matching and partial matching for argument labeling, a new metric, main predicate matching is proposed to better evaluate the performance of shallow discourse parsing.
Keywords/Search Tags:Discourse Parsing, PDTB, Joint Model, Implicit Discourse Parsing
PDF Full Text Request
Related items