Font Size: a A A

Research On Representation Schema,Resource Construction And Computational Modeling Of Macro Discourse Structure

Posted on:2019-11-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M ChuFull Text:PDF
GTID:1368330578479841Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Discourse parsing is the task of identifying the relatedness and the particular discourse relations among various discourse units in a text,which consists of continuous segments or sentences.Recently discourse parsing has drawn more and more attention due to its importance in natural language processing(NLP)applications,such as machine translation,question answering,and text summarization,etc.,and therefore becomes one of hot research topics in NLP.With different focuses,discourse structures could be categorized into two types:micro structure and macro structure.Micro structure refers to the structure and discourse relations among the discourse units in a sentence,or consecutive sentences in one paragraph,in which a discourse unit is a clause,or a sentence.On the other hand,macro structure refers to the structure and discourse relations among paragraphs,chapters,or even documents,in which a discourse unit is a paragraph,or a chapter.While there has been substantial work on discourse parsing for micro discourse struc-ture,studies for macro discourse structure are few,most likely due to the unavailability of corpus.For the above reasons,this paper aims at exploreing a representation schema,building a corpus,and developing automatic discourse parsing models of macro discourse structure.The main research contents are as follows:1.This paper proposes a macro-micro unified discourse structure framework with the primary-secondary relation as the carrier.Furthermore,this paper explores a macro dis-course structure presentation schema to present the macro level discourse structure,and constructs the logical semantic structure and functional pragmatic structure respectively.In this representation schema,each discourse is represented as a hierarchical discourse tree.In the macro discourse structure tree,leaf nodes represent paragraphs,and non-leaf nodes rep-resent discourse relations.The edges connect the discourse units,with the arrows pointing to the "Primary" units.2.Guided by the macro discourse structure framework,this paper has carried out an-notating work of macro Chinese discourse structure,which called Macro Chinese Discourse Treebank(MCDTB).In the process of annotating,the structure definition and annotating criteria are modified iteratively.After nearly a year of annotation,720 news wire articles are annotated.3.On these bases,this paper concentrates on two tasks of macro discourse structure analysis,including structure identification and primary-secondary recognition.In order to reduce the error transmission between the associated tasks,this paper adopts a joint mod-el of the two tasks,and an Integer Linear Programming approach is proposed to achieve global optimization with various kinds of constraints.Finally,this paper presents an End-to-End macro discourse structure parser based on MCDTB.This discourse parser adopts linear chain conditional random field and support vector machine respectively as the basic classifi-er for discourse structure identification and discourse relation classification.By combining labeling with a bottom-up tree building approach,the discourse structure parser is able to create complete discourse structure trees.At present,macro discourse structure analysis is still in primary stage.This study is an exploratory work.It has great innovation in representation schema,corpus resources and computing model in the macro discourse structure.The research work exhibits great reference value for the relevant research in this field.
Keywords/Search Tags:Discourse Analysis, Macro Discourse Structure, Representation Schema, Cor-pus, Primary-Secondary Relation, Discourse Relation
PDF Full Text Request
Related items