Font Size: a A A

Research Of Chinese Discourse Structure Representation And Resource Construction

Posted on:2016-04-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:1108330482963893Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It is well-known that interpretation of a discourse requires understanding of its rhetorical relation hierarchy since discourse units rarely exist in isolation. Research in discourse parsing is aim to reveal such relations hierarchy in discourse, which is helpful for many downstream applications, including summarization, information retrieval and question answering, etc. Due to the maturity of the lexical and syntactic parsing technologies, discourse parsing has been attracted more and more attentions in recent years.In comparison with English, howerver, there are rare studies on Chinese discourse parsing. The main reasons include: 1) There is no complete theory for Chinese discourse parsing; 2) The corpus of Chinese discourse which matches the characteristic of Chinese discourse structure is deficient; 3) Due to the complexity of Chinese discourse structure, the existing methods can not be directly applied on Chinese discourse parsing.This dissertation focuses on Chinese discourse structure representation theory. We first take value of various theories and representation scheme on the tree structure and nuclearity of Rhetorical Structure Theory(RST), relation and discourse structure of Chinese compound sentence and the sentence-group theory, the connective treatment of Penn Discourse Tree Bank(PDTB). Then we propose an effective discourse representation scheme for Chinese, called Connective-driven Dependency Tree(CDT), and give the definition of clause, connective, relation and nuculurarity etc. In CDT, clauses are regarded as leaf nodes while connectives are regarded as non-leaf nodes. In particular, connectives directly represent the hierarchy of the tree structure and the rhetorical relation of a discourse, while the nucleus of discourse units is globally determined according to the dependency theory. Compared with RST and PDTB, CDT representation has certain advantages to meet the special characteristics of Chinese discourse structure. Therefore, CDT representation is the basis of discourse corpus construction.Guided by the CDT scheme, we construct Chinese discourse structure corpus, i.e., Chinese Discourse Treebank(CDTB). This is done by manually annotating 500 documents in Chinese Treebank(CTB). We use a top-down segmentation strategy and incorporate with both manual and automatic annotation approaches to keep consistency with Chinese natives’ cognitive habits. Then we show the statistics and analysis of CDTB in details. The consistency test shows that the CDTB quality is good. And the statistical data shows that the CDTB reaches an available size. Therefore, CDTB can provide resource for the task of Chinese discourse parsing.Finally, this dissertation presents a Chinese discourse parser based on our CDTB corpus. The input of the platform is raw text while the output is a discourse tree, including the clause, the discourse hierarchy structure, discourse relations(4 classes) and discourse nuclearity. Experimental results show the appropriateness of the CDT scheme of Chinese discourse analysis and the effective of our CDTB corpus.The current studies on Chinese discourse structure is still in primary stage. The research of the Chinese discourse parsing on theory, resource, computing has great innovation in Chinese discourse parsing. The research work exhibits a great reference value to the future research in Chinese discourse parsing.
Keywords/Search Tags:Discourse parsing, Corpus, Clause, Connective, Discourse relation
PDF Full Text Request
Related items