Font Size: a A A

Coherence-based Research Of Connectives In English And Chinese Text

Posted on:2016-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:B DingFull Text:PDF
GTID:2308330464452862Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Discourse coherence usually depends on the internal structure and semantic relation of the text. Discourse refers to a series of consecutive language unit, including clause,sentence or group of sentences. Discourse relation(such as comparison, contingency, etc.)is the logical semantic relation between different discourse units. Discourse connective is usually used to explicitly express the semantic relation between discourse units. According to the presence or absence of an explicit discourse connective, such as because, but, etc.,discourse relations can be classified into explicit relations and implicit relations. In this thesis, we mainly focus on the explicit discourse relations in Chinese and English discourse corpora. The main work includes:(1) We construct connective analyzer in Chinese and English respectively. Every connective analyzer contains two components, i.e., connective identification and sense classification. Using the maximum entropy model and Conditional Random Fields(CRFs),we construct the connective analysis platform in Chinese Discourse Treebank(CDTB) and Penn Discourse Treebank(PDTB). The performance(F-values) of Chinese discourse connective identification is about 66.79% in gold parse trees(95.72% in English). For sense classification, we conduct the experiments on fully correct connectives and automatic identified connectives. Using the automatic identified connectives, the overall performance of the top four sense classification is 57.58% in Chinese and 90.14% in English.(2) Considering the potential contribution of bilingual information to the explicit discourse connective study, we annotate some of parallel corpora referring the CDTB schema. With the sentence and word alignment tools, combining with a small amount of manual annotation, we build an annotated parallel discourse corpus. The main annotated information include connective, relation type(explicit / implicit), sense, etc. Additionally,we compare explicit / implicit discourse relations distribution/conversion, and relation type in Chinese and English.(3) In order to improve the performance of Chinese connective analysis, we introduce some bilingual information into the Chinese connective analysis platform using the annotated parallel discourse corpus. Experimental results show that this method make the performance of Chinese connective identification increase by 1.7%.
Keywords/Search Tags:Discourse Parsing, Connective Identification, Sense Classification, Parallel Corpus
PDF Full Text Request
Related items