Font Size: a A A

Research On Chinese Discourse Topic Structure:Representation,Resource Construction And Its Analysis

Posted on:2018-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F XiFull Text:PDF
GTID:1318330542959101Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The analysis of discourse topic structures focuses on the discourse intension,which plays a fundamental role in discourse-level semantic analysis.Currently,most research in NLP focuses on morphological and syntactic levels and there is no research on inherent regulations in discourse.This results in the lack of theoretical and computational methodologies towards effective discourse topic analysis and severely restricts wide applications.This project addresses Chinese discourse topic structure analysis from following four aspects.First,this thesis focuses on the representation system of Chinese discourse topic structure.Specifically,we took the advantage of the Theme-rheme Theory,Rhetorical Structure Theory(RST)and Penn Discourse Treebank(PDTB)in English,studied the research on Chinese compound sentence and the sentence-group theory,and proposed a representation scheme Micro-Topic Scheme(MTS)based on the theme-rheme theory for Chinese discourse topic structure,which uses the characteristics of Chinese itself.The representation architecture of Chinese discourse topic structure are constructed according to the representation scheme MTS.In this thesis,key elements of Chinese discourse topic structure are defined according to the characteristics of Chinese discourse,such as elemental discourse topic unit(EDTU),micro-topic scheme(MTS),theme and rheme of EDTU,micro-topic link,and micro-topic chain.A discourse micro-topic scheme is formalized as quadruple,which is a chain structure.Each chain node is an EDTU(clause),and the internal theme or rheme of EDTU is the end of connection.End-to-end connections built by micro-topic links reflect a kind of semantic relations among discourses.Compared with both the Onto Notes corpus system and the generalized topic structure theory,our proposed representation system is more suitable for Chinese and has its theoretical advantages.This representation system provides theoretical fundamentals for the further development of constructing textual topic structure.Second,on the basis of this representation system,the thesis studied the construction of Chinese discourse topic structure corpus.The thesis constructed a Chinese Discourse Topic Corpus(CDTC)based on micro-topic chains,which used a corpus annotation method of both top-down and chain-backtracking annotation strategies and incorporated with both manual and automatic annotation approaches to keep consistency with Chinese native's cognitive habits.Currently,CDTC contains 500 documents.It carried out a detailed statistical analysis and demonstrated the annotation of the corpus.A consistency test shows that CDTC can fully reflect the difficulty of Chinese discourse topic analysis,and can provide corpus resource support for related research.Third,the thesis studied the dynamic formation process of the discourse topic structure,proposed and archived an automatic analysis platform based on the thematic progression theory in Chinese discourse topic structure.The functionalities of this platform include the identification of EDTU,the identification of theme-rheme in the micro-topic scheme,the identification of the micro-topic link and the construction of the discourse topic chain.Our experimental results verified the rationality of the representation system based on the micro topic chain,and the availability of the annotated CDTC corpus.Finally,this thesis studied the effectiveness of the micro-topic structure of Chinese discourse by applying it to the main task(coreference resolution)in NLP.We formed chapter micro topic structure based on our proposed representation and construction of the corpus(CDTC),combined with heuristic filtering rules and machine learning methods,utilized discourse topic structure feature extraction and top-down bottom-up shallow semantic feature extraction,and realized a prototype system for Chinese Coreference resolution.Our experimental results on the CDTC corpus and the Chinese corpus of CoNLL2012 Shared Task showed the effectiveness of our proposed method.In conclusion,current studies on Chinese discourse structure is still at its primary stage.The research of this thesis belongs to an exploration work.This research on the Chinese discourse topic structure analysis has a certain extend of innovation on the theory,the resource,and the computing of Chinese discourse structure.We hope this thesis can not only be helpful to other researchers in this area but also promote the development of deep natural language understanding.
Keywords/Search Tags:Chinese Discourse Topic Structure, Discourse Topic Structure Theory, Corpus Annotation, Computational Modeling, Theme-Rheme Theory, Thematic Progression
PDF Full Text Request
Related items