Font Size: a A A

Research On The Method Of Chinese Macro Discourse Resources Construction And Structure Analysis

Posted on:2020-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:F JiangFull Text:PDF
GTID:2428330578479392Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of Natural Language Processing(NLP),discourse analysis is becoming more and more important as the research object shifts from words and sentences to higher semantic levels such as sentence groups,paragraphs,and chapters.Discourse analysis is fundamental to understand the overall text's semantics and has been widely applied to vari-ous deep NLP applications,such as sentiment analysis,question answering,and summariza-tion.Compared with the success of discourse analysis at the micro-level,there are still chal-lenges in the macro discourse analysis.The macro discourse analysis includes three sub-tasks:discourse structure analysis,discourse nuclearity identification,and discourse rela-tionship classification.Under the guidance of Chinese macro discourse structure representa-tion framework,this thesis focuses on the method of Chinese macro discourse resources construction and structure analysis,and the main research contents include the following three aspects:(1)This thesis constructs a Chinese macro discourse treebank(MCDTB)for the lack of macro discourse corpus in Chinese.Firstly,under the guidance of Chinese macro dis-course structure representation framework,this thesis annotated macro discourse structure tree,and also annotated higher-level macro discourse information,such as paragraph's topic sentences and document's abstracts.Secondly,after defining the detailed annotation process and criteria,this thesis develops annotation tools and proposes quality assurance strategies to ensure the speed and quality of annotation.Finally,720 Chinese news are annotated with Agreement rate greater than 80%and Kappa values greater than 0.6.In addition,a prelimi-nary experiment on macro-level discourse nuclearity identification in MCDTB to verify the availability of the corpus.(2)To solve the over-fitting due to too few samples at a high level,this thesis proposes a label degradation combination model to recognize the macro discourse structure.Firstly,this thesis combines structure features with semantic and macro information features to form a combination model.Then,this thesis joint the task of discourse nuclearity identification with the task of discourse structure recognition,and uses the label degradation method to degrade the predictive label of the nuclearity identification into the predictive label of the structure recognition,which can capture more detailed feature expression.The experimental results show that compared with the benchmark tion.Compared with the success of discourse analysis at the micro-level,there are still chal-system,the label degradation combination model has been improved the performance significantly in MCDTB.(3)To solve the infor:mation imbalance and weak coherence in discourse tree construc-tion with long documents,this thesis proposes a reverse reading method to construct a macro discourse tree.Firstly,inspired by Bidirectional Long Short-Term Memory(Bi-LSTM)in processing text streams,this thesis proposes a global reverse reading and local reverse read-ing method for macro-level discourse structure tree construction to alleviate information im-balance and weak coherence.Then,in the process of constructing a macro-level discourse tree based on transition-based method,this thesis builds a unified neural model to decide the next action.Finally,the experimental results on the annotated corpus MCDTB and English corpus RST-DT show the validity of the proposed model.This thesis constructs a Chinese macro discourse treebank(MCDTB)and proposes ef-fective methods of discourse structure recognition and discourse tree construction,which lays a foundation for further research of macro discourse analysis.
Keywords/Search Tags:Macro Discourse Analysis, Discourse Structure, Label Degradation, Reverse Reading
PDF Full Text Request
Related items