Font Size: a A A

Research On The Method Of Chinese Macro Discourse Tree Auto Construction

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2428330605974888Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
In the field of natural language processing,the research object has gradually shifted from small-grained semantic units such as characters and words to larger-grained semantic units such as sentence groups and discourses.Therefore,the task of discourse analysis is becoming more and more important.Discourse analysis is devoted to the understanding of text from the whole article and to clarify its outline.Therefore,it can also be widely used in other NLP tasks such as sentiment analysis,question answering systems,and text summaries.Discourse analysis is divided into micro discourse analysis to study the internal structure of paragraphs and macro discourse analysis to study the relationship between paragraphs and paragraph groups.Compared with the study of micro discourse analysis,the task of macro discourse analysis is still in its infancy.Macro discourse analysis includes three sub-tasks:macro discourse structure analysis,macro discourse nuclearity identification,and macro discourse relationship classification.This paper explores these three sub-tasks separately,and finally builds an automatic Chinese-macro-discourse-tree constructor that can generate a complete macro discourse tree from raw text.The main research contents of this paper include the following four aspects:(1)To solve the problem of Underutilization of text semantics and having heavy dependence on manually extracted features.This paper proposes a method for macro discourse tree construction based on multiple views and word-pair similarity.First,an LSTM model is introduced as a base model,and then we use a word-pair similarity unit to capture the relationship between a pair of discourse units.Finally,we introduce topic information and capture the relationship between discourse units and topic to enhance the representation of the discourse unit.The experiment on MCDTB corpus verify its effectiveness,and the performance is improved by 4.68%compared with the baseline.(2)Conduct the preliminary study on the macro discourse relationship classification task and propose a method for macro discourse relationship classification task based on the representation of macro discourse semantic.First,we filter the features that can be transported to the task from the previous study in the micro discourse field.Then,propose a new structure feature and a macro discourse representation method based on multiple word vectors.Compared with the baseline features,the model performance is improved by 4.08 and 5.97 percent on the MCDTB corpus and macro RST corpus,respectively.(3)To solve the problem that the semantic and structural information cannot be well combined,this paper proposes a method to improve the performance of the nuclearity identification task by enhancing the macro-structure information.It converts the original problem into a problem of classifying graph nodes.By taking a whole discourse tree as a sample,the structural information of the entire discourse tree is strengthened,specifically,the structural relations implied in the semantic representation is strengthened.At the same time,we perform the model behavior analysis to explain the reasons why the structural information and semantic information cannot be well combined in the previous research.Based on the analysis,a two-step training method is proposed,which retains the characteristics of weak structural features.The result on MCDTB shows the method proposed by this paper improves the model performance by 2.48 percent compared with the BERT baseline.(4)Due to the non-standard annotation process,and the incomplete visualization system of macro discourse tree,it is difficult to conduct research or enlarge the scale of the corpus on Chinese macro discourse tasks.To solve the problem,this paper builds a macro text analysis platform to provide toolchain support for MCDTB,from corpus annotation for annotators to the visualization of discourse tree and result analysis for researchers.It will be helpful for the corpus expansion and further research in macro discourse analysis in the future.We also combine the achievements on the research of macro discourse analysis and implement a Chinese macro discourse tree auto constructor on the platform to provide the service that converts raw text into a macro discourse tree.This paper makes some study of the three major issues in macro discourse analysis,and proposes effective solutions,which have improved the performance compared to existing research,and provide references for future research.
Keywords/Search Tags:Macro Discourse Analysis, Discourse Tree Construction, Discourse Semantic Representation, Features Combination
PDF Full Text Request
Related items