Font Size: a A A

Discourse Rhetorical Structure Analysis With The Integration Of Multi-Level Knowledge

Posted on:2023-10-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y ZhangFull Text:PDF
GTID:1528307370467834Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Rhetorical Structure Theory(RST)regards an article as composed of several elementary discourse units(EDUs),and these discourse units are organized and connected through specific rhetorical relations to form a discourse tree.The discourse rhetorical structure plays an important role in promoting Natural Language Understanding because it can accurately describe the positional distribution of the EDUs within an article and the rhetorical relations between adjacent discourse units.In recent years,discourse rhetorical structure information has been successfully applied to various Natural Language Processing tasks,such as text summarization,machine translation evaluation,sentiment analysis,and so on.Based on the RST theory,discourse parsing aims to automatically parse the structure,nuclearity,and rhetorical relations between EDUs within an article.At present,there are two main challenges faced by discourse parsing.On the one hand,news articles usually have the characteristic of long-distance dependence in content,and the organization of the entire article is relatively complex.Modeling for such a task is inherently difficult.On the other hand,the RST data resources are scarce,and the construction of new RST data requires high-quality annotators and a lot of time and money,which is very difficult to implement.Limited by the above problems,the development of RST discourse parsing is slow at this stage,and the existing discourse parsers are difficult to provide effective help for upper-layer natural language processing applications due to insufficient accuracy and generalization.The main research contents of this article are summarized as follows:RST parsing incorporated with global context information.News articles tend to have the characteristics of long-distance dependence in content and deep hierarchies in structure.Given this,this paper first constructs a transition-based bottom-up parsing system and demonstrates its serious error propagation problem and excessive dependence on the local context information.Furthermore,this paper proposes to regard discourse parsing as a topdown text segmentation process,which can better utilize the global context information to adapt to the long-distance dependence of discourse.In addition,regarding the deep hierarchies of discourse structure,this paper analyzes the advantages of bottom-up and top-down methods and explores the combination of bottom-up micro-perspective and top-down macroperspective to construct a discourse parser with bidirectional representation.By flexibly using the global and local context information,this method can alleviate the modeling difficulty caused by the deep hierarchical structure of the RST tree.RST parsing incorporated with prior knowledge of discourse structure.The scarcity of discourse structure resources is a prominent problem in RST analysis,which seriously restricts the ability of data-driven discourse parsing methods.In this case,how to mine more available information from limited resources and how to utilize large-scale unlabeled discourse data are the keys to solving this problem.Based on this,this study proposes two more specific scientific questions and conducts research in a problem-oriented manner:(1)How to construct a machine brain with knowledge of rhetorical structure,which has not only the ability to create rhetorical structure data but also judge the pros and cons of rhetorical structure.Specifically,this paper proposes to graphically represent discourse trees and use adversarial learning methods to achieve global decision-making optimization of RST parsing.This research aims to enable machines to capture higher-quality rhetorical knowledge and evaluate discourse trees automatically.(2)How to construct a virtuous circle,on the one hand,the machine brain is used to generate high-quality external RST data;on the other hand,the knowledge of rhetorical structure memorized in the machine brain can be continuously refined.Specifically,this paper proposes using the high-quality prior rhetorical structure knowledge obtained by pre-training the above RST parser to realize automatic labeling and evaluation of unlabeled text data.Based on this large-scale data,this work implements deep learning on the above parsing model to capture better rhetorical structure knowledge for a new round of data labeling,thus forming a virtuous circle and effectively alleviating the problem of RST data scarcity.RST parsing incorporated with nuclearity and topic knowledge.The rhetorical knowledge contained in small-scale labeled data in the field is seriously insufficient,and exploring the correlation between RST knowledge and other discourse knowledge is the key to improving the performance of RST parsing in the long term.Therefore,this paper aims to promote RST parsing from the study of limited knowledge and data of rhetorical structure to the study of the perception of more general discourse knowledge.Considering the close connection between the discourse topic structure used to express the cohesion of the text and the rhetorical structure that expresses the discourse coherence,this paper studies RST discourse parsing from the perspective of topic knowledge.On the one hand,this work analyzes the topical manifestation of rhetorical structure from a theoretical point of view.It establishes the implicit boundary of "nuclearity topic" within the rhetorical structure to explore the help of this boundary knowledge in understanding the rhetorical structure.On the other hand,this study manually annotates the topic chain structure of the news articles in RST-DT.Based on the annotated structure,topic boundaries are extracted and employed to explore the help of topic knowledge in understanding rhetorical structure.By analyzing the correlation between the above two kinds of discourse knowledge,this paper combines specific experimental results to analyze the RST parsing results using nuclearity boundary knowledge and discourse topic knowledge from the perspective of interpretability.Overall,this paper extensively investigates document-level RST analysis from three perspectives:discourse dependence,discourse sparsity,and discourse topicality.The study on discourse dependence examines the distribution of discourse structures.The study on discourse sparsity infers external discourse structure data based on the learned RST data distribution.The investigation into discourse topicality explores topic phenomena within the RST structure and seeks to utilize external topic knowledge for RST parsing.These perspectives leverage different levels of discourse knowledge:discourse dependence uses artificially induced rhetorical structure distribution knowledge,discourse sparsity utilizes implicit rhetorical structure knowledge,and discourse topicality employs explicit discourse topic knowledge.
Keywords/Search Tags:Rhetorical Structure Parsing, Discourse Analysis, Discourse Knowledge Mining, Natural Language Processing
PDF Full Text Request
Related items