Font Size: a A A

Research Of Single Document Automatic Summarization Based On Discourse Structure Theory

Posted on:2019-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2428330545451219Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic extractive single document summarization is one of the core tasks in the field of automatic text summarization,which is extracting the sentence that represents the core content of a document directly from the document as a summary of the document.The extracted abstracts often use surface features,such as statistical information,location information,etc.,and neglect the use of deep information such as document discourse information and semantic information,resulting in uneven quality of abstracts.Therefore,on the basis of the current Chinese natural language processing entering the discourse level,this paper applies discourse structure information to extract,optimize and evaluate Chinese news corpus based on the discourse rhetorical structure and discourse topic structure.Therefore,this article focus on discourse rhetorical structure and discourse topic structure information and conducts research in the following three aspects:1.For the conductive position of discourse rhetorical structure in summarization,this article uses discourse rhetorical structure information,based on the primary and secondary relationship of the elementary discourse unit,extracts the main part of the unit,uses it as a summary of the document.2.In order to improve the coherence of the abstract,this paper uses the discourse topic structure to develop the discourse topic chain.And with the help of theme-rheme theory,we developed the corresponding extraction rules to optimize the abstract extracted above,including the extension of the missing sentences theme and removing of the redundant part to get a concise,coherent summary.Then we use the coherence evaluation method.3.In order to evaluate the coherence of the text,this paper uses the entity grid model and the neural network model respectively,in which the neural network model has two kinds of network structures,LSTM and GRU,to evaluate the coherence of the abstract.Due to the small number of artificial abstracts,we use the result of sentence sorting to evaluate these two models.Experimental results show that GRU is not only effective in classification but also fast in convergence.The innovation of this article is mainly manifested in(1)research on extractive autosummarization of Chinese articles using discourse rhetorical structure information whenanalyzing the quality of summaries;(2)use the discourse topic structure information toimprove the coherence of the extractive summaries.(3)When analyzing the quality ofabstracts,it is considered that the coherence of texts should be added to the criteria ofevaluation summaries,and the consistency of the abstracts should be evaluated with aconsistent model.In the end,the automatic summarization system can be used to extract the summaries of Chinese texts that have been marked with discourse rhetorical and discourse topic information.
Keywords/Search Tags:discourse rhetorical structure, discourse topic structure, neural network, coherence, single-document automatic summarization
PDF Full Text Request
Related items