Font Size: a A A

Research On Chinese Abstractive Summarization Via Fusing Topic Information

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:X P JiangFull Text:PDF
GTID:2428330605461326Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Internet contains billions of online documents and is growing exponentially every day.It is difficult for people to identify valuable information from these documents quickly and accurately.Therefore,it is necessary to provide a tool that can access in time and digest all kinds of information to alleviate information overload.Automatic summarization can simplify one or more long texts into a short text containing the most important information,which can effectively alleviate people's reading stress.In recent years,with the rapid development of deep learning algorithm,big data and hardware computing power,the attention-based sequence to sequence model has achieved success in the task of abstractive summarization,and has been widely studied in academia.However,there are some shortcomings in the existing summarization model,such as inaccurate content and insufficient summary.To alleviate this problem,this paper considers keywords as the topic information of the original text and integrates them into the pointer-generation network,so as to improve the quality of the summary generated by the model.The main research work of this paper is as follows:First,the attention mechanism is usually used to obtain the alignment information between the target word and the original text,but it is difficult to identify the topic content contained in the original text,resulting in the inductive ability of the context vector to the topic information of the original text.When writing a summary,people often refer to the topic information in the document,which is usually presented in the form of keywords or central sentences.In view of this,this paper extracts keywords to mine the topic information of the document and integrates them into the attention mechanism explicitly,so that the model can generate topical-oriented summary in the context aware way under the guidance of global topic information.Specifically,first we use TextRank algorithm to extract the keywords of the original text.Then,the sum of all keyword embeddings is regarded as the global topic representation.Finally,the global topic representation is used to guide the attention mechanism to obtain topic information.This method won the third place in NLPCC 2018 Chinese single document summarization evaluation competition,which proves the feasibility and effectiveness of this method.Secondly,the topic of a document is usually composed of multiple sub-topics,and the location and probability of each sub-topics are different.It is necessary to distinguish the sub-topics to improve the quality of generating summary.This paper introduces the topic keyword attention mechanism to obtain the probability distribution of keywords,and then integrates the weighted topic keyword context vector into the attention mechanism.Compared with the global topic representation,the topic keyword attention mechanism can obtain the local topic information of each decoding time step.The experimental results on NLPCC 2018 Chinese single document summarization evaluation data set show the effectiveness of this method.
Keywords/Search Tags:Automatic summarization, Topic information, Sequence to sequence model, Attention mechanism
PDF Full Text Request
Related items