Font Size: a A A

Research On Topic-controlled Multi-document Summarization Generation

Posted on:2022-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:S B HeFull Text:PDF
GTID:2518306572497804Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapidly development of internet technology,the explosive increase of information has led to the birth of automatic text summarization,which can help people analyze and understand the text information.But in reality,there is usually more than one document describing the same thing,and these documents are usually written by different authors at different times,expressing different topic.Traditional automatic multi-document summarization aims to summarize all the information in the source documents.The lack of topic information makes the generation of summaries inaccurate and still far from the actual needs.In addition,the traditional automatic multi-document abstracts do not accurately grasp the inter-document relationships,which easily leads to contradictory and redundant information in the generated abstracts.In this paper,we propose a novel topic-controlled automatic multi-document summarization model which enhances the understanding of semantic information and interdocument relationships based on traditional multi-document summarization.Our model incorporates subjective topic information in the process of generating summaries,so that readers can choose the topic of the summaries by themselves.The generated abstracts are topic-salient and more in line with human language habits and preferences.In terms of acquiring paragraph and abstract topics,we use Prod LDA topic model which is based on automatic encoding variational inference.Prod LDA has strong generalization ability and can be well adapted to new data.In terms of topic-controlled summary generation,the model proposed in the paper is based on an encoder-decoder architecture,in which the encoder of the model uses a self-attention mechanism and an explicit graph attention mechanism to obtain word-level and paragraph-level representations of multiple documents,respectively.In the decoder part,a self-attention mechanism is used to allow each position of the decoder input sequence to learn information from the previous position,and a hierarchical topic attention is used to utilize topic to influence the paragraph-level learning,and then using the paragraph to influence the wordlevel learning.At the output module of decode,it integrating the topic,paragraph and word to generate topic-controlled multi-document summaries by considering the information of the three levels at the same time.Finally,we compares the proposed model with other models on the Multi-News dataset to demonstrate the superiority of our model in multi-document summarization task.We also build a test platform to verify the topic controllability of our model.
Keywords/Search Tags:Multi-document summarization, Topic model, Seq2Seq model, Self-attention, Hierarchical topic attention
PDF Full Text Request
Related items