Font Size: a A A

Research On News Automatic Summarization Based On Improved Transformer

Posted on:2024-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J J PengFull Text:PDF
GTID:2568307061969609Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the help of big data analysis,intelligence analysts are able to extract potential intelligence from seemingly unrelated news events.However,comprehensively and timely screening and analyzing various types of news requires a significant amount of time and effort.Therefore,how to obtain a clear and understandable summary that can convey the main theme of the news in a short period of time for auxiliary reading has become an urgent issue to be addressed.Transformer models can effectively solve the problem of sequence parallel training in automatic text summarization generation tasks,greatly improving the speed of summary generation.However,they are not good at encoding topic information and cannot obtain the main theme information of the news.During the summarization generation process,the model will repeatedly pay attention to the same words,resulting in the generation of duplicate content.In addition,due to the length limit of the input sequence of the Transformer model,truncation is usually adopted for some longer news articles,which may result in the loss of some original information and affect the comprehensiveness of the generated summary.To solve these problems,this study conducts the following research:(1)To improve the encoding ability of the Transformer model for topic information.This thesis adds topic information to the traditional attention mechanism.Firstly,the LDA topic model is used to obtain the topic word distribution and construct a topic similarity matrix.Then adjust the attention weight matrix,so that the model can learn the topic relevance between different words in the news,so as to get a summary with clear topic information.In addition,in order to solve the problem of generating duplicate summaries,during the model decoding process,coverage vectors are initialized to optimize attention scores.Then,we use coverage loss to modify the loss function to reduce the possibility of paying attention to repetitive positions.After experimental verification,the model proposed in this thesis has achieved good results on Chinese news abstract datasets of different lengths,which can effectively improve the topic consistency and readability of generated abstracts.Compared to the traditional Transformer model,the summary results generated by this model increased by 2.34,2.95,and3.69 on the long news data sets,and by 3.12,3.86,and 6.27 on the short news data sets.(2)In order to address the issue of missing source information in the Transformer model for long news summary generation tasks,this thesis proposes a two-stage extractive-generative summary generation model.First,a key sentence extraction algorithm based on the TextRank algorithm is proposed.More comprehensive text feature information is obtained at the sentence and word levels based on the special structure of the news,including sentence and paragraph positions,sentence-title similarity,key sentences,sentence length,clue and transition words,and key and proper nouns.The probability transition matrix of the TextRank algorithm is modified using text features of different dimensions to obtain more accurate sentence weights.Then,the MMR algorithm is used to update the sentence weights,and a candidate summary set is obtained through beam search.The summary with the highest coherence is selected based on the MMR score.Finally,the candidate summary is input into a generative summary model to obtain the final summary.Experimental results on the NLP2107 long news summary dataset show that the improved TextRank algorithm can obtain a low-redundancy and high-coverage candidate summary compared to mainstream extractive algorithms.For long news,the introduction of an extractive summary algorithm can effectively retain the key information of the original text,improve the ROUGE and semantic similarity of the generated summary.
Keywords/Search Tags:News summarization, Transformer, TextRank, LDA topic model
PDF Full Text Request
Related items