Font Size: a A A

Research And Application Of Automatic Text Summarization Technology Based On Deep Learning

Posted on:2022-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2518306761959739Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile Internet,many self-media APPs have spewed out,which has greatly enriched people's lives,but at the same time,massive amounts of text information have also been accumulated.Automatic text summarization technology can extract key information from complex texts,filter irrelevant content,and improve people's work efficiency.At present,the sequence-tosequence(Seq2Seq)model has become one of the mainstream research directions in automatic text summarization technology.The Seq2 Seq model consists of an encoder and a decoder.It can flexibly process input and output data and widely used in many fields.Although the Seq2 Seq model has made some breakthroughs in text summarization,there are still some issues that deserve attention and research.For example,such as incomplete keyword representation,repeated words or sentences,unknown words,and exposure bias problem.Aiming at the above problems,this paper proposes a Seq2 Seq model based on multi-head attention mechanism and keyword replacement,and a Seq2 Seq model generation method based on improved attention mechanism and reinforcement learning.The main work is divided into the following two parts:(1)A Seq2 Seq model based on multi-head attention mechanism and keyword replacement.The main contents include: multi-head attention mechanism,the model establishes the connection between the encoder and the decoder through the multi-head attention mechanism,can learn the feature information of text in different spaces,and solves the problem of long-distance dependence;keyword replacement,The method uses the Text Rank algorithm based on word vector to extract keywords from the source text to generate a keyword set,and then replace the keywords generated by the previous model to form the final abstract,which solves the problem of incomplete keyword representation.Finally,experiments are carried out on the Wan Fang scientific research data set,and the experimental results verify the effectiveness of the model.(2)A Seq2 Seq model based on improved attention mechanism and reinforcement learning.The main contents include: the Intra-attention mechanism,the encoder uses the encoding internal attention to penalize the part with high historical attention,and the decoder uses the decoding internal attention to penalize the historically predicted words,in order to avoid generating the same word at the current moment.which solves the problem of repeated word generation;the pointer generation mechanism,it calculates the pointer probability by using the pointer generator,and then decides whether to select an existing word from the input text or generate a word from a vocabulary according to the pointer probability.which solves the problem of unknown words;reinforcement learning,it uses the generated summary model as an agent,uses the output generated summary as an action,and uses the Rouge score of the generated summary compared with the input reference summary as a reward,and uses the selfcritic strategy gradient method repeatedly training the generated summary model.which solves exposure bias problem.Finally,experiments are carried out on the LCSTS dataset,and the experimental results verify the effectiveness of the model.
Keywords/Search Tags:Deep Learning, Text Summarization, Seq2Seq Model, Attention Mechanism, Reinforcement Learning
PDF Full Text Request
Related items