Font Size: a A A

Research Of Summarization Method Based On Abstractive Type

Posted on:2023-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2568306914964849Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the face of the rapid expansion of text information on the Internet,information overload has become an urgent problem to be solved.Automatic text summarization technology has become an effective way to solve the problem of information overload,which takes the technical advantage of mapping the original text to the shorter text on the premise of retaining the important content.Automatic summarization technology is mainly divided into extractive and abstractive type,abstractive method as one of its branches,automatically generates abstracts in human writing style based on fully understanding the semantics of the original text by using language model,and has strong generalization ability and fluent context representation ability.Abstractive summarizaition technology has occupied a certain research position in the field of natural language processing.In this thesis,four shortcomings of the current generation method based on Seq2Seq structure are summarized.First,the traditional attention mechanism makes the encoder’s representation ability insufficient,the second problem is out-of-vocabulary words,the third problem is generating text repeatly,the fourth problem is exposure bias.For the first problem,this thesis proposes two abstractive methods to promote the existing attention mechanism from the explicit hierarchical feature and implicit local semantic feature aspect respectively.The former based on hierarchical hybrid attention,by introducing sentence-level attention mechanism to adjust word-level attention distribution.The latter based on local semantic awareness attention,directs the attention distribution which outputed by encoder through local semantic unit features.Aiming at the second problem,this thesis introduces the pointer mechanism to make the model have the ability to copy information from the original text.For the third problem,this theis introduces an coverage mechanism to force the model not to focus on the same area continuously.In view of the fourth problem,this theis uses reinforcement learning technology,introduces an optimization goal based on ROUGE reward to the loss function and train together,this method also directly optimize the model with the evaluation metric,effectively alleviate the inconsistent problem between the training and testing evaluation standards.In this thesis,relevant experiments are carried out on LCSTS and CNN/Daily Mail datasets to verify the effectiveness of proposed models.Then,the performance of the models was further verified by case analysis of the generated summaries.
Keywords/Search Tags:Natural Language Processing, Abstractive Summarization, Sequence-Sequence, Attention Mechanism
PDF Full Text Request
Related items