Font Size: a A A

Research On Text Generative Summarization Method Based On Attention Mechanism

Posted on:2020-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Q HuangFull Text:PDF
GTID:2438330596497339Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and information technology,the data and files on the Internet have exploded,and the problem of information overload has become more and more serious.Therefore,how to get useful information quickly and accurately from massive amounts of data becomes important.Text automatic summarization technology is a method to generate concise and important information from large text collections such as text documents,articles or blogs,and has become a research hotspot at home and abroad.The study of traditional text abstracts focuses on extractive abstracts,extracting sentences from the original text to represent abstracts,but not refined enough to show that the results are not satisfactory.The generative digest is to generate new sentences by understanding the text content.Compared with the decimation digest,it has a more flexible vocabulary combination and expression,and the working method is closer to artificial.Based on this,this paper will analyze the underlying coding features of text,vector representation of text words,attention model mechanism,and cyclic neural network,and explore the problem of generative abstraction from the bottom to the model structure.Mainly completed the following work:(1)A document word vector representation method based on knowledge migration and multi-features is proposed.Word vector works as the basis of text features.Its quality directly affects the quality of each upper-level model,especially in the generative summary.When using word embedding technology to train word vectors,the more the text data is trained,the higher the quality of the word vector.Therefore,this paper uses the external dataset of Wikipedia and adopts the knowledge migration method to carry out incremental training on the task training set,thus training.Improve word vector quality.At the same time,in the past text classification and abstract research,some other characteristics of text words such as word frequency inverse document frequency are also widely used,and achieved good results.In order to further improve the quality of word representation in the text,this paper proposes to combine the features of word frequency inverse document frequency and part of speech and the word vector after knowledge transfer into a new word vector,anddesign experiments to verify the effectiveness of the proposed method.(2)A pointer-based text summary method based on attention mechanism is proposed.In this method,two bidirectional long-term memory networks(LSTM-RNN)are used to capture information about two important levels of the document,one at the word level and the other at the sentence level.At the two levels,attention mechanisms are introduced to make the model focus on key words.Finally,the decoding stage introduces a hybrid pointer generator network that generates a final digest using the probabilities of generating probabilities and copying text.In this way,the interference of low-frequency words is eliminated,the sentence structure is captured,and the problem of using the original text or generating new words is well combined,and the performance of generating the abstract is improved.(3)Design and implement an automatic summary prototype system based on attention mechanism.
Keywords/Search Tags:document word vector, text abstract, neural network, attention mechanism
PDF Full Text Request
Related items