Font Size: a A A

Research On Text Summarization Generation Technology Based On Topic Model And Variational Self Coding

Posted on:2021-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2518306122464174Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rise of social media such as Sina Weibo and We Chat,Internet media and electronic publications have replaced traditional newspapers and other paper publications and become the main channel for people to release and obtain information.The rapid development of the network and the popularity of various mobile terminal devices make the network electronic text information explosive.In order to quickly draw effective information from the massive electronic text information has gradually become a hot research topic.The automatic generation technology of text Abstract is the core to solve this problem.Automatic text Abstract generation technology can not only improve the efficiency of obtaining information,but also support the construction of intelligent response system,network public opinion analysis and other upper level applications.At present,the main problems of this technology are as follows:(1)the machine automatically generates the repeated abstract words,the word order is disordered,the unknown words in the vocabulary cannot be generated,and the training model degenerates in time;(2)the content of the summary is not related to the topic information of the original text;(3)the summary words and sentences generated by the machine are not diverse,rich and fluent.The main work of this paper is as follows:1)In order to generate text abstracts that are more relevant to the topic information of the original text and related to the content of the original text,this paper combines the topic information and the original information with the seq2 seq model,and constructs a ta-tsg model based on the double attention mechanism.The model first encodes the input through the encoder based on the bidirectional short-term memory network,and then uses a twitter LDA model to obtain the input topic words as the additional input of the model.The integration of topic information makes the input and output share the same topic,ensuring the relevance of the generated summary content.In this paper,experiments are carried out on CNN / daily mail data set,and the results show that TA-TSG has some improvement compared with other methods in various indicators.2)TA-TSG model not only integrates topic information,but also improves the quality of sentences,but also lacks diversity of summary content.In order to alleviate this problem,this paper proposes a text summarization automatic generation model(VAE-TSG)based on variational autoencoder,which makes use of the characteristics of VAE to better model the potential semantics of text.The model is divided into three modules: the variational encoder is used to encode the input and output sequences;the variational reasoning is used to model the approximate posterior distribution of hidden variables;the variational decoder integrated with copynet uses the context semantic vector,hidden variables and generated copy network functions to decode the generated summary.The results show that vae-tsg model improves the diversity index based on unigram and bigram compared with other benchmark models,and performs well in solving OOV problems.1)In order to generate text abstracts that are more relevant to the topic information of the original text and related to the content of the original text,this paper combines the topic information and the original information with the seq2 seq model,and constructs a TA-TSG model based on the double attention mechanism.The model first encodes the input through the encoder based on the bidirectional short-term memory network,and then uses a twitter LDA model to obtain the input topic words as the additional input of the model.The integration of topic information makes the input and output share the same topic,ensuring the relevance of the generated summary content.The model captures the information related to the input text and the additional topic words and topics simultaneously through the double attention mechanism.In this paper,experiments are carried out on CNN / Daily Mail dataset.The experimental structure based on automatic evaluation shows that ta-tsg has some improvement compared with other methods.
Keywords/Search Tags:Text Summarization Automatic Generation, Topic Model, Deep Learning, Variational Autoencoder, seq2seq Model
PDF Full Text Request
Related items