Research On Text Summarization Generation Technology Based On Topic Model And Variational Self Coding

Posted on:2021-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Li

Full Text:PDF

GTID:2518306122464174

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rise of social media such as Sina Weibo and We Chat,Internet media and electronic publications have replaced traditional newspapers and other paper publications and become the main channel for people to release and obtain information.The rapid development of the network and the popularity of various mobile terminal devices make the network electronic text information explosive.In order to quickly draw effective information from the massive electronic text information has gradually become a hot research topic.The automatic generation technology of text Abstract is the core to solve this problem.Automatic text Abstract generation technology can not only improve the efficiency of obtaining information,but also support the construction of intelligent response system,network public opinion analysis and other upper level applications.At present,the main problems of this technology are as follows:(1)the machine automatically generates the repeated abstract words,the word order is disordered,the unknown words in the vocabulary cannot be generated,and the training model degenerates in time;(2)the content of the summary is not related to the topic information of the original text;(3)the summary words and sentences generated by the machine are not diverse,rich and fluent.The main work of this paper is as follows:1)In order to generate text abstracts that are more relevant to the topic information of the original text and related to the content of the original text,this paper combines the topic information and the original information with the seq2 seq model,and constructs a ta-tsg model based on the double attention mechanism.The model first encodes the input through the encoder based on the bidirectional short-term memory network,and then uses a twitter LDA model to obtain the input topic words as the additional input of the model.The integration of topic information makes the input and output share the same topic,ensuring the relevance of the generated summary content.In this paper,experiments are carried out on CNN / daily mail data set,and the results show that TA-TSG has some improvement compared with other methods in various indicators.2)TA-TSG model not only integrates topic information,but also improves the quality of sentences,but also lacks diversity of summary content.In order to alleviate this problem,this paper proposes a text summarization automatic generation model(VAE-TSG)based on variational autoencoder,which makes use of the characteristics of VAE to better model the potential semantics of text.The model is divided into three modules: the variational encoder is used to encode the input and output sequences;the variational reasoning is used to model the approximate posterior distribution of hidden variables;the variational decoder integrated with copynet uses the context semantic vector,hidden variables and generated copy network functions to decode the generated summary.The results show that vae-tsg model improves the diversity index based on unigram and bigram compared with other benchmark models,and performs well in solving OOV problems.1)In order to generate text abstracts that are more relevant to the topic information of the original text and related to the content of the original text,this paper combines the topic information and the original information with the seq2 seq model,and constructs a TA-TSG model based on the double attention mechanism.The model first encodes the input through the encoder based on the bidirectional short-term memory network,and then uses a twitter LDA model to obtain the input topic words as the additional input of the model.The integration of topic information makes the input and output share the same topic,ensuring the relevance of the generated summary content.The model captures the information related to the input text and the additional topic words and topics simultaneously through the double attention mechanism.In this paper,experiments are carried out on CNN / Daily Mail dataset.The experimental structure based on automatic evaluation shows that ta-tsg has some improvement compared with other methods.

Keywords/Search Tags:

Text Summarization Automatic Generation, Topic Model, Deep Learning, Variational Autoencoder, seq2seq Model

PDF Full Text Request

Related items

1	Research And Implementation Of Automatic Text Summarization Based On Seq2Seq Model
2	Research On Automatic Text Summarization Method Based On Seq2seq Model
3	Research And Application Of Automatic Text Summarization Technology Based On Deep Learning
4	An Automatic Summarization Model Based On Deep Learning For Chinese
5	Research On Emotional Conversation Generation Technology Based On Topic Model And Variational Auto-Encoder
6	Research On Text Generation With Topic Model
7	Research And Implementation Of Multilingual Automatic Summarization System Based On Deep Learning
8	Automatic Summarization Of Academic Literature Based On Deep Learning
9	Research On Automatic Text Summarization Technology Based On Deep Learnin
10	Research On Automatic Text Summarization Generation Technology Based On Deep Learning