Font Size: a A A

Seq2seq Attention:Super Long Chinese Text Summarization Model

Posted on:2021-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z J YaoFull Text:PDF
GTID:2518306131992869Subject:Statistics
Abstract/Summary:PDF Full Text Request
Text summarization is a hot frontier problem in natural language processing.Existing methods have the problem of low accuracy when performing long text summarization.Seq2seq(sequence-to-sequence)deep learning technology based on long-short term memory(LSTM)has achieved remarkable results in natural language processing.In this thesis,we are dedicated to building an improved Seq2 seq network model to handle Chinese long text auto-summarization.The main work of this article consists of the following parts:Firstly,an Attention Seq2 seq network model for super long Chinese text summarization is given and implemented.Secondly,in view of the problem that there are relatively few researches on long text summarization in China and the lack of currently available data sets of Chinese long text summarization,this thesis selects the news data sets of sogou laboratory,preprocesses the data,obtains the super-long text data sets that can be used for network training,and constructs a vocabulary.Thirdly,comparison experiment on model index value: We randomly selected 50 samples as the test sample,used the rouge-1,rouge-2,rouge-L index as evaluation index,and then used improved Seq2 seq model,Textrank model and TF-IDF model to predict research samples respectively,and calculated the corresponding values of rouge,and contrasted the rouge values of each model experiment.The Experimental results showed that compared with Textrank model,the rouge-1 value of the model in this thesis increased by about 0.64,the rouge-2 value by about 0.59,and the rouge-L value by about 0.66.Compared with the TF-IDF model,the rouge-1 value increased by about 0.67,the rouge-2 value by about 0.61,and the rouge-L value by about 0.69.Fourthly,comparison experiment on model summarization generation: We used the improved Seq2 seq model,Textrank model and TF-IDF model respectively to conduct prediction research on the same super-long text,and made comparison experiments,and finally obtained the results: Both the improved Seq2 seq model and Textrank model have good effects,and both can extract the semantic features indicating the central theme,and both are better than the TF-IDF model.In addition,the improved Seq2 seq model has a better summary of the abstract,and the predicted summary should be more concise.
Keywords/Search Tags:Text summarization, Super-long text, Seq2seq, LSTM, Attentional mechanism
PDF Full Text Request
Related items