Font Size: a A A

TMSA:A Two-stage Autimatic Summary Generation Model

Posted on:2021-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y J FanFull Text:PDF
GTID:2428330647959594Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Automatic summary generation technology has always been an important and challenging subject in natural language processing.It's mainly divided into two types:extractive summariation and automative summariation.This paper investigates current main models for automatic text summarization and compares their advantages and disadvantages.Consequencely,TMSA(Text Rank-MMR-Seq2Seq-Attention),an innovative model for summariation was proposed.The construction approach of theoretical framework of TMSA is as follows.Firstly,Word2 Vec and BERT,two pre-processing models,are used to obtain features of sentences.Secondly,features of sentences are input into the combined Text Rank and MMR model to extract the important sentences from the raw text.Finally,this paper builds a generator with Seq2Seq-Attention model and compresse the extracted important sentences,so that the generated abstract is both concise and general.At the same time,in order to test the effectiveness of the generated abstracts,a new abstract scoring model is proposed in this paper,which forms a comprehensive summary scoring indicator by the ROUGE scores and three other indexes: redundancy,relevance,and importance.The result of this experiment shows that,on the ROUGE scores,TMSA model is superior to both the single Text Rank and MMR models on ROUGER-1,ROUGE-2 and ROUGE-L,and is superior to the Seq2Seq-Attention model on ROUGE-L.In terms of redundancy,relevance and importance,the overall performance of TMSA is also better than the previous model.From an intuitive point of view,the summaries generated by this model are both smooth and concise,which means that we can use it in our real life.
Keywords/Search Tags:automatic summary, TextRank, MMR, BERT, Seq2Seq-Attention
PDF Full Text Request
Related items