Font Size: a A A

Chinese Automatic Summary Model And Its Applications

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:P X ZhangFull Text:PDF
GTID:2428330575952033Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In the era of mobile Internet,facing the emergence of massive data,how to quickly and accurately find the key information has become an urgent problem to be solved.Therefore,using automatic summarization technology to add short text summary to extract key information accurately has become a hot topic for scholars.In this paper,the Chinese automatic summary model and its application are studied and analyzed.Firstly,after investigating the literature at home and abroad,the research directions of automatic summary are divided into extractive automatic summary and generative automatic summary.Then,the two directions of automatic summary are studied and analyzed respectively.For extractive automatic summarization,we use BERT(Bidirectional Encoder Representations From Transformers)sentence vectors to improve the representation ability of traditional word vectors,and combine the maximum edge correlation algorithm(MMR)proposed by Alexander M.Rush and Sumit Chopra to get the BE-MMR model for extractive automatic summarization.For generative automatic summarization,the traditional way is to use Seq2Seq(Seq2Seq)to encode all information into a fixed dimension of the intermediate vector,while in the actual scene,Seq2 Seq decoding will cause a lot of information loss.In order to solve the problem of information loss during decoding,this paper integrates Attention into Seq2 Seq model,and uses Bi-LSTM to construct coder and decoder,and constructs a generative automatic summary model based on Seq2Seq-Attention.Then,the two research directions of automatic summary are experimented.Modeling the extractive automatic summary,and using the results of TextRank and TF-IDF on the same test set to demonstrate its effectiveness.Seq2Seq-Attention,a generative automatic summary model,is trained with the Chinese Text Classification of Tsinghua University and the web news data crawled manually.The ROUGE-N score is evaluated with Seq2 Seq and other methods on the generative automatic summary test set of the NLPCC2017 to verify its effectiveness.In the experiment,ROUGE-1 score of Seq2Seq-Attention model is 0.26,ROUGE-1 score of baseline model is 0.19,and ROUGE-1 score of extractive automatic summary is much better than that of baseline model.Finally,the extractive automatic summary and generative automatic summary are validated manually through practical application scenarios.By randomly selecting articles in the field of sports and economic news,conducting relevant experiments,and using artificial evaluation,it is found that abstract sentences extracted from BE-MMR model contain more semantic richness and better sentence continuity effect.The abstract sentences extracted from Seq2Seq-Attention have the characteristics of conciseness and better logic.The innovation of this paper is to fuse BERT pre-training model with MMR model to get BE-MMR model for extractive automatic summarization.In view of the shortcomings of traditional TextRank and TF-IDF methods in extracting sentences with poor continuity and high similarity,the BE-MMR model combines the BERT sentence vector to enable it to mine deeper semantic information and obtain paragraph information of sentences,and uses the MMR method to make it pay more attention to sentences with high semantic richness,extraction of abstract sentences more in line with people's linguistic logic.
Keywords/Search Tags:automatic summary, in-depth learning, BERT, attention mechanism
PDF Full Text Request
Related items