Research On Generative News Summarization Based On Denoising Autoencode

Posted on:2024-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:J C Chang

Full Text:PDF

GTID:2568307106984059

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Automatic news summarization is an important natural language processing task,which aims to automatically extract key information and summaries from a large number of news texts so that people can quickly understand the news content.Sequence-to-sequence model based on attention mechanism In the task of automatic news summarization,the automatic text summarization model based on long-term and short-term memory network(LSTM)is better than the automatic text summarization model based on recurrent neural network(RNN),which alleviates the "long-term dependence" problem of RNN model.However,the accuracy of generating abstract content is still not high enough,which often leads to the inconsistency between the semantics of abstract content and the semantics of original news content,and the problem of generating repeated words will occur,resulting in incoherent sentences of summarization content and poor readability.The model can’t generate new words,which is due to the limitation of thesaurus size,resulting in the problem of Out-of-Vocabulary word in the generated abstract content,that is,words that don’t exist in the thesaurus,including proper nouns such as names and places.Aiming at the existing problems,this paper puts forward a hybrid model of pointer network based on denoising automatic encoder.The main work is as follows:1)A pointer generation network model with improved coverage mechanism is constructed.The encoder of the pointer generation network model adopts Bi-LSTM model,and the decoder adopts one-way LSTM model.The structure of bi-directional coding can make the model better learn the contextual semantic information of the original news text.Pointer mechanism can solve the problem that the model can’t generate new words,resulting in the problem of Out-ofVocabulary word in the generated abstract content.The improved coverage mechanism is introduced into the pointer generation network model to punish the generation probability of words that have been generated by the model more effectively,reduce the probability of regeneration and avoid the problem of repeated words.The experimental results on NLPCC2017 data set show the effectiveness of pointer mechanism and coverage mechanism in news automatic summarization task.2)A hybrid model of pointer generation network based on denoising encoder is constructed.Firstly,BART pre-training language model based on de-noising encoder is used to extract the deep semantic features of the original news,and these semantic features are fused with the original news text and then input into the pointer generation network model with coverage mechanism,so as to improve the model’s ability to obtain key information of the original news.The attention mechanism of the model and the bi-directional coding structure of the encoder can make the model learn the context semantic information better,and the Beam Search algorithm is used to improve the accuracy of the abstract generated by the model.The model of this paper is trained and tested on NLPCC2017 Chinese news data set,and the generated news summarization are closer to the original semantics,more coherent and more readable.The performance of the experimental model is evaluated by using the ROUGE tool.Compared with the pointer generation network model,the BART-PGN summarization model proposed in this paper has improved by 1.81%,1.86% and 2.15% on the indicators of Rouge-1,Rouge-2 and Rouge-L,respectively,which verifies the effectiveness of the BART-PGN Chinese news summarization model proposed in this paper in the task of automatic news summarization generation.

Keywords/Search Tags:

Abstractive summarization, Sequence-to-sequence model, Pointer generation network, BART pre-training language model

PDF Full Text Request

Related items

1	Research On Chinese Abstractive Text Summarization Based On Sequence To Sequence Model
2	Abstractive Document Summarization Based On Deep Sequence To Sequence Model
3	Research On Abstractive Text Summarization Model Based On Transformer
4	Research And Implementation Of Short Text Sentimental Abstractive Summarization
5	Research On Abstractive Summarization Based On Sequence-to-Sequence Neural Network Model
6	Research And Application Of Related Techniques For Text Summarization Based On Deep Learning
7	Research Of Summarization Method Based On Abstractive Type
8	Entity-biased Multi-Source Pointer Generator For Article Event Summarization
9	Research On Text Summarization For Chinese News
10	Abstractive Text Summarization Generation Method Based On Adaptive Resilient Loss