Font Size: a A A

Research On Generating News Summary Based On Pointer Network

Posted on:2020-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q GuoFull Text:PDF
GTID:2438330572999663Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of Internet big data with the information explosion,it is no doubt that expressing main contain with short text will alleviate the problem of overload information.Using a computer to summarize the main content of the text automatically is of great significance for saving human resources and easing information overload.Currently,text summaries can be divided into two categories: extractive text summarization and abstractive text summarization.The extractive text summarization extracts some sentences directly from the source text.The essence is sorting algorithm,by scoring each sentence,selecting the sentence with the highest score,and doing some redundancy to generate the summarization.The abstractive text summarization tries to understand the meaning of the original text firstly and then strives to extract the meaning of the original text to generate the summarization.This method is consistent with the artificially generated abstract mostly and is also the significance of studying the abstract text summarization.The research on abstract text summary is an important branch of natural language processing research,and in recent years,with the deepening of deep learning and the maturity of deep learning framework technology,it is a task with important research value to generate abstract text summary by means of deep learning.Most of the previous research results are based on the traditional statistical method generated by the extraction of the text summary,this method is mainly from the original text of the sentence extraction and rearrangement,the use of this method will lead to two major problems.Firstly,the resulting text summary appears to ignore a large number of text details.Secondly,the resulting sentence will have repetitive problems,but also do not generalize the original text meaning of the requirements.Aiming at these two important problems of text summary,this paper puts forward the generation of a text summary of pointer-typed hybrid network and coverage model,and the research in this paper belongs to the category of using deep learning method to generate abstract text summary in the direction of natural language processing.The Chinese Sohu News corpus is used as the training data,and then the pointer-typed hybrid network model is constructed.The pointer-typed network model is a hybrid model based on the basic model of SeqtoSeq with attention mechanism and pointer network,which is designed to reduce the error rate of generating text.The network model constructed in this paper is based on the SeqtoSeq model of the basic attention mechanism to balance the generation of new words with the weight of taking words from the original text by introducing a trade-off probability,and adding the attention mechanism of the SeqtoSeq network model will generate new words through dictionaries to achieve the purpose of generating abstract abstracts.The pointer network,on the other hand,takes the word from the original text by pointing to the original text to ensure that the resulting summary is more in line with the meaning of the original text.In the later stage of model training,the coverage mechanism is introduced to reduce the repetition rate of the generated abstracts.The advantages of building a network model are: Firstly,the traditional new words generated by the encoder-decoder model based on attention mechanism are inaccurate,there will be a phenomenon of detachment from the original meaning.Even if a selfattention mechanism is introduced,it does not achieve good results.The introduction of pointer network on the basic model can correct the inaccuracy of the generation of new words in the basic seqtoseq model.Secondly,the coverage mechanism reduces the repetition rate of the generated summary.The dataset of the experimental part of this problem is based on Sohu News corpus,and the results of the experiment are evaluated by using the ROUGE scoring mechanism,and the scoring results are at least 2 points higher than the previous seqtoseq basic model.Experimental data show that the resulting text summary is more accurate and has a higher ROUGE score than the basic seqtoseq model,and the repetition rate of the generated summary is significantly reduced by adding the coverage model in the experiment.The resulting text summary also gets a better user experience.
Keywords/Search Tags:Abstractive summarization, pointer network, seqtoseq mode, selfattention mode, coverage mode
PDF Full Text Request
Related items