Font Size: a A A

Research On Chinese Single Document Automatic Summarization Based On Deep Learning

Posted on:2019-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2428330548469563Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Automatic summarization is the use of computers to automatically extract summarization from original documents by programming,the summarization must be simple and consistent,it also can reflect the contents of documents fully and accurately.The abstractive summaries based on neural network is used to "understand" the main content of original articles.Compared with the extractive summaries,it can summarize texts concisely,and the grammar is also very simple,the summarization is very readable.However,in practical applications,due to technical limitations,the OOV(Out of Vocabulary)problem often occurs in some abstractive summaries which generated based on neural network,furthermore,some important semantic units in the original article are constantly repeat themselves in the final summary.There are two main reasons for this phenomenon:First,there are some low frequency but extremely important words in the original text,these words can hardly be captured and output as part of the summarization;secondly,due to artificial nerves the drawbacks of the neural network,it's difficult to generate fluent sentences.This paper aims to improve the generation quality of Chinese single document summarization.In view of the problems faced by the above automatic summarization,the following two aspects are studied:1.A strategy of fusion word extraction is proposed to improve some low-frequency words which are extremely important in the original text,which can not be well generated in the final summary.The traditional attention mechanism can only focus on which input has a greater impact on the output.The strategy of this article is by adding a word list,which is added to all the words contained in the original text on the basis of the original corpus of the corpus,but the words are not included in the initial word list,which can be considered when the word is generated.The probability distribution of low-frequency words in the original text is generated and these words are generated as final summaries.The experimental results show that the strategy can better result than the traditional decimation method and the base-to end-based neural network model on the two data sets of LCSTS and NLPCC2017.2.A strategy of weight elimination is proposed to improve the repetition of single words in summarization.Each time the current word is generated,the first generated summary word is used as input,so during the decoding process,there will be an excessive attention in one part of the encoder,resulting in the error,and then the endless phrase repetition.Based on this problem,we add a new fusion mechanism.Each time a word is generated,the word "concerned" is given a certain "punishment"in this round,so that it can avoid a higher "attention" in this round because of the generated words.Implementation shows that the strategy can effectively avoid duplication of an important word in the generated summary,making readability of the generated statement better.
Keywords/Search Tags:Automatic summary, Word extraction, Neural Networks, attention, decode
PDF Full Text Request
Related items