Font Size: a A A

Research On Chinese Text Summarization Algorithm Based On Deep Learning

Posted on:2021-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:F X MaFull Text:PDF
GTID:2428330647461934Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the exponential growth of textual information available from the Internet,the problem of information overload is very serious.How to reduce the user's information load to perform "dimensionality reduction" is necessary,and automatic text summarization is an important method.With the development of deep learning,more and more researchers use deep learning technology to automatically generate summary for texts.This paper studies the summary generation method based on deep learning algorithm.The main work is as follows:Firstly,aiming at the problems of easy occurrence of unknown words and incomplete content in the summary generation process,a Chinese text summary generation algorithm based on keyword information and adversarial learning is proposed.The algorithm includes two stages: keyword extraction and summary generation.Firstly,the key words are extracted by using the attention mechanism-based Seq2 Seq model.Then,the semantic distance between the source text and the summary text is dynamically shortened through adversarial learning,and on this basis,the extracted keyword information is added to the attention mechanism,so that the model pays more attention to the key information of the source text and generates a more comprehensive summary.The experimental results on LCSTS data set show that the algorithm proposed in this paper can effectively improve the accuracy of the abstract and reduce the number of unknown words.Compared with Seq2 Seq method,the scores of ROUGE-1,ROUGE-2 and ROUGE-L are improved by 6.1%,4.8% and 6.2%,respectively.Secondly,aiming at the problem that the long-term dependence of the generative summary algorithm in the processing of long texts leads to low accuracy,this paper proposes a new long text summary algorithm,which includes topic sentence extraction and summary generation.In the topic sentence extraction phase,doc2 vec was added to improve the text similarity calculation method in Text Rank and the accuracy of key sentence extraction.In the summary generation phase,the key sentence obtained in the previous phase is used as the input for the summary generation,and a gated unit including CNN and self-attention mechanism is added between the encoder and decoder of Seq2 Seq to extract n-gram information controls the information flow of the model and eases the duplication of words in the generated summary results.The experimental results on the crawled Sina Finance News dataset show that this method is better than a single extraction or generative method in terms of accuracy when processing long texts.The above work provides a new research idea for the automatic generation of text summary.The method proposed in this paper also has a significant improvement in the rouge score and has better practicability in alleviating the problem of information overload.
Keywords/Search Tags:Text summarization, adversarial learning, Seq2Seq, Text Rank, attention
PDF Full Text Request
Related items