Font Size: a A A

Research On Automatic Text Summarization Algorithm For Chinese And English Long Text

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H PanFull Text:PDF
GTID:2428330623468511Subject:Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet and big data technologies has placed us in an era of information explosion,and at the same time,the problem of overloading text information has become more serious.Through the Internet,we can quickly obtain a large amount of information,but the text on the network contains a lot of redundant data.The purpose of automatic text summarization is to extract the key content of the text and generate a short summary,which can effectively improve the user experience.Therefore,it is of great research significance.At present,the automatic text summarization technology based on deep learning has made good progress,but due to the requirements of software and hardware conditions and the complexity of the model,there are still many shortcomings in the related algorithm in the case of long texts,and the generated summary is difficult to fully cover the key information of the source text.This thesis is mainly based on deep learning technology to design appropriate model architecture and training strategies to effectively improve the effect of generative text summarization of long text input in the case of single document.The main work and research results of this thesis are as follows:This thesis designs a generative automatic text summary model based on the sequence to sequence infrastructure.And with the help of transfer learning,a generative automatic text summarization algorithm based on the pre-trained model is proposed,which effectively enhances the text representation and feature extraction capabilities of the summary model.In addition,multi-task learning is introduced in this thesis,and a three-stage training strategy is specifically designed: the first stage uses the extractive text summary task to fine-tune the encoder part of the model;the second stage uses the generative text summary task to train the entire model;the three stages uses multi-task learning to jointly model and train the extractive and generative text summary tasks,and finally achieve good results on the real data set.In addition,for the long text input,this thesis gives up the truncation strategy for the long text,but obtains the key sentence through the unsupervised key sentence extraction algorithm,so as to compress it into the short text,and then uses the model to generate the summary on the basis of the short text to reduce the loss of the key information in the long text.In addition,in order to further enhance the key information extraction capabilities of the model,a keyword extraction dataset is constructed in this thesis.The keyword extraction task is converted into a classification task.The classification training is performed on a model built based on a convolutional neural network.Finally,the key words obtained are used as additional input of the summary model to optimize the calculation process of attention weight distribution and pointer network probability,and effectively enhance the key information extraction ability of the model.In this thesis,we use Chinese and English open datasets to test and verify the effectiveness of the proposed algorithm and the improved strategies.The experimental results show that compared with some baseline models,the algorithm proposed in this thesis can improve in many aspects,and finally achieve better summary results.
Keywords/Search Tags:deep learning, transfer learning, automatic text summarization, natural language processing
PDF Full Text Request
Related items