Font Size: a A A

Analysis And Implementation Of Text Summarization Based On Deep Learning

Posted on:2021-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2428330611480563Subject:Electronic and communications engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the continuous popularization of social media,all kinds of information,such as news,public opinion,hot spots and so on,are developing rapidly,which brings about the problem of information overload.And with the acceleration of the pace of production and life,people don't have enough time to read all the searched information.High quality abstract is an effective way to improve the efficiency of information acquisition.With the rise of deep learning and the improvement of computer hardware level,more and more scholars use deep learning to generate automatic article summary.In this paper,the traditional Chinese word segmentation algorithm is optimized and improved,and the Chinese abstract algorithm based on double attention mechanism is proposed.Finally,the improved Chinese word segmentation algorithm is applied to the Chinese abstract algorithm.The traditional Chinese word segmentation algorithm can not extract local features efficiently,and can not do parallel computing.In view of the above two points,a Chinese word segmentation algorithm based on the combination of simple CNN and BI-LSTM is proposed in this paper,which not only solves the problem that CNN can not extract sequence features,but also solves the problem that BI-LSTM can't extract local features and parallel computing.At the same time,the full connection operation of weight sharing ensures that the classification results are obtained according to the context information,and realizes the purpose of one input to many outputs.Experimental results show that the algorithm is feasible and the accuracy of model output is 98%.The traditional model of Seq2 Seq abstract generation mostly uses LSTM network and single-layer attention mechanism,which leads to the slow calculation of the model and the lack of information in the attention matrix.In view of the above problems,this paper proposes the following improvement directions:(1)In the construction of input matrix,the improved Chinese word segmentation algorithm is used to improve the accuracy of word segmentation,so as to improve the accuracy of the generated word vector;(2)In feature extraction,the even convolution kernel is used to reduce the number of parameters,while improving the speed of model training without reducing the accuracy of the model;(3)Improve the attention mechanism,establish a doublelayer attention mechanism,two different attention matrices are generated to obtain global and local information,and the two attention matrices are merged to get the final attention matrix.The second level attention matrix can strengthen the weight of adjacent words around the target words,thus improving the accuracy of the article summary.The improved Chinese word segmentation algorithm is applied to the double-layer attention mechanism Chinese abstract algorithm,which is verified by experiments on LCSTS data set,and the article abstract is evaluated by Rouge evaluation method.The Rouge-1,Rouge-2 and Rouge-L scores of the paper summary model based on the double-layer attention mechanism proposed in this paper can reach 37.8,25.3 and 34.9,which verifies the feasibility and accuracy of the model.
Keywords/Search Tags:AI, Automatic article summary, Seq2Seq Model framework, Attention, Chinese word segmentation
PDF Full Text Request
Related items