Font Size: a A A

Research On Automatic Text Summarization Algorithm For Chinese Long Text

Posted on:2023-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2568306836464424Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the exponentially increasing Internet data of the era of big data,it is difficult to solve the problem of information overload.How to quickly mine the main information from massive data becomes crucial.Automatic text summarization has become an important technology in natural language processing,which compresses and refines the input content by machine and outputs short summaries containing key information.In recent years,automatic text summarization technology has achieved excellent results for the short text.However,due to the limitation of the model complexity and the hardware and software resources,there are still long-distance dependence and semantic inaccuracy on long text.Therefore,according to the feature of the Chinese long text,this paper improves the model of automatic text summarization.The main work is as follows:Firstly,aiming at the problem that the traditional deep neural network has insufficient ability to extract text features,a generative text summarization algorithm based on CNN and Transformer is proposed.The multi-head attention mechanism in Transformer is improved.The dynamic convolution neural network is introduced to capture the local features and potential word information of the text,and then combined with the original self-attention to obtain the global semantics of the text feature.The text is modeled from the local and global perspectives,so the model can obtain more semantic features.Aiming to reduce the out-ofvocabulary words,the pointer mechanism is added,and the words copied from the fixed vocabulary or the original text are selected by probability as the output,which effectively improves the summary generation performance of the model.Secondly,a three-stage text summarization algorithm for Chinese long text is proposed.Aiming at the problem of information loss in the generation of long text summarization,combined with the respective advantages of the extraction method and the generative method,the summary generation is divided into three stages.In the first stage,the improved graph model algorithm is used to compress the information of the long text in a certain range,and remove the superfluous information unrelated to the theme.In the second stage,in order to learn the deep semantic features of sentences,the pre-trained model BERT is used to further extract the key sentences with rich topic information.In the third stage,the generative summarization model is used to rewrite the key sentences to generate a short summary with strong semantic coherence and readability.Thirdly,reinforcement learning mechanism is used to improve the performance of automatic generation of Chinese long text summarization.On the basis of the segmented automatic summarization method,the Actor-Critic algorithm is used to connect the extraction model and the generation model,and these two models are composed of end-toend systems for common training to achieve information and advantage sharing.The extraction model can be optimized and adjusted according to the evaluation of summary generation,so as to help the generation model output the summary with a higher quality.In this paper,the feasibility of the proposed algorithm and optimization strategy on the Chinese long text data set is shown by a series of experiments,which increases the ROUGE score and shows certain research significance.Besides,this paper provides some new research ideas for automatic text summarization.
Keywords/Search Tags:Text summarization, Transformer, attention mechanism, BERT, reinforcement learning
PDF Full Text Request
Related items