Font Size: a A A

Research On Key Technologies Of Automatic Summarization Of Chinese News Documents

Posted on:2020-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:C B LiFull Text:PDF
GTID:2428330623958503Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,a huge amount of news information is full of all aspects of people,so people suffer from information overload,how to present a large number of lengthy news information concisely has become one of the urgent problems to be solved.Automatic summarization technology is one of the core means to solve the above problems.It can help people summarize the long content of news texts and obtain important information quickly and accurately,so as to improve the speed of reading news and effectively reduce the energy of browsing information.This paper makes an in-depth study of single document and multi-document automatic summary technology,mainly including the following work:(1)Aiming at the task of automatic summarization of Chinese news single document,this paper optimized the expression of text words.In the process of data processing,additional features were integrated into Word Embedding.Part of speech and TF-IDF values of words were added,so that multiple dimensions were embedded in the vector representation of each Word.This method can make full use of the language feature information of the text to improve the coherence of generating news summary.(2)An improved sequence-to-sequence model based on the attention mechanism was proposed to perform the task of automatic summarization of Chinese news single document.Among them,Bi-LSTM is adopted in Encoder and LSTM improved model structure in Decoder,and Decoder/Pointer mechanism is added to solve the problem of unregistered words.The experimental results show that the experimental model presented in this paper is superior to other groups of comparative experimental models on the News2016 zh data set,and can solve the problems of gradient explosion and gradient disappearance caused by traditional cyclic neural network.At the same time,Decoder/Pointer mechanism can alleviate the problem of unrecorded words in the process of abstract generation and improve the readability of the abstract.(3)Aiming at the task of multi-document automatic summarization of Chinese news,this paper proposes a multi-document automatic summarization method based on semantic clustering and local topic matching.This method uses word vectors with semantic environment to cluster news documents through k-means and extract sentences with maximum information entropy from local topics,thus realizing the extraction of news multi-document summaries.The effectiveness of this approach is demonstrated by comparing it with the Baseline method for extracting the first sentence of each news document and the multi-document summary done under the LDA topic model.
Keywords/Search Tags:automatic summarization, news summaries, Sequence-to-Sequence Model, Linguistic features, clustering
PDF Full Text Request
Related items