Font Size: a A A

Research On Document Update Summarization Based On Density Peaks And Latent Semantic Analysis

Posted on:2018-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y T GuoFull Text:PDF
GTID:2348330512476961Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The dynamic summarization is to construct evolutionary content of multi-document collection.But there are some complicated problems in dynamic summarization,such as information redundancy,novelty information easily lost.To solve the above problems,this dissertation reviews some state-of-art approaches for extracting update multi-document summarization,and proposes two algorithms for update summarization extraction,namely dynamic multi-document summarization based on density peaks and enhanced LSA-based approach for dynamic summarization.The enhanced LSA-based approach for dynamic summarization incorporates bigrams in the process of constructing document expressing matrix,which helps reducing the sparsity of the matrix and the consuming time of latent semantic analysis.To estimate the redundancy and novelty of the topic more accurately,a thresholding function is constructed to filter the terms having lower relationship with the topic,and the bigram information was also added in the determining process,further improving the precision.The dynamic multi-document summarization based on density peaks method,according to the similarities between sentences,calculates the representativeness score and diversity score for each sentence.For the purpose of mining the update information in the event,the algorithm introduces the topic signature model,to determine the novelty of the sentences.Two update summarization strategies based on ILP and composite score of sentence are also devised,assuring the algorithm able to generate high quality summarization within a short time at different size of summarization requirements.The experiment result shows,compared to the traditional LSA-based approach for dynamic summarization,our method reduces time complexity of semantic data mining and advances the preciseness for evaluating novelty of the topics.The dynamic multi-document summarization based on density peaks is a simple and effective approach,which can be reproduced and deployed in real environment.
Keywords/Search Tags:Dynamic Summarization, LSA, Topic Signature, Density Peak, Clustering Model
PDF Full Text Request
Related items