Font Size: a A A

The Application Research Of Incremental Clustering For Document Update Sumarization

Posted on:2016-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:H R GuoFull Text:PDF
GTID:2308330461459378Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Forum, blog, news, and online reviews and other new network media appear in Web2.0 era produce massive document information every day, so we need an efficient way to extract the important contents and remove the redundant information of the document and presented the simple, refined content to readers. To help readers get the latest events they are interested in, updates the documents summarization in real time, dynamic summarization technology become a new hotspot. Extracting dynamic summarization needs to ensure quality and efficiency. most of the current research work is based on a batch mode- A collection of documents are processed as a unit. In practical applications, such as news updates, disaster reports, public opinion analysis system, document data stream is unstable, and therefore need to study the dynamic and efficient way to extract summarization in stream-based processing.In order to solve the above problems, this paper proposes two incremental clustering algorithm for dynamic multi-document summarization extract: dynamic multi-document summarization based on an improved K-means algorithm and dynamic multi-document summarization based on KNN incremental graph clustering algorithm.In the dynamic multi-document summarization extract, document clustering is to divide the document sub-themes, and the sub-themes of the document is potential. In view of the traditional k- means need to manually clustering number of shortcomings, implements an adaptive initial center selection method, and scoring sentence node by usefulness to delete the sentence, to achieve incremental clustering stream. Dynamic multi-document summarization extract based on KNN algorithm build sentences graph model based on KNN, use density-based clustering to partition the sentence classification, combined the time factor and graph node weight to screening summary sentences, according to the length to cut out a final summarization. At last, this paper implements an Chinese public opinion update summarization extraction prototype system based on KNN incremental graph clustering.The main contribution of this paper are: propose two new update summarization extraction algorithm based on incremental clustering metho d. achieve update summarization extraction based on data stream- With the arrival of the document data stream, real-time updates summarization content. According to the four characteristics of dynamic summariz ation: the importance of the topic relevance, low redundancy and novelty, propose a new sentence weight computing method. Experiments at the TAC system data sets and Chinese public opinion proves the validity of the two algorithms, which update summarization extracting based on the KNN incremental graph clustering algorithms get better summarization quality.
Keywords/Search Tags:K-means, incremental clustering, document update summarization, k-Nearest neighbor, sentence graph model
PDF Full Text Request
Related items