Design And Implementation Of Multi-Documents Clustering And Summarization On Single-Event News

Posted on:2015-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:D J Zhang

Full Text:PDF

GTID:2268330428961662

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Nowadays with multiple online news sites emerging up, people are flooded with a large number of news without being sorted, finding it’s becoming harder and harder to keep up with the updating speed of information. Thus, there is an urgent need of a news-browsing system, which could not only gather together articles on major news sites, but also classify and summarize them. By using this assembling tool, it would significantly save our time, as we can quickly focus on what we are interested in and get a list of refined articles to read.Based on several researches on the related technologies of topic detection and multi-document summarization, this paper builds a prototype of a system which integrates single-event cluster and summarization; and this system would mainly focus on three parts: news classifying, single-event cluster and multi-documents summarization on single events. The main work of this paper includes the following two aspects:First of all, this paper achieves the main module algorithms about the single-event cluster system. After deep studying theory about LDA, this paper combines VSM models with LDA models to compute similarity between two news articles. We implement KNN based on similarity-weighted voting to sort news set, based on the combined similarity. The combined similarity is also involved with SinglePass, which cluster the classified news on single-event. Tests were done to prove the effect of the improved KNN and SinglePass.This paper has built a multi-documents summarization system, which will be described as below. At text representation part, we import Hownet into traditional VSM model to both compute words similarity semantically and put together words whose similarity is beyond a fixed threshold as a Synonym; and finally we get a promoted VSM model. The rest computation is based on the promoted VSM. At sentence weight calculated part, we combine some sentence features with LexRank to get sentence weight and rank the sentences by their weights. At sentence extraction part, MMR is used to make a nonredundancy summary. Meanwhile, we set some simple sequential rules to output the summarized sentences.

Keywords/Search Tags:

multi-documents summarization, KNN, Single-Pass, Latent DirichletAllocation, LexRank

PDF Full Text Request

Related items

1	Study On The Text Representation Of Extraction-based Multi-documents Summarization
2	Research On The Topic-oriented Summarization For Web Documents
3	Research On Key Techniques Of Multiple Documents Automatic Summarization
4	Research And Apply On Patient Record Text Mining Based On Latent Semantic Analysis
5	The Research Of Automatic Single Text Summarization Based On Latent Semantic Analysis
6	Research On The Algorithms For Automatic Summarization Of Single Text Documents In Uyghur
7	Sentence Extraction For Multi-Document Summarization Based On Topic Model And Semantics
8	Study On Multi-Document Summarization Algorithm Based On Fusing Topic Sentences Semantic
9	Research On News Summarization Based On Multi-granularity Latent Semantic Model And Submodular Maximization
10	Chinese News Text Opinion Summarization Based On Integrating Sentences Opinion And Topic Similarity