Compare Analysis Of Document Clustering Algorithm For Large Data Set And The Application In Sense Induction

Posted on:2011-09-18

Degree:Master

Type:Thesis

Country:China

Candidate:M Xi

Full Text:PDF

GTID:2248330395457678

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Document Clustering is a hot topic in date mining. With the development of internet, people face more and more information which has poor structure. How to organize them to make users can find the information they need is the motivation of document clustering.Automatically organize documents and take the document into clusters needs preprocess the document, clustering analysis, and evaluate the results. The key technique involved include vector space model (VSM), cluster analysis, cluster evaluation.At present a lot of researches on document clustering only give the comparison and analysis of one clustering algorithm and its improvement on document dataset. There has no a report on the performance of different clustering algorithm. The paper compare and analyze partition based algorithm, hierarchy based algorithm, density based algorithm, give a detailed report.A lot of clustering algorithm do not fit large data set, even some algorithm can not perform, e.g. agglomerative clustering algorithm. But the development of Internet challenge document clustering with large data set. In the last of the paper, we put forward an algorithm can used on large data set. The algorithm improves the speed of clustering without recede the performance.Document clustering algorithm can use in many tasks, e.g. sense inductive. Sense inductive automatic clusters the ambiguity word sense. In the last of the paper, we show the application of document clustering algorithm in the task of sense inductive.

Keywords/Search Tags:

Vector Space Model, K-means, Agglomerative Hierarchical Clustering, ClusterEvaluation

PDF Full Text Request

Related items

1	Efficient Algorithms for Hierarchical Agglomerative Clustering
2	A Document Clustering Method Based On Affinity Propagation And Agglomerative Hierarchical Clustering
3	Study Of Chinese Text Clustering On Improved K-means Algorithm
4	The Network Public Opinion Monitoring System Research And Exploitation
5	Application and evaluation of Hierarchical Agglomerative Clustering in Wireless Sensor Networks
6	Micro-blog Hot Topics Detection Method Based On Hybrid Clustering
7	The Research On Bag-of-Words Based On Improved K-means And Hierarchical Cluster Algorithms
8	Research On Gauss Mixture Clustering Algorithms In Image Retrieval
9	Research And Implementation On The E-mail Classification System Based On Text Clustering Technology
10	The Study And Development Of Hierarchical-K-means-Based Clustering Algorithm