Font Size: a A A

Compare Analysis Of Document Clustering Algorithm For Large Data Set And The Application In Sense Induction

Posted on:2011-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:M XiFull Text:PDF
GTID:2248330395457678Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Document Clustering is a hot topic in date mining. With the development of internet, people face more and more information which has poor structure. How to organize them to make users can find the information they need is the motivation of document clustering.Automatically organize documents and take the document into clusters needs preprocess the document, clustering analysis, and evaluate the results. The key technique involved include vector space model (VSM), cluster analysis, cluster evaluation.At present a lot of researches on document clustering only give the comparison and analysis of one clustering algorithm and its improvement on document dataset. There has no a report on the performance of different clustering algorithm. The paper compare and analyze partition based algorithm, hierarchy based algorithm, density based algorithm, give a detailed report.A lot of clustering algorithm do not fit large data set, even some algorithm can not perform, e.g. agglomerative clustering algorithm. But the development of Internet challenge document clustering with large data set. In the last of the paper, we put forward an algorithm can used on large data set. The algorithm improves the speed of clustering without recede the performance.Document clustering algorithm can use in many tasks, e.g. sense inductive. Sense inductive automatic clusters the ambiguity word sense. In the last of the paper, we show the application of document clustering algorithm in the task of sense inductive.
Keywords/Search Tags:Vector Space Model, K-means, Agglomerative Hierarchical Clustering, ClusterEvaluation
PDF Full Text Request
Related items