Font Size: a A A

A Document Clustering Method Based On Affinity Propagation And Agglomerative Hierarchical Clustering

Posted on:2011-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y C HeFull Text:PDF
GTID:2178330338981048Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the data on Internet is growing explosively. How to obtain useful information from internet fast and accurately become a focus of people. As a key technology of processing and organizing large number of text data, document clustering can solve the problems arising from the information explosion. Moreover, document clustering has a broad prospects, which is applied to infomation filtering, information retrieval, text databases and digital libraries.Affinity propagation clustering was proposed recently, which was famous for its high converging speed and optimal cluster result. It performed well for many applications in image categorization, gene expressions and speaker clustering. However, it was rare to apply the method to Chinese document clustering. This paper introduced it to cluster Chinese documents, and then proposed a novel document clustring method based on affinity propagation and agglomerative hierarchical clustering (APAHC). This study covers the following content:(1) The paper analyzed the problems arising by affinity propagation method clustering documents, and then proposed a new document clustering method. It selected features on the clusters from affinity propagaion firstly, and then applied agglomerative hierarchical clustering method to refine the result by affinity propagation.(2) This study introduced an incremental clustering method by data divsion based on APAHC algorithm. In order to cluster large-scale data, the method partitioned large data set into several small data sets before clustring the samll data set, after that, it merged the clustering result of small data set finally.(3) The study designed and implemented an online news clustering system with APAHC algorithm, to detect the hot events and important news everyday. The study performed some document clustering experiments on four data sets. Compared with K-Means clustering, agglomerative hierarchical clustering and affinity propagation clustering, APAHC obtained best result in all data sets.
Keywords/Search Tags:affinity propagtion clustering, agglomerative hierarchical clustering, document clustering
PDF Full Text Request
Related items