Font Size: a A A

A Clustering Algorithm Based On Dynamic Cluster Center Shifting

Posted on:2006-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:L S LiFull Text:PDF
GTID:2168360152466594Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the coming of Internet, there is a rapid accumulation of information. How to extract the useful knowledge from mass information is real challenge for those scientific researchers. Based on the fact that the majority of information is in a form of text, some methods have been developed the to classify the text. Among the common techniques of the text mining, text clustering is most frequently used for its simplicity and efficiency. Focusing on the text information processing, the thesis carries out a thorough research on text clustering in two levels, theory and application.Firstly, the thesis introduces the background of the text clustering, which includes common approaches of text mining, the role of text clustering in text mining, the data structure and data measurement of the text clustering and a brief introduction of the most recent development of text clustering. Secondly, with an analysis of the high dimension and sparse feature of the text data, the thesis discusses two different kinds of feature description methods,CF and SFV, and their applicable algorithms, CFK-means and CABOSFV.However,k-means algorithm is sensible of initial partition, which means without a suitable initial partition, it would easily be trapped into local optimal solution. In aiming to get rid of the weak point of the k-means method, the thesis develops a novel algorithm based on the idea of active clustering-center-shifting. With constantly detecting and comparing the similarity of the different clustering centers and the documents to be classified, the algorithm implements a coalition strategy to obtain a better initial partition. And, the algorithm is then applied in high dimension text clustering.The core of the thesis is the improvement of the text clustering. The digital experiment shows that the new algorithm works well in providing the k-means method a suitable initial partition, and thus obtains a desirable clustering result. Finally, by analyzing the high dimension and sparse feature of the text, the thesis improves the clustering feature (CF) and applies the new algorithm in the high dimension text clustering case and obtains a better result.
Keywords/Search Tags:Text Clustering, high dimension and sparse, Clustering Feature (CF), cluster-center-shifting
PDF Full Text Request
Related items