A Clustering Algorithm Based On Dynamic Cluster Center Shifting

Posted on:2006-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:L S Li

Full Text:PDF

GTID:2168360152466594

Subject:Computer application technology

Abstract/Summary:

With the coming of Internet, there is a rapid accumulation of information. How to extract the useful knowledge from mass information is real challenge for those scientific researchers. Based on the fact that the majority of information is in a form of text, some methods have been developed the to classify the text. Among the common techniques of the text mining, text clustering is most frequently used for its simplicity and efficiency. Focusing on the text information processing, the thesis carries out a thorough research on text clustering in two levels, theory and application.Firstly, the thesis introduces the background of the text clustering, which includes common approaches of text mining, the role of text clustering in text mining, the data structure and data measurement of the text clustering and a brief introduction of the most recent development of text clustering. Secondly, with an analysis of the high dimension and sparse feature of the text data, the thesis discusses two different kinds of feature description methods,CF and SFV, and their applicable algorithms, CFK-means and CABOSFV.However,k-means algorithm is sensible of initial partition, which means without a suitable initial partition, it would easily be trapped into local optimal solution. In aiming to get rid of the weak point of the k-means method, the thesis develops a novel algorithm based on the idea of active clustering-center-shifting. With constantly detecting and comparing the similarity of the different clustering centers and the documents to be classified, the algorithm implements a coalition strategy to obtain a better initial partition. And, the algorithm is then applied in high dimension text clustering.The core of the thesis is the improvement of the text clustering. The digital experiment shows that the new algorithm works well in providing the k-means method a suitable initial partition, and thus obtains a desirable clustering result. Finally, by analyzing the high dimension and sparse feature of the text, the thesis improves the clustering feature (CF) and applies the new algorithm in the high dimension text clustering case and obtains a better result.

Keywords/Search Tags:

Text Clustering, high dimension and sparse, Clustering Feature (CF), cluster-center-shifting

Related items

1	Precise Clustering Algorithm For Chinese Text Based On K-means
2	Knn Text Classification Algorithm Based On The Semantics Of The Center
3	A Research Of Developed Algorithms About Text Cluster Center Choose
4	Research On Clustering Algorithm Of K-medoids And Its Application In Text Clustering
5	Study On Several Issues Of Text Clustering
6	Research And Implementation Of Text Clustering Based On Dk-means
7	Research And Implementation Of Text Clustering Based On DK-Means
8	Research On The Key Technology Of Text Clustering
9	Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy
10	Research On Dimension Reduction Algorithms For Preserving Clustering Structures