Font Size: a A A

Research And Implementation Of Text Clustering Based On Fuzzy C-Means Clustering Algorithm

Posted on:2014-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ShangFull Text:PDF
GTID:2248330395480920Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text information is growing in high speed as the representative of news on Internet, how to carry out effective text clustering algorithm is a research focus in the areas of data mining. Fuzzy clustering has established the sample for class of uncertain descriptions, and being able to reflect the real world more objectively and gradually becomes the mainstream of Cluster analysis. Fuzzy c-means clustering algorithm (FCM) is the most popular algorithm based on the objective function. However, FCM has the shortcoming of needing to know the number of clusters in advance, being sensitive to the initial cluster centers.In order to overcome shortcoming of FCM algorithm, we have done the following works.1) we propose the concept of adjacent group; it can reduce huge data size effectively.2) we introduce the evaluation model of clustering validity to calculate the comprehensive index value of clustering, and then we optimize the cluster centers by using genetic algorithm, which can ultimately obtain the number of clusters and the optimal initial cluster centers in an iterative way.3) the concept of adjacent group and adhesion degree can be used to modify the membership function of FCM algorithm and the convergence speed of the FCM algorithm can be accelerated.Finally, we design and implement a text clustering system according to the improved FCM algorithm. The experimental results show that the improved FCM algorithm has an obvious advantage while it is applied to Chinese text clustering.
Keywords/Search Tags:Text clustering, Tuzzy c-means clustering algorithm, Adjacent group, Adhesiondegree, Genetic algorithm
PDF Full Text Request
Related items