Font Size: a A A

Application Of Sub-fuzzy C-means Algorithm In Document Clustering

Posted on:2010-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178330332962342Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
This paper presents the whole process of Chinese text clustering systematic from both pre-processing and text clustering.From the beginning,tne basic status of the key technologies is introduced to enable the reader to Text Clustering first introduced the basic status of key technologies to enable the reader to have a better understanding of the main process of text clustering.Then,it raises a new method of rare words filtering countering the existence of misunderstanding involved with the Data Cleaning process. Because of the integrity and refining of the text feature items,it can improve the effect of text clustering. After that it reduce the dimensionality of text vector through the use of word frequency-counting so as to make it possible to to select the best items which can reflect the text characteristics of the text categories and from that,the text collection can be tromsonbed to the vector space model. The next it resolve he problem that the fuzzy C-means (FCM) algorithm is sensitive to the initial value and easy to fall into local optimization through the application of the improved fuzzy C-means (SUB-FCM) algorithm to text clustering.At last, the traditional FCM algorithm and the SUB-FCM algorithm are applied to text clustering respectively to conduct a comparing experiment by application of text clustering system this paper has planned.Proven,SUB-FCM algorithm can reduce the number of iterations rather than traditional FCM algorithm it is more faster and can get a better initial clustering center;And in the Chinese text clustering,SUB-FCM text clustering designed in this paper has better result.
Keywords/Search Tags:text cluster ing, fuzzy clustering, subtractive clustering, VSM
PDF Full Text Request
Related items