Font Size: a A A

Clustering, Based On The Chinese Text Of The Som Algorithm

Posted on:2009-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2208360245479097Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Text mining is a popular research branch of data mining domain. Text clustering helps to reduce the search space of data and improve the query accuracy in the data mining domain. As an unsupervised machine learning method, text clustering technique has become an important method to organize and summarize text information. More and more researchers have focused on text clustering because of its great value in both theory and application.The Self Organizing Map neural network (SOM) has many advantages in self-organizing, clustering visualization, computing efficiency and clustering precision. Therefore, we use SOM network in Chinese text clustering and analyze the characteristic of SOM in clustering.We firstly introduce some key techniques in text preprocessing: word segmentation, word filtration, character word extraction and text expressing. After that, we carry out a text preprocessing model: Firstly, we make a stop-word list from an existed vocabulary according to part of speech. Then, we remove stop words from all the words of texts and choose a number of character words according to some measurement. At last, we use Vector Space Model (VSM) to express every text into real vectors.After the preprocessing, we use SOM network and class label method for text clustering under the known information of text class. We use the kernel - SOM to improve clustering effect. Then we do several experiments to compare clustering precision and robustness between the original SOM and kernel - SOM.However, if the information of text class is unknown beforehand, we can't do text clustering automatically. So, we integrate SOM network and k-means clustering method to carry out a two-phase text clustering. This two-phase model has some advantages over k-means model in clustering visualization and computing efficiency. But the final clustering result depends on the training effect of the SOM network.
Keywords/Search Tags:clustering, text clustering, SOM, kernel-SOM, K-means
PDF Full Text Request
Related items