Font Size: a A A

Research On Text Clustering Based On Self-Organizing Maps

Posted on:2011-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z M HouFull Text:PDF
GTID:2218330338466963Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, network information becomes an important source of information. The main problems faced by people are not the lack of information, but how to improve the rate to access and visit information. The solutions to these problems are data mining and knowledge discovery. Text mining is one of most important areas of data mining. Meanwhile, text clustering is the one of the core technologies of text mining. In recent years, the research of text clustering has made great development.SOM (Self-organizing maps), developed to simulate the characteristics of signal processing of human brain, is an artificial neural network. The basic idea of SOM is to put similar input mapped into the same output node by network training. Based on the self-organizing map network, this thesis studies the advantages and disadvantages, the main problems and the corresponding solutions of relative algorithms. The main purpose of this thesis is to study the text clustering algorithm with high performance. The main research work is as follows:Firstly, the text preprocessing is the important basis of text clustering. Because many factors may directly affect the final result of text clustering, this thesis studies segmentation, feature extraction and text quantification, which have laid a solid foundation for later research.Secondly, the traditional SOM clustering algorithm should determine the number of cluster categories in advance, but the number of cluster categories may be not suitable for practical situation. This thesis proposes an improved algorithm. The algorithm determines the number of cluster categories by min-max k-means text clustering algorithm.Thirdly, the thesis proposes a new SOM-based text clustering algorithm. This algorithm firstly determines the number of cluster categories by the improved method, handles this value as SOM output layer neurons, and then clusters with SOM algorithm.Finally, some tests are done with the new text clustering algorithm, and the performance of the new text clustering algorithm is analyzed.
Keywords/Search Tags:Data mining, Text clustering, k-means, Cluster categories, Self-organizing maps
PDF Full Text Request
Related items