Optimize SOM Algorithm To Apply In Text Clustering

Posted on:2009-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:A X Sun

Full Text:PDF

GTID:2178360272463229

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of network technology and the popularity of the rapid expansion of information, in order to gain useful information from the large information sea, data mining and knowledge discovery technology arise at the historic moment. Because text is the most important existing form of information, correspondly text mining is one of the most important data mining fields. Clustering is one of the fundamental technology in text mining field. The research in this text clustering field has undergone considerable development in recent years. As text is unstructured data, in order to cluster them, pretreatment technologies must be adopted to transform them to structured form. So Firstly this paper introduces the text pretreatment technology such as word segmentation, stemming, dimension reducing systematically. Clustering technology is the key technology in text clustering field. Since the 1950s, a variety of clustering algorithm has been invented, of which SOM algorithm is a very famous one. Then this paper sets focus on studying SOM algorithm, and make two important improvements.SOM neural network is one kind of artificial neural networks by simulating the signal processing characteristics of the human brain. The basic idea of SOM clustering is to through network training, map the similar input vectors to one output node, so it can realize the input vector clustering.This paper improves SOM algorithm from two aspects. The first is to take the text clustering goal-the minimum of average deviation, also called the average similarity within cluster into account, then proposes an improved learning strategy. The improved algorithm introduces the equal deviation error theory into the learning process of neural network, the algorithm guides neural network learning through the adjustment of the cluster deviation in order to make clustering results with the smallest average deviation. This improved algorithm not only solve the neurons less-use and over-use problem, and has greatly enhanced the quality of the text clustering outcomes.The second is: aiming at the problems of random weight initialization causing long training time of the net, a hierarchical clustering method is used to detect data-intensive region, the centers of K regions which are detected are used to initialize the connecting weight. Experiments show that the improved SOM can reduce the training time of network and is not easily converge to a local optimum. Meanwhile, in order to express the result of clustering easily, we select several important key words to express clusters appropriately in that the content of clusters can be understood correctly and the performance and efficiency of information processing can be enhanced.

Keywords/Search Tags:

Text Clustering, Self Organizing feature Mapping, Equal Eerror, Weight Initialization, Label

PDF Full Text Request

Related items

1	Knn Text Classification Algorithm Based On The Semantics Of The Center
2	Research Of Text Clustering Based On Self-Organizing Maps
3	Study On Two-stage Chinese Text Clustering Based On Self-organizing Of Map
4	Research And Design Of Classification Algorithm Based On Massive Multi-label Text
5	Research On Clustering Algorithm Of K-medoids And Its Application In Text Clustering
6	Research On Text Clustering Based On Self-Organizing Maps
7	Optimization Of Som Algorithm And Application In Chinese Text Clustering
8	Clustering Algorithm Research Based On Self-Organizing Feature Map Network
9	Research On Eigenvector Mapping Algorithm Based On Multi-label
10	Research On The Extraction Of Voice Pitch Frequency Modes Based On Self-organizing Feature Map