Font Size: a A A

Optimization Of Som Algorithm And Application In Chinese Text Clustering

Posted on:2011-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:F R LiuFull Text:PDF
GTID:2198330332974064Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,foreign scholars have done a lot research, and made som outstanding results on the English text clustering. Compared with the English text clustering, Chinese text clustering technology research and application started more lately, and text clustering results in general is not very satisfactory. For this situation, this paper had some depth-study of Chinese text clustering techniques. It focused on improving SOM clustering algorithm, and applied to Chinese text clustering.It started work in the following aspects:(1) Explored the Chinese text clustering technologies, including Chinese pretrcatment such as Chinese word segmentation, Stop words filter, and Feature selection and so on and a variety of commonly used clustering algorithm.(2) The curse of dimensionality for feature items would cause the large calculation load. So in the Chinese pretreatment it introduced synonyms merger technology, to achieve to reduce Feature space dimension and improve speed and accuracy of clustering.(3) Explored SOM neural network theory and analyze its defects.Beacuse its Number of clusters to be pre-input, fixed network and unsatisfactory initialization result, this paper proposed an improved and growth-self SOM algorithm to solve these problems.(4) Using C#.Net technology designed and implemented a "Chinese Text Clustering System" platform with both research and application value.And then conducted a systematic testing and evaluation, evaluation results show that the improved SOM algorithm can optimize cluster results.
Keywords/Search Tags:text clustering, self-organizing neural network, Chinese word segmentation, synonyms merger, feature selection, pretreatment
PDF Full Text Request
Related items