Research On SOFM Text Clustering Algorithm

Posted on:2018-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:L J Tan

Full Text:PDF

GTID:2348330542991457

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development and popularization of network technology and database technology,people can quickly and easily get and store large amounts of data,and eighty percent of the data is text.At present,how to accurately and quickly obtain useful information from a large-scale text data has become an urgent problem.Under this background,the data mining technology arises at the historic moment,text clustering technology as one of the important branch of data mining technology has become a research hotspot in recent years.As the text data is unstructured,it must transform into structured form with a series of pretreatment technology like words segmentation,stop words processing,character words selection,weight calculation and mathematical model representation before the text data clustering.The traditional self-organizing feature map(SOFM)neural network algorithm is applied in the text clustering.We put forward two improved points which make it more suitable for large scale text data.The first is aiming at random network initial connection weights selection of the traditional SOFM algorithm.It may lead to the training result that makes all samples together.We put forward a SOFM text clustering algorithm which is based on the improved initial connection weights.According to the proposed method to select the initial connection weights,it can make the initial connection weights are close to the input mode of text data,so that it can improve the accuracy of clustering results.At the same time,it can speed up the clustering convergence.The second is aiming at the problems of sparse data and dimension disaster which is caused by the high dimensionality of text data.The text data is represented by vector space model.This paper proposes a SOFM text clustering algorithm which is based on principal component analysis(PCA).Relative to the feature selection method,the proposed algorithm mainly consider that it can keep the right amount of useful feature words in terms of dimension reduction,and will not lose important information.By comparison with the simulation experiments,the algorithm can further improve the clustering accuracy of the algorithm and speed up the clustering.

Keywords/Search Tags:

Text clustering, SOFM algorithm, Initial connection weights, Dimension reduction, Principal component analysis method

PDF Full Text Request

Related items

1	Dimension Reduction Technology Research Based On Text Features
2	A Dimension Reduction Method For Large-scale TExt Categorization
3	A Dimension Reduction Method For Large-scale Text Categorization
4	Clustering Algorithm Research Based On The Bilinear Probabilistic Principal Component Analysis
5	Application Of PCA Dimensionality Reduction Method Based On Latent Variables In Text Classification Problems
6	The Application Of Clustering Analysis Based On Principal Component Analysis And Rough Set In Financial Index Data
7	The Research Of Text Classiifcation Algorithm Based On KPCA And SOFM Neural Network
8	Secure And Efficient Dimension-reducing Ranked Query Method For Encrypted Cloud Data
9	Research On Text Clustering Based On Text Dimension Reduction And Ant Colony Algorithm
10	Research And Application Of Recommendation Algorithm Based On Dimension Reduction And Clustering