Font Size: a A A

Text Classification Research Based On Improved PCA-SOM Neural Network

Posted on:2014-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:2348330473953758Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
In the era of information overloading, how to quickly and accurately obtain valid information through the network has become the focus of the present study of the issue. Text classification algorithm is an important means to realize information retrieval, are widely used in text filtering, information retrieval, natural language processing and detection, etc.Based on thorough research of text classification algorithm technology, in view of the characteristics of text data and the shortage of traditional characteristic dimension reduction algorithm and the classification algorithm, put forward the dimension reduction algorithm based on the characteristics of the white principal component analysis and the self-organization map neural network text classification algorithm.White principal component analysis (White-PCA) is a kind of multiple data statistics and analysis of technology, In the treatment of high dimensional nonlinear problem has a great advantage, and relative to feature selection can provide more information,self-organizing mapping (SOM) neural network can process parallel data distribution information on a large scale, in addition, ability to learn, convergence speed, can realize the global optimal, and self-organization cluster function. But there are limitations in SOM neural network, covering method, conscience algorithm and kernel function can optimize SOM neural network.Combined the advantage of White-PCA feature dimensionality algorithm with the SOM neural network algorithm, constructed text classification model. Firstly according to the non-linear character of text data, use the white principal component analysis (White-PCA) algorithm to finish feature extraction and dimensional reduction, to implement the feature space noise reduction, dimension reduction and correlation removing, to finish preparing work before complete classification, And then use the SOM neural network to make text classification, this algorithm has the very strong learning, imagine, tolerance and robustness ability; Finally compared the text classification algorithm with Naive Bayes algorithm, KNN algorithm, back propagation(BP) neural network and radial basis function(RBF) neural network. Through the experiment simulation contrast, This algorithm has higher classification accuracy than the Naive Bayes and KNN and faster classification speed than BP neural network and RBF neural network.
Keywords/Search Tags:Text Classification, Feature Dimension Reduction, Whitening Principal Component Analysis, Improved SOM Neural Network, Conscience Algorithm, Kernel Function
PDF Full Text Request
Related items