Font Size: a A A

Clustering And Classification Of Data And Text Using Such Technologies As Neural Network

Posted on:2006-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D QianFull Text:PDF
GTID:1118360182975484Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Clustering and classification are one of the most valuable technologies in datamining, and the neural network in soft calculation is one of the main technologies ofclustering and classification. Adaptive Resonance Theory(ART) neural network notonly refers to the physical connection model of human brain neuron, and also to thestudying mechanism of human brain, therefore has the good feature of data clustering.The researches of ART are now still in the beginning phase. In text mining, text vectorset is usually expressed as the high dimensional orthogonal space, therefore, it bringsthe calculation bottleneck and inconsistence with the factual application background.So, the researches of good dimension decreasing algorithm and the improvement ofcurrent space have a lot of developing space.This dissertation presents 4 kinds of improved algorithms based on ART2 neuralnetwork for data clustering. All these improved algorithms overcome the classicalART2's shortcoming such as output without hierarchical network and form thedynamic hierarchical clustering results, and meanwhile decrease the requirements ofvigilance parameter' subjective configuration.The improved algorithm 1 of ART2, which is based on the integration of modul,phase and space density, also overcomes the classical ART2's shortcomings includingvigilance parameter globalization and clustering independence with mode, andclusters by the comprehensive comments of modul and phrase and together with thereference to the previous training result of the neural network. It adjusts the vigilanceparameter according to the number of the input vectors of the classes generated inprevious cycle to realize the vigilance parameter localization based on the spacedensity.The improved arithmetic 2 of ART2 based on the agglomeration and iterationachieves the reasonable clustering result by the manual interaction through iterativemethods, and calculates the required vigilance parameters range necessary to thereasonable clustering result;all the iterative process and the output of neural networkin the iteration process exhibit orderly self-organization feature, and the networktraining time also rapidly decreases in the iterative process.The improved algorithm 3 of ART2 based on Hebb rule and the leakedcompetition allows the multi-neurons to be the winners and calculates the correlationamong the winning neurons.The improved algorithm 4 of ART2 based on Hebb rule and redundant neuronovercomes the shortcomings including relying on winning neuron too much and so on,and it implements hierarchical clustering result using single ART neural network bythe method of considering both the winning neuron and other neuros' information andtogether with Hebb rule.This dissertation also presents a text dimension-reduction algorithm based onrandom mapping(RM), under the conditions of controllable and low cost, andapproximating sufficiently to the calculation and classification results of the originalspace, it can greatly decrease the dimension of the text vector space. On the basis ofthis algorithm this dissertation presents an accelerated latent semantic index(LSI)algorithm that is based on the combination of RM algorithm and latent semantic index,the accelerated LSI algorithm can efficiently reduce the dimension of the space andalso emphasize the semantic relationship, therefore it makes the classificationalgorithms have real-time and better classification accuracy in high-dimension textenvironment.In addition, this dissertation carries out a improved KNN text classificationalgorithm based on pattern aggregation and different weights of each dimension. Onthe basis of data analysis the improved pattern aggregation method is presented. Andthe neural network is used to calculate the weight of each dimension of VSM modelsto overcome such a shortcoming of VSM space as possessing the same weight byeach dimension. Therefore it can increase the text classification precision of theimproved KNN algorithm on the basis of decreasing of the complexity of time andspace.
Keywords/Search Tags:Clustering, Classification, Adaptive Resonance Theory Neural Network, Text Mining, KNN, Random Mapping
PDF Full Text Request
Related items