Font Size: a A A

A Comparison And Improvement Of Several Partitional Clustering Algorithms In Data Mining

Posted on:2009-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2178360242484853Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Data mining technology is a kind of new technology which is overlapping by several subjects. It is produced and developed along with accumulation of the data sets, as well as the demand of information and acknowledge by the competition of the market, and it is Gradually becoming hot by people. Clustering analysis is an active and challenge research direction in the field of data mining. At present, there are several partitional clustering algorithms which are more popular and used more often, such as K-means, K-harmonic means, fuzzy C-means and spectral clustering .But these algorithms also have some disadvantage, like sensitive to the initial points, not fit for the lager scale data sets, the speed of convergence is low and so on.The principal components analysis (PCA) is a explorative statistical analysis method that concentrate the information which is dispersed in a group of variables to some several composite targets (principal components).and it is also a processing technology which can reduce the dimension of the data. This paper will first proposed a new kind of clustering algorithm based on the PCA and the largest or second largest eigenvalue. This algorithm combines the advantage of both PCA and the clustering algorithm which is chosen. It makes the algorithm more practical and actual. And according to the selection of a new metric, we propose a alternative K-Harmonic Means algorithm, which can receive better clustering results by adjusting a parameter.This paper elaborate the basis of theory of clustering analysis and PCA, Analyzed and compares several kind of more popular partitional clustering algorithms, then we propose a new clustering algorithm based on the largest or the second largest eigenvalue and principal component analysis. At last, we do numerical experiments on three data, The numerical results illustrate that the new clustering algorithms has advantages in the computation time, iteration numbers and clustering results. And alternative K-Harmonic Means algorithm also has better clustering results, with the computation time and iteration numbers than the usual K-Harmonic Means.
Keywords/Search Tags:Clustering Algorithm, Partitional Clustering, Principal Component Analysis, K-Harmonic Means
PDF Full Text Request
Related items