Font Size: a A A

Analysis And Improvement Of Traditional Clustering Methods

Posted on:2008-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2190360245482384Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The Clustering has always been an important research area in data mining. At first, this paper introduces Clustering's extensive applications .Then, on the base of classify in clustering methods, this paper emphasizes hierarchical clustering method, traditional Partitioning clustering—k-means algorithm, k-center algorithm and their mutations. The advantages and disadvantages of the above two mentioned algorithms were synthetically analyzed and compared.K-means algorithm is the most widespread method in cluster analysis., which is simple, fast and effective. But at the same time its vital shortcoming is the sensibility to initial value; it is easy to run into a local optimum. Also the number of the clusters needs to be specified in advance, and It cannot be clearly and easily confirmed in fact.Main works in the paper can be summed up in two aspects.1. An improved algorithm, which not only was more effective, but also had an ideal result, was proposed in this paper. At first, it get initial centers from hierarchical cluster analysis, which avoid a random selection of k initial centers. To a certain extent, the possibility of running into a local minimum has been greatly reduced.2. Also, this paper designs a new algorithm to look for optimum k value since the classical k-means algorithm has another important shortcoming that the number of the clusters needs to be specified in advance. The introduction of a simple function to describe clustering quality can identify optimal number.
Keywords/Search Tags:Hierarchical cluster analysis, Partitioning Cluster analysis, k-means methods, optimum number of clusters
PDF Full Text Request
Related items