Font Size: a A A

Improved K-means Algorithm Based On Optimizing Initial Cluster Centers

Posted on:2014-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:2268330401482091Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In order to draw out valuable information for users from huge abstract data is datamining. Clustering analysis is an important branch of data mining field, can classify dataautomatically according to its similarity, it is not only used as an independent data mining toolto find deep message about the distribution of data in database, but also can serve aspreprocessing step for other data mining algorithm.K-means algorithm can be considered as the most important unsupervised machinelearning approach in clustering. It is a partition clustering algorithm, all the data is dividedinto k sub-classes which are quite different, Through such the iterative partitioning, k-meansalgorithm minimizes the sum of distance from each data to its clusters. Because of easyimplementation and efficiency, it is popular and widely used in many fields, such as datamining, pattern recognition and knowledge discovery. However, some limitations still exist.For example, the number of clusters should be given in advance; it is extremely sensitive toinitial cluster centers. If the selected initial cluster centers are not suitable, it is easy to fallinto local optimal solutions and could not guarantee stable results.In this thesis, In order to reduce the dependence on initial values and improve theeffectiveness of k-means algorithm, we explored the optimal choice of initial cluster centersin detail, and proposed the novel IU-M k-means algorithm (K-means Clustering Algorithmbased on Improved UPGMA and Max-min Distance Algorithm), first of all,It takes simplerandom sampling to get a simplified and smaller number of candidate clustering seedcollection, then combines the improved UPGMA algorithm with Max-Min distance method tofind the optimal initial clustering centers, in order to improve k-means algorithm. On the onehand, we can get favorable initial cluster centers to improve the clustering effect, and on theother hand, the number K could be determined intelligently. In this way, we can avoidrandomly selection of the initial value.Comparative experiments between the IU-M k-means algorithm and k-means algorithmwhich is based on Max-Min distance method are done on three benchmark datasets. They arethe databases of Balance-Scale, Glass and New-thyroid. K-means algorithm based onMax-Min distance method is an improvement of K-means algorithm, on this basis, IU-Mk-means algorithm enhances the clustering effect for further. The results demonstrate that the IU-M k-means algorithm is efficient and effective.
Keywords/Search Tags:Clustering, Initial cluster centers, UPGMA, Max-Min distance
PDF Full Text Request
Related items