Improved K-means Algorithm Based On Optimizing Initial Cluster Centers

Posted on:2014-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zhang

Full Text:PDF

GTID:2268330401482091

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In order to draw out valuable information for users from huge abstract data is datamining. Clustering analysis is an important branch of data mining field, can classify dataautomatically according to its similarity, it is not only used as an independent data mining toolto find deep message about the distribution of data in database, but also can serve aspreprocessing step for other data mining algorithm.K-means algorithm can be considered as the most important unsupervised machinelearning approach in clustering. It is a partition clustering algorithm, all the data is dividedinto k sub-classes which are quite different, Through such the iterative partitioning, k-meansalgorithm minimizes the sum of distance from each data to its clusters. Because of easyimplementation and efficiency, it is popular and widely used in many fields, such as datamining, pattern recognition and knowledge discovery. However, some limitations still exist.For example, the number of clusters should be given in advance; it is extremely sensitive toinitial cluster centers. If the selected initial cluster centers are not suitable, it is easy to fallinto local optimal solutions and could not guarantee stable results.In this thesis, In order to reduce the dependence on initial values and improve theeffectiveness of k-means algorithm, we explored the optimal choice of initial cluster centersin detail, and proposed the novel IU-M k-means algorithm (K-means Clustering Algorithmbased on Improved UPGMA and Max-min Distance Algorithm), first of all,It takes simplerandom sampling to get a simplified and smaller number of candidate clustering seedcollection, then combines the improved UPGMA algorithm with Max-Min distance method tofind the optimal initial clustering centers, in order to improve k-means algorithm. On the onehand, we can get favorable initial cluster centers to improve the clustering effect, and on theother hand, the number K could be determined intelligently. In this way, we can avoidrandomly selection of the initial value.Comparative experiments between the IU-M k-means algorithm and k-means algorithmwhich is based on Max-Min distance method are done on three benchmark datasets. They arethe databases of Balance-Scale, Glass and New-thyroid. K-means algorithm based onMax-Min distance method is an improvement of K-means algorithm, on this basis, IU-Mk-means algorithm enhances the clustering effect for further. The results demonstrate that the IU-M k-means algorithm is efficient and effective.

Keywords/Search Tags:

Clustering, Initial cluster centers, UPGMA, Max-Min distance

PDF Full Text Request

Related items

1	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
2	Research On Initial Centers Selection Method For K-modes Clustering
3	Research On Initial Cluster Centers Choice Algorithm And Clustering For Imbalanced Data
4	The Selection And Improvement Of K-means’s Initial Clustering Centers
5	Research On Hybrid Algorithm Based On Subtractive Clustering
6	Algorithms Implementation Of Determining The Number Of Clusters And Initial Cluster Centers For Mixed Data
7	Studies On Clustering Algorithms For Categorical Data
8	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm
9	Research And Implementation Of Fuzzy Clustering Algorithm
10	Research On Advertisement Recommendation System Based On Data Mining