Research On The Selection Of Initial Cluster Centers In K-means Algorithm

Posted on:2019-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2428330548457776

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Clustering is one of the most important data mining methods.K-means is one of the most widely used clustering algorithms due to its simplicity and effectiveness.However,the performance of this algorithm is highly dependent on the selection of initial cluster centers.If the initial cluster centers are not properly selected,it is easy to fall into the local optimum and result in poor performance.Based on this,numerous improved methods have been proposed,but most of them cannot dynamically adapt to data sets with different features.Therefore,it is very important to propose an adaptive selection method for initial cluster centers.According to the current research status,we mainly did the following work:(1)The density of point is defined in the previous initial cluster center selection methods.However,these definitions decrease the discriminative power of the density of point.To address this issue,this paper redefines the density of point according to the number of its neighbors,as well as the distance between point and its neighbors.This method can effectively increase the discriminative power of the density of point.(2)The selection of initial cluster centers plays a significant role for the clustering performance of K-means algorithm.And numerous methods have been proposed,which have good performance only on the part of the data sets.In fact,the feature of data sets in the real world is various,it is of great significance to propose an initial cluster center selection method which can dynamically adapt to data sets with different features.Hence this paper proposes a new distance metric�Hybrid Distance.And based on the Hybrid Distance,an initial cluster center selection method with parameter ? is proposed.The experimental results show that our method can improve the clustering accuracy of K-means more effectively than the previous methods.(3)Due to the adjustment of parameter ?,there are several kinds of clustering results.So it is worth further discussing which the best clustering result is when the correct labels are unknown.To this end,this paper proposes a new internal clustering validation measure,the clustering validation index based on the neighbors(CVN),which can be exploited to select the optimal result among multiple clustering results.And its effectiveness is better than other internal clustering evaluation indexes,which are widely utilized.

Keywords/Search Tags:

Clustering, Initial cluster centers, K-means, Hybrid distance, Density, Internal clustering evaluation index

PDF Full Text Request

Related items

1	Improved K-means Algorithm Based On Optimizing Initial Cluster Centers
2	Research On Improvement Of K-means Clustering Algorithm
3	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm
4	Research On Hybrid Algorithm Based On Subtractive Clustering
5	Improvements And Implementation Of K-means Clustering Algorithm
6	The Selection And Improvement Of K-means’s Initial Clustering Centers
7	Research On Clustering Algorithm Based On Minimum Spanning Tree
8	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering
9	Algorithms Implementation Of Determining The Number Of Clusters And Initial Cluster Centers For Mixed Data
10	Research On Initial Cluster Centers Choice Algorithm And Clustering For Imbalanced Data