Font Size: a A A

Research On The Selection Of Initial Cluster Centers In K-means Algorithm

Posted on:2019-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2428330548457776Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is one of the most important data mining methods.K-means is one of the most widely used clustering algorithms due to its simplicity and effectiveness.However,the performance of this algorithm is highly dependent on the selection of initial cluster centers.If the initial cluster centers are not properly selected,it is easy to fall into the local optimum and result in poor performance.Based on this,numerous improved methods have been proposed,but most of them cannot dynamically adapt to data sets with different features.Therefore,it is very important to propose an adaptive selection method for initial cluster centers.According to the current research status,we mainly did the following work:(1)The density of point is defined in the previous initial cluster center selection methods.However,these definitions decrease the discriminative power of the density of point.To address this issue,this paper redefines the density of point according to the number of its neighbors,as well as the distance between point and its neighbors.This method can effectively increase the discriminative power of the density of point.(2)The selection of initial cluster centers plays a significant role for the clustering performance of K-means algorithm.And numerous methods have been proposed,which have good performance only on the part of the data sets.In fact,the feature of data sets in the real world is various,it is of great significance to propose an initial cluster center selection method which can dynamically adapt to data sets with different features.Hence this paper proposes a new distance metric—Hybrid Distance.And based on the Hybrid Distance,an initial cluster center selection method with parameter ? is proposed.The experimental results show that our method can improve the clustering accuracy of K-means more effectively than the previous methods.(3)Due to the adjustment of parameter ?,there are several kinds of clustering results.So it is worth further discussing which the best clustering result is when the correct labels are unknown.To this end,this paper proposes a new internal clustering validation measure,the clustering validation index based on the neighbors(CVN),which can be exploited to select the optimal result among multiple clustering results.And its effectiveness is better than other internal clustering evaluation indexes,which are widely utilized.
Keywords/Search Tags:Clustering, Initial cluster centers, K-means, Hybrid distance, Density, Internal clustering evaluation index
PDF Full Text Request
Related items