Font Size: a A A

Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm

Posted on:2013-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y B LiFull Text:PDF
GTID:2248330374974688Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Cluster analysis, one of the functions of data mining, analyzes, clusters and groups the data objects, according to the principle of maximizing the similarity between the class objects and minimizing the similarity between different class objects, when the training data does not provide class label. At present, there are a large number of clustering algorithms, and K-means algorithm is one of the widely-used clustering algorithms.K-means algorithm has the following advantages:easy to be achieved, almost linear time complexity, scalable and high efficient in disposing of big data set. However, it also has some weakness:relying on initial conditions, requiring the user to give the number of clusters beforehand, criterion function often trapped in local minimum, and sensitive to outliers.The research mainly improves k-means algorithm, which is based on the selection of the initial focal point in K. Its main idea is:For a collection of data, data object of the high density regions is separated by low-density region of the object, and data objects in the low density region is generally considered to be noise points. Firstly, it divides data object into high-low-density point according to two parameters:ε and MinPts(ε is the radius of the neighborhood; MinPts is the number of data objects which High-density point at least contains within the radius of the neighborhood), and then takes the farthest K points in High-density point set as initial cluster centers. This paper presents several comparable experiments, showing that the improved algorithm can lead to better and more stable solutions than K-means algorithm.The method of giving values to parameters is proposed and the main idea is as follows:taking the middle value of distance for each object, and then taking the average of these intermediate values as the value of ε; as for MinPts, calculating the number of objects in the neighborhood radius ε for each object, summing these numbers, then dividing2*n,(n stands for the number of data set). This paper also presents several experiments, demonstrating that the method is logical.
Keywords/Search Tags:K-means Algorithm, Initial Cluster Centers, Density, The Radius of The Neighborhood, High-density point
PDF Full Text Request
Related items