Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm

Posted on:2013-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Li

Full Text:PDF

GTID:2248330374974688

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

Cluster analysis, one of the functions of data mining, analyzes, clusters and groups the data objects, according to the principle of maximizing the similarity between the class objects and minimizing the similarity between different class objects, when the training data does not provide class label. At present, there are a large number of clustering algorithms, and K-means algorithm is one of the widely-used clustering algorithms.K-means algorithm has the following advantages:easy to be achieved, almost linear time complexity, scalable and high efficient in disposing of big data set. However, it also has some weakness:relying on initial conditions, requiring the user to give the number of clusters beforehand, criterion function often trapped in local minimum, and sensitive to outliers.The research mainly improves k-means algorithm, which is based on the selection of the initial focal point in K. Its main idea is:For a collection of data, data object of the high density regions is separated by low-density region of the object, and data objects in the low density region is generally considered to be noise points. Firstly, it divides data object into high-low-density point according to two parameters:ε and MinPts(ε is the radius of the neighborhood; MinPts is the number of data objects which High-density point at least contains within the radius of the neighborhood), and then takes the farthest K points in High-density point set as initial cluster centers. This paper presents several comparable experiments, showing that the improved algorithm can lead to better and more stable solutions than K-means algorithm.The method of giving values to parameters is proposed and the main idea is as follows:taking the middle value of distance for each object, and then taking the average of these intermediate values as the value of ε; as for MinPts, calculating the number of objects in the neighborhood radius ε for each object, summing these numbers, then dividing2*n,(n stands for the number of data set). This paper also presents several experiments, demonstrating that the method is logical.

Keywords/Search Tags:

K-means Algorithm, Initial Cluster Centers, Density, The Radius of The Neighborhood, High-density point

PDF Full Text Request

Related items

1	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
2	The Research And Implementation Of Density-based Clustering Algorithm With Pattern Evaluation Methods
3	Improved K-means Algorithm Based On Optimizing Initial Cluster Centers
4	Research On Improved K-means Clustering Algorithm Based On Density
5	Research On Clustering Methods For High Dimensional Data And Their Application
6	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering
7	K-means Algorithm For Optimizing Initial Clustering Centers Based On Improved Density Peak
8	Research On Improved Density Peak Clustering Algorithm
9	Research On Improvement Of K-means Clustering Algorithm
10	Research And Application Of K-means Clustering Algorithm