Research On Problems Related To The Initial Center Selection In K-means Clustering Algorithm

Posted on:2009-10-21

Degree:Master

Type:Thesis

Country:China

Candidate:X R Wu

Full Text:PDF

GTID:2178360242490861

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data mining is the procedure of extracting of implicit, original unknown and potentially valuable knowledge and rules in the database, which is widely applied in many fields in recent years. It has been achieved a mass of theories and methods, the main research concentratre on the clustering which is based on the distance,for instance K-means clustering is the most classical algorithm.The K-means clustering algorithm is a typical partition method, for it is easy to achieved, scalable and high efficient for disposing big data set. However,there are shortcomings of this algorithm:it requires the user to give the number of clusters beforehand;it is very sensitive to initial conditions, often gets trapped in local minimum and has only the best capability to capture clusters inhyperspherical shape.In this paper , in-depth study and analysis of the clustering algorithm in the K-means clustering algorithm, summed up its strengths and weaknesses. This paper focus of the dependence of the k-means clustering algorithm to the initial value and use a large number of experiments to verificate the impact of the randomly selected initial value to the clustering results. As to the independence of the k-means to the initial centers selection, we present two new initial centers selection algorithms. The researches and contributions are as follows:1. Based on the idea of Huffman tree structure, it is proposed that a new method of selecting the initial k-means clustering centers which improves the instability of the clustering results with randomly selecting initial centers, and to a certain extent improves the shortness of getting the local optimum rather than the global optimun results.2. The initial k-means clustering centers are selected by max distance algorithm, which makes the selected centers express different clusterings, inhances the efficiency of dividing initial data set and conquers the problem of randomly selected centers getting too close, that is several initial centers are selected in the same clustering with no initial centers in small clustering. In addition, the introduction of the method of the weighted features distincts different ditributions to clustering of different features which increases the effectiveness of clustering.

Keywords/Search Tags:

Data Mining, Clustering, K-means Algorithm, Initia center, Feature weighting

PDF Full Text Request

Related items

1	Research And Implementation On Variable Weighting In K-means Type Clustering
2	Clustering Analysis Based On Improved K - Means Algorithm
3	Improvements Of K-means Clustering Algorithm
4	Research On The Improvement Of C-means Clustering Algorithm
5	Research On Feature Weighting And Feature Selection-based Data Mining Algorithms
6	Research And Implementation Of Scenic Area Information Mining System Based On Feature Weighting And Density Clustering
7	The Research Of K-means Clustering Algorithm Improvement
8	Scmi-superviscd K-means Clustering Algorithm In Data Mining
9	K-means Clustering Algorithm
10	The Research Of The K-means Clustering Algorithm Based On Nearest Neighbors