Font Size: a A A

Research On Problems Related To The Initial Center Selection In K-means Clustering Algorithm

Posted on:2009-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:X R WuFull Text:PDF
GTID:2178360242490861Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is the procedure of extracting of implicit, original unknown and potentially valuable knowledge and rules in the database, which is widely applied in many fields in recent years. It has been achieved a mass of theories and methods, the main research concentratre on the clustering which is based on the distance,for instance K-means clustering is the most classical algorithm.The K-means clustering algorithm is a typical partition method, for it is easy to achieved, scalable and high efficient for disposing big data set. However,there are shortcomings of this algorithm:it requires the user to give the number of clusters beforehand;it is very sensitive to initial conditions, often gets trapped in local minimum and has only the best capability to capture clusters inhyperspherical shape.In this paper , in-depth study and analysis of the clustering algorithm in the K-means clustering algorithm, summed up its strengths and weaknesses. This paper focus of the dependence of the k-means clustering algorithm to the initial value and use a large number of experiments to verificate the impact of the randomly selected initial value to the clustering results. As to the independence of the k-means to the initial centers selection, we present two new initial centers selection algorithms. The researches and contributions are as follows:1. Based on the idea of Huffman tree structure, it is proposed that a new method of selecting the initial k-means clustering centers which improves the instability of the clustering results with randomly selecting initial centers, and to a certain extent improves the shortness of getting the local optimum rather than the global optimun results.2. The initial k-means clustering centers are selected by max distance algorithm, which makes the selected centers express different clusterings, inhances the efficiency of dividing initial data set and conquers the problem of randomly selected centers getting too close, that is several initial centers are selected in the same clustering with no initial centers in small clustering. In addition, the introduction of the method of the weighted features distincts different ditributions to clustering of different features which increases the effectiveness of clustering.
Keywords/Search Tags:Data Mining, Clustering, K-means Algorithm, Initia center, Feature weighting
PDF Full Text Request
Related items