Font Size: a A A

Research On K - Means Initialization Algorithm

Posted on:2016-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:J D WeiFull Text:PDF
GTID:2208330461479312Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology, more and more data appeared in the daily production and life. Data mining techniques have emerged, and become a hot technology that must be talked in big data era. This article introduces the overview of data mining, introduces the definition of cluster analysis and related knowledge. In this paper we talk the method K-means algorithm, find out the advantages and the disadvantages of K-means algorithm. For its deficiencies of initialization of clustering centers and the clustering number need to be known beforehand, we design a kind of new algorithm which can automatically determine the clustering centers and the number of the dataset. Specific work of this paper includes the following points:First of all, we study the clustering validity evaluation index, the performance of commonly used clustering validity evaluation criteria VIn and DBI index in catching the uniform effect of K-means algorithm, capturing data member change in the clustering results and founding the class number of the data set is very good.Then the initialization method based on genetic algorithm is studied, namely, GA is used to determine the initial cluster cents,and the detailed algorithm floachart and experimental re sults are presented.Then hierarchical initialization method is studied, a way to reasonably determine the center of the initial method was designed:sample the data layer by layer, then cluster in the end layer of the sampling, and the cluster centers are mapped to the original data layer as the initial clustering center, so as to get the initial clustering center of the original data set. The experiment results show that the hierarchical initialization method can identify the initial clustering center so as to reduce the number of iterations, improve the convergence speed.Finally we combines hierarchical initialization method and DBI index, design a new algorithm that can automatically determine the number of categories (DHIKM for short).First of all to the original data grid sampling layer by layer, decrease the amount of data needed to compute; then cluster at the end of the sampling layer, through DBI index to determine the best clustering number; finally top-down, sampling layer clustering center is mapped to the next layer as the initial clustering center and so on until the original data layer. Simulation data set and the UCI data sets show that the improved DHIKM is effectives.
Keywords/Search Tags:Data Mining, K-means Algorithm, DBI, Hierarchical Initialization
PDF Full Text Request
Related items