Research On K - Means Initialization Algorithm

Posted on:2016-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:J D Wei

Full Text:PDF

GTID:2208330461479312

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet technology, more and more data appeared in the daily production and life. Data mining techniques have emerged, and become a hot technology that must be talked in big data era. This article introduces the overview of data mining, introduces the definition of cluster analysis and related knowledge. In this paper we talk the method K-means algorithm, find out the advantages and the disadvantages of K-means algorithm. For its deficiencies of initialization of clustering centers and the clustering number need to be known beforehand, we design a kind of new algorithm which can automatically determine the clustering centers and the number of the dataset. Specific work of this paper includes the following points:First of all, we study the clustering validity evaluation index, the performance of commonly used clustering validity evaluation criteria VIn and DBI index in catching the uniform effect of K-means algorithm, capturing data member change in the clustering results and founding the class number of the data set is very good.Then the initialization method based on genetic algorithm is studied, namely, GA is used to determine the initial cluster cents,and the detailed algorithm floachart and experimental re sults are presented.Then hierarchical initialization method is studied, a way to reasonably determine the center of the initial method was designed:sample the data layer by layer, then cluster in the end layer of the sampling, and the cluster centers are mapped to the original data layer as the initial clustering center, so as to get the initial clustering center of the original data set. The experiment results show that the hierarchical initialization method can identify the initial clustering center so as to reduce the number of iterations, improve the convergence speed.Finally we combines hierarchical initialization method and DBI index, design a new algorithm that can automatically determine the number of categories (DHIKM for short).First of all to the original data grid sampling layer by layer, decrease the amount of data needed to compute; then cluster at the end of the sampling layer, through DBI index to determine the best clustering number; finally top-down, sampling layer clustering center is mapped to the next layer as the initial clustering center and so on until the original data layer. Simulation data set and the UCI data sets show that the improved DHIKM is effectives.

Keywords/Search Tags:

Data Mining, K-means Algorithm, DBI, Hierarchical Initialization

PDF Full Text Request

Related items

1	Research On The Automatic Initialization Techniques For YH-SUPE-based Parallel Simulation System
2	Research On Key Technologies Of Media Data Process For Social Network
3	Methods And Applications Study Of Cluster-based Spatial Data Mining
4	Research On Telecom Lte Users Churn Algorithm Based On Data Mining
5	Application Of K - Means Algorithm In Microblogging Data Mining
6	The Research And Application Of Improved K-Means Algorithm In Data Mining
7	The Research Of K-means Clustering Algorithm In Data Mining
8	The Application Of Data Mining In The Customer Classification In The Telecommunication Field
9	The Research Of Clustering Data Mining Based On Swarm Intelligence Algorithm
10	Research And Application Of K-means Algorithm In Data Mining Technology Based On Genetic Algorithm