Font Size: a A A

The Selection And Improvement Of K-means’s Initial Clustering Centers

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:L Q WangFull Text:PDF
GTID:2308330473950981Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In 21ST century, the continuous development of science and technology makes the data mining technology has been paid more and more attention in scholars. Data mining is the procedure of extracting of novel, potentially valuable knowledge and rules in the database. Data Mining is a kind of Knowledge Discovery in Databases (KDD), which handles useful information in large quantities of data in database. Data Mining technology is a rising crossover subject which involves an integration of techniques from multiple disciplines such as combines pattern recognition, databases, statistics, machine learning, artificial intelligence, and other fields. Classification, clustering, association rules are three main large research scopes of Data Mining. Clustering analysis is one of important filed in it. Researching into the subject deeply has most important value not only on theoretic but also on applications. Clustering analysis is to divide the data object into different clusters, and the same cluster has high similarity, but the object of similarity in different data is low. At present, clustering analysis has been applied to pattern recognition, data analysis, image processing and market research.Clustering algorithm is the most important part of the clustering analysis. Currently, clustering algorithms can branch out into the partitioning method, the grid-based method, the density-based method, the hierarchical method, and the model-based method. K-means is a typical clustering algorithm of partitioning method. The biggest advantage of the K-means clustering algorithm is:simple operation, the higher scalability and efficiency in the handling of large data sets. However, the most important infect is that the random selection of initial clustering centers which may cause the local minimum of clustering results of K-means. There are two improved methods of selecting initial clustering centers proposed in this paper.In order to solve the shortcoming of selecting the initial clustering centers of K-means, and on the basis of the detailed analysis and summary of the existing algorithms, this paper proposes the divided clustering algorithm based on density and the Huffman tree clustering algorithm based on density. The approach first selects the initial clustering centers, and then iterate according to the initial clustering center. Both of the two improved clustering algorithms select the initial clustering centers based on certain principles and avoid the weakness of K-means of selecting initial clustering centers randomly. Therefore, they avoid falling into the local minimum. Experiments show that the proposed clustering algorithms could improve the stability and accuracy of clustering results. Compared to the other cluster algorithms, the Huffman tree clustering algorithm based on grids can improve the efficiency of clustering process greatly.
Keywords/Search Tags:data mining, clustering, analysis, initial centers, K-Means
PDF Full Text Request
Related items