Font Size: a A A

Improved Parallel K-means Clustering Algorithm Based On Cuckoo Search

Posted on:2018-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X J YuFull Text:PDF
GTID:2348330533461385Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet technology and the explosive growth of data,the efficient processing and utilization of massive data has become one of the most difficult tasks in the society at present.A major problem in the field of data mining is how to dig out potential and useful knowledge from the existing mass data in an efficient,low cost and accurate way.Clustering analysis represented by K-means algorithm,is one of the most important research directions in data mining.K-means is a typical clustering algorithm based on partitioning method with the advantages of simple,fast convergence,similarly linear time complexity and suitable for clustering of massive data.The optimal solution of the optimal problem can quickly been obtain by the swarm intelligent optimization algorithm using the group advantage and the parallel search in the global optimization way,as the most effective method to solve clustering optimization problem.At present,K-means clustering algorithm has been optimized by many scholars using many different swarm intelligent optimization algorithms.However,in the improved K-means clustering algorithms,there still exist two problems: 1.the global optimization ability in the process of clustering was not outstanding,easy to fall into local optimum;2.there was lower clustering efficiency in the large data scenario and the server cluster advantages cannot been used fully.The main works of the author includes: 1.a quantum-based adaptive cuckoo search algorithm named QACS was proposed to optimize the cuckoo search algorithm so that the adaptive problem of searching step size can been solved by the proposed algorithm,and a certain tendency of searching direction has been get by quantum computing;2.aiming at the problem that the K-means clustering algorithm was easy to fall into the local optimum,a new serial K-means clustering algorithm(QACS-KMeans)was proposed to improve the global search capability combining QACS with the K-means clustering algorithm;3.aiming at the problem that the K-means clustering algorithm existed lower efficiency when large data was handled,the parallel processing of the new algorithm QACS-KMeans was realized by the MapReduce programming model of Hadoop distributed platform.The Hadoop pseudo-distributed cluster was build in the virtual machine,in which 10 accuracy experiments and 10 efficiency experiments were performed on different sample datasets.The experiment results show that: 1.comparing with the original K-means algorithm?PSO-Kmeans and ACS-KMeans algorithm,the average accuracy of clustering of the QACS-KMeans is improved based on six UCI standard data sets;2.the average execution efficiency of clustering of the QACS-KMeans is significantly better than original K-means algorithm and slightly better than parallel PSO-Kmeans and parallel ACS-KMeans algorithm based on five random incremental data sets in the case of very large amount of data.It can be concluded that the parallel QACS-KMeans algorithm shows a better clustering effect in the case of large dada and lower dimension data.
Keywords/Search Tags:Clustering analysis, K-means, Cuckoo search, Hadoop, MapReduce
PDF Full Text Request
Related items