Improved Parallel K-means Clustering Algorithm Based On Cuckoo Search

Posted on:2018-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:X J Yu

Full Text:PDF

GTID:2348330533461385

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and Internet technology and the explosive growth of data,the efficient processing and utilization of massive data has become one of the most difficult tasks in the society at present.A major problem in the field of data mining is how to dig out potential and useful knowledge from the existing mass data in an efficient,low cost and accurate way.Clustering analysis represented by K-means algorithm,is one of the most important research directions in data mining.K-means is a typical clustering algorithm based on partitioning method with the advantages of simple,fast convergence,similarly linear time complexity and suitable for clustering of massive data.The optimal solution of the optimal problem can quickly been obtain by the swarm intelligent optimization algorithm using the group advantage and the parallel search in the global optimization way,as the most effective method to solve clustering optimization problem.At present,K-means clustering algorithm has been optimized by many scholars using many different swarm intelligent optimization algorithms.However,in the improved K-means clustering algorithms,there still exist two problems: 1.the global optimization ability in the process of clustering was not outstanding,easy to fall into local optimum;2.there was lower clustering efficiency in the large data scenario and the server cluster advantages cannot been used fully.The main works of the author includes: 1.a quantum-based adaptive cuckoo search algorithm named QACS was proposed to optimize the cuckoo search algorithm so that the adaptive problem of searching step size can been solved by the proposed algorithm,and a certain tendency of searching direction has been get by quantum computing;2.aiming at the problem that the K-means clustering algorithm was easy to fall into the local optimum,a new serial K-means clustering algorithm(QACS-KMeans)was proposed to improve the global search capability combining QACS with the K-means clustering algorithm;3.aiming at the problem that the K-means clustering algorithm existed lower efficiency when large data was handled,the parallel processing of the new algorithm QACS-KMeans was realized by the MapReduce programming model of Hadoop distributed platform.The Hadoop pseudo-distributed cluster was build in the virtual machine,in which 10 accuracy experiments and 10 efficiency experiments were performed on different sample datasets.The experiment results show that: 1.comparing with the original K-means algorithm?PSO-Kmeans and ACS-KMeans algorithm,the average accuracy of clustering of the QACS-KMeans is improved based on six UCI standard data sets;2.the average execution efficiency of clustering of the QACS-KMeans is significantly better than original K-means algorithm and slightly better than parallel PSO-Kmeans and parallel ACS-KMeans algorithm based on five random incremental data sets in the case of very large amount of data.It can be concluded that the parallel QACS-KMeans algorithm shows a better clustering effect in the case of large dada and lower dimension data.

Keywords/Search Tags:

Clustering analysis, K-means, Cuckoo search, Hadoop, MapReduce

PDF Full Text Request

Related items

1	Research On K-means Method Based On Cuckoo Algorithm
2	Optimization And Application Of K-means Clustering Algorithm Based On Spark Framework
3	K-medoids Cluster Analysis Based On Improved Cuckoo Algorithm And Its Parallel Implementation
4	The Research And Application Of Security Log Clustering Mining Algorithm Based On Hadoop Platform
5	Research On Clustering Recommendation Algorithm Based On Cuckoo Search
6	Parallel Clustering Algorithm Based On MapReduce
7	Research And Application Of Cuckoo Search Algorithm
8	The Clustering Algorithm Based On Hadoop Parallel Analysis And Applied Research
9	Research On The Application Of User Behavior Analysis Based On Hadoop
10	Research On Parallel Clustering Algorithm For Large - Scale Data Set