Research On Parallel Sampling K-Means Algorithm Based On MapReduce

Posted on:2017-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:P Cui

Full Text:PDF

GTID:2428330548983848

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

K-means algorithm is widely used in business,academic and other fields because of its simple,fast and easy to implement.But the algorithm depends on the selection of the initial value,poor clustering accuracy,and the face of massive data processing is prone to storage problem.Due to the wide application of Hadoop,the parallel of K-means algorithm is realized,and on this basis to make improvement of Canopy-kmeans algorithm,better solved the massive data storage and the selection of the initial value problem,because it is the global pretreatment of the data,the cost of the initial value selection is higher.Therefore,in view of the above problems,this paper proposes a parallel sampling K-means algorithm based on MapReduce.Using K selection sort algorithm combined with MapReduce programming model for parallel sampling,improve the sampling efficiency.Based on sample preprocessing strategy,to achieve the rapid acquisition of initial value.In the end,replace the mean iteration with the method of weight substitution,which can improve the accuracy of clustering.And through cluster optimization,further improve the efficiency of the algorithm.Experimental results show that the parallel algorithm has better clustering results and speedup,and the performance of the algorithm is improved further in the comparison experiment of the optimized cluster.

Keywords/Search Tags:

K-means algorithm, K selection sort, MapReduce, cluster optimization

PDF Full Text Request

Related items

1	Research On Parallelization Of Clustering Algorithm Based On MapReduce
2	Optimization Of Network And Scheduling For MapReduce In Heterogeneous Cluster
3	Research On Energy Balanced Routing Algorithm Of WSN Based On Rough C-Means Clustering
4	K-means Cluster Algorithm Based On Improved PSO And Its Application In Recommendation System
5	Research On K-Means Algorithm Based On MapReduce
6	Theoretical And Applied Research On Fuzzy C-means Clusteirng And Its Cluster Validation
7	The Research And Implementation Of Interdata Storage And Transaction Optimization On Mapreduce Cluster Engine
8	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
9	Research On Accelerating Of K-means Clustering Algorithm Using FPGA Based On MapReduce
10	Improved K-means Clustering Algorithm Based On MapReduce Framework