Font Size: a A A

Research On Parallelization Of Clustering Algorithm Based On Many Cores

Posted on:2019-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:F R DingFull Text:PDF
GTID:2438330551960790Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
We live in the information age,which large amount of data is generated from all walks of life every day.By cluster analysis,people can find valuable information from the massive data.However,with the increase of data size,the clustering process is time-consuming and fail to meet people's requirements for data processing speed.Therefore,this paper studies the improvement and parallelization of clustering algorithms on multi-cores GPU.The specific research work carried out in this paper is as follows:1.For the K-means clustering results vulnerable to the initial cluster center problem,this paper presents an improved initial cluster center selection algorithm.Firstly,the algorithm initializes the cluster centers,and then calculates the fitness values of multiple cluster centers respectively.Finally,the cluster center with the highest fitness value is selected as the initial cluster center.The experimental results of multiple clustering on clustering datasets show that the clustering results of this method are more stable.2.In the K-means algorithm,the calculation between the sample point and the center of the clusters is data parallelism and time-consuming.Thereby this paper proposes a parallel GPU-based K-means algorithm by calculating the distance parallel.Experiments show that the parallel K-means algorithm has a higher speedup.3.Aiming at the problems of density peak clustering algorithm,this paper proposes a KNN-DPC based on KNN,which is based on KNN.The algorithm firstly calculates the local density of samples based on KNN,and then automatically select cluster center based on the linear least square method.After the cluster centers are selected,the remaining samples are assigned to cluster centers forming the initial clustering results.Finally,merging clusters to form the final clustering result based on the density reachability analysis.Experimental results on benchmark datasets show that the improved method in this paper has a better clustering results.4.In the DPC algorithm,the process of calculating local density,distance and decision value of the sample points is data parallelism.Thereby this paper proposes a parallel GPU-based DPC algorithm by calculating local density,distance and decision value of the sample points parallel Experiments show that the parallelized DPC algorithm has higher speedup.
Keywords/Search Tags:K-means, density clustering, parallel computing, GPU
PDF Full Text Request
Related items