| Under the background of big data,mining valuable information quickly poses a great challenge to the speed of computing.With the rapid development of computer hardware and software,high performance computing has been widely used in data mining.CUDA-based GPU heterogeneous computing and Spark distributed computing are two popular directions in the field of high performance computing.Based on CUDA and Spark,this paper does some researches on the parallel computing acceleration of classical clustering and classification algorithms.In this paper,three algorithms,Kmeans,K nearest neighbor and DBSCAN,are selected to study the performance bottleneck of each algorithm in serial computation.According to the hardware and software characteristics of CUDA and Spark,the parallel computing acceleration scheme and optimization strategy are designed and implemented.The main work is as follows:1.The parallelism of Kmeans: The cluster center is saved in the GPU’s constant memory to speed up access.The distance calculation module is expanded in parallel.For centroid update module,atomic add provided by CUDA is implemented in parallel on GPU.The experiment result shows that the parallel algorithm of Kmeans based on CUDA significantly improves the computing efficiency.And the larger the K value is,the better the parallelism effect is.Compared with CPU,the acceleration ratio of serial operation can reach up to 414 times.On the Spark platform,broadcasting centroid variables to worker nodes reduces data requests.Caching data in memory reduces disk IO,speeding up the iteration process.The result shows that the running time of the algorithm decreases and tends to be stable,which is consistent with Amdahl’s law.Compared with the kmeans algorithm in Spark MLlib,the computational efficiency of the parallel design in this paper is improved in the case of a smaller k value.Cublas library is used to accelerate matrix multiplication.2.The parallelism of K nearest neighbor: On the CUDA platform,the distance calculation module is split to reduce the memory consumption and redundant calculation.The insertion sort algorithm is optimized.An ordered array of k size is maintained to reduce unnecessary computational branching.Experiments show that the distance calculation based on Cublas is faster than global memory.The acceleration ratio of serial computation with CPU is up to 237 times.In the case of various parameters,the algorithm still maintains a good acceleration ratio.For the k-nearest neighbor algorithm on Spark,training sets are broadcasted to each node.All kinds of efficient operators are designed to accelerate the prediction.3.The parallelism of DBSCAN: For CUDA platform,adjacency list index is designed in the process of neighborhood construction to reduce the memory resource consumption of adjacency list and increase the memory access efficiency.Width-first search is used in parallel computation of cluster recognition.Although,the data transmission between GPU and CPU is increased,but the poor parallelism problem of DBSCAN is effectively solved.The parallelism of the algorithm is improved and the overall performance is accelerated.The experiment results show that the speed up ratio of DBSCAN on GPU and CPU is up to 222 times,which is significant on large data sets. |