Font Size: a A A

Research On Parallel Optimization Of Clustering Algorithms In Data Mining

Posted on:2016-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y G FanFull Text:PDF
GTID:2348330488957092Subject:Engineering
Abstract/Summary:PDF Full Text Request
Data mining is currently a hot research topic in the field of information processing and database technology, and it is also one of the technology of development prospects in the field of computer science with the big data era. Clustering analysis is one of the important research fields in data mining. K-means clustering algorithm with partition method and DBSCAN clustering algorithm using density have been widely used. At the same time, with the rapid development of GPU high performance computing, high performance computing is gradually being pushed onto the stage of history. With the incoming wave of big data, more and more developers begin to choose NVIDIA's CUDA technology to better play the GPU's ability of parallel computing. High performance parallel computing using CUDA has become a hot research.In this thesis, the concept of data mining and the background of the relevant knowledge are analyzed, then the clustering analysis technology and the common clustering analysis algorithms are studied. CUDA related technology is analyzed, and a new CUDA parallel optimization method is presented. The defects of K-means algorithm and DBSCAN algorithm are analyzed. And for two aspects of the defect for K-means algorithm, the initial clustering center selection is difficult and the efficiency is low of big data algorithm, and low efficiency of DBSCAN algorithm, the parallel optimization schemes based on CUDA are proposed for the two algorithms. And the optimization results are analyzed.The data mining, clustering analysis and CUDA platform are researched in the second chapter. Then, the third chapter describes the technology of CUDA in the practical application thought the parallel optimization for module of WSM52 D in WRF.The classical K-means algorithm is optimized by the selection of clustering center and algorithm efficiency. In the fourth chapter, a new method to find the initial cluster centers is proposed. The method uses the maximum distance between the random data points to take out the initial clustering center set, and then select the final initial clustering center by using a hybrid clustering method with the original K-means algorithm. Experiments are carried out to verify the accuracy of the method and the original method, and the results of new clustering method are more stable and accurate. Then, a new K-means algorithm used CUDA is proposed, through the optimization of shared memory and the strategy of the data using pre-reading, achieving higher speed of reading data and memory access. And the results show that the improved parallel algorithm is as high as 137× speedup when compared with a CPU implementation in the case of different sample sizes and different cluster numbers.The DBSCAN algorithm in clustering analysis is optimized by distance computation function. Combined with the CUDA technology, a new DBSCAN algorithm is proposed. In the fifth chapter, the distance computation function is improved using the method of parallel. And the use of shared memory and data set to read the pre-read data to merge processing, reducing the number of data reading and accelerating the running speed of the program. Through the experimental verification, the improved algorithm has achieved 23× speedup when compared with the traditional algorithm.In the thesis, we combine CUDA technology with the traditional clustering technology, and use parallel algorithm to rewrite the K-means clustering algorithm and DBSCAN clustering algorithm. At the same time, a new selection method of initial clustering center for K-means clustering algorithm is proposed. Experimental results show that compared with the traditional algorithm, the efficiency of the K-means algorithm and the DBSCAN algorithm are improved.
Keywords/Search Tags:data mining, clustering analysis, GPU, CUDA, K-means, DBSCAN
PDF Full Text Request
Related items