| Data mining is the technology which through the analysis of the vast amounts of data tofind out the potential, innovative, valuable information, and it has important applications inmany areas. For such vast amounts of data, the first task is to be reasonable classification.Clustering analysis is an analysis process which divides the data into similar objects, andobject has a lot of similarities in each cluster. Thus clustering analysis has a key role in datamining. K-means algorithm is the classic divide-based clustering method, which ischaracterized by a simple algorithm to fast clustering. Parallel computing is one of theeffective ways to solve massive computational problems. The development of graphicsprocessors and the constant improvement of CUDA language have provided a good platformfor parallel computing for developers.The paper first analyzes the status quo of the clustering analysis and the shortcomings ofK-means algorithm. According to the characteristics of serial K-means algorithm, weproposed a parallel implementation of the K-means algorithm based on the graphics processor.The algorithm is divided into three sections, and we put the second part which has the largestcalculation of the amount on the graphics processor to compute, in order to achieve thepurpose of rapid clustering. Further according to the principle of the memory model of thegraphics processor and CUDA code, we optimized parallel K-means algorithm which focuson merge access and shared memory. We propose a parallel K-means clustering system.Firstly, we use simulation data to test the performance of the platform, to come to the platformof the experimental acceleration performance. Secondly, we compared parallel K-meansalgorithm and the optimized algorithm proposed in this paper which use the real data, and theexperimental results showed that the optimized algorithm can improve the performance ofalmost four times. Finally, the optimized parallel K-means clustering algorithm are comparedwith other parallel K-means algorithm, the experimental results showed that our algorithm canobtain a higher acceleration performance compared with other K-means algorithm. Theresults of this study showed that the GPU-based parallel K-means algorithm can cluster massdata quickly, which is an effective way to improve cluster analysis computational speed. |