| With the fast development of computers and the Internet, the number ofInternet users keeps growing every day. Therefore, Internet users produce largeamounts of information every second. Meanwhile, the management systems of bigcompanies also produce a large amount of new data. Data mining and machinelearning algorithms provide a feasible method for extracting valuable informationfrom these data, but the learning process of these algorithms is complex and slow.Although useful information is extracted, but the timeliness has been missed.Therefore, it is crucial to accelerate the implementation process of the algorithms.Despite the fact that high-performance machines or CPU clusters can speed up theimplementation of the algorithms, companies need to take a huge amount of moneyfor investing.Currently, multi-core technology has been developed well; GPU performancehas far exceeded the performance of the CPU. To use the multi-core features ofGPU, exploiting the parallelism of the algorithms becomes a hot field of scientificresearch. This paper studies how to make the SOM algorithm parallel by using theCPU and GPU collaborative environment, and how to use the CUDA platform toaccelerate the process of text data clustering.In this thesis, we start from the bottleneck of automatically clustering. We thenfocus on the SOM algorithm in the CUDA environment and the method of CUDA toaccelerate text clustering. The main contributions of this thesis are:We studied the SOM algorithm and CUDA environment to make full use ofGPU, designed and implemented a parallel SOM algorithm based on the CUDAenvironment.We used the CUDA platform to accelerate the text mining process by theparallel calculation of text feature vectors together with promising results. Wedesigned a CPU/GPU collaborative framework, which allocates reasonablealgorithms’ tasks, and implemented paralleled SOM algorithms to accelerate textclustering process. According to the experiments, CUDA platform can effectivelyaccelerate text clustering.To sum up, the thesis implemented a CUDA-based parallel text clusteringsystem using the SOM algorithm. We have used appropriate datasets to conduct acomparative study of other parallel algorithms and improved algorithms. Theexperiments show that the parallel SOM algorithm for text clustering takes full advantages of the GPU and reduces the time of the clustering in a significant rate. |