Font Size: a A A

Design And Implementation An Of Document Clustering Algorithm Based On The GPU

Posted on:2014-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:E X LiFull Text:PDF
GTID:2248330398472108Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, mobile Internet and enterprise information technology, there has been an increasing number of information that are stored in the form of text. How to obtain valuable information from these data has become a challenge in computer science and technology fields. Text clustering is a key text mining methods which can find protential relationship between texts and partition the texts into several groups automatically. There are many clustering algorithms that can be used in document clustering. Spectral clustering algorithm is one of them, and it is a clustering algorithm based on spectral graph theory. The greatest advantage of spectral clustering is the ability to handle arbitrary shape sample space and to converge to the global optimal solution.With the the rapid development of multi-core and many-core processors, parallel algorithms have become one of the most important means to improve the efficiency of program. On the other hand, GPU can provide powerful parallel capacity and high bandwidth. And with the help of CUDA platform, programmers can write GPU-based parallel programs much more easily. This paper mainly studies how to use the CUDA to write parallel spectral clustering algorithm, and apply it on text clustering to improve the efficiency of clustering algorithm.On the parallel implementation section of this article, spectral clustering algorithm is firstly divided into CPU-side steps and GPU-side steps with fully consideration for the complexity of each step and the communication between CPU and GPU. There are four steps which are parallel implemented on GPU, including similarity matrix computing, Laplace matrix computing, Lanczos method and K-means clustering algorithm. The main time complexity comes from linear algebra computing in these four steps. We parallel these linear algebra computing with CUDA basic linear algebra subprograms and make reference to the existing matrix-vector operations with CUDA. At last the parallel programs are fully optimized according to the architectural features of the GPU and the features of CUDA.We test our implementation on multiple data sets on the Geforce GTX260, the test results show that parallel text clustering algorithm implemented in this article achieves good acceleration.
Keywords/Search Tags:document clustering, spectral clustering, CUDA, parallel computing
PDF Full Text Request
Related items