Font Size: a A A

Parallel Design And Implementation Of LDA Algorithm Based On GPU

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:H L WenFull Text:PDF
GTID:2248330398972100Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the information technology and the Internet, we are facing the rapid expansion of network information. So it is a research focus in the field of natural language processing to select target information from massive text information quickly. Text clustering is a basic natural language processing technology. In the field of text clustering, LDA(Latent Dirichlet Allocation) algorithm is a topic model clustering. LDA algorithm clusters text information according to the topics found in the text, so it can effectively improve the quality of clustering results. However, in the practical application, the LDA program runs very slow when calculating large-scale data. The reason is that all data is calculated with the same control logic repeatedly on CPU. In the paper, the LDA program is parallel designed and accelerated with parallel computing technology.GPU(Graphics Processing Unit) is developing fast, with the strong parallel processing ability and programmable pipeline, it is very suitable for high performance parallel numerical computing. GPU provides a good platform for not only graphic processing, but also general computing tasks. GPU based general computing has become a hot topic in the field of high performance computing.CUDA(Compute Unified Device Architecture) is a new hardware architecture as well as programming model of GPU parallel computing develop by NVIDIA. In the CUDA programming model, GPU is a data parallel computing device. The CUDA programming language is a expanded C language, which can be used to develp kernal functions. The kernal function calls GPU to perform parallel computing, with the multi-level memory in the GPU hardware, it is quite efficient to read or write data, so the execution time of the kernal function is often very short. Parallel design and implementation of LDA algorithm with CUDA programming model can make full use of GPU parallel computing ability, and get a Good acceleration effect.This paper introduces the LDA program based on MapReduce model in Mahout machine learning library. The MapReduce model is designed for distributed computing, and can be run on Hadoop clusters. Then we find out the serial codes that have a lot of calculations. and research to parallelize these codes. Finally, we implement the parallel program with CUDA programming model, and make the LDA program accelerated by GPU. The experiment shows that, with powerful parallel computation ability of GPU, we can accelerate LDA text clustering program based on MapReduce model greatly. This research has a certain reference significance for applying GPU to other algorithms in data mining.
Keywords/Search Tags:text clustering, LDA algorithm, GPU parallelcomputing, CUDA
PDF Full Text Request
Related items