Parallel Design And Implementation Of LDA Algorithm Based On GPU

Posted on:2014-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:H L Wen

Full Text:PDF

GTID:2248330398972100

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the information technology and the Internet, we are facing the rapid expansion of network information. So it is a research focus in the field of natural language processing to select target information from massive text information quickly. Text clustering is a basic natural language processing technology. In the field of text clustering, LDA(Latent Dirichlet Allocation) algorithm is a topic model clustering. LDA algorithm clusters text information according to the topics found in the text, so it can effectively improve the quality of clustering results. However, in the practical application, the LDA program runs very slow when calculating large-scale data. The reason is that all data is calculated with the same control logic repeatedly on CPU. In the paper, the LDA program is parallel designed and accelerated with parallel computing technology.GPU(Graphics Processing Unit) is developing fast, with the strong parallel processing ability and programmable pipeline, it is very suitable for high performance parallel numerical computing. GPU provides a good platform for not only graphic processing, but also general computing tasks. GPU based general computing has become a hot topic in the field of high performance computing.CUDA(Compute Unified Device Architecture) is a new hardware architecture as well as programming model of GPU parallel computing develop by NVIDIA. In the CUDA programming model, GPU is a data parallel computing device. The CUDA programming language is a expanded C language, which can be used to develp kernal functions. The kernal function calls GPU to perform parallel computing, with the multi-level memory in the GPU hardware, it is quite efficient to read or write data, so the execution time of the kernal function is often very short. Parallel design and implementation of LDA algorithm with CUDA programming model can make full use of GPU parallel computing ability, and get a Good acceleration effect.This paper introduces the LDA program based on MapReduce model in Mahout machine learning library. The MapReduce model is designed for distributed computing, and can be run on Hadoop clusters. Then we find out the serial codes that have a lot of calculations. and research to parallelize these codes. Finally, we implement the parallel program with CUDA programming model, and make the LDA program accelerated by GPU. The experiment shows that, with powerful parallel computation ability of GPU, we can accelerate LDA text clustering program based on MapReduce model greatly. This research has a certain reference significance for applying GPU to other algorithms in data mining.

Keywords/Search Tags:

text clustering, LDA algorithm, GPU parallelcomputing, CUDA

PDF Full Text Request

Related items

1	Reversible Logic Synthesis Method Based On Genetic Algorithm And Its Cuda-Dased Parallel Implementation
2	CUDA-based Parallel K-Means Algorithm Of Text Clustering
3	Optimization Of CUDA-based Parallel SOM Algorithm And Its Application
4	Research On High-dimensional Text Data Clustering Algorithms And Parallel Design
5	Parallel Auto-clustering Algorithm Based On CUDA And GEP
6	Accelerating Clustering Algorithm On The Cuda Graphics Processor
7	Parallel Design And Implementation Of AP Clustering Algorithms Based On CUDA
8	Image Segmentation Based On Normalized Cut And CUDA Parallel Implementation
9	Research, Design And Application Of Clustering Algorithm Using Mapreduce
10	Research Of Text Clustering Based On NMF Algorithm