Font Size: a A A

Design And Implementation Of Word2Vec Based On CUDA

Posted on:2016-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:C C TuFull Text:PDF
GTID:2348330488457091Subject:Engineering
Abstract/Summary:PDF Full Text Request
Word2Vec is a practical tool launched by Google at the end of 2013 for Natural Language Processing field. It can convert words in natural language into mathematical calculation in vector form. With the popularity of the Internet and the explosive growth of information, Word2 Vec faces a problem in practical application. The problem is, when a large corpus of text to be trained, Word2 Vec takes a long time to train words vector using this corpus. In this case, Word2Vec's efficiency is low.CUDA is a general parallel computing architecture launched by NVIDIA. The calculation of complex problems can be solved in a much shorter time using CUDA. In order to improve the efficiency of Word2 Vec training time, the combination of CUDA and Word2 Vec is a good idea.The thesis studies the word vector and neural network language model. Four algorithms of Word2 Vec model are analyzed, and a new parallel Word2 Vec software tool is designed and implemented based on CUDA technology.The optimization technology of CUDA program is studied. Four methods of optimizing CUDA program which are task partitioning optimization, memory access optimization, instruction flow optimization and balancing GPU resources are summarized. Examples of matrix multiplication are discussed to apply these optimization methods in the CUDA program. The test results show that the time of the optimized CUDA matrix multiplication is significantly shorter than that of CPU.Word2Vec based on CUDA is designed and implemented. The feasibility of the parallel Word2 Vec algorithm is analyzed. The task of Word2 Vec is well partitioned with CUDA. This thesis also improves algorithms of Word2 Vec with CUDA parallel technology. On this basis, with the technology of CUDA programming, Word2 Vec based on CUDA is implemented. Finally, the code of Word2 Vec based on CUDA is optimized by using CUDA optimization method.Experimental tests are presented and the results are analyzed. Using Word2Vec's two classical applications which are calculating the vector distance and vector addition, the correctness of the result of the Word2 Vec based on CUDA is verified. With seven groups of different sizes of corpora files, testing experiments are respectively done on CPU serial word2 vec and word2 vec based on CUDA. The result shows that word2 vec based on CUDA is 27.09 times faster than CPU serial word2 vec.This thesis analyzes the network model structure of Word2 Vec, summarizes the CUDA program optimization technology, designs and implements the Word2 Vec based on CUDA, and optimizes it with CUDA program optimization technology. Finally, a comparison experiment is designed to prove the correctness and efficiency of the Word2 Vec based on CUDA.
Keywords/Search Tags:Word2Vec, Natural Language Processing, CUDA, CUDA optimization
PDF Full Text Request
Related items