Font Size: a A A

Large-Scale RNNLM System Based On Spark

Posted on:2017-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:K Q LiFull Text:PDF
GTID:2308330503964112Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Natural language processing as an important issue in artificial intelligence which is the focus of research and development; the recurrent neural network language model(RNNLM) is powerful and robust, but because of the limitations of the conventional computing technology and computing systems it is difficult to build a large-scale RNNLM system, which restricts the veracity of RNNLM.With the analysis of existing RNNLM system which is serial and based on GPU, we designed RNNLM structure for big data in the Spark platform to solve the factors about the calculation of the traditional RNNLM system.RNNLM changed to improve system performance by increasing the speed matrix calculation mode, simulate the parallelism of biological neural networks, we design RNNLM based on the parallel neurons which uses logical units of neurons to achieve RNNLM distribute, thus a large matrix distributed logic operation into a first-order neurons operation, which greatly improves the efficiency of the RNNLM for building large-scale RNNLM foundation. Through testing, we use the computational framework of Spark to optimize RNNLM system, N x M matrix split into various computing nodes, each neuron only needs to calculate a data line to migrate a large amount of computation to node which will be greatly reduced time cost. The performance of the system will improve 20 times, when we increase the data, the system also adapt it.Then we analyzes the factors that restricting computing performance of RNNLM system based on distributed platforms Spark, design a strategy based on the combined parameters broadcast transmission, a strategy based on fault tolerance and a strategy based on NVM memory optimization mechanism for distributed RNNLM from improving communication parameters of the RNNLM system based on distributed platforms Spark to improve efficiency and the performance of RNNLM. The performance of the system will improve 7-15 times.Finally, we design a prototype system of large-scale distributed RNNLM in the Spark platform, we use the data sets of Microsoft and the data of RNNLM Toolkit sets to do some performance test to compare the traditional system and RNNLM system based on distributed platforms Spark. The test results show that the RNNLM system based on distributed platforms Spark after optimizing the structure will break the terms of the number of neurons and bottlenecks in terms of large-scale corpus, the system of large-scale distributed RNNLM in the Spark will improve more than 10 times performance, and not because the corpus expanded exponentially cause the system running time linearly increased exponentially, greatly improved system availability RNNLM..
Keywords/Search Tags:Deep Learning, Recurrent Neural Network Language Modeling, Distributed System, Spark
PDF Full Text Request
Related items