Large-Scale RNNLM System Based On Spark

Posted on:2017-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:K Q Li

Full Text:PDF

GTID:2308330503964112

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Natural language processing as an important issue in artificial intelligence which is the focus of research and development; the recurrent neural network language model(RNNLM) is powerful and robust, but because of the limitations of the conventional computing technology and computing systems it is difficult to build a large-scale RNNLM system, which restricts the veracity of RNNLM.With the analysis of existing RNNLM system which is serial and based on GPU, we designed RNNLM structure for big data in the Spark platform to solve the factors about the calculation of the traditional RNNLM system.RNNLM changed to improve system performance by increasing the speed matrix calculation mode, simulate the parallelism of biological neural networks, we design RNNLM based on the parallel neurons which uses logical units of neurons to achieve RNNLM distribute, thus a large matrix distributed logic operation into a first-order neurons operation, which greatly improves the efficiency of the RNNLM for building large-scale RNNLM foundation. Through testing, we use the computational framework of Spark to optimize RNNLM system, N x M matrix split into various computing nodes, each neuron only needs to calculate a data line to migrate a large amount of computation to node which will be greatly reduced time cost. The performance of the system will improve 20 times, when we increase the data, the system also adapt it.Then we analyzes the factors that restricting computing performance of RNNLM system based on distributed platforms Spark, design a strategy based on the combined parameters broadcast transmission, a strategy based on fault tolerance and a strategy based on NVM memory optimization mechanism for distributed RNNLM from improving communication parameters of the RNNLM system based on distributed platforms Spark to improve efficiency and the performance of RNNLM. The performance of the system will improve 7-15 times.Finally, we design a prototype system of large-scale distributed RNNLM in the Spark platform, we use the data sets of Microsoft and the data of RNNLM Toolkit sets to do some performance test to compare the traditional system and RNNLM system based on distributed platforms Spark. The test results show that the RNNLM system based on distributed platforms Spark after optimizing the structure will break the terms of the number of neurons and bottlenecks in terms of large-scale corpus, the system of large-scale distributed RNNLM in the Spark will improve more than 10 times performance, and not because the corpus expanded exponentially cause the system running time linearly increased exponentially, greatly improved system availability RNNLM..

Keywords/Search Tags:

Deep Learning, Recurrent Neural Network Language Modeling, Distributed System, Spark

PDF Full Text Request

Related items

1	Asynchronous RNNLM System Based On Distributed Neuron Proliferation
2	Deep Learning Based Spoken Language Identification
3	Research On Sign Language Recognition Method Based On Deep Learning Algorithms
4	Research Of Sign Language Recognition Based On Deep Learning And Keyframe Extraction
5	Video-based Sign Language Recognition With Deep Learning
6	Action Recognition Of Video Based On Distributed System Of Deep Learning
7	Deep Learning Driven Sign Language Translation System Based On Smart-watch
8	Research And Implement Of Distributed Deep Learning System Based On Spark
9	From Code To Natural Language: Type-aware Sketch-based Seq2seq Learning
10	Research On Sign Language Recognition Algorithm Based On Deep Learning