Font Size: a A A

Algorithm Design And Optimization Of Recurrent Neural Network Training On GPU Platform

Posted on:2019-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Y FengFull Text:PDF
GTID:2428330545477032Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the continuous development of science and technology and the maturity of theoretical basis,Deep Neural Network(DNN)has been widely used in many fields at present and has brought breakthroughs in respective fields.In the natural language processing(NLP)realm,compared with traditional machine learning methods and prob-abilistic models,Recurrent Neural Networks(RNNs)achieves excellent results in train-ing sequence and rapidly used in areas such as speech and natural language understand-ing.At the same time,the development of high performance processors such as multi-core processors and dedicated accelerator cards for deep learning further promoted the research and application based on the neural network model.Based on the GPU plat-form,this paper studies how to improve the accuracy and speed of the recurrent neural network model training,and designs and optimizes these two aspects to improve the practical training of speech recognition and machine translation model effect.The main contents and achievements of this paper include:(1)Aiming at speech recognition applications and their models,improve model training accuracy and training speed.By reorganizing the training data,the training speed of the model is improved;then the data normalization algorithm is improved to reduce the data drift and make it evenly distributed for the timing sequence charac-teristics.In the training process,the adjustment is combined with the adjustment of the learning rate,the number of hidden layer neurons,and the parameter updating al-gorithm.The reference method eventually improves the training accuracy within the same training phase.(2)Aiming at machine translation applications and their models,speed up model training.Based on the machine translation prediction model,through parallel optimiza-tion of the model with full use of computing resources,the original single-sentence training is grouped into a multi-sentence training without reducing its translation qual-ity,thus improving the single-sentence training speed.(3)Accelerate GPU-based neural network training speed.By writing an efficient CUDA kernel,the computational efficiency is improved.By improving the GPU mem-ory reuse rate,increasing the number of single training samples and making full use of computing resources,the training speed is accelerated.Based on GPU hardware com-puting unit,the parameters are quantized and stored in low bit form,so as to reduce the parameter storage scale,and the model is fine-tuned with mixed precision training,thus accelerating the training speed of the model.Based on the deep learning framework and the GPU platform,this paper designs and realizes the Recurrent Neural Network optimization algorithm,and makes full use of hardware features to effectively improve accuracy and training speed of the speech recognition model and machine translation model respectively,so as to lay the foun-dation for follow-up study.At the same time,all of the work in this paper has been applied to the actual application of IFLYTEK,effectively improving the accuracy of the real model and training speed,and promoting its development process.
Keywords/Search Tags:Recurrent Neural Network, GPU, Speech Recognition, Machine Translation, Parallel Computing
PDF Full Text Request
Related items