Font Size: a A A

A Study Of Efficient Training Approaches To Deep Learning Models

Posted on:2017-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:1108330485451624Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In the past several years, deep leaning models have been successfully applied to many areas such as speech recognition, handwriting recognition, computer vision and natural language processing, and achieved promising results. Nowadays, the structure of deep learning models is becoming more and more complex while the amount of da-ta used to tune them is becoming larger and larger, under this circumstance, efficient training of these models has to be settled urgently. Fortunately, with the development of computing technology, especially High Performance Computing(HPC) and Graphic Processing Unit(GPU), we can now access a significant amount of computing resources, which lays a foundation for solving this problem. This thesis focuses on the new cri-terion for Rectified Linear unit(ReLU) based deep neural network(DNN) training, fast training algorithm for deep bidirectional long short-term memory(DBLSTM) recurren-t neural Network(RNN) and scalable training of deep learning models to address the above problem.Firstly, this thesis proposes to train ReLU-DNN classifier with Sample Separation Margin(SSM) based Minimum Classification Error(MCE) criterion instead of Cross Entropy(CE). Given a training sample, if all inactivated neurons, whose outputs are 0, of hidden layers are ignored, ReLU-DNN can be treated as a linear classifier. As a training criterion designed for linear and piecewise linear classifiers, SSM-MCE is directly related to training set classification error rate and the introduction of SSM can improve classifiers’generalization capacity. Experimental results show that SSM-MCE performs better than CE on small to medium scale ReLU-DNNs.Secondly, this thesis proposes a Context-Sensitive-Chunk(CSC) approach to D-BLSTM training and decoding. With this approach, DBLSTM models short CSC in-stead of long sequence, which results in faster training speed and lower decoding la-tency, and lays a foundation to apply DBLSTM to real-time scenario. Experimental results of Large Vocabulary Continuous Speech Recognition(LVCSR) task show that C-SC trained model achieves same performance compared with traditional method trained one but with 3.4 times training speedup and lower decoding latency.Thirdly, this thesis proposes an Incremental Block Training(IBT) framework based on Alternating Direction Method of Multipliers(ADMM) to conduct data parallel train-ing of deep learning models. This method formulates unconstrained distributed opti-mization problem of deep learning as a global consensus problem and solves it in paral-lel. This method is implemented on HPC cluster and experimental results of 1,860-hour LVCSR task of DNN training show that it achieves comparable results with Model Av- eraging(MA) with linear speedup.Lastly, this thesis proposes a Blockwise Model-Update Filtering(BMUF) algorith-m, which treats global model update in MA as a stochastic optimization procedure, to solve the performance degradation problem when scale out. With the introduction of Block Momentum(BM), this algorithm compensates the side effect caused by averag-ing operation in MA and results in better performances. On 1,860-hour LVCSR task, it achieves linear speedup up to 64 GPUs on DNN CE training and up to 32 GPUs of DBLSTM with projection layers(DBLSTMP) CE training. On 1M-lines Handwriting Recognition(HWR) task, it achieves linear speedup up to 32 GPUs on DBLSTM con-nectionist temporal classification(CTC) training. Moreover, the models trained by this algorithm perform almost no difference or even better compared with those trained by a traditional mini-batch based stochastic gradient descent on a single GPU.
Keywords/Search Tags:Deep Learning, Sample Separation Margin, SSM, Minimum Classifica- tion Error, MCE, Context-Sensitive-Chunk, CSC, Parallel Training, Scalable Training, ADMM, Blockwise Model-Update Filtering, BMUF, DNN, LSTM, CTC
PDF Full Text Request
Related items