A Study Of Efficient Training Approaches To Deep Learning Models

Posted on:2017-03-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:K Chen

Full Text:PDF

GTID:1108330485451624

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

In the past several years, deep leaning models have been successfully applied to many areas such as speech recognition, handwriting recognition, computer vision and natural language processing, and achieved promising results. Nowadays, the structure of deep learning models is becoming more and more complex while the amount of da-ta used to tune them is becoming larger and larger, under this circumstance, efficient training of these models has to be settled urgently. Fortunately, with the development of computing technology, especially High Performance Computing(HPC) and Graphic Processing Unit(GPU), we can now access a significant amount of computing resources, which lays a foundation for solving this problem. This thesis focuses on the new cri-terion for Rectified Linear unit(ReLU) based deep neural network(DNN) training, fast training algorithm for deep bidirectional long short-term memory(DBLSTM) recurren-t neural Network(RNN) and scalable training of deep learning models to address the above problem.Firstly, this thesis proposes to train ReLU-DNN classifier with Sample Separation Margin(SSM) based Minimum Classification Error(MCE) criterion instead of Cross Entropy(CE). Given a training sample, if all inactivated neurons, whose outputs are 0, of hidden layers are ignored, ReLU-DNN can be treated as a linear classifier. As a training criterion designed for linear and piecewise linear classifiers, SSM-MCE is directly related to training set classification error rate and the introduction of SSM can improve classifiersâ€™generalization capacity. Experimental results show that SSM-MCE performs better than CE on small to medium scale ReLU-DNNs.Secondly, this thesis proposes a Context-Sensitive-Chunk(CSC) approach to D-BLSTM training and decoding. With this approach, DBLSTM models short CSC in-stead of long sequence, which results in faster training speed and lower decoding la-tency, and lays a foundation to apply DBLSTM to real-time scenario. Experimental results of Large Vocabulary Continuous Speech Recognition(LVCSR) task show that C-SC trained model achieves same performance compared with traditional method trained one but with 3.4 times training speedup and lower decoding latency.Thirdly, this thesis proposes an Incremental Block Training(IBT) framework based on Alternating Direction Method of Multipliers(ADMM) to conduct data parallel train-ing of deep learning models. This method formulates unconstrained distributed opti-mization problem of deep learning as a global consensus problem and solves it in paral-lel. This method is implemented on HPC cluster and experimental results of 1,860-hour LVCSR task of DNN training show that it achieves comparable results with Model Av- eraging(MA) with linear speedup.Lastly, this thesis proposes a Blockwise Model-Update Filtering(BMUF) algorith-m, which treats global model update in MA as a stochastic optimization procedure, to solve the performance degradation problem when scale out. With the introduction of Block Momentum(BM), this algorithm compensates the side effect caused by averag-ing operation in MA and results in better performances. On 1,860-hour LVCSR task, it achieves linear speedup up to 64 GPUs on DNN CE training and up to 32 GPUs of DBLSTM with projection layers(DBLSTMP) CE training. On 1M-lines Handwriting Recognition(HWR) task, it achieves linear speedup up to 32 GPUs on DBLSTM con-nectionist temporal classification(CTC) training. Moreover, the models trained by this algorithm perform almost no difference or even better compared with those trained by a traditional mini-batch based stochastic gradient descent on a single GPU.

Keywords/Search Tags:

Deep Learning, Sample Separation Margin, SSM, Minimum Classifica- tion Error, MCE, Context-Sensitive-Chunk, CSC, Parallel Training, Scalable Training, ADMM, Blockwise Model-Update Filtering, BMUF, DNN, LSTM, CTC

PDF Full Text Request

Related items

1	Patent Training Sample Pruning Based On A Supervised Clustering Algorithm
2	Research On Face Recognition With Single Training Sample
3	Research On Collaborative Training Algorithm Based On Noise Filtering
4	Co-training Method Research Based On Sample Selection Strategy
5	Parallel And Distributed Training Of Deep Learning
6	Deep Learning Training Based On Clustering Algorithm Is Improved
7	Design And Implementation Of Chunk Parsing System Based On Deep Learning
8	A generalization of the minimum classification error (MCE) training method for speech recognition and detection
9	The Research On A Single Training Sample For Face Recognition Per Person
10	Comparing the effectiveness of error avoidance, error management and team-based learning programs for SAP software training