The Problem In Training Neural Networks With Limited Resources

Posted on:2020-08-16

Degree:Master

Type:Thesis

Country:China

Candidate:J M Ye

Full Text:PDF

GTID:2428330596475100

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Going deeper and wider in neural networks is an effective way to achieve higher accuracy,while the limited memory resources pose the major bottleneck for improvement.To alleviate this problem,this paper provides two approaches.The first is to design compact neural networks to reduce the model's computation overhead.The second is to reduce the scheduling overhead brought by the training system,offering more available memory for training.The inefficiency and redundancy of the fully-connected layer are notorious in Convolutional Neural Networks(CNN),but it is still one of the basic building blocks in Recurrent Neural Networks(RNN).The fully-connected input layer in RNN will hinder it from achieving higher accuracy,especially dealing with the high-dimensional input data.We analyzed the low-rank matrix decomposition and the tensor decomposition,and found that the tensor decomposition has advantages in terms of the parameter size and model expressive power.By integrating the Block-Term tensor decomposition with RNN and substituting the low-rank tensor multiplication for the inefficient matrix multiplication,essentially replacing the original fully-connected design with the sparse connection,we finally propose the new compact BT-RNN model.Comparing with the existing low-rank approaches such as Tensor-Train RNN(TT-RNN),BT-RNN is not only more concise(the more flexible hyper-parameters settings)but also able to attain a better accuracy with much fewer parameters.In the video classification task and image captioning task,BTRNN outperforms the traditional RNN model and TT-RNN model both in model accuracy and convergence speed.Specifically,BT-LSTM achieved an accuracy improvement over15.6% with fewer parameters in the video classification task comparing with the traditional LSTM.To train a giant model,the common practice is employing data parallelism or model parallelism and nontrivially dissecting the data or the model across multiple devices respectively,but also incurring excessive communications that drastically deteriorates the performance.To tackle this issue,we present a novel dynamic memory scheduling runtime to enable the network training far beyond the GPU memory capacity,including 3memory optimizations: 1)Liveness Analysis,enabling different tensors to reuse the same physical memory at different time partitions;2)Unified Tensor Pool,being integrated with the asynchronous data transfer and LRU caching strategy,reserving sufficient memory with a little communication overhead;3)Cost-Aware Recomputation,greatly reducing the memory overhead with a minor amount of extra forward computations.The proposed runtime guarantees sufficient memory while accelerates the training speed,opening new opportunities to explore deeper and wider neural architectures.

Keywords/Search Tags:

Convolutional Neural Networks, Recurrent Neural Networks, Tensor Decomposition, Memory Management

PDF Full Text Request

Related items

1	Research On Relationships Between Tensor Networks And Neural Networks
2	Research On Sign Language Recogniton Method Based On Convolutional Neural Networks And Recurrent Neural Networks
3	Research On Dynamic Emotion Recognition Based On Spatial-Temporal Neural Networks
4	Research On Malware Behavior Detection Technology Based On Recurrent Neural Network
5	The Analysis Of Stability And Passivity For Recurrent Neural Networks
6	Improved Recurrent Neural Networks And Its Application In Chinese Language Processing
7	Research Of Neural Network Based On Tensor Ring
8	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
9	Research And Implementation Of Sign Language Recognition Algorithm Using Deep Learning Networks
10	Research And Implementation Of 3D Objects Reconstruction Based On Recurrent Neural Networks