Font Size: a A A

The Problem In Training Neural Networks With Limited Resources

Posted on:2020-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:J M YeFull Text:PDF
GTID:2428330596475100Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Going deeper and wider in neural networks is an effective way to achieve higher accuracy,while the limited memory resources pose the major bottleneck for improvement.To alleviate this problem,this paper provides two approaches.The first is to design compact neural networks to reduce the model's computation overhead.The second is to reduce the scheduling overhead brought by the training system,offering more available memory for training.The inefficiency and redundancy of the fully-connected layer are notorious in Convolutional Neural Networks(CNN),but it is still one of the basic building blocks in Recurrent Neural Networks(RNN).The fully-connected input layer in RNN will hinder it from achieving higher accuracy,especially dealing with the high-dimensional input data.We analyzed the low-rank matrix decomposition and the tensor decomposition,and found that the tensor decomposition has advantages in terms of the parameter size and model expressive power.By integrating the Block-Term tensor decomposition with RNN and substituting the low-rank tensor multiplication for the inefficient matrix multiplication,essentially replacing the original fully-connected design with the sparse connection,we finally propose the new compact BT-RNN model.Comparing with the existing low-rank approaches such as Tensor-Train RNN(TT-RNN),BT-RNN is not only more concise(the more flexible hyper-parameters settings)but also able to attain a better accuracy with much fewer parameters.In the video classification task and image captioning task,BTRNN outperforms the traditional RNN model and TT-RNN model both in model accuracy and convergence speed.Specifically,BT-LSTM achieved an accuracy improvement over15.6% with fewer parameters in the video classification task comparing with the traditional LSTM.To train a giant model,the common practice is employing data parallelism or model parallelism and nontrivially dissecting the data or the model across multiple devices respectively,but also incurring excessive communications that drastically deteriorates the performance.To tackle this issue,we present a novel dynamic memory scheduling runtime to enable the network training far beyond the GPU memory capacity,including 3memory optimizations: 1)Liveness Analysis,enabling different tensors to reuse the same physical memory at different time partitions;2)Unified Tensor Pool,being integrated with the asynchronous data transfer and LRU caching strategy,reserving sufficient memory with a little communication overhead;3)Cost-Aware Recomputation,greatly reducing the memory overhead with a minor amount of extra forward computations.The proposed runtime guarantees sufficient memory while accelerates the training speed,opening new opportunities to explore deeper and wider neural architectures.
Keywords/Search Tags:Convolutional Neural Networks, Recurrent Neural Networks, Tensor Decomposition, Memory Management
PDF Full Text Request
Related items