Font Size: a A A

Research On Memory Reuse And Optimization Methods For Deep Learning System

Posted on:2020-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2428330590458365Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In deep learning,GPUs are usually used to accelerate the training process of deep neural networks.However,the limited physical memory of the GPU means that it is difficult to train large-scale deep neural network models.Existing memory optimization methods,including recompute and CPU-GPU transfer,can not achieve ideal training performance by applying single optimization to all layers of neural network.Because they ignore an important feature: the data transfer costs and computational costs of different layers in the neural network are inconsistent.To overcome the shortcomings of existing optimization methods,a layer-based memory reuse and optimization method,Layup,is proposed,which contains two strategies.First,according to the characteristics of asynchronous concurrent execution,the layers in the neural network are divided into two categories: compute-sensitive and transfer-sensitive by analyzing the transfer and recompute costs of different layers in the neural network.Different types of layers use different optimization strategies,the CPU-GPU transfer method is used to optimize the feature map of the compute-sensitive layer,and the recompute method is used to optimize the feature map of the transfer-sensitive layer.At the same time,the pipeline parallel method is used to overlap the data transfer and the calculation processes to further reduce the training performance overhead of the neural network.Secondly,by analyzing the memory usage in the training process,a memory reuse strategy for multiple intermediate data is proposed.The memory of gradient map data is reused in the way of sliding window.Based on the layer-wise computing characteristics of the neural network,the convolutional workspace and cuDNN handle data are reused layer by layer,further reducing the memory usage of deep neural network.The above method is implemented on Caffe system and tested on two different gpus.Experiments show that Layup can significantly reduce the memory usage of deep neural networks while maintaining low performance overhead.The memory consumption in training is reduced by up to 92%,while the performance overhead on the tested neural networks is only 12% on average.It even can train ResNet with 2500 layers using 12 GB memory(batch size = 16),which outperform SuperNeurons with a 30% improvement,further increasing the scale of extra-deep network models on a single GPU.
Keywords/Search Tags:Deep learning, Deep neural network, GPU, Memory management
PDF Full Text Request
Related items