Font Size: a A A

Research On GPU Storage Optimization For Deep Learning Applications

Posted on:2020-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:X GeFull Text:PDF
GTID:2428330590458376Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Massive computation parallelism of GPU makes it to be indispensable to train deep neural networks(DNN).As deeper and wider DNNs improve their accuracy,the memory footprint of their training becomes larger.However,the limited GPU DRAM capacity become the primary barrier to train deeper and wider network in GPU.In order to mitigate GPU memory capacity issue,the offload-prefetch and recomputation are proposed to reduce memory footprint for the training on a GPU.The offload-prefecth and recomputation can reduce the memory footprint further by trading less memory consumption with data transmission over slow PCIe bus and more computation overhead respectively.And only part of the layer's output to be retained in the GPU memory during forward propagation.However,these methods lack in-depth analysis of the data dependencies of different layers in backward computing,and lack support for the bottleneck layers.To solve this problem,the deep learning system MEDL,which fully exploit the recalculation to save GPU memory overhead,is proposed to provide memory optimization methods for non-linear and linear networks.In non-linear networks,the memory overhead of the utility layer is reduced by spatial reuse and recomputation,and the necessity of using the forward output of the tool layer as a checkpoint is eliminated.The GPU memory space occupied by the output of the utility layer can be released directly,greatly reducing the memory requirements when training neural networks using recomputation.In a linear network,the fine-grained double buffering technique reduces the memory overhead of the bottleneck layer,enabling a wider neural network to be trained in limited GPU memory.Experimental results show that the system MEDL can train deeper and wider neural networks in limited GPU memory.In the same experimental environment,MEDL's memory consumption was reduced by an average of 27.5% compared to the existing deep learning system,and with a performance improvement of 10.9%.When the batch size of the training is increased and existing deep learning system can't work,the MEDL can still work normally.
Keywords/Search Tags:Deep neural networks, GPU memory management, Recomputation, Liveness Analysis
PDF Full Text Request
Related items