Font Size: a A A

Research On GPU Memory Optimization For Deep Learning

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:W W WuFull Text:PDF
GTID:2428330575462063Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of rapid development of artificial intelligence,the application of deep learning algorithms is becoming more and accuracy and performance requirements of deep learning models are more higher.For training deepening neural networks and multi-featured big data,the accuracy of the algorithm is sacrificed while sacrificing the computational performance of the algorithm.So,the performance of the training model becomes the major development limitation.GPU high-speed parallel acceleration devices and acceleration libraries,such as CuDnn optimize deep learning performance,as the implement of many optimination methods,how to use the GPU's limited memory space to achieve its efficient is a big challenge.Based on the research of GPU memory architecture,considering the data transfer consumption,this thesis proposes a tensor exchange algorithm to optimize GPU memory performance for convolution-based neural network model.The main works of this thesis are as follows:The programming methods and GPU memory parameters of the optimized progra-m memory access are obtained,including cache feature parameters,cache strategy,globalme mory throughput and memory access latency at all levels.This thesis combines the fi-ne grained method with the assembly syntax benchmarking method,which can more fullydis cover the memory usage limitation factors at all levels,and provide important charact-eri stic parameters for the next data transfer model.A method for evaluating the performance of GPU microarchitecture for deep lear-nin g is proposed.Based on the memory characteristic parameters,data transfer model andco mputation consumption model on GPU is studied,including the peak estimation modelbas ed-Roofline,the data transfer model between GPU-CPU based on LogGp and the act-ual performance by precalculating.Combining with the GPU experimental platform to m-odi fy the evaluation model.The deep learning process is quantified from the perspective of theoretical analysis and actual evaluation,which provides a basis for the next step in me mory optimization of the TensorFlow framework.A data stream exchange method based on data operation cost is proposed.Comb-ined with the previous computational cost model research,the cost evaluation module is adde d to the existing data stream exchange model TFLMS with the support of theoreticalmod el and actual operational performance.The fusion operation strategy is used to opti-mize the calculation graph,and the forward and backward search methods are used to i-mplem ent the exchange strategy,which makes up for the performance defects of the TF-LMS model.In this thesis,the data computation cost model and the data stream exchange algorithm are verified by the Cuda programming model and the TensorFlow framework.It is found in experiments that the data transfer model of this thesis can evaluate the actual situation well;the computational model is used to quantify the deep learning algorithm to obtain the theoretical peak and the actual operation time;the cost-based tensor exchange algorithm in this thesis not only utilizes the TFLMS framework to expand the system memory advantage,but also guarantees the memory access performance,the average performance 4.6% better than the original framework.
Keywords/Search Tags:GPU micro-benchmarking, Memory optimization, Performance evaluation
PDF Full Text Request
Related items