Font Size: a A A

Model Sharing For GPU-Accelerated DNN Inference In Big Data Processing System

Posted on:2022-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q H ChenFull Text:PDF
GTID:2518306479993879Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
In recent years,big data processing systems have been developed rapidly.Based on MapReduce and its improved programming model,it only requires users to write data processing logic code within a specific function of the main program to realize a distributed computing program,which effectively reduces the difficulty of users to write distributed programs and is widely used by academic and industrial communities.Meanwhile,Deep Neural Networks(DNN)technology has been widely used in the fields of intelligent analysis of video image data and speech recognition due to its excellent feature extraction capability.In general,DNN model have data parameters and require a large amount of computation to perform one inference,which takes a long time using CPU alone,and usually requires the high concurrency and high-speed floating-point computation capability of Graphics Processing Unit(GPU)to accelerate the inference.For DNN inference on large-scale data,using the extended version of GPU of big data processing system for processing is a reasonable solution.Based on the task parallelism model of big data processing systems,scheduling multiple inference tasks to use GPU resources in parallel is an effective way to improve GPU utilization.However,this will inevitably load multiple read-only DNN models into the GPU memory,causing a huge GPU memory burden.When the GPU memory becomes the bottleneck,the number of DNN inference tasks that can be run on each GPU is limited,and the computational resources of the GPU cannot be fully utilized,thus limiting the inference performance of the system.To address the GPU memory overhead problem,in a single-node environment,this thesis proposes a model sharing method for a single GPU card that enables model data in GPU memory to be shared among threads in the same work process of a big data processing system.The model sharing in the single-node environment assigns GPUs by specifying the device number in the code,and the method cannot use GPU devices not specified in the code in the task parallel mode of the big data processing system.For a distributed multiGPU card environment,the way to assign device numbers in the code can lead to idle GPU resources.To support model sharing in a distributed multi-GPU card environment,this thesis designs a GPU allocator that enables model sharing technique to work on each GPU card in a distributed cluster by dynamically requesting and assigning GPU device numbers.This thesis implements a distributed prototype system for performing DNN inference on traffic videos based on the Spark software platform,a big data processing system,and the GPU hardware platform,which integrates the above two optimization techniques.The main contributions of this thesis are listed in the following.·We propose a model sharing method for a single GPU card.It enables the threads in the same working process of big data processing system to share the model data in GPU memory,and effectively reduces the GPU memory overhead of DNN inference application.·We design a GPU allocator for model sharing towards multiple GPU card.It can collect and maintain the GPU resource information of the nodes in the cluster,and allocate the GPU resource equally according to the processes in the nodes,so that the model sharing technology can be effectively applied to each GPU card in the distributed cluster.·We implement a distributed prototype system for performing DNN inference on traffic videos is implemented.The prototype system integrates model sharing technology for a single GPU card and a GPU allocator that supports model sharing for multiple GPU cards to detect and track vehicles in traffic video data based on the Spark software platform,a GPU hardware platform,and DNN inference technology for big data processing systems.In summary,this thesis focuses on the problem of excessive GPU memory overhead in GPU-oriented accelerated DNN inference in big data processing systems.In this thesis,we propose a model sharing method for single GPU card and design a GPU allocator that supports model sharing of multiple GPU cards.A distributed prototype system for DNN inference on traffic videos is implemented based on the above two optimization techniques.The work in this thesis is based on the survey and analysis of existing related work.Theoretical analysis and experimental results show that the model sharing technique can effectively reduce GPU memory overhead and improve the performance of the prototype system,and the system throughput can be improved by 136%.
Keywords/Search Tags:Big Data Processing System, DNN Inference, GPU, GPU Memory, Model Sharing
PDF Full Text Request
Related items