Font Size: a A A

Design And Implementation Of A Parallel CNN Framework Based On Heterogeneous Computing

Posted on:2017-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y B PengFull Text:PDF
GTID:2348330485986051Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the continuous development of deep learning and heterogeneous computing, deep learning based on heterogeneous computing has made significant progress in various fields, such as image recognition and speech recognition. Up to now, deep learning is the most intelligent learning method, and CNN is the most important model in deep learning. CNN has great research significance and commercial value. Since the powerful computing power of GPU is being dug out, GPU is now widely used to train CNN.But due to the problems, such as the CNN's long training time and the requirement for tremendous memory to load a single copy of the model(A single copy can easily beyond a GPU card's memory, or even the memory of a GPU server consisted of multiple GPU cards), it is necessary to use multiple GPU servers clusters to train CNN parallely. This paper will describe the method of GPU cluster parallel training focusing on the determination of the model segmentation scheme and data parallel number.Firstly, we study the existing CNN parallel method, and then a optimization scheme of model parallel and data parallel has been proposed. Based on this scheme, the architecture of the parallel CNN framework using heterogeneous computing is designed. The parallel architecture is based on the classical Master/Slave architecture. The Master is the scheduler, it is mainly responsible for calculating the optimization scheme and scheduling the computational tasks. And Slave includes W-slave and P-slave. W-salve is the carrier of the actual CNN training task, and P-slave is the parameter server that is responsible for updating parameters. Then, the design and implementation of the optimization scheme of model parallel and data parallel is introduced.The main work of this paper is as follows:1. A optimization scheme aiming to figure out the optimal number of model segmentation and data parallelism based on hardware facilities is proposed with the knowledge of current parallel scheme of deep learning.2. Design a Master/Slave architecture of CNN parallel framework based on heterogeneous computing. And asynchronous updating mode is used in parameter updating.3. The optimization scheme is designed and implemented. It is to determine the number of model segmentation and data parallelism in the case of fixed hardware facilities.Finally, the optimization scheme is tested by a simulation. It includes two kinds of situations: single GPU card is able to load a copy of a single model and otherwise. The simulation shows that the optimization scheme can find the optimization number of child model number of one model and model number, and find the maximum training estimate time speedup in the case of fixed hardware facilities. So we can use this parallel scheme in actual training to reduce the training time.
Keywords/Search Tags:Heterogeneous computing, CNN, Model parallel, Data parallel
PDF Full Text Request
Related items