Design And Implementation Of A Parallel CNN Framework Based On Heterogeneous Computing

Posted on:2017-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Peng

Full Text:PDF

GTID:2348330485986051

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the continuous development of deep learning and heterogeneous computing, deep learning based on heterogeneous computing has made significant progress in various fields, such as image recognition and speech recognition. Up to now, deep learning is the most intelligent learning method, and CNN is the most important model in deep learning. CNN has great research significance and commercial value. Since the powerful computing power of GPU is being dug out, GPU is now widely used to train CNN.But due to the problems, such as the CNN's long training time and the requirement for tremendous memory to load a single copy of the model(A single copy can easily beyond a GPU card's memory, or even the memory of a GPU server consisted of multiple GPU cards), it is necessary to use multiple GPU servers clusters to train CNN parallely. This paper will describe the method of GPU cluster parallel training focusing on the determination of the model segmentation scheme and data parallel number.Firstly, we study the existing CNN parallel method, and then a optimization scheme of model parallel and data parallel has been proposed. Based on this scheme, the architecture of the parallel CNN framework using heterogeneous computing is designed. The parallel architecture is based on the classical Master/Slave architecture. The Master is the scheduler, it is mainly responsible for calculating the optimization scheme and scheduling the computational tasks. And Slave includes W-slave and P-slave. W-salve is the carrier of the actual CNN training task, and P-slave is the parameter server that is responsible for updating parameters. Then, the design and implementation of the optimization scheme of model parallel and data parallel is introduced.The main work of this paper is as follows:1. A optimization scheme aiming to figure out the optimal number of model segmentation and data parallelism based on hardware facilities is proposed with the knowledge of current parallel scheme of deep learning.2. Design a Master/Slave architecture of CNN parallel framework based on heterogeneous computing. And asynchronous updating mode is used in parameter updating.3. The optimization scheme is designed and implemented. It is to determine the number of model segmentation and data parallelism in the case of fixed hardware facilities.Finally, the optimization scheme is tested by a simulation. It includes two kinds of situations: single GPU card is able to load a copy of a single model and otherwise. The simulation shows that the optimization scheme can find the optimization number of child model number of one model and model number, and find the maximum training estimate time speedup in the case of fixed hardware facilities. So we can use this parallel scheme in actual training to reduce the training time.

Keywords/Search Tags:

Heterogeneous computing, CNN, Model parallel, Data parallel

PDF Full Text Request

Related items

1	Research Of Parallel Computing On CPU/GPU Heterogeneous Architecture
2	Key Techniques Research On Multi-device Cooperative Parallel Computing For New-type Heterogeneous Many-core Systems
3	Parallel Computing Scalability Studies And Applications On The Distributed Memory Environments
4	Parallel Computing Model And To Schedule Tasks In A Heterogeneous Environment
5	IPv4/IPv6 Translator Parallel Processing Based On Heterogeneous Computing Technology
6	Parallel Program Execution Model On Data Communication Optimization
7	Research On MapReduce Parallel Programming Model In The Cloud Computing
8	Research And Application Of Multi-GPU Parallel Computing Based On OpenCL
9	The Design And Implementation Of CPU‐GPU Heterogeneous Parallel Computing System
10	Research On Parallel Computing Model For CPU/GPU Heterogeneous System