A Study On Parallelization Strategy Of Distributed Deep Learning

Posted on:2022-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Wan

Full Text:PDF

GTID:2518306602993289

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the great expansion of the Internet in recent years,huge amount of data is produced on the Internet every day.Thanks to the abundance of sample resources,deep learning algorithms has been developed rapidly,reaching higher accuracy than traditional machine learning algorithms in fields such as image classification,target detection,and speech recognition.As deep learning application scenarios are becoming more specific and customized,the complexity of deep neural network models increases.Using a single computing device to train large-scale complex networks with large datasets will consume arduous amount of time.Therefore,it is very necessary to use multiple computing devices for distributed training to improve the efficiency of training deep neural networks.There are different strategies for distributed training of deep neural networks such as model parallel and data parallel.Traditional model pipelining strategy assigns network layers to different devices and reduces communication overhead by overlapping computing time and communication time.However,this strategy ignores the existence of branches and shortcuts in deep neural network model.Hence when splitting the model,it is difficult to ensure uniform distribution which leads to an increase in iteration time.Researchers usually use synchronous method to update the model parameters of computing devices when applying data parallelism.At the end of each iteration cycle,the parameter server will communicate with each computing device to collect gradients.During this process,the device with the longest iteration time will become a straggler,which will affect the overall iteration time.In this thesis,we make an improvement of the traditional model pipelining strategy.First,we use the computation graph to describe the deep neural network model.Then we use graph-based local search to partition the neural network model and create model parallel pipeline.In order to minimize the communication overhead and ensure load balancing between model segments at the same time,we use genetic algorithm to find the best partitioning result.Simulation results show that for specific deep neural networks,the improved strategy we proposed can shorten the iteration time of synchronous model pipeline by 5-19%,increase the efficiency of asynchronous model pipeline by up to 15%,and reduce the communication injected into the network by up to 35%.We also propose a device-aware hybrid parallel strategy to solve the problem of straggler.The strategy is based on grouped hybrid parallel,devices between groups will perform synchronous data parallel while devices within groups will perform model pipelining.Within each group,the strategy reduces the impact of slower devices by assigning model segments according to device computing capacity.Experiment results show that the device-aware hybrid parallel strategy we proposed reduces the overall iteration time by an average of about 21% compared with simple synchronous data parallel strategy when the cluster consists of devices with different computing capacity.

Keywords/Search Tags:

deep learning, parallelization, distributed system

PDF Full Text Request

Related items

1	Research And Implement Of Distributed Deep Learning System Based On Spark
2	Research On Parallelization Of Point Cloud Classification Based On Deep Learning
3	Deep Neural Network Model Parallelization And Optimization Based On GPU
4	Performance Optimization Of Distributed Machine Learning Cluster System
5	Research And Application Of GPU Scheduling Strategy And Task Parallelization Method On Deep Learning Cloud Platform
6	Research Of Fined-grained Layer-wise Parallelism Strategy For Deep Learning Model On Many-core Platform
7	Research On Parallelization Of Deep Learning Algorithms Based On GPU
8	Multi-view 3D Reconstruction Algorithm And Parallelization Research Based On Deep Learning
9	Algorithm And System For Distributed Deep Ensemble Learning And Architecture Search
10	Parallel Computing Of Deep Learning Based On Hadoop