Font Size: a A A

A Study On Parallelization Strategy Of Distributed Deep Learning

Posted on:2022-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WanFull Text:PDF
GTID:2518306602993289Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the great expansion of the Internet in recent years,huge amount of data is produced on the Internet every day.Thanks to the abundance of sample resources,deep learning algorithms has been developed rapidly,reaching higher accuracy than traditional machine learning algorithms in fields such as image classification,target detection,and speech recognition.As deep learning application scenarios are becoming more specific and customized,the complexity of deep neural network models increases.Using a single computing device to train large-scale complex networks with large datasets will consume arduous amount of time.Therefore,it is very necessary to use multiple computing devices for distributed training to improve the efficiency of training deep neural networks.There are different strategies for distributed training of deep neural networks such as model parallel and data parallel.Traditional model pipelining strategy assigns network layers to different devices and reduces communication overhead by overlapping computing time and communication time.However,this strategy ignores the existence of branches and shortcuts in deep neural network model.Hence when splitting the model,it is difficult to ensure uniform distribution which leads to an increase in iteration time.Researchers usually use synchronous method to update the model parameters of computing devices when applying data parallelism.At the end of each iteration cycle,the parameter server will communicate with each computing device to collect gradients.During this process,the device with the longest iteration time will become a straggler,which will affect the overall iteration time.In this thesis,we make an improvement of the traditional model pipelining strategy.First,we use the computation graph to describe the deep neural network model.Then we use graph-based local search to partition the neural network model and create model parallel pipeline.In order to minimize the communication overhead and ensure load balancing between model segments at the same time,we use genetic algorithm to find the best partitioning result.Simulation results show that for specific deep neural networks,the improved strategy we proposed can shorten the iteration time of synchronous model pipeline by 5-19%,increase the efficiency of asynchronous model pipeline by up to 15%,and reduce the communication injected into the network by up to 35%.We also propose a device-aware hybrid parallel strategy to solve the problem of straggler.The strategy is based on grouped hybrid parallel,devices between groups will perform synchronous data parallel while devices within groups will perform model pipelining.Within each group,the strategy reduces the impact of slower devices by assigning model segments according to device computing capacity.Experiment results show that the device-aware hybrid parallel strategy we proposed reduces the overall iteration time by an average of about 21% compared with simple synchronous data parallel strategy when the cluster consists of devices with different computing capacity.
Keywords/Search Tags:deep learning, parallelization, distributed system
PDF Full Text Request
Related items