Distributed Deep Learning Platform DisPyTorch

Posted on:2019-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Shi

Full Text:PDF

GTID:2428330545477791

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of deep learning and artificial intelligence,deep neural network model and the problem to solve become more and more complex,and meanwhile the scale of dataset for training becomes larger and larger.To break through the limit of a single machine's computing resources,building an efficient and easy-to-use distributed deep learning platform has been a hot research topic in both the academy and industry.Most state-of-the-art distributed deep learning platforms only support Static Compu-tation Graph(SCG),and are based on the Parameter Server framework.Compared to SCG,Dynamic Computation Graph(DCG)allows the program to modify or define n-odes in graph and is able to process more complicated network models.One Single dis-tributed deep learning framework is not able to efficiently handle different application scenarios,such as the scenarios with different sizes of network model,scenarios with different number of distributed nodes and scenarios with even or uneven computing resources and so on.There has not appeared any deep learning platform that supports multiple distributed frameworks and supports DCG.This thesis designs and develops a new distributed deep learning platform DisPyTorch based on PyTorch.To handle dif-ferent kinds of application scenarios,this thesis designs and develops three distributed deep learning frameworks on DisPyTorch:MR-DisPyTorch,RA-DisPyTorch and PS-DisPyTorch.Users can choose suitable framework according to their specific need in real applications.The main contributions of this thesis are as follow:1.This thesis design and implement a distributed deep learning framework MR-DisPyTorch based on MapReduce programming model.MR-DisPyTorch adopts a syn-chronous update strategy.MR-DisPyTorch is able to handle the application scenarios where the network model is small,the number of distributed nodes are small and the computing sources of nodes are even.2.This thesis design and implement a decentralized distributed deep learning frame-work RA-DisPyTorch based on Ring Allreduce programming model.RA-DisPyTorch adopts a synchronous update strategy.RA-DisPyTorch is able to handle the application scenarios where the network model is large,the number of distributed nodes are large and the computing sources of nodes are even.3.This thesis design and implement a distributed deep learning framework PS-DisPyTorch based on Parameter Server programming model.PS-DisPyTorch supports synchronous,asynchronous and semi-synchronous update strategy asynchronous.Synchronous s-trategy is able to handle the application scenarios where the deep learning model is medium,the number of distributed nodes are medium and the computing sources of nodes are even.Asynchronous and semi-synchronous strategy are able to handle the application scenarios where the deep learning model is medium,the number of dis-tributed nodes are medium and the computing sources of nodes are not even.

Keywords/Search Tags:

Deep learning, Distribted deep learning, PyTorch, Neural Network

PDF Full Text Request

Related items

1	Model Training Performance Analysis Of Typical Deep Learning Frameworks In The Single GPU Environment
2	On The Learning And Compression Of Deep Neural Network Structure
3	Research On Key Technologies Of Wireless Communication Physical Layer Based On Deep Learning
4	Research On Pedistrain Accessory Detection And Retriveal Thechnology Via Deep Learning
5	Research And Implementation Of Ranking Model Based On Deep Learning
6	Optimization Design For Deep Belief Network And Its Applications
7	Deep Learning Based Spoken Language Identification
8	Pedestrian Detection Re-identification Technology Based On Deep Learning
9	Research On Speaker Identification Based On Deep Learning
10	Research And Implement Of Distributed Deep Learning System Based On Spark