Font Size: a A A

Distributed Deep Learning Platform DisPyTorch

Posted on:2019-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ShiFull Text:PDF
GTID:2428330545477791Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning and artificial intelligence,deep neural network model and the problem to solve become more and more complex,and meanwhile the scale of dataset for training becomes larger and larger.To break through the limit of a single machine's computing resources,building an efficient and easy-to-use distributed deep learning platform has been a hot research topic in both the academy and industry.Most state-of-the-art distributed deep learning platforms only support Static Compu-tation Graph(SCG),and are based on the Parameter Server framework.Compared to SCG,Dynamic Computation Graph(DCG)allows the program to modify or define n-odes in graph and is able to process more complicated network models.One Single dis-tributed deep learning framework is not able to efficiently handle different application scenarios,such as the scenarios with different sizes of network model,scenarios with different number of distributed nodes and scenarios with even or uneven computing resources and so on.There has not appeared any deep learning platform that supports multiple distributed frameworks and supports DCG.This thesis designs and develops a new distributed deep learning platform DisPyTorch based on PyTorch.To handle dif-ferent kinds of application scenarios,this thesis designs and develops three distributed deep learning frameworks on DisPyTorch:MR-DisPyTorch,RA-DisPyTorch and PS-DisPyTorch.Users can choose suitable framework according to their specific need in real applications.The main contributions of this thesis are as follow:1.This thesis design and implement a distributed deep learning framework MR-DisPyTorch based on MapReduce programming model.MR-DisPyTorch adopts a syn-chronous update strategy.MR-DisPyTorch is able to handle the application scenarios where the network model is small,the number of distributed nodes are small and the computing sources of nodes are even.2.This thesis design and implement a decentralized distributed deep learning frame-work RA-DisPyTorch based on Ring Allreduce programming model.RA-DisPyTorch adopts a synchronous update strategy.RA-DisPyTorch is able to handle the application scenarios where the network model is large,the number of distributed nodes are large and the computing sources of nodes are even.3.This thesis design and implement a distributed deep learning framework PS-DisPyTorch based on Parameter Server programming model.PS-DisPyTorch supports synchronous,asynchronous and semi-synchronous update strategy asynchronous.Synchronous s-trategy is able to handle the application scenarios where the deep learning model is medium,the number of distributed nodes are medium and the computing sources of nodes are even.Asynchronous and semi-synchronous strategy are able to handle the application scenarios where the deep learning model is medium,the number of dis-tributed nodes are medium and the computing sources of nodes are not even.
Keywords/Search Tags:Deep learning, Distribted deep learning, PyTorch, Neural Network
PDF Full Text Request
Related items