Distributed training of very large neural networks

Posted on:2014-04-18

Degree:M.S

Type:Thesis

University:University of California, Irvine

Candidate:Patel, Vishal R

Full Text:PDF

GTID:2458390005499731

Subject:Computer Science

Abstract/Summary:

We describe an implementation of deep learning algorithms on shared-nothing machine clusters using a distributed file system and the think-like-a-vertex programming model (the Pregel programming model). While there has been success in distributing very large neural networks across many machines in a cluster, these implementations have been limited in deployability and replication by proprietary technologies and unavailable cluster configurations. Our software, PArallel Neural Distributed Architecture (PANDA), is the first open source implementation that allows training of neural networks with millions or billions of parameters using commodity machine clusters. At its core, the neural network forward- and back-propagation are implemented in a parallel and distributed fashion using neuron centric views and message passing algorithms. This flexible and scalable approach allows for both data and model to be distributed across different machines in a cluster during training and prediction without requiring a centralized parameter server. This implementation uses the Hadoop Distributed File System and Pregelix, an open source implementation of Pregel.

Keywords/Search Tags:

Distributed, Implementation, Neural, Training

Related items

1	Research On Distributed Optimization Method Of Neural Network In Edge Computing
2	Optimizations For Data Path In Parallel And Distributed Neural Network Training
3	Research And Implementation Of Efficient Parameter Communication Technology In Distributed Deep Learning System
4	Optimal Design And Implementation Of Distributed Deep Learning Training
5	Distributed Training Optimization In Heterogeneous Clusters
6	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
7	Image Classification Method Based On Deep Learning And Accelerated Training Technique
8	Research And Implementation Of FPGA Accelerated Convolutional Neural Network Training
9	Distributed Combat Training Simulation System Research And Design
10	Research And Implementation Of Teaching Training Platform Based On Web Service