Optimization Of Memory And Communication For The Pipeline Parallelism

Posted on:2023-08-24

Degree:Master

Type:Thesis

Country:China

Candidate:W Jiang

Full Text:PDF

GTID:2568307169983459

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The deep learning neural network(DNN)architecture has been successfully applied in many application such as computer vision,speech recognition and natural language processing.At present,the neural network model is developing deeper and wider to obtain better correctness and robustness.As a result,the training of neural networks has become increasingly difficult,which is reflected in the rapidly increasing demands of memory capacity and computing capacity.The limited physical memory capacity of the existing hardware device cannot match the memory requirements of the training process,and the limited computing capacity leads to long training time.At present,the distributed parallelism is an effective scheme to train large-scale neural network,which deploys multiple computing devices on one machine,or deploys multiple machines as a cluster.The distributed parallelism scheme collects the memory capacity and computing capacity of multiple machines to accommodate largescale network models and accelerate the training process.The pipeline parallelism scheme is one of the distributed parallelism scheme,which has the significantly advantage in the training speed,but it exists the problems of high memory load and high communication overhead.This paper analyzes the memory load and communication overhead of the pipeline parallelism scheme,and proposes corresponding solutions.The contributions are as follows:1.Proposing a data transfer mechanism.The data transfer mechanism can be applied to the pipeline parallelism scheme.This mechanism can temporarily offload data in computing nodes to other memory devices and retrieve the data back when the data is needed,which avoids excessive high memory load.This paper implements the data transfer mechanism and apply it in a mature pipeline parallelism scheme(Pipe Dream).In experiments,the data transfer mechanism significantly reduces the peak memory load of the Pipe Dream scheme.2.Proposing an optimized pipeline parallelism scheme: Pipe FB.The Pipe Dream scheme applying the data transfer mechanism brings a large amount of communication,which results the pipeline lost a lot of training speed.This paper proposes the Pipe FB scheme for solving this problem.The Pipe FB scheme deploys the forward propagation and backward propagation of neural network to different computing nodes in pipeline,which optimizes the communication mode of pipeline.This change makes the Pipe FB scheme applying the data transfer mechanism only brings small impact on training speed,which can significantly reduces the peak memory load and keep the training speed loss within a small range.

Keywords/Search Tags:

neural network training, distributed parallelism, pipiline parallism, data transferring mechanism

PDF Full Text Request

Related items

1	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
2	The Research On Key Technologies Of DNN Hybrid Parallel Training
3	An Algorithm For Training Back-propagation Neural Networks Based On Data Parallelism
4	Image Classification Method Based On Deep Learning And Accelerated Training Technique
5	Optimization Of Distributed Training Strategies For Deep Learning Networks
6	Research On Deep Neural Network Acceleration Method Based On Data Parallelism
7	Optimizations For Data Path In Parallel And Distributed Neural Network Training
8	Communication Optimization Technique For Distributed Synchronous Data Parallel Training
9	Research And Implementation Of FPGA Accelerated Convolutional Neural Network Training
10	Research And Implementation Of Data Parallel Training Optimization Methods For Deep Learning Models