Font Size: a A A

Optimization Of Memory And Communication For The Pipeline Parallelism

Posted on:2023-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:W JiangFull Text:PDF
GTID:2568307169983459Subject:Engineering
Abstract/Summary:PDF Full Text Request
The deep learning neural network(DNN)architecture has been successfully applied in many application such as computer vision,speech recognition and natural language processing.At present,the neural network model is developing deeper and wider to obtain better correctness and robustness.As a result,the training of neural networks has become increasingly difficult,which is reflected in the rapidly increasing demands of memory capacity and computing capacity.The limited physical memory capacity of the existing hardware device cannot match the memory requirements of the training process,and the limited computing capacity leads to long training time.At present,the distributed parallelism is an effective scheme to train large-scale neural network,which deploys multiple computing devices on one machine,or deploys multiple machines as a cluster.The distributed parallelism scheme collects the memory capacity and computing capacity of multiple machines to accommodate largescale network models and accelerate the training process.The pipeline parallelism scheme is one of the distributed parallelism scheme,which has the significantly advantage in the training speed,but it exists the problems of high memory load and high communication overhead.This paper analyzes the memory load and communication overhead of the pipeline parallelism scheme,and proposes corresponding solutions.The contributions are as follows:1.Proposing a data transfer mechanism.The data transfer mechanism can be applied to the pipeline parallelism scheme.This mechanism can temporarily offload data in computing nodes to other memory devices and retrieve the data back when the data is needed,which avoids excessive high memory load.This paper implements the data transfer mechanism and apply it in a mature pipeline parallelism scheme(Pipe Dream).In experiments,the data transfer mechanism significantly reduces the peak memory load of the Pipe Dream scheme.2.Proposing an optimized pipeline parallelism scheme: Pipe FB.The Pipe Dream scheme applying the data transfer mechanism brings a large amount of communication,which results the pipeline lost a lot of training speed.This paper proposes the Pipe FB scheme for solving this problem.The Pipe FB scheme deploys the forward propagation and backward propagation of neural network to different computing nodes in pipeline,which optimizes the communication mode of pipeline.This change makes the Pipe FB scheme applying the data transfer mechanism only brings small impact on training speed,which can significantly reduces the peak memory load and keep the training speed loss within a small range.
Keywords/Search Tags:neural network training, distributed parallelism, pipiline parallism, data transferring mechanism
PDF Full Text Request
Related items