Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow

Posted on:2020-12-15

Degree:Master

Type:Thesis

Country:China

Candidate:M J He

Full Text:PDF

GTID:2428330596975457

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information age and the increasing amount of big data,the analysis and utilization of big data through cloud computing platform has become a powerful tool for competition among enterprises.The best way to grasp the big data resources more effectively is to utilize the deep learning of artificial intelligence.In the process of deep learning,a good deep learning framework is indispensable.This framework must have the characteristics of high flexibility,strong portability and support for multiple languages to meet the changing application needs.The open source TensorFlow of Google is such an excellent deep learning framework.TensorFlow,as a young open source project,is widely respected for its advantages in deep learning.However,with the application of deep learning in more and more complex problem processing,the deep learning model becomes larger and larger,and TensorFlow also shows its shortcomings.It often takes several hours,days or even weeks to train a complex deep learning model iteratively,which is an unacceptable time consumption in the rapidly developing information age.Even though TensorFlow has supported distributed iterative training and improved the problem of long training time,it still fails to meet the requirements and there are still places for optimization.How to optimize the distributed training algorithm,improve the utilization of computing equipment,reduce model training time is the main goal of this thesis.In order to achieve the above goals,this thesis first delved into the underlying implementation principle of TensorFlow,and analyzed and understood the specific implementation of TensorFlow system architecture,data flow graph,session management,distributed execution,data input and so on.Then,an optimization algorithm based on two distributed modes of TensorFlow(data parallelism and model parallelism)is designed and implemented.Aiming at data parallelism,the algorithm is implemented to replace the original linear execution mode with pipeline execution mode.And for model parallelism,the original random model partitioning mode is replaced by the innovative greedy algorithm.Finally,the validity of the two optimization algorithms proposed in this thesis is verified by several comparative experiments.The data parallelism optimization algorithm can increase the utilization rate of computing equipment by 28% on average.The model parallelism optimization algorithm can save 34% of training time on average.The purpose of TensorFlow acceleration training model is completed.

Keywords/Search Tags:

Deep learning, TensorFlow, distributed parallelism, data parallelism, model parallelism

PDF Full Text Request

Related items

1	Optimization Of Distributed Training Strategies For Deep Learning Networks
2	Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow
3	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
4	Research On Deep Learning Syntax Extension And Compilation Method Of COStream Language
5	The Research And Implementation Of Parallel Computing Method On MPCore Multicore Processor
6	Research Of Memory-Access Management And Scheduling Optimization In Functional Parallelism
7	On The Depth And Big Model Of Deep Neural Networks: Theory And Algorithm
8	Optimization Of Resource Allocation Algorithm For Training Complex Neural Networks In Model Parallel Mode
9	Computation metrology of parallelism and applications to precision machinery
10	Testing Parallelism Enhancement Of Freescale Microcontroller MC68HC908GR8 Series