Model Training Performance Analysis Of Typical Deep Learning Frameworks In The Single GPU Environment

Posted on:2021-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:H L Dai

Full Text:PDF

GTID:2518306104494634

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning,in order to help practitioners efficiently write and train models,a large number of deep learning frameworks have emerged.According to the programming paradigm,these frameworks can be divided into two categories: declarative and imperative programming frameworks.The most popular frameworks in these two categories are Tensor Flow and Py Torch.Tensor Flow uses static computation graph to represent computation process,while Py Torch uses dynamic computation graph.The former can optimize the computation graph before running,while the latter can better handle variable-length input.Due to different computation graphs,Tensor Flow and Py Torch are very different in framework design,task scheduling,and computation graph execution,making it difficult to compare and analyze the performance of each part of the two frameworks.In order to deeply analyze the performance difference between Tensor Flow and Py Torch training deep neural network(DNN)model in a single GPU environment and identify the key factors affecting performance,the performance model of training DNN in the single GPU environment is defined,and experimental evaluation is done based on the model.The performance model is based on the standard process of DNN training,considers factors such as I/O,memory copying,CPU processing,GPU processing,and computation graph optimization.It comprehensively reflects the performance of the entire training process.The experiment uses 7 popular DNN models covering CNN,RNN and Transformer network structures,benchmarks the training performance of these two frameworks,then makes qualitative and quantitative analysis and comparison.Performance analysis shows that in the single GPU environment,the factors such as task scheduling,data loading,and memory copying of the two frameworks have an impact on the overall performance of less than 3%,and the implementation of the key layers of the deep learning model is critical to the training speed.For most models,the optimization of the computation graph improves the training performance by no more than 2.5%,which means it has little effect on performance.The research results can provide technical guidances for deep learning practitioners in framework selection and performance optimization.

Keywords/Search Tags:

Deep Learning, Comparison, TensorFlow, PyTorch

PDF Full Text Request

Related items

1	Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow
2	Distributed Deep Learning Platform DisPyTorch
3	TensorFlow Architecture Analysis And Application Research
4	Application Of Convolutional Neural Network In Image Classification Under Tensorflow Framework
5	Implementation Of Recognition System Based On Tensorflow Deep Learning And Research On Optimization Of Mobile Terminal Recognition
6	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
7	Image Classification Based On Deep Learning
8	Research And Implement Of Distributed Deep Learning System Based On Spark
9	Research On Resource Scheduling Of Deep Learning Tasks In TensorFlow Platform
10	Natural Image Classification Method Based On The Deep Learning Research