Study Of ParaC Compiler Supporting Deep Learning Operator Parallel Algorithm Optimization

Posted on:2021-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:H Yin

Full Text:PDF

GTID:2518306032967039

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,the wide application of deep learning in various fields has brought tremendous progress to our lives and production.But deep learning is based on huge training data and huge consumption of computing resources.Moreover,the algorithm is the core of deep learning,so the acceleration of deep learning algorithms is particularly important.With the advancement of large-scale parallel architectures of GPUs by companies such as NVIDIA,GPUs have become the mainstream acceleration platform for deep learning optimization,but due to the complex architecture of GPUs,program optimization based on GPU platforms faces optimization difficulties,complexity,and low efficiency Challenge.Academia and industry have developed some deep learning compilation optimization frameworks.They support typical operators,face many typical core platforms,parallel optimization and automatic tuning.But the existing tools still have many problems that cannot be solved.Based on the ParaC compiler,this paper implements a compiler which can optimize the deep learning algorithm in parallel and generate high performance CUDA code on GPU platform.The ParaC language is extended to support complex nested loop structure and matrix types higher than two dimensions;the compiler's parallel analysis and optimization capabilities are improved,and the back end of CUDA code generation is provided;the open optimization interface is designed and implemented,which provides OPI guidance methods,supports explicit optimization strategy optimization;OPI guidance and runtime functions for data flow optimization are provided,which supports multiple Data flow optimization in mixed language programming.Two typical algorithms are selected to evaluate the performance of ParaC compiler.First,write out the ParaC version code of the algorithm,output the CUDA version code through OPI guidance tuning and compiler,and finally compare the performance of the generated version of ParaC with the optimized version of manual version and high-performance algorithm library.In general,the performance of the ParaC version is the same as that of the manual optimized version.Bucket sorting is in the range of array size from 10000 to 200000,and its performance is higher than thrust algorithm library.For batch normalization algorithm,taking resnet-50 network as an example,under the scale of 128 batches,the performance of large pictures is faster than that of cudnn algorithm library,and the performance of small pictures is slightly lower than that of algorithm library,but better parallel optimization strategies can be found.For its capacity,the number of lines of code in ParaC version is far lower than that in CUDA version,and because OPI guidance provides a tuning interface,it greatly reduces the tuning work of developers.

Keywords/Search Tags:

Deep learning operator, compiler, CUDA, OPI guidance, data flow, optimization

PDF Full Text Request

Related items

1	Research On Deep-learning Optimization Technology Based On ARM Processor
2	Research And Implementation On Compiler Framework For Translating Ansic C Into CUDA C
3	Construction And Optimization Of Deep Learning Operator Library Based On Loongson Platform
4	Research Of Automatic Compiler Tuning Base On Machine Learning
5	Desigh And Implementation Of Guidance System Based On Deep Learning
6	Research On Compiler Adaption With Machine Learning
7	Research Of Flow Field Data Postprocessing Methods Based On Deep Learning
8	Research And Cases Study On Data Flow Analysis
9	Research On Flow In Situ Visualization Based On Deep Learning
10	Research On Automatic Code Generation And Optimization In Parallelizing Compiler