Performance Model For Parallel Convolutional Neural Network Based On OpenCL

Posted on:2019-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:P Li

Full Text:PDF

GTID:2428330572958924

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,GPU has gradually become a research hotspot in the field of high performance computing because of its high energy efficiency.Currently,there are many GPU products,GPUs produced by different vendors have great differences in hardware architecture,power consumption and application scenarios.Open CL can run on different heterogeneous devices and achieve portability of code,but the efficiency of parallel program development based on Open CL programming and the performance of practical applications are very problematic.How to take advantage of the architecture features of GPU to obtain higher performance has been paid more and more attention by researchers.In view of the above issues,this topic is based on the characteristics of GPU architecture and uses four typical applications as benchmark algorithms to analyze the key issues surrounding parallel program optimization,Open CL-based auto-tuning performance model and performance portability on the GPU platform.In addition,using the convolutional neural network as the background,the auto-tuning performance model was established for the most time-consuming convolution operator and the performance portability on different platforms was realized.The main content of this paper is as follows:An auto-tuning performance model based on Open CL by GPU architecture is proposed.Firstly,the influence factors of the performance programs on GPU are studied and analyzed,including the setting of the workgroup size in the Open CL kernel code,the task amount of per thread and the optimization methods commonly used on GPU platform,the above influencing factors are parameterized and set the range of values reasonably.All possible parameter configurations form a set space and all the parameters in space are tested on different GPU platforms.Taking four typical applications as the benchmark,we measure the actual execution time on kernel and select the minimum value from all the test times.The corresponding parameter configuration of this minimum value is the optimal configuration on the tested platform,so we achieve the purpose of auto-tuning.At the same time,an auto-tuning performance model based on CUDA programming was built for NVIDIA GPU,and the experimental results obtained after auto-tuning CUDA and Open CL programming models were compared.Finally,the search space is optimized through minimize and differential evolution in order to further improve the search efficiency.Then we apply the auto-tuning performance model proposed above to practical problems,taking the convolutional neural network as the research background,establish the performance model for the most time-consuming convolution operator.Taking a single-channel single-convolution kernel as an example,the mathematical formulas are used to prove that the size of the convolution kernel affects the arithmetic intensity of the GPU in theory,and this conclusion is further verified through experiments.Then the performance improvement is tested for each parameter configuration in the model.The experimental results show that compared with the original kernel,the performance is significantly improved after auto-tuning,up to 86.7%.For multi-channel multi-convolution kernels,there are mainly two methods for solving the problem: direct convolution method and convolution operator transforming to matrix multiplication,to achieve an auto-tuning performance model for these two methods.Taking the input data of the convolutional layer on Alex Net as an example,the two methods are tested on different GPU platforms respectively,and the experimental results are compared to verify the performance portability of Open CL.At the same time,the two methods are proved to be different scope of application through experiments.Finally,the search space was optimized.

Keywords/Search Tags:

GPU, Performance Portability, OpenCL, Auto-tuning, Performance Model, Convolution

PDF Full Text Request

Related items

1	Key Techniques Research On GPU Parallel Computing Targeted On Applications
2	The Evaluation Of Portable Performance For OpenACC 2.0
3	Performance portability of parallel kernels on shared-memory systems
4	Performance Model Of OpenCL Programs For GPU
5	OpenCL Accelerated Deep Convolutional Neural Networks Inference And Performance Model
6	Technology And Tool Desgin Of Performance Tuning For Parallel Programing On Multi-core Platform
7	Performance impacts due to number portability under various routing schemes
8	Research On Key Techniques Of Performance Models For High Performace Computing
9	Performance Management In Storage Systems
10	The Design And Implement Of Typical WEB Service Auto Performance Test Tool