Research Of Thread Placement Optimization Strategy For CUDA Programs

Posted on:2020-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:G S Xie

Full Text:PDF

GTID:2428330590973879

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

GPU has powerful data parallel processing and floating-point computing capabilities,so it is more and more widely used in numerical simulation and scientific computing.However,in the face of the complex hardware structure of GPU and the multithreaded programming model which is completely different from CPU,it is particularly important to improve the efficiency of program development and the program performance on GPU.Thread placement strategies is an important part of GPU program's optimization.They are complex.Traditional thread placement strategies include reference guidance,exhaustive parameters and so on.Based on the static and runtime information of the program,this paper establishes an optimization model of thread placement for CUDA program by using machine learning algorithm.Firstly,this paper summarizes the core information of the program with strong representativeness,and designs a method of collecting corresponding runtime information based nvprof.However,the collecting process needs to run the CUDA programs repeatedly,which leads to a doubling of the time-consuming.The drawback of this method is that it is time-consuming when collecting runtime information.Therefore,the idea of collecting static program information instead of part of run-time information is put forward in this paper.In this paper,we use LLVM framework to transform CUDA program into intermediate representation,compile and analyze the cyclic information,instruction information and storage information of pass statistical source program,and comprehensively realize the static information collection of CUDA program,which greatly reduces the elapsed time of the information collection process.This paper also proposes a label setting algorithm which can fully reflect the change of program performance.This paper screens out multiple machine learning algorithms to train the model,and uses grid search method,cross validation to complete the work of parameter optimization.In the experimental analysis,we select programs from three benchmark suites to build training datasets,design and complete three groups of experiments.Through the static information substitution effect analysis experiment,it is verified that the static information has a good fitting effect on the runtime information.And the static information substitution of the runtime information can reduce the information collection time by 23.2% on the premise of ensuring the training accuracy of the model.Through the comparative experiment of machine learning algorithm,it is found that the support vector machine algorithm has better training effect on this model.By testing with the existing model under the same conditions,it is proved that the proposed model improves the accuracy by 3.7% and reduces the time consumption by 51.8% compared with the existing models,and has better training effect and advantage in time.

Keywords/Search Tags:

CUDA, Thread, Machine Learning, LLVM, Performance optimization

PDF Full Text Request

Related items

1	Practical Formal Techniques and Tools for Developing LLVM's Peephole Optimization
2	Research On The Thread-level Speculation Execution Model For LLVM Compiler
3	A Compile-time Optimization Method For Heterogeneous Computing Platforms Based On LLVM
4	An Auto Performance Predict Research For Scientific Program Based On LLVM
5	The Design And Implementation Of LLVM-based Backend For Model Machine
6	Research And Implementation Of Transplant CUDA Program Based On Android
7	High Performance Implementation Of Multiple Machine Learning Algorithm On GPGPU
8	Research And Lmplementation Of Parallelization Of Machine Learning Algorithm Based On CUDA Platform
9	Performance Analysis And Modeling Of Parallel Thread Execution Program
10	Research On Data-driven Performance Prediction And Optimization Of HPC Programs