Font Size: a A A

Research Of Thread Placement Optimization Strategy For CUDA Programs

Posted on:2020-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:G S XieFull Text:PDF
GTID:2428330590973879Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
GPU has powerful data parallel processing and floating-point computing capabilities,so it is more and more widely used in numerical simulation and scientific computing.However,in the face of the complex hardware structure of GPU and the multithreaded programming model which is completely different from CPU,it is particularly important to improve the efficiency of program development and the program performance on GPU.Thread placement strategies is an important part of GPU program's optimization.They are complex.Traditional thread placement strategies include reference guidance,exhaustive parameters and so on.Based on the static and runtime information of the program,this paper establishes an optimization model of thread placement for CUDA program by using machine learning algorithm.Firstly,this paper summarizes the core information of the program with strong representativeness,and designs a method of collecting corresponding runtime information based nvprof.However,the collecting process needs to run the CUDA programs repeatedly,which leads to a doubling of the time-consuming.The drawback of this method is that it is time-consuming when collecting runtime information.Therefore,the idea of collecting static program information instead of part of run-time information is put forward in this paper.In this paper,we use LLVM framework to transform CUDA program into intermediate representation,compile and analyze the cyclic information,instruction information and storage information of pass statistical source program,and comprehensively realize the static information collection of CUDA program,which greatly reduces the elapsed time of the information collection process.This paper also proposes a label setting algorithm which can fully reflect the change of program performance.This paper screens out multiple machine learning algorithms to train the model,and uses grid search method,cross validation to complete the work of parameter optimization.In the experimental analysis,we select programs from three benchmark suites to build training datasets,design and complete three groups of experiments.Through the static information substitution effect analysis experiment,it is verified that the static information has a good fitting effect on the runtime information.And the static information substitution of the runtime information can reduce the information collection time by 23.2% on the premise of ensuring the training accuracy of the model.Through the comparative experiment of machine learning algorithm,it is found that the support vector machine algorithm has better training effect on this model.By testing with the existing model under the same conditions,it is proved that the proposed model improves the accuracy by 3.7% and reduces the time consumption by 51.8% compared with the existing models,and has better training effect and advantage in time.
Keywords/Search Tags:CUDA, Thread, Machine Learning, LLVM, Performance optimization
PDF Full Text Request
Related items