Font Size: a A A

A CUDA Workload Synthesis Method Based On Program Behavior Profiling

Posted on:2013-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:W ChengFull Text:PDF
GTID:2248330392457841Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the progress of GPU micro-architecture and mature of programming models,many developers take its advantage to develop their high-performance applications.Among all the GPU programming models, CUDA programming model proclaimed byNVIDIA Corporation is warmly embraced by most users. Micro-architecture designers usesimulators to validate design alternatives in terms of multiple metrics such as performanceand energy consumption. Although a detailed GPU performance simulator can be used toevaluate the performance of design and it is close to that of its corresponding realhardware chip, it takes too much time to obtain the evaluation results.In this work, we introduce a CUDA workload synthesis method based on programbehavior profiling without considering of the semantic of the synthesized programsbecause the semantic is not important for micro-architecture performance validating. Thebehavior characteristics vary from instruction level to thread level and they are used as theinput of the CUDA source code generator. Consequently, another different CUDAprogram can be synthesized, which we called it as a synthesized CUDA kernel. Althoughthe synthesized program is different with the original CUDA program in the usage anddynamic instructions’ count, it has similar performance to the original one. The fact is that,we identify many important factors in terms of performance of CUDA kernel through ourin house profiling tool, which is developed by extending GPGPU-Sim. We have employedseveral performance metrics such as throughput to validate our CUDA synthesisframework.The experiment results show that the performance of synthesized CUDA kernelscorrelate well with CUDA programs from CUDA SDK, Parboil and Rodinia benchmarksuites. They have the same thread dimensions and similar basic block characteristics aswell as static instruction mix, but the dynamic instructions’ count and execution time ofthe synthesized program have been reduced heavily for most of the CUDA programs. Thesynthesized program has similar performance to the original one within less than10%ofthe error only.
Keywords/Search Tags:behaviour characteristics, program profiling, program synthesis, CUDA
PDF Full Text Request
Related items