Font Size: a A A

Performance Analysis And Modeling Of Parallel Thread Execution Program

Posted on:2014-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:H S SuoFull Text:PDF
GTID:2248330395496958Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology progress and electronicindustry, GPU operational performance has been developed to incapable of further increase.From the previous single as graphics processor, to massively parallel computing processor.GPU now also played a lot of this is the work of CPU. At the same time, because of theunique structure of the GPU hardware, so that in the treatment of large parallel computingtime, the processing speed of GPU is much more than CPU. GPU in general-purposecomputing has obvious advantages: its parallelism is strong, can bear high density operation,but also can effectively reduce the frequent communication between the program runs in theprocess of GPU and CPU. Therefore, the calculation of GPU processing ability of people paysmore and more attention. CUDA platform provides a programming model for people: it doesnot need to learn too much language; just the programming language is extended, greatlyreducing the threshold of CUDA. As a new generation of NVIDIA graphics architecture of theFermi release, CUDA C programming is also more and more people are familiar with. CUDAC is an extension of the C language programming. PTX is a use of GPU scalable parallelcomputing. Because never meet market demand, to achieve real-time driving, high definition3D graphics, GPU has evolved into a highly parallel, multithreaded processor, also has thehuge computation ability and high memory bandwidth. The GPU processor is particularlysuitable for solving this kind of problem, can be represented as data parallel computing (thesame program execution and high strength in many data elements in parallel, memoryoperations). Because the execution is the same procedure, so each data unit has a lowerrequirement for complex flow control. PTX defines a virtual machines and general-purposeparallel threads of execution of the instruction set. It was installed is translated into the targethardware instruction set, through the use of PTX, GPU can be used as programmable parallelcomputer.Like ordinary compilation process, in GPU programming is also going through ahigh-level language into a low-level language, the assembly language to binary language.From a programming language is from CUDA C to PTX to sass to cubin process. In order to accurately calculate the program is executed in a GPU command delay, we must firsteliminate all interference conditions; ensure the correctness of our results. Therefore, in thecalculation of the clock period, we need to get the binary file input hardware implementation,so as to avoid the conversion time jamming our final result. First we need to calculate thenumber in a PTX program to each of the PTX instruction by programming. First set up acount array, the array of each count represents a PTX instruction number, number of the finaloutput of these instructions in a TXT. Then through the analysis of the hardware structure ofGPU, to calculate the PTX of each instruction clock delay. Because the instruction in GPUpipelined way to carry, so we can use the instructions in the GPU workflow to calculate acalculate formula of PTX instruction clock delay, delay and by this formula can be derived foreach PTX instruction, which can be calculated by a PTX program clock delay.
Keywords/Search Tags:CUDA, Fermi, PTX, GPU, pipeline
PDF Full Text Request
Related items