Performance Analysis And Modeling Of Parallel Thread Execution Program

Posted on:2014-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:H S Suo

Full Text:PDF

GTID:2248330395496958

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern science and technology progress and electronicindustry, GPU operational performance has been developed to incapable of further increase.From the previous single as graphics processor, to massively parallel computing processor.GPU now also played a lot of this is the work of CPU. At the same time, because of theunique structure of the GPU hardware, so that in the treatment of large parallel computingtime, the processing speed of GPU is much more than CPU. GPU in general-purposecomputing has obvious advantages: its parallelism is strong, can bear high density operation,but also can effectively reduce the frequent communication between the program runs in theprocess of GPU and CPU. Therefore, the calculation of GPU processing ability of people paysmore and more attention. CUDA platform provides a programming model for people: it doesnot need to learn too much language; just the programming language is extended, greatlyreducing the threshold of CUDA. As a new generation of NVIDIA graphics architecture of theFermi release, CUDA C programming is also more and more people are familiar with. CUDAC is an extension of the C language programming. PTX is a use of GPU scalable parallelcomputing. Because never meet market demand, to achieve real-time driving, high definition3D graphics, GPU has evolved into a highly parallel, multithreaded processor, also has thehuge computation ability and high memory bandwidth. The GPU processor is particularlysuitable for solving this kind of problem, can be represented as data parallel computing (thesame program execution and high strength in many data elements in parallel, memoryoperations). Because the execution is the same procedure, so each data unit has a lowerrequirement for complex flow control. PTX defines a virtual machines and general-purposeparallel threads of execution of the instruction set. It was installed is translated into the targethardware instruction set, through the use of PTX, GPU can be used as programmable parallelcomputer.Like ordinary compilation process, in GPU programming is also going through ahigh-level language into a low-level language, the assembly language to binary language.From a programming language is from CUDA C to PTX to sass to cubin process. In order to accurately calculate the program is executed in a GPU command delay, we must firsteliminate all interference conditions; ensure the correctness of our results. Therefore, in thecalculation of the clock period, we need to get the binary file input hardware implementation,so as to avoid the conversion time jamming our final result. First we need to calculate thenumber in a PTX program to each of the PTX instruction by programming. First set up acount array, the array of each count represents a PTX instruction number, number of the finaloutput of these instructions in a TXT. Then through the analysis of the hardware structure ofGPU, to calculate the PTX of each instruction clock delay. Because the instruction in GPUpipelined way to carry, so we can use the instructions in the GPU workflow to calculate acalculate formula of PTX instruction clock delay, delay and by this formula can be derived foreach PTX instruction, which can be calculated by a PTX program clock delay.

Keywords/Search Tags:

CUDA, Fermi, PTX, GPU, pipeline

PDF Full Text Request

Related items

1	The Research On Fermi-Architecture GPU Ased FDTD Methods And Related Lgorithms
2	Reconstruction de la surface de Fermi dans l'etat normal d'un supraconducteur a haute Tc: Une etude du transport electrique en champ magnetique intense
3	The Design And Implementation Of Urban Underground Pipeline Information System Based On ArcEngin
4	Research On Pipeline Detection Technology Based On Pipeline Robot
5	Research And Implementation Of Transplant CUDA Program Based On Android
6	Design And Implementation Of Word2Vec Based On CUDA
7	The Design And Implementation Of The Key Technology Of Pipeline Engine For Numerical Control System
8	Deep Packet Inspection With CUDA Technology
9	A Resesrch Of EM-PIC's GPU Parallel Methods Based On CUDA
10	CUDA For High-Performance Computing