Research On GPU-based Parallel Computation Performance Optimization

Posted on:2015-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:T J Shao

Full Text:PDF

GTID:2268330428497995

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the hardware resource restriction and the programming model of highperformance computing programming, this paper research on GPU-based parallelcomputation performance. Mainly works following as:Introduced the development status, prospects, and daily increasing application ofhigh performance computing of GPU. Took the most-widely used and correlatedprogramming language CUDA as an example, introduced the hardware and softwareframeworks of GPU parallel computing. At the same time, gave detail introduction toCUDA programming model. Through analysis and introduction of this chapter, gavein-depth introduction to various aspects of CUDA, which is the programming designlanguage of GPU, ranging from thread hierarchical design, memory access strategy,thread block organization and allocation, execution unit and CUDA program compileprocess.And the next, performed thorough inquiry about optimization method for theperformance of GPU parallel computing. As the scheduling was carried out based onwarp unit, reduced the execution branches of warp internal thread. In order to increasethe utilization ratio for the resource, increased parallel scale; for situation that there islarge amount of data, the approach of packet processing can be taken; for data that willbe visited repeatedly, use partitioning technology and load them into the shared storagearea; in order to increase the delay of hidden visits for the independent instructions,performed data prefetching; with utilization of regional characteristic of the visits,performed combined reading and fetching, and tried to visit continuous data as muchas possible; in order to reduce unnecessary delay, avoided bank conflicts; in order toreduce instruction cost, reduced invalid instructions, performed loop unrolling andtried to use instructions with higher throughput capacity as much as possible.At last, carried out some experiments in the specific environment, to verificationthe optimization is rationality, and gave the results of a whole programing, analyzeddata of the simulation result. From the actual test data of the final experiments weknow that the optimization to the performance of CUDA parallel computing talkedabout in this paper is effective.

Keywords/Search Tags:

GPU, Parallel computing, Performance optimization, CUDA

PDF Full Text Request

Related items

1	Parallel Processing Of Remote Sensing Image Filtering Algorithm Based On CUDA
2	Parallel Implementation Of K-means Algorithm And Performance Optimization
3	Study Of Porting And Optimization Of GTC-P On Large Scale System Using OpenACC
4	Research On Fast Registration Of The Remote Sensing Images Based On CUDA Parallel Computing
5	Implementation Of Two-dimensional DFT Parallel Algorithm On CUDA
6	Research On Performance Optimization Of Heterogeneous Platform Based On CPU-GPU And Multicore Parallel Programming Model
7	Design And Implementation Of Parallel SM4-GCM Based On CUDA
8	Research Based On CUDA Parallel Computation Of FFT
9	Optimization Of Sequence Alignment Parallel Software On CUDA
10	High Efficiency Video Coding Algorithm Optimization Using CUDA