Font Size: a A A

Research On GPU-based Parallel Computation Performance Optimization

Posted on:2015-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:T J ShaoFull Text:PDF
GTID:2268330428497995Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the hardware resource restriction and the programming model of highperformance computing programming, this paper research on GPU-based parallelcomputation performance. Mainly works following as:Introduced the development status, prospects, and daily increasing application ofhigh performance computing of GPU. Took the most-widely used and correlatedprogramming language CUDA as an example, introduced the hardware and softwareframeworks of GPU parallel computing. At the same time, gave detail introduction toCUDA programming model. Through analysis and introduction of this chapter, gavein-depth introduction to various aspects of CUDA, which is the programming designlanguage of GPU, ranging from thread hierarchical design, memory access strategy,thread block organization and allocation, execution unit and CUDA program compileprocess.And the next, performed thorough inquiry about optimization method for theperformance of GPU parallel computing. As the scheduling was carried out based onwarp unit, reduced the execution branches of warp internal thread. In order to increasethe utilization ratio for the resource, increased parallel scale; for situation that there islarge amount of data, the approach of packet processing can be taken; for data that willbe visited repeatedly, use partitioning technology and load them into the shared storagearea; in order to increase the delay of hidden visits for the independent instructions,performed data prefetching; with utilization of regional characteristic of the visits,performed combined reading and fetching, and tried to visit continuous data as muchas possible; in order to reduce unnecessary delay, avoided bank conflicts; in order toreduce instruction cost, reduced invalid instructions, performed loop unrolling andtried to use instructions with higher throughput capacity as much as possible.At last, carried out some experiments in the specific environment, to verificationthe optimization is rationality, and gave the results of a whole programing, analyzeddata of the simulation result. From the actual test data of the final experiments weknow that the optimization to the performance of CUDA parallel computing talkedabout in this paper is effective.
Keywords/Search Tags:GPU, Parallel computing, Performance optimization, CUDA
PDF Full Text Request
Related items