Font Size: a A A

Research On Performance Optimization Of Heterogeneous Platform Based On CPU-GPU And Multicore Parallel Programming Model

Posted on:2012-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:B ChenFull Text:PDF
GTID:2178330338492039Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the computing power and programmability of graphics processor uint (GPU) increasing continuously, general purpose computing on GPU (GPGPU) is gradually becoming a research hotspot. Usually the computing with GPGPU utilizes a heterogeneous mode of CPU and GPU. Although the heterogeneous system based on CPU-GPU can achieve good performance gains, program development and performance optimization of it are more complexity compared with the homogeneous system.Computing on the heterogeneous system based on CPU-GPU will encounter a lot of performance bottlenecks, such as load balancing, synchronization and delay, data locality, task division and so on. These factors are essential to improve the performance of the program. On the other hand, although the programming difficulty of the heterogeneous system based on CPU-GPU reduced greatly due to the CUDA programming model, the development requirement is still high for most of the serial program developers. And when the underlying hardware changes, software developers have to learn a new programming model and rewrite programs for the new hardware platform, which increases the burden on the programmer. So it is very significate to designing a simple and platform-independent multicore parallel programming model.We mainly did the following researches:(1) Analyzed the key factors which affect the performance of CUDA programs, summarized the existing optimization methods comprehensively and proposed our new optimization methods and optimization strategy, such as using atomic functions to achieve synchronization between different thread blocks. For each optimization method, we did experiments to verify its effectiveness and theoretical analysis. And our method that exploiting atomic functions to synchronize different thread blocks is 4~5 times faster than existing method that restarting the kernel function.(2) To further validate the effectiveness of various optimization methods, and also to descripte the development process (algorithm design, programming, performance optimization) of heterogeneous platform based on CPU-GPU, we exploited CPU-GPU heterogeneous computing platform to solve the problem of DNA or protein sequence alignment which is a bioinformatics problem, namely designed and implemented a new column-based parallel Smith-Waterman algorithm based on CUDA platform. The optimized parallel program is 37 times faster than the serial program.(3) After analyzing the OpenMM parallel programming framework deep, we proposed a library-based and hardware-independent multicore parallel programming model. In order to verify the feasibility and simplicity of the model, we implemented a prototype system for scientific computing and tested it. It shields the details of underlying hardware for upper users through designing rational hierarchy architecture of APIs. To parallelize the serial programs, programmers only need to select the appropriate dynamic-link library depending on underlying hardware at the compile time.
Keywords/Search Tags:GPU, GPGPU, CUDA, Heterogeneous Computing Platform, Performance Optimization, Parallel Programming Model
PDF Full Text Request
Related items