Research On General Computation Model For CPU_GPU Heterogeneous System

Posted on:2016-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:X Han

Full Text:PDF

GTID:2428330542454607

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years,the use of Graphics Processing Unit(GPU)for general computing has become a research hotspot.The reason is that its ability to calculate is a dozen or even dozens of times than of the CPU.More importantly,its programmability also continues to increase.Currently,the general use of CPU_GPU heterogeneous system is generally used when GPU is used to speed up general computation.Compared with the homogeneous systems,although the use of this heterogeneous system can achieve a good speedup,its program development?performance optimization and other issues are also more complex.While the programming model for CPU_GPU heterogeneous system reduces the difficulty of programming application for developers,but in the calculation process,there are still some influencing factors,visibly,there is still room for improvement.Therefore,further research and optimization of the CUDA Programming model has important significance.In this thesis,we mainly complete the following work:Research and analyze the bottleneck of the CUDA programming model,including access optimization,task partition,communication delay,etc.and propose effective optimization strategy for each factor.The memory optimization proposed a coordinated static and dynamic cache bypass optimization framework.We propose profiling-based static analysis that classifies the global loads into three categories based on their localities and We develop run-time management techniques that modulate the ratio of thread blocks that use or bypass the cache;In task division,we proposed SM-centric transformation,which,for the first time enables precise spatial scheduling of GPU tasks.It offers the missing piece of the puzzle for circumventing GPU hardware restrictions to implement a flexible control of task scheduling;In the part of the communication delay,we propose a long short flow mechanism,by the segmentation of a long kernel into some short kernels to achieve data communication latency hiding.In order to validate the feasibility of the proposed optimization strategy,we studied the local sequence alignment algorithms Smith-Waterman,which is a bioinformatic problem.We designed and implemented the Smith-Waterman algorithm based on line,and in the CUDA programming platform using the proposed optimization strategy.Experimental results show that optimized parallel program got a higher speedup.

Keywords/Search Tags:

Heterogeneous system, GPU, CPU_GPU, programming model, CUDA, performance optimization, Smith-Waterman

PDF Full Text Request

Related items

1	Research On Performance Optimization Of Heterogeneous Platform Based On CPU-GPU And Multicore Parallel Programming Model
2	Research On Optimized Programming For Heterogeneous Multi-core Platform
3	GPU data-parallel computing of sequence alignment using CUDA
4	Research On Accelerator-centric Programming Model And Optimizations For Heterogeneous Computing Systems
5	The Implementation And Analysis Of Smith&Waterman Algorithm On Systolic Array
6	High Level Programming Model And Compiler Optimizations For CPU-GPU Heterogeneous Systems
7	Automatic Traffic Signature Extraction Based On Smith-Waterman Algorithm For Traffic Classification
8	CUDA For High-Performance Computing
9	Cuda For High-performance Computing
10	Reconfigurable Computing Research Oriented Bioinformatics