Font Size: a A A

Research On General Computation Model For CPU_GPU Heterogeneous System

Posted on:2016-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2428330542454607Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,the use of Graphics Processing Unit(GPU)for general computing has become a research hotspot.The reason is that its ability to calculate is a dozen or even dozens of times than of the CPU.More importantly,its programmability also continues to increase.Currently,the general use of CPU_GPU heterogeneous system is generally used when GPU is used to speed up general computation.Compared with the homogeneous systems,although the use of this heterogeneous system can achieve a good speedup,its program development?performance optimization and other issues are also more complex.While the programming model for CPU_GPU heterogeneous system reduces the difficulty of programming application for developers,but in the calculation process,there are still some influencing factors,visibly,there is still room for improvement.Therefore,further research and optimization of the CUDA Programming model has important significance.In this thesis,we mainly complete the following work:Research and analyze the bottleneck of the CUDA programming model,including access optimization,task partition,communication delay,etc.and propose effective optimization strategy for each factor.The memory optimization proposed a coordinated static and dynamic cache bypass optimization framework.We propose profiling-based static analysis that classifies the global loads into three categories based on their localities and We develop run-time management techniques that modulate the ratio of thread blocks that use or bypass the cache;In task division,we proposed SM-centric transformation,which,for the first time enables precise spatial scheduling of GPU tasks.It offers the missing piece of the puzzle for circumventing GPU hardware restrictions to implement a flexible control of task scheduling;In the part of the communication delay,we propose a long short flow mechanism,by the segmentation of a long kernel into some short kernels to achieve data communication latency hiding.In order to validate the feasibility of the proposed optimization strategy,we studied the local sequence alignment algorithms Smith-Waterman,which is a bioinformatic problem.We designed and implemented the Smith-Waterman algorithm based on line,and in the CUDA programming platform using the proposed optimization strategy.Experimental results show that optimized parallel program got a higher speedup.
Keywords/Search Tags:Heterogeneous system, GPU, CPU_GPU, programming model, CUDA, performance optimization, Smith-Waterman
PDF Full Text Request
Related items