Font Size: a A A

Research Of Optimization Method About Branch Divergence And Irregular Memory Access On GPU

Posted on:2016-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q YuFull Text:PDF
GTID:2348330536967304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,GPUs have been widely accepted in general computing area and become an important part of high performance computing system.However,as GPUs use an execution model called SIMT,their efficiency is subject to the presence of branch divergence in a GPU application.To save memory bandwidth while reducing memory access latency,memory coalescing has been introduced to GPUs.Althouth this mechanism can help improve memory access effectiveness,irregular memory accesses still impact program performance greatly.To solve these two problems,this paper analysed their causes,then proposed corresponding optimization methods,finally tested some programs on GPGPU-Sim simulator.Besides,we discussed the impact of these methods we proposed has on the performance,power consumption and energy of programs we optimized.Main work of this paper can be summarized as follows:1)We analysed the cause of branch divergence as well as the influence existing thread swapping algorithm has on power consumption,then Reduction and Bitonic Sort was further optimized.For Reduction,existing thread swapping algorithm increases power consumption,to solve this problem,we modified the algorithm to reduce bank conflicts in shared memory,as a result,power consumption was reduced.Experiments show that,this method reduces power consumption by 5% with performance loss by 5% on average.For Bitonic Sort,we reduced some unnecessary swap based on the analysis of original thread swapping algorithm.Experiments show that,this method improves performance by 6% with no power consumption gain.2)We discussed the impact of thread swapping range has on program performance.Thread swapping range is an important parameter when thread swapping algorithm was used to optimize programs.We found that,the larger thread swapping range is,the more fully swap is conducted,the higher possibility of reducing branch divergence it has,at the same time,the more extra overheads it brings.As a consequence,how to select appropriate thread swapping range relies on specific programs.3)We analysed the cause of irregular memory access and proposed an optimization method based on matrix operations.Then some programs in PolyBench/GPU benchmark suit were optimized and tested.Experimental results show that,our proposed method is effective in reducing irregular memory accesses and is capable of producing significant performance improvements for tested programs.In certain condition,the highest performance speedup ratio reaches 78.9x while the average speedup ratio of kernel is 35.9x,the power consumption increases 106.2% on average while energy consumption reduces86.2% on average.4)We concluded the reason of power consumption change,i.e.the total power is the sum of power produced by every GPU component,besides,the change of performance counters associated with each GPU component and program execution time account mainly for power consumption change before and after optimization.In addition,we researched the impact the size of shared memory has on the optimization results of programs in PolyBench/GPU.Results show that,the larger size of shared memory,the higher speedup ratio and little power consumption change,so the optimization effect is better.
Keywords/Search Tags:Branch Divergence, Irregular Memory Access, Thread Swapping, Matrix Operation, Performance, Power Consumption
PDF Full Text Request
Related items