Research On Control Divergence Optimization Of GPGPU

Posted on:2019-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Yang

Full Text:PDF

GTID:2428330611993138

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

At present,GPU has been widely used in the field of general computing because of its powerful ability,and it uses SIMT execution model to manage thousands of threads effectively.However,this model may result in control divergence for program execution,limiting performance gains.In response to this problem,warp regrouping method has been proposed to combine threads executing the same branch path,which can significantly improve thread-level parallelism.However,it is found that such methods generally have some unnecessary reorganization,introducing additional overheads and limiting the further improvement of performance.In view of the above problems,this paper analyzes the source of overheads and proposes a lightweight method�partial warp regrouping.It controls the scope of reorganization by setting thresholds.Under the premise of ensuring certain efficiency,it reduces the reorganization of warps with a large number of active threads,avoiding excessive overheads.The main works of this paper include:First,we analyze the problems existing in the current warp regrouping mechanism.In order to reduce the complexity of hardware design,GPUs usually organize registers in banks.When a thread executes in a statically mapped manner,it does not have any effect.However,after the reorganization operation,multiple threads may access the same bank,which results in the register bank conflicts,causing the pipeline to stall.At the same time,the reorganization may lead to the split of warps,which increases the number of memory accesses and reduces the efficiency.It is found that these costs tend to be the result of reorganizations of the warps with a large number of active threads,which offsets the benefits and may even degrade the performance of the program.To this end,we propose a partial regrouping method,which controls the scope of reorganization by setting thresholds,avoiding the additional overheads.Through further analysis of the scope of reorganization,it is found that the optimal thresholds for different programs in partial warp regrouping are different.So we propose a general partial warp regrouping framework that can dynamically adjust the threshold.It dynamically samples and analyzes the performance parameters to guide the adjustment of the threshold.In the processing of the aligned warps,the idea of remapping is adopted,and the threads are evenly distributed to each SIMD lane,which reduces the overheads introduced by reorganization and accelerates the execution of the program.At the end,we implemented and evaluated the partial warp regrouping on GPGPUSim.The experimental results show that partial warp regrouping can significantly reduce unnecessary overheads and the performance is increased by 12% on average and up to 27%compared to PDOM.Although the existing Lane-Aware warp regrouping can achieve good performance improvement,the hardware design overhead can not be ignored.In the remapping test,for specific programs,an average of 9.1% acceleration ratio can be obtained,with a maximum increase of nearly 20%.In general,partial warp regrouping can achieve good performance acceleration.Its main advantage lies in the simple hardware design,and it can be easily integrated into the existing warp regrouping methods.The switching between PDOM and complete warp regrouping can be achieved by explicit threshold adjustment,which greatly improves its versatility.

Keywords/Search Tags:

SIMT, control divergence, warp regrouping, threshold, extra overheads, performance

PDF Full Text Request

Related items

1	Mitigating the cost, performance, and power overheads induced by load variations in multicore cloud servers
2	The H-extra Conditional Diagnosability Of Two Class Of Interconnection Networks Under The PMC Model
3	Loom Warp Tension Control Strategy
4	Toward Efficient SIMT Execution---A Microarchitecture Perspective
5	An automatic regrouping mechanism to deal with stagnation in Particle Swarm Optimization
6	The Regrouping Of America's Broadcasting Industry In The Contemporary Era
7	Software Optimization Scheme On Control Flow Divergence For NVIDIA Maxwell GPGPU Applications
8	The Electronic Let-off Control System Research Of Embedded Warp Knitting Machine
9	Research On Memory Mapping Methods Of Reconfigurable And SIMT Processor System Architectures
10	Analysis And Optimization Of SIMT Thread Scheduling Model