Font Size: a A A

Research And Evaluation Of A Micro Architecture Solution To GPGPU Memory Irregularity Problem Based On Thread Regrouping

Posted on:2017-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:W MengFull Text:PDF
GTID:2348330491962947Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
With the improvement of programmability and computing power in the last decade, the application domain of GPU has extended to general purpose computing. Many general purpose computing applications contain memory irregularity, which prevents GPU cache from capturing intra-warp and inter-warp data locality and thus slows down program execution. In order to protect data locality and improve the performance of irregular applications, lots of efforts have been devoted to GPU microarchitecture research. Unfortunately, prior techniques fail to protect inter-warp locality in programs with significant memory irregularity. Therefore, this thesis proposes a microarchitecture solution based on thread regrouping to protect intra-warp locality.Firstly, two main strategies in the thread regrouping solution are explained:(1) exchanging threads among warps to alleviate cache contention and protect data locality, (2) reshaping memory access stream in order to reduce memory latency and improve the performance of irregular programs. Secondly, necessary GPU microarchitecture modifications for supporting the thread regrouping solution are described, including the addition of regroup buffer and the changes to issue logic. Thirdly, the overhead of thread regrouping is discussed and measures to reduce the overhead are given. Finally, this thesis combines thread regrouping with an intra-warp locality preservation technique called MRPB, and designs a holistic solution capable of protecting both inter-warp and intra-warp locality. The microarchitecture solution analyzes memory access characteristics of GPU program on startup. Then it chooses either thread regrouping or MRPB based on program characteristics.Both thread regrouping solution and holistic solution are implemented on simulator GPGPU-Sim and evaluated against PolyBench. PolyBench is a GPGPU benchmark with mainly programs containing memory irregularity. Experiment results show that, compared with baseline architecture, thread regrouping solution reduces LI cache misses by 28.2% and increases IPC by 44.9% on average. The above results prove that thread regrouping solution can effectively protect data locality and speedup program execution. Moreover, holistic solution reduces LI cache misses by 34.9% and increases IPC by 63.2% on average. This demonstrates the ability of holistic solution to protect data locality better than thread regrouping and to further speedup program execution.
Keywords/Search Tags:GPGPU, Memory Irregularity, Cache Contention, Data Locality, Microarchitecture
PDF Full Text Request
Related items