Research And Evaluation Of A Micro Architecture Solution To GPGPU Memory Irregularity Problem Based On Thread Regrouping

Posted on:2017-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:W Meng

Full Text:PDF

GTID:2348330491962947

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

With the improvement of programmability and computing power in the last decade, the application domain of GPU has extended to general purpose computing. Many general purpose computing applications contain memory irregularity, which prevents GPU cache from capturing intra-warp and inter-warp data locality and thus slows down program execution. In order to protect data locality and improve the performance of irregular applications, lots of efforts have been devoted to GPU microarchitecture research. Unfortunately, prior techniques fail to protect inter-warp locality in programs with significant memory irregularity. Therefore, this thesis proposes a microarchitecture solution based on thread regrouping to protect intra-warp locality.Firstly, two main strategies in the thread regrouping solution are explained:(1) exchanging threads among warps to alleviate cache contention and protect data locality, (2) reshaping memory access stream in order to reduce memory latency and improve the performance of irregular programs. Secondly, necessary GPU microarchitecture modifications for supporting the thread regrouping solution are described, including the addition of regroup buffer and the changes to issue logic. Thirdly, the overhead of thread regrouping is discussed and measures to reduce the overhead are given. Finally, this thesis combines thread regrouping with an intra-warp locality preservation technique called MRPB, and designs a holistic solution capable of protecting both inter-warp and intra-warp locality. The microarchitecture solution analyzes memory access characteristics of GPU program on startup. Then it chooses either thread regrouping or MRPB based on program characteristics.Both thread regrouping solution and holistic solution are implemented on simulator GPGPU-Sim and evaluated against PolyBench. PolyBench is a GPGPU benchmark with mainly programs containing memory irregularity. Experiment results show that, compared with baseline architecture, thread regrouping solution reduces LI cache misses by 28.2% and increases IPC by 44.9% on average. The above results prove that thread regrouping solution can effectively protect data locality and speedup program execution. Moreover, holistic solution reduces LI cache misses by 34.9% and increases IPC by 63.2% on average. This demonstrates the ability of holistic solution to protect data locality better than thread regrouping and to further speedup program execution.

Keywords/Search Tags:

GPGPU, Memory Irregularity, Cache Contention, Data Locality, Microarchitecture

PDF Full Text Request

Related items

1	The Optimized Design Of GPGPU On-chip Memory System
2	The Study Of GPGPU Microarchitecture And Performance Analysis
3	Data Sharing Optimization On CPU-GPGPU Shared Last Level Cache System
4	Model Driven Cache Management
5	An Efficient Cache System For Hybrid Memory
6	Research On Intra-cluster Redundant Memory Request Coalescence Optimization Mechanism Of GPGPU
7	A fresh look at data locality on emerging multicores and manycores
8	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
9	Frequent value locality and its applications to energy efficient memory design
10	Contention resolution and memory load balancing algorithms on distributed shared memory multiprocessors