Research On Intra-cluster Redundant Memory Request Coalescence Optimization Mechanism Of GPGPU

Posted on:2022-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:J C Xie

Full Text:PDF

GTID:2518306557461294

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

General Purpose Graphics Processing Units support highly parallel threads,it is especially suitable for high throughput applications.To achieve higher throughput and support more parallel threads,GPGPU has built-in more quantity and more powerful streaming multi-processors(SM).More parallel threads need to get the requlred data for running,which puts more pressure on the cache sub-system.To reduce the total access to the cache sub-system,SM has built-in two redundant memory request coalescence mechanisms for the level-1 cache and the level-2 cache,coalescing the access to the same cache line across different threads.For concurrent threads across different SMs,there is also the phenomenon of requesting the same level-2 cache data block.To realize further redundant memory requests coalescing across several SMs in the SM cluster can effectively reduce the total level-2 cache access and reduce the on-chip network communication burden.Based on analyzing the previous GPGPU sub-system performance reach finds,the thesis studies redundant memory request coalescence.1.The Self-Adaptive Intra-Cluster Coalescence Mechanism,SAICC,is proposed,which consists of three parts.The first is a redundant memory request probe table based on the sliding window mechanism,which is used to discover redundant access requests with the shortest send interval.The second is Intra-Cluster Miss-Status Handling Registers,ICMSHR,which handles redundant memory requests and executes further redundant probes.The third is Intra-Cluster Public Cache,ICPC,which holds the data obtained by redundant memory requests,reduces the redundancy of the data across caches of different SMs and responds to repeated read requests across SMs within the cluster.2.Two SAICC optimization methods are proposed.One is the access cost balancing algorithm,to avoid ICPC excessive busyness via restricting loading the data obtained by redundant memory requests into ICPC,and the other is the SAICC based level-1 cache bypass strategy,to avoid ICPC excessive idleness via allowing the data which bypassing the level-1 cache to be loading into ICPC.Experiments show that the SAICC has brought advantages in reducing secondary cache access,reducing on-chip network latency and improving cache responsiveness.Based SAICC,the access cost balancing algorithm and the SAICC based level-1 cache bypass strategy gains further,resulting in further performance gains.

Keywords/Search Tags:

general purpose graphics processing units, cache subsystems, redundant memory requests coalescence, cache bypass

PDF Full Text Request

Related items

1	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
2	Key Techniques Research Of Memory In Homogeneous General Purpose Stream Processor
3	Key Research Issues Of Memory Architecture For Three Dimensional Multi-Core Processors
4	Low-Power Cache Design Based On Non-Volatile Memory
5	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
6	Research On Resources Scheduling For Irregular Applications On Graphics Processing Units
7	Architectural Level Leakage Power Optimization For Cache Memory In Microprocessors
8	Gpus General-purpose Computing Applications In Ct
9	Power Analysis And Optimization Of The General Purpose Computing Of Graphics Processing Unit
10	Research On Performance Optimization Of General Purpose Graphics Processing Unit Based On Thread Scheduling