Font Size: a A A

Research On Intra-cluster Redundant Memory Request Coalescence Optimization Mechanism Of GPGPU

Posted on:2022-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:J C XieFull Text:PDF
GTID:2518306557461294Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
General Purpose Graphics Processing Units support highly parallel threads,it is especially suitable for high throughput applications.To achieve higher throughput and support more parallel threads,GPGPU has built-in more quantity and more powerful streaming multi-processors(SM).More parallel threads need to get the requlred data for running,which puts more pressure on the cache sub-system.To reduce the total access to the cache sub-system,SM has built-in two redundant memory request coalescence mechanisms for the level-1 cache and the level-2 cache,coalescing the access to the same cache line across different threads.For concurrent threads across different SMs,there is also the phenomenon of requesting the same level-2 cache data block.To realize further redundant memory requests coalescing across several SMs in the SM cluster can effectively reduce the total level-2 cache access and reduce the on-chip network communication burden.Based on analyzing the previous GPGPU sub-system performance reach finds,the thesis studies redundant memory request coalescence.1.The Self-Adaptive Intra-Cluster Coalescence Mechanism,SAICC,is proposed,which consists of three parts.The first is a redundant memory request probe table based on the sliding window mechanism,which is used to discover redundant access requests with the shortest send interval.The second is Intra-Cluster Miss-Status Handling Registers,ICMSHR,which handles redundant memory requests and executes further redundant probes.The third is Intra-Cluster Public Cache,ICPC,which holds the data obtained by redundant memory requests,reduces the redundancy of the data across caches of different SMs and responds to repeated read requests across SMs within the cluster.2.Two SAICC optimization methods are proposed.One is the access cost balancing algorithm,to avoid ICPC excessive busyness via restricting loading the data obtained by redundant memory requests into ICPC,and the other is the SAICC based level-1 cache bypass strategy,to avoid ICPC excessive idleness via allowing the data which bypassing the level-1 cache to be loading into ICPC.Experiments show that the SAICC has brought advantages in reducing secondary cache access,reducing on-chip network latency and improving cache responsiveness.Based SAICC,the access cost balancing algorithm and the SAICC based level-1 cache bypass strategy gains further,resulting in further performance gains.
Keywords/Search Tags:general purpose graphics processing units, cache subsystems, redundant memory requests coalescence, cache bypass
PDF Full Text Request
Related items