The Optimized Design Of GPGPU On-chip Memory System

Posted on:2018-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:F F Fan

Full Text:PDF

GTID:2428330590977644

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays,GPGPU has become a popular platform for general purpose computing.GPGPU shows powerful computing ability by leveraging thousands of threads.However,massive threads also make big pressure to some GPGPU on-chip memory resources.Especially,many threads often compete in the small sized first level data(L1D)cache,which leads to severe cache thrashing problem and further hurts the GPGPU performance.At the same time,some GPGPU on-chip memory resources are facing serious under-utilization.For example,there are many unoccupied registers and shared memory during runtime,which leads to serious resource waste.In this paper,we explore the optimized design of GPGPU on-chip memory system to solve these problems.First,we introduce victim cache to GPGPU to help us keep more data on on-chip memory.By doing so,we try to alleviate L1 D cache thrashing.A small fully associative victim cache is usually applied in CPU.However,to suit the GPGPU applications with massive concurrent threads,we design a set associative victim cache in GPGPU,which is equivalent to L1 D cache.Second,we apply a simple prediction scheme to keep the mostly used data on L1 D cache.This helps to avoid costly cache line evictions and interchanges between victim cache and L1 D cache and thus enable a better cooperation of victim cache and L1 D cache.Third,we propose to use the unoccupied registers and shared memory to hold the data of cache lines in victim cache.Which further saves the chip area and hardware cost.The experiment results show that after we introduce victim cache to GPGPU,the hit ratio of on-chip data cache can be improved 26.8% and the system performance can be improved 36.3% on average.Meanwhile,by using our prediction scheme,the cache line evictions and interchanges between L1 D cache and victim cache can be reduced 21.8% and the system performance can be further improved 4.9%.Moreover,after using the unoccupied registers and shared memory as the storage units for data blocks of victim cache,the utility of GPGPU on-chip memory can be largely improved and the hardware cost of victim cache can be significantly reduced.

Keywords/Search Tags:

GPGPU, register, L1D cache, shared memory, victim cache, prediction scheme

PDF Full Text Request

Related items

1	Data Sharing Optimization On CPU-GPGPU Shared Last Level Cache System
2	Composite pseudo associative cache with victim cache for mobile processors
3	Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors
4	Research And Implementation Of The Cache Coherence Protocol For The Large Scale System Of The SMP-based CC-NUMA Category
5	Modeling Of Shared Cache Memory Access Behavior Based On Artificial Neural Network
6	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
7	Smart Directory Cache For Multi-Many-Core Systems
8	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
9	Modeling Shared Cache Memory Accesses Of Multi-core Processors
10	Classification-based Prefetch-Aware Cache Partition Mechanism