Research On Dynamic Cache Partition For Fused CPU-GPU Architecture

Posted on:2016-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:C W Sun

Full Text:PDF

GTID:2308330473461600

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Recently, with fast development of GPU hardware and easy to adopt programming models such, using GPUs to accelerate non-graphics CPU application is becoming more and more popular and an inevitable trend. Thus "combination " of the CPUs and GPUs on the same chip has become a new architectural trend. This kind of combination offers several advantages than traditional systems, such as reducing communication cost, more efficient utilization and better performance. Due to the multi-stage filtration of hierarchical cache structure, the last level cache locality becomes poor. Thus, it is hard to use the last level cache efficiently with traditional LRU cache replacement policy and performance improvement is seriously affected. Particularly, the management of last-level cache (LLC) is very important to performance. Firstly, the CPU and GPU have different architectures and they differ on the last level cache capacity sensitivity. Secondly, GPU has a large number of cores, and under the LRU replacement strategy, GPU applications read/write the cache heavily. But the performance of the GPU applications did not significantly improved with the increase of the cache capacity. On the contrary, the CPU applications are thus allocated with less cache and the performance will be seriously affected. Due to the different characteristics of the CPU and GPU applications, it will bring new challenges to manage the the shared LLC between CPU and GPU.In this paper, we will analyze the GPU applications’features. And we propose I-M CP dynamic cache partition for the fused architecture by absorbing previous cache management schemes.The main research contents and contributions include:1. In order to manage the last level cache of CPU-GPU fused architecture efficiently, we analyze the similarities and differences of the CPU and GPU architecture, including the cost of thread switching, parallel cores, memory bandwidth, the way of cache read. After a brief introdution of the advantages and programming model on CPU-GPU fused architecture, we put forward GPU application cache sensitivity evaluation method and classify the GPU applications.2. We analyze the importance of last level cache and the challenges of LRU cache replacement policy on multi-core processors. Based on the optimization strategy of last level cache and the features of CPU and GPU applications, we propose I-M CP dynamic cache partition for the fused architecture.In this thesis, we test static cache partition on multiple dimensions through detailed experimental design. The experiments indicate that the interference between CPU and GPU applications can be avoided by cache partitioning. The results show that after cache partition, the performance of the whole system improves significantly, which indicates the effectiveness of the proposed method in this thesis. Compared to LRU, optimal static cache partition and I-M CP dynamic cache partition improves performance by 11.68% and 13.68% respectively. And GPU performance programs are degraded by 3.27% and 0.87%.

Keywords/Search Tags:

GPU cache sensitivity, fusion architecture, share last level cache, dynamic cache partitionn

PDF Full Text Request

Related items

1	Adaptive Cache Management Policies For High Performance Microprocessors
2	Research On Power Optimization Of Cache Architecture Design
3	Architectural Level Leakage Power Optimization For Cache Memory In Microprocessors
4	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
5	Study On Cache Partition Optimization Based On Non-stacked Cache Replacement Algorithm
6	Cache Partitioning Policies On Chip Multi-processors For Scientific Applications
7	The Dynamic Cache Of Proxy For Streaming Media
8	Research On Shared Last-level Cache Management Policy For CPU-GPU Heterogeneous Multiprocessor Architecture
9	Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors
10	Research On Cache Optimization Technology Based On CPU-GPU Heterogeneous Architecture