Font Size: a A A

Research On Compressed Cache Technology For Performance Optimization

Posted on:2008-07-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H TianFull Text:PDF
GTID:1118360242999234Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Since innovations in CMOS technology in recent years have led to performance gap between processor and memory widening, modern processors use one or more levels of on-chip caches to alleviate the ever-increasing pressure of memory accesses. In addition, as the chip density increasing, chip multiprocessors (MP) and multithreading (MT) are becoming mainstream architectures of current processor design. The both architectures can greatly improve processor performance and throughput by exploiting both thread-level and instruction-level parallelism, but the growing memory access demand in MP/MT environment challenge the throughput ability of their memory sub-system. The processor designer must determine the tradeoff between cores and caches in a fixed area budget so that neither cores nor caches is the only performance bottleneck. Compressed cache technology can change the tradeoff between cores and caches and allow a design where more on-chip area is allocated to processor cores since on-chip cache compression can increase the effective cache size without significantly increasing its area and avoid some misses. Unfortunately, cache compression also has a negative side effect, since compressed cache lines have to be decompressed before being used by processor. This means that storing compressed lines increases cache hit latency. So this paper researched on the compressed cache technology for performance optimization. The methods, such as optimizing compressed cache hierarchy, simplifying compressed algorithm and improving cache replacement policy etc. were proposed to improve performance of compressed cache.The main contributions of this paper are as follows:1. With simplifying the Frequent Pattern Compression (FPC) algorithm, which used by L2 cache compression, and dividing the decompression process of compressed cache line into two stages, we proposed a novel decompression process of L2 compressed cache line based on Simple Frequent Pattern Compression (S-FPC) algorithm. The proposed scheme can decrease L2 decompressed latency 1 cycle and support compressing L1 data cache data. We evaluated the scheme by simulation experiments and described the hardware implementation of the compression scheme in detail.2. We proposed a unified compressed cache hierarchy (UCCH) that uses a unified compression algorithm in both L1 D-cache and L2 cache, called Simple Frequent Pattern Compression (S-FPC). UCCH can increase the cache capacity of L1 D-cache and L2 cache without any sacrifice of the L1 cache access latency. The layout of compressed data in L1 data cache of UCCH enables partial cache line prefetching and does not introduce prefetch buffers or increase cache pollution and memory traffic. The experiment shows UCCH can distinctly improve the performance.3. We proposed a novel modified LRU replacement policy for compressed cache (MLRU-C). MLRU-C replacement policy uses extra tags in compressed cache to construct a shadow tag struct, which be used to identify and record the mistake replacement in LRU policy. The mistake replacements in LRU policy recorded by shadow tag struct would be stored in Mistake Record Table (MRT). The MLRU-C would correct the mistake replacement decision according to the mistake replacement record in MRT. The experiment shows that MLRU-C can evidently decrease L2 compressed cache miss rate.4. We proposed using compressed cache technology to improve multithreading processor performance. Because the data locality of L1 D-cache and L2 cache is hurted by sharing on-chip cache hierarchy between threads, MT technolgy distinctly increases the cache miss rate and memory traffic. The demands for cache capcity and data bus bandwidth between levels of caches increase apparently. Because our UCCH scheme can increase capacity of L1 D-cache and L2 cache and decrease miss rate of both L1 D-cache and L2 cache distinctly, it can alleviate the L1-L2-main memory bandwidth demand and improve the performance of MT processor.
Keywords/Search Tags:S-FPC, Compressed Cache, Partial Cache Line Prefetching, Compressed Cache Replacement Policy, SMT, MLRU-C Replacement Policy
PDF Full Text Request
Related items