Design And Implementation Of Level One Cache Miss Pipelining On High Performance DSP

Posted on:2010-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Fu

Full Text:PDF

GTID:2178360278457231

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

DSP is a special embedded processor which processes digital signal at high speed and in real time. It is the characteristic of real time DSP that correct computation results should be given in time. The performance of DSP is increasing at the speed of 60% per year, while the speed of data accessing advances no more than 10% per year. The so-called"Memory Wall"is becoming the problem which greatly blocks the development of DSP. How to solve the problem of"Memory Wall"has been a hot research field in the computer architecture domain for a long time.YHFT-DX is a high performance DSP designed by National University of Defense Technology. It adopts VLIW architecture. Direct memory access and high performance external interface is integrated. Its memory system is"on-chip Cache and RAM"structure. The capacities of L1P (Level One Program Cache) and L1D (Level One Data Cache) are both 16 KB. The capacity of L2 (Level Two Cache) is 1 MB. The huge capacity of L2 results in too long cache miss stall of system pipeline. In this paper, the design and implementation of a compatible strategy to optimize the cache system of YHFT-DX is introduced. This strategy will relax the contradiction between the low efficiency of memory system and the high performance of CPU core. The main work and contributions of this paper are as follows:First, after deeply analyzing the traditional methods of reducing cache miss stall penalties such as prefetching and unblocking cache, an applicable strategy called"miss pipelining"is proposed by combining the optimization idea of the two methods to optimize the two level cache system in YHFT-DX, which has a high requirement on real time and low power. In order to further reduce the miss pipeline bubble and improve the efficiency, the strategy of miss pipelining is analyzed and improved.Second, the pipeline structure of L1D and L1P is deeply studied in the paper, and a few strategies are presented to solve the key problems of miss pipelining such as the protocol for sending miss requests, the methods of dealing with related requests, and the branch instruction invalidating miss fetch packets. Simple hardware, good efficiency and wide application are the characteristics of the strategy which has being implemented in L1D and L1P.Third, in order to reach the frequency objective of 600MHz, the whole design is synthesized and optimized. All the critical paths are elimated by using multi-optimization methods such as adjusting logic structure, balancing stage logic structure, full custom and semi-custom combinative design. After optimization, the total delay of the netlist is no more than 1.26ns in the typical condition with the 0.13 um process and the goal of the design is met.At last, the miss pipelining policy is evaluated. Several classical Benchmarks are chosen to test the performance of L1D and L1P. As indicated by the experimental results, the miss pipelining improves the average performance of Cache system up to 25%, and the average performance of the entire system up to 1%.

Keywords/Search Tags:

DSP, miss pipelining, unblocking Cache, data prefetching

PDF Full Text Request

Related items

1	A Quantitative Analysis Of Memory Level Parallelism And Cache Prefetching On Multi-core Processors With Multi-level Caches
2	Memory Access Optimization Of ATLAS On Loongson 2F
3	An Online Cache Miss Rate Curves Generating And Shared Cache Partition Method
4	Research On The Problem Of First Miss In Edge Cache
5	The Design And Implementation Of Non-blocking And Miss-pipeline Global Cache On XDSP Chip
6	Improving memory hierarchy performance with hardware prefetching and cache replacement
7	Web Prefetching And Caching The Integrated Model
8	Cache Of .32-bit Embedded Processor Design
9	A cache-based prefetching memory system for mediaprocessors
10	Study On Local Design Optimization Of High Performance DSP's On-Chip Storage System