Font Size: a A A

The Design And Implementation Of High Performance Level Two Cache Controller On DSP Chip

Posted on:2009-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2178360242999020Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays the Digital Signal Processor (DSP) has got a lot of development and been widely used. And the "on-chip Cache and RAM" structure is becoming an indispensable technique in the design of the high performance DSP. The design of level two memory (L2) cache controller is a key point in the "on-chip two level Cache and RAM" structure. So it is a good research area that how to design and realize an accurate, efficient and frequency-satisfied L2 cache controller.FT-CXX is a 32-bit fixed-point high performance DSP being designed. Its architecture is very long instruction word (VLIW) and it can issue 8 instructions in a cycle. Its CPU will run at the frequency of 600MHz,and its peripheral equipment will run at 300MHz.The total capability of L2 is one million bytes. We design and realize the L2 cache controller of FT-CXX. The main work and contribution is as follows:First, we roundly review the cache techniques and the requisite performance in the popular DSP. The cache/RAM structure is designed and realized and the data bank, tag bank, and the address accessed rule are fixed. And the associative rules, choosing cache policies, writing policies are fixed and realized, too.Second, facing the fact that the L2 data bank can only run at a half frequency of CPU, we make some methods to reduce the cost of L1 (L1D and L1P) miss: 1), the L1 miss pipeline is designed. Once the pipeline has been totally filled, the increment cost of a new miss averages only 2 cycles. 2), between L1D and L2 we design a L1D write buffer which width is 64-bit and depth is 4. The write buffer allows merging of write requests. It can reduce the write miss cost efficiently. 3), a scheme which could solute the nonaligned access problem is designed. And this scheme, which has little hardware cost, is more efficient and couldn't make much burden to the complier.Third, we also provide a good method for the EDMA (enhanced direct memory access) to access the SRAM of L2. The potential parallelism between the accessing is being made good use of. The method contains supporting the burst access (8 reading burst and 4 writing burst), pipelining the snooping and sending, reducing the times of snooping by recording the snooping history, and reducing the times of accessing the L2 data bank by bypass and merging. The cost of per EDMA access is 2-3 cycles. Compared with the serial access, it has a speedup of 2.0 at least.At last, an efficient memory consistency protocol is also designed and realized. On one hand, various cache operations are provided. On the other hand, different snoopings and different write-backs are handled separately. The cost of some typical requests has been reduced by 10% at least from our experiment.In addition, we also complete the work of verification and synthesis of the L2 cache controller. In the SMIC 0.13μm technology, The design meets the frequency request which is 600MHz in the fast units, 300MHz in the slow units.
Keywords/Search Tags:"Cache and RAM" structure, miss pipeline, write buffer, write merge, nonaligned access, EDMA service, memory consistency
PDF Full Text Request
Related items