Font Size: a A A

Timing Optimization For L2 Cache Of High Performance DSP Core

Posted on:2016-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:H L RenFull Text:PDF
GTID:2348330509460932Subject:Software engineering
Abstract/Summary:PDF Full Text Request
YHFT-XX is a high-performance eight-core DSP(Digital Signal Processor) chip, which is independent research and development by Graduate School of National University of Defense Technology, the performance requirement of the core is to obtain 1GHz High- frequency design goal at Worst Corner. As the cache storage center of the core, L2 Cache accounted for about one half of the total core area, Therefore, this module will seriously affects timing closure of the entire DSP core. T his paper will focus on the timing optimization during physical design for L2 Cache in YHFT-XX DSP core, and the main research contents are divided into three points:1) Curing the databanks in L2 Cache according to the characteristics of its' area overhead and numerous SRAMs. This databanks can be cured from three aspects: Firstly, adjust the placement structures of the databanks during placement stage, comparing rectangular place scheme with lateral concave place scheme. Experimental results show that the lateral concave placement has advantages in distribution of routing resource and optimization of the critical path. Secondly, compared with auto-clock tree synthesis, manual clock tree design and multi-clock source design, which is the three methods of clock tree structures. The results of analysis reclaim that the multi-clock source design method has simplest clock structure and lowest routing resource, it also has the ability to balance the relationship between setup time and hold time of the macro module. Thirdly, registers replacement. According to the characteristics of the output data regularization, replace the 16 bit registers by pulse trigger of 16 bit, making the internal reg2 reg path timing optimized 12.2% utilize its negative setup time, the absolute path delay of the reg2 out shortened 40 ps, and the total power consumption reduced 12%.2) Analyzing the pipeline structure of RTL codes, in consideration that a large number of registers in L2 Cache Controller. According to the results of the analysis, manually place the macro blocks and registers with big bit-width, and puts part of register groups in reg2 out close to output ports. There also has several register arrays in the L2 Cache Controller, according to delay model, these parts obtain netlist utilize the method of the circuit design, and combined with manual and automatic to place register arrays. Compared with the traditional automatic timing optimization, this optimization method makes the critical path optimization by 35.6%, the total violation path number decreased by 22%.3) Using the useful clock skew to continue to optimize the timing, after routing there are still part of the timing violations for the L2 Cache. Implementing the auto-compensation algorithm of the clock skew by TCL language, this algorithm is mainly to increase the positive clock skew(or decrease the negative clock skew) of the current critical path. By the statistical analysis of the current critical path and the next stack timing, inserting buffer in the common clock path of the critical path, and increasing the clock path delay. The result of application shows that this algorithm makes the timing of critical path optimized by 15.8%, the total number of the violation path is reduced by 33.5%.
Keywords/Search Tags:Physical Design, Timing Optimization, Clock Ske w, Circuit Design, Pulse Trigger
PDF Full Text Request
Related items