Font Size: a A A

Physical Design And Optimization Of Multiplication Unit In 40 Nanometer Process

Posted on:2016-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:C BaiFull Text:PDF
GTID:2348330509460516Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The speed of multiplication unit directly affects the overall performance of CPU kernel data path. The physical design and implementation of multiplication unit with high performance and low power has become one of the main difficult problems nowadays. Considering the cost of chip design and its integral performance comprehensively, we need to complete the physical design in a limited area, this will result in bigger skew of clock network and higher overall density, and then affect the timing and power consumption of the design. In view of the above problems and aiming at the performance optimization of multiplication unit in X-DSP CPU kernel, this paper studies physical design flow in detail from 4 aspects: improving timing, reducing power consumption, timing analysis and equivalent verification. We propose some optimization methods and make an application in the physical design of multiplication unit. The main research work in this paper includes the following aspects:1) Floorplan plays an important role in physical design, its quality directly affects the performance of the chip. We did the floorplan of the top CPU data path by using the hierarchical physical design method, found out two kinds of floorplan and made a comparision according to the connecting relations of modules, the results show that the improved floorplan can be optimized 9% in timing. Determining the floorplan of multiplication unit by the floorplan of CPU data path, after detailed analysis of the multiplication unit hierarchy and numerous iteration, the improved floorplan can be optimized 5% in timing.2) About clock tree, the main target of the clock network is to reduce the clock latency and clock skew. The original clock skew of multiplication unit is 47.2ps and clock delay is 304.7ps. Optimizing the clock network by controlling the clock-driving unit, maximum fanout and maximum level, the clock skew and the clock latency decrease to 35.5ps and 260.6 ~ 296.1ps respectively. On the basis of the above, we optimized the clock network by controlling the clock routing, the clock skew and the clock latency have decreased to 27.9ps and 232.3 ~ 260.2ps respectively.3) The chip power consumption has become as important as chip speed and chip area of performance indicators. Realizing the insertion of clock gating automatically to reduce the dynamic power consumption, the optimized dynamic power consumption was reduced by 62.7%. Optimizing the static power by reducing the density, downsize and multi-threshold unit replacement, the improved static power consumption was reduced by 9%.4) About the static timing analysis, due to the shortages of single-mode single-corner timing analysis, this paper discusses the mulit-mode mulit-corner timing analysis from three aspects: the corner composition, analysis model and analysis flow. We did the mulit-mode mulit-corner timing analysis based on multiplication unit, and applied ice tools to optimize timing and meet the requirement.5) By studying formal verification technology based on multiplication unit, this paper did the formal verification with Formality. Analyzing the problems occurred in the process of verification such as clock gating, scan chain and so on, we proposed the corresponding solutions, and passed the verification finally.
Keywords/Search Tags:Multiplication Unit, Floorplan Optimization, Power Optimization, Static Timing Analysis, Formal Verification
PDF Full Text Request
Related items