Font Size: a A A

The Design And Optimization Of 1GHz Vector Execution Unit

Posted on:2015-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z W MaFull Text:PDF
GTID:2308330479979086Subject:Software engineering
Abstract/Summary:PDF Full Text Request
YHFT-XX is a domestic high-performance Multicore DSP chip, which uses 11-issue Very long instruction word architecture, a single instruction 40 bit or 80 bit. The vector processing unit(PXX) is one of the largest blocks in DSP core. In PXX’s interior contains 16 vector execution unit(PX), PX performance has a direct relationship to the performance of the entire chip. This DSP chip needs to reach the target of speeding up to more than 1GHz, this goal proposed a severe challenge for the design and optimization of PX.Firstly, by analyzing the overall structure of the PX, we determines the hierarchical design method. To decide the best optimization strategies for each module that must be accordance with DC synthesize results, various modules of PX are optimized to effectively reduce the size, lower the power consumption. Finally this module has been achieved the design goals that above 1GHz operating frequency. The main results of this work are as follows :1、ASIC-based automatical synthesize approach got the data in the timing and area of each Sub-module, after a analysis of synthesize results to determine the overall optimization of vector execution unit, different modules using different strategies and different optimization measures. First, the micro-architecture level critical path optimization, then the vector register file(RF) and 64 multiplier parts of multiply-accumulate(MAC) using custom design and optimization, and then the other sub-modules adopt semi-custom design and optimization methods.2、Analysis of various structural optimization method of microarchitecture, and the RF Bypass decoding module, the storage array write decoding module and Bypass arrays were optimized design in this methed. Compared with old design, the new that has been optimized have 15% reduction in the absolute delay, 32 multipliers area decreased by 64%.3、Finished full-custom and semi-custom design mixed optimization to RF module and Based on the data flow driven Manual semi-custom design to 64 multiplier module optimization, as well as semi-custom design of other modules, effectively shortening the design cycle, reducing the area, reducing the power consumption and improved performance. 64 multiplier and the RF full custom modules to meet the design goals timing 1GHz, while the area to meet the design requirements.4 、For the physical design of PX using hierarchical strategy to achieve, complete the integration of the entire physical design.First doing physical design of MAC and RF respectively, Detailed analysis of the Based on the data flow driven Manual semi-custom physical design of the RF Bypass array and mux two to one, and 64-bit multiplier. At last,call these full-custom macro modules and MAC in PX top, complete integration and design optimization of physical design, and achieved remarkable results. In clock cycle 950 ps of constraints, PX two layout programs have to meet the design requirements, reg2 reg path exists about 70 ps margin.After the physical design of the PX whitch can reach frequencies above 1GHz in 40 nm process technology, PX two layouts are designed to meet the design requirements.
Keywords/Search Tags:Synthesize optimization, Micro-architecture optimization, Custom design, Multiplier, Register file, Hierarchical, Manual semi-custom
PDF Full Text Request
Related items