Font Size: a A A

Research And Implementation Of The Division Unit And Elementary Function In X-DSP

Posted on:2015-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y DengFull Text:PDF
GTID:2308330479479173Subject:Software engineering
Abstract/Summary:PDF Full Text Request
X-DSP is an independently-developed 64 byte DSP with 1GHz clock speed, which adopts VLIW technique and can dispatch 11 instructions at a time. Based on research and development of the X-DSP kernel, we accomplished the design, verification and optimization work of the high-performance division unit in the kernel, as well as the timing optimization work of the whole kernel. Detailed work is showed as follows.1. We design the overall structure and instruction set of division unit based on SRT-8 arithmetic, implementing double-precision floating point division and SIMD single-precision floating point division on a same hardware structure in parallel. To solve the problem that traditional iteration cycle is long which leads to the increment of hardware complexity, we use iteration-cutting method to cut iteration process into three instructions(FSRTD instruction for double precision, FSRT8S32 for single precision) and design standard floating instructions relevantly.2. According to the existing mature ASIC verification method, we design verification scheme for division unit and prepare experiment environment for simulation work, including module- level verification, system verification and fault coverage analysis. Based on simulation verification, we use simulation- format hybrid method to locate bug quickly and reduce iteration period. What’s more, we comprehensive the test set to improve the verify coverage.3. After synthesizing the division unit and analyzing its synthesis report, we can optimize the timing of modules on critic path. At the same time, considering the actual situation, we optimize timing and area of the overall division unit and the improvement is that in 45 nm technology, delay on critic path has reduced 100 ps, performance has improved 18.2%, the performance and area of the chip both has met the requirements. Last but not least, combining with the physical design, we optimize timing of the whole kernel so that the frequency can be up to 1GHz.4. Analyzing different improved methods of CORDIC algorithm, a multifunctional hardware circuit used for calculating trigonometric function and exponential function is proposed which based on the low- latency CORDIC algorithm using CSA. As the experiment result indicates, apart from meeting high precision demand, compared with tradition design, it has following advantages: 5 times performance, area reduced by 20.3%, delay reduced by 44.8% to Parallel CORDIC, 52.6% to Radix-4 CORDIC, 31.6% to Redundant CORDIC.
Keywords/Search Tags:Digital Signal Processor, SRT, Division Cell, Verification, Optimization, CORDIC, Elementary Function
PDF Full Text Request
Related items