Font Size: a A A

The Research And Implementation Of High Performance SIMD Floating-point Multiplication Accumulator Unit For FT-XDSP

Posted on:2014-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:T TianFull Text:PDF
GTID:2268330422973747Subject:Software engineering
Abstract/Summary:PDF Full Text Request
FT-XDSP is a high performance general SIMD Digital Signal Processor (DSP)with Very Long Instruction Word (VLIW) architecture, developed by NationalUniversity of Defense Technology (NUDT), and its frequency is1.25GHz. It will bewidely used in high performance computing, wireless communication, video and imageprocessing, etc. one core of FT-XDSP includes50Floating-point fused MultiplyACcumulator (FMAC) units. The peak performance of floating-point arithmetic ofFT-XDSP is determined by FMAC unit.This thesis make a deeply research on the implementation of high performanceSingle Instruction Multiple Data (SIMD) FMAC unit in FT-XDSP to satisfy theperformance demands of applications in wireless communication basestation and highperformance computing. The main work and contribution are as follows:1. A multi-functional high performance FMAC unit is designed and realized basedon the classical structure of low latency FMAC. After the deeply research of the FMACarchitecture, we proposed a SIMD FMAC architecture with six pipeline stage, where thepipeline partion is carefully analyzed to achieve low pipeline stage while satisfying thedelay requirement. The FMAC unit can perform either one double-precision addition,multiplication, fused multiply accumulation, or two parallel single-precision SIMDadditions, multiplications, fused multiply accumulations, or one single-precision dotproduct and complex multiplication. The latencies for multiplication and addition are4cycles and5cycles, respectively and the latency for other operations is6cycles.2. The key modules were reused in the implementation of multi-functions onFMAC unit to reduce area. We analyzed the design of key modules, such as mantissamultiplier, alignment shifter, compound adder, leading zero Anticipation, andnormalization modules. Based on the architecture of double-precision FMAC, these keymodules are reused to implement two parallel single-precision SIMD fused multiplyaccumulator, SIMD addition, dot product and complex multiplication. Meanwhile, toreduce area further, the mantissa multiplier is extended and reused with fixed-pointMAC unit to support64-bit fixed-point multiplication.3. Verification and optimization of multi-function SIMD FMAC unit were donewith NC and RC. We have tested the design of FMAC unit at module level and DSPcore level simulation environment. The experimental results show that the results ofeach FMAC instructions are complied with IEEE-754standard. Meanwhile, we haveoptimized the critical path on FMAC unit according to logical delay optimizing strategy.Finally, we synthesized our FMAC unit with Candence RTL Complier tool in45nmCMOS technology by RC at the condition of Typical. The synthesis results show thatthe largest delay is550ps, power consumption is14.11mW, and cell area is166854 um2. Thus, the timing delay, area and power of the multi-function SIMD FMAC unitwill entirely satisfy the requirements of FT-XDSP.
Keywords/Search Tags:Digital Signal Processor, Double-precision, Floating-point fusedMultiply Accumulator, Single Instruction Multiple Data, Floating-point ComplexMultiplier, Leading Zero Anticipation, Verification
PDF Full Text Request
Related items