Font Size: a A A

Design And Verification Of SIMD Structure FMAC Unit In A 64-bit High Performance X-DSP

Posted on:2015-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhaoFull Text:PDF
GTID:2308330479479185Subject:Software engineering
Abstract/Summary:PDF Full Text Request
X-DSP, developed by our school, is a 64-bit high performance General-Purpose DSP chip and uses 11 emissions with Very Long Instruction Word(VLIW) architecture. Floating-point multiply add fused unit(FMAC), is a key function unit in X-DSP and its performance directly determines the peak performance of X-DSP. However, it’s a challenging work to design a FMAC with low power consumption, small area, and high performance.This thesis is to implement the optimization design and functional verification of FMAC and makes the following contributions:(1) According to the performance requirement of X-DSP, the channel separation structure of FMAC is proposed. This unit can calculated 64-bit double precision floating-point operations and 32-bit Single Instruction Multiple Data(SIMD) single precision floating-point operations, such as multiplication, addition, fused multiply add, dot product and complex multiplication. Three FMAC structures(non-fused FMAC, non-channel-separation FMAC, and channel-separation FMAC) are presented. The latency of non-fused FMAC structure is long due to sequence execution of multiply and addition. In non-channel- separation FMAC structure, most logic units are reused for single-precision and double-precision operations, which leads to the more complex algorithms, longer latency, and larger hardware area for dot product and complex multiplication operations. According to the advantages and disadvantages of the above two kinds of structure, the channel-separation FMAC is proposed to balance the latency, hardware area, and power consumption.(2) Optimization design of channel-separation FMAC and module reused design for multiple operations are presented. First, single-precition channel and double-precision channel of FMAC is designed to optimize the critical path and reduction area. In double-precision channel, fused multiply add(A*B+C) scheme, where the operand C is treated as a part product of partial product compression in the mantissa multiplication, is used to shorten the addition latency. In single-precition channel, the simple implementation algorithms of dot product and complex multiplication are employd to reduce the area and latecncy. Second, the optimal mantissa multiplier structure is presented. Four 32*32 mutiplier are used to implement the mantissa multiplication in double-precision multiplication, fused multiply add and single-precision multiplication, fused multiply add, dot product and complex multiplication. Moreover, the double-precision and single-precision channel are reused to calculate the real part and imaginary part of complex multiplication, respectively. Third, based on the channel-separation FMAC, the alignment shifter, mantissa addition and normalization modules are reused to implement double-precision and SIMD single-precision floating-point addition with 5 cycle pipeline.(3) The omnifaceted functional verification of FMAC is performed. According to the implementation algorithms of 12 instructions in FMAC, the golden models with C language are built, which simulated the hardware execution with 4 kind of rounding modes. These golden models are also used in the result comparison of functional verification and as reference model in the formalization verification from module level to system level. Finally, the coverage of FMAC is analysised...
Keywords/Search Tags:Channel Separation Structure, Floating-point Multiply Add Fused Unit(FMAC), Channel Multiplexing, Complex Multiplication, Golden Model, Functional Verification
PDF Full Text Request
Related items