Design And Verification Of SIMD Structure FMAC Unit In A 64-bit High Performance X-DSP

Posted on:2015-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:R Zhao

Full Text:PDF

GTID:2308330479479185

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

X-DSP, developed by our school, is a 64-bit high performance General-Purpose DSP chip and uses 11 emissions with Very Long Instruction Word(VLIW) architecture. Floating-point multiply add fused unit(FMAC), is a key function unit in X-DSP and its performance directly determines the peak performance of X-DSP. However, it’s a challenging work to design a FMAC with low power consumption, small area, and high performance.This thesis is to implement the optimization design and functional verification of FMAC and makes the following contributions:(1) According to the performance requirement of X-DSP, the channel separation structure of FMAC is proposed. This unit can calculated 64-bit double precision floating-point operations and 32-bit Single Instruction Multiple Data(SIMD) single precision floating-point operations, such as multiplication, addition, fused multiply add, dot product and complex multiplication. Three FMAC structures(non-fused FMAC, non-channel-separation FMAC, and channel-separation FMAC) are presented. The latency of non-fused FMAC structure is long due to sequence execution of multiply and addition. In non-channel- separation FMAC structure, most logic units are reused for single-precision and double-precision operations, which leads to the more complex algorithms, longer latency, and larger hardware area for dot product and complex multiplication operations. According to the advantages and disadvantages of the above two kinds of structure, the channel-separation FMAC is proposed to balance the latency, hardware area, and power consumption.(2) Optimization design of channel-separation FMAC and module reused design for multiple operations are presented. First, single-precition channel and double-precision channel of FMAC is designed to optimize the critical path and reduction area. In double-precision channel, fused multiply add(A*B+C) scheme, where the operand C is treated as a part product of partial product compression in the mantissa multiplication, is used to shorten the addition latency. In single-precition channel, the simple implementation algorithms of dot product and complex multiplication are employd to reduce the area and latecncy. Second, the optimal mantissa multiplier structure is presented. Four 32*32 mutiplier are used to implement the mantissa multiplication in double-precision multiplication, fused multiply add and single-precision multiplication, fused multiply add, dot product and complex multiplication. Moreover, the double-precision and single-precision channel are reused to calculate the real part and imaginary part of complex multiplication, respectively. Third, based on the channel-separation FMAC, the alignment shifter, mantissa addition and normalization modules are reused to implement double-precision and SIMD single-precision floating-point addition with 5 cycle pipeline.(3) The omnifaceted functional verification of FMAC is performed. According to the implementation algorithms of 12 instructions in FMAC, the golden models with C language are built, which simulated the hardware execution with 4 kind of rounding modes. These golden models are also used in the result comparison of functional verification and as reference model in the formalization verification from module level to system level. Finally, the coverage of FMAC is analysised...

Keywords/Search Tags:

Channel Separation Structure, Floating-point Multiply Add Fused Unit(FMAC), Channel Multiplexing, Complex Multiplication, Golden Model, Functional Verification

PDF Full Text Request

Related items

1	Research And Realization Of The 128-bit Floating-Point Multiply-Add Fused Unit
2	The Design And Implement Of Floating-point Fused-multiply-add Unit For High-performance Microprocessor
3	Design Optimization And Verification Of Floating Point Units Based On BOOM
4	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
5	Research On Floating-point Multiply-add Fused Units And The Algorithm Based On FPGA
6	Research On Floating Point Multiply Add Unit Of High Performance Microprocessor
7	The Research And Implementation Of High Performance Vector FMAC Unit For LTE
8	Research And Optimization On Low Power Floating Point Multiply ADD Fused Unit
9	The Design And Implementation Of Multiple-precision Floating-point Multiply-Add Fused Unit
10	The Research And Development Of The High Performance Floating-Point Multiply-Add-Fused Unit