Research And Design Of Floatins-Point Accelerator

Posted on:2014-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Shen

Full Text:PDF

GTID:2248330395989057

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

With the development of the information society, the advent of applications, such as scientific research, industrial production, artificial intelligence and3D games places stronger demands for the performance of FPUs. Therefore, the low-latency FPUs with increased throughput is the key component of all kinds of CPUs.After researching the architecture and implementation of traditional floating-point adders, multipiers and fused multiply-adders, this paper improves on the typical structure of the FPUs according to the single-cycle accumulation algorithm. An accumulation loop is added to the pipeline which uses carry-save arithmetic with delayed final adder and normalization. This paer implements the accumulater which is capable of finishing the float-point add operation in single cycle. So the throughput of this unit is much increased while executing the dot production of vectors. In order to meet the different application requirements, the unit is designed to support SIMD operations through reusing logic resource. It is compatible with4kinds of operands including double-precision floating-point, dual single-precision floating-point,32-bits signed integer and dual16-bits integer and4kinds of operations including add, multiply, fused multiply-add and continuous multiply-accumulate. The multiplier and leading zero anticipation is optimized to reduce latency and area. Finally, operand isolation and clock gating are applied to reduce power consumption.The verification platform is constructed using SystemVerilog language to generate constrained random test vectors. It checks results automatically so that code coverage is met easily. This paper uses SMIC0.13um process to implement synthesis. The multiply-accumulator can work at400MHz, and the area is equivalent to58.4k NAND gates. The result of gate level simulation shows that the dynamic power is54.8mW which is decreased by24.1%due to low power optimization.

Keywords/Search Tags:

FPU, fused multiply-add, multiply-accumulation algorithm, single-cycleaccumulation, SIMD, operand isolation, clock gating

PDF Full Text Request

Related items

1	The Design Of Floating-Point Multiply-Add Fused Units In General Purpose Processors
2	Research On Floating-point Multiply-add Fused Units And The Algorithm Based On FPGA
3	The Design And Implementation Of Multiple-precision Floating-point Multiply-Add Fused Unit
4	The Design And Implement Of Floating-point Fused-multiply-add Unit For High-performance Microprocessor
5	Design And Implementation Of The Low-Power DSP Multiply-Add-Fused Unit
6	The Research And Development Of The High Performance Floating-Point Multiply-Add-Fused Unit
7	Design Of 128 Bit SIMD Arithmetic Unit Based On Subword Parallel Technology
8	The Research And Implementation Of Double Data-path Fused Floating Point Multiply-Add Supporting Parrall Multiply
9	The Design And Implementation Of High-performance64Bit Fixed-point SIMD Multiply Accumulate For FT-XDSP
10	Soc Low Power Design Methodology Research