Font Size: a A A

Research And Design Of Floatins-Point Accelerator

Posted on:2014-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:J ShenFull Text:PDF
GTID:2248330395989057Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the development of the information society, the advent of applications, such as scientific research, industrial production, artificial intelligence and3D games places stronger demands for the performance of FPUs. Therefore, the low-latency FPUs with increased throughput is the key component of all kinds of CPUs.After researching the architecture and implementation of traditional floating-point adders, multipiers and fused multiply-adders, this paper improves on the typical structure of the FPUs according to the single-cycle accumulation algorithm. An accumulation loop is added to the pipeline which uses carry-save arithmetic with delayed final adder and normalization. This paer implements the accumulater which is capable of finishing the float-point add operation in single cycle. So the throughput of this unit is much increased while executing the dot production of vectors. In order to meet the different application requirements, the unit is designed to support SIMD operations through reusing logic resource. It is compatible with4kinds of operands including double-precision floating-point, dual single-precision floating-point,32-bits signed integer and dual16-bits integer and4kinds of operations including add, multiply, fused multiply-add and continuous multiply-accumulate. The multiplier and leading zero anticipation is optimized to reduce latency and area. Finally, operand isolation and clock gating are applied to reduce power consumption.The verification platform is constructed using SystemVerilog language to generate constrained random test vectors. It checks results automatically so that code coverage is met easily. This paper uses SMIC0.13um process to implement synthesis. The multiply-accumulator can work at400MHz, and the area is equivalent to58.4k NAND gates. The result of gate level simulation shows that the dynamic power is54.8mW which is decreased by24.1%due to low power optimization.
Keywords/Search Tags:FPU, fused multiply-add, multiply-accumulation algorithm, single-cycleaccumulation, SIMD, operand isolation, clock gating
PDF Full Text Request
Related items