Key Technologies Of VLSI Implementation Of High Performance Floating-point Arithmetic Unit

Posted on:2017-03-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:D Liu

Full Text:PDF

GTID:1318330536981055

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

Arithmetic operation is divided into two catagories: fixed and floating point arithmetic,and the later can represent higher precision number and has wider range.Since the variables in the natural world are almost real numbers,the floating point number can emulate and approximate the natural variables more appropriately.Many CPUs and GPUs begin to have multi-precision floating point units integrated on them to support SIMD instruction.These units can perform a single high-precision or several low-precision arithmetic operations.Some CPU and GPU chips also integrate fused floating point arithmetic unit to decrease the accumulative errors of complicated arithmetic operation which is consist of symple operations.By decomposing and reorganizing the complicated arithmetic operations in cicuit structure,the fused floating point arithmetic unit not only decreasses the accumulative erros,but also increases the computational speed.Based on the IEEE standard of floating point number and the basic algorithm of binary floating point arithmetic,and using the latency of circuit as the evaulation standard,floating point adder,multiplier,multi-precision arithmetic units,fuesd arithmetic units and divide/square-root unit are researched in this dissertation.The research focus is on the architecture and circuit structure of these floating point arithmetic units.Firstly the algorithm and circuit structure of floating point adder is researched.There are two floating point addition algorithms: single-path and dual-path algorithm.This dissertation uses dual-path algorithm as the basic algorithm to implement floating point adder.In the optimization of circuit structue,we propose a new path dividing method which removes the rounding logic from the close path to reduce the area and delay.In far path,with the injection rounding method,this dissertation proposes a novel dual-way rounding structure to further reduce the latency of circuit of far path.Compared to the basic dual-path algorithm,the performance of the proposed adder is promoted by 5%.Based on the proposed architecture,this dissertation also proposes a triple-mode floating point adder which supports one quadruple or two double or four single-precision addition/subtraction.The performance of the proposed design is promoted by 8%.Secondly the algorithm and circuit structure of floating point multiplier is researched.The alogrithm of floating point multipication is simpler than that of addition: adding the exponents and multiplying the mantissas of two floating point operands.In floating point arithmetic operation,how to reduce the delay of rounding circuit is always the bottle-neck.This dissertation proposes a novel dual-way rounding structure and a sticky bit predicting algorithm which simultaneously computes the sticky bit and summation of mantissa partial products.The latency is reduced by 16%.Based on the basic structure of the proposed multiplier,a dual-mode floating point multiplier is proposed.The dual-mode floating point multiplier can perform one high-precision or two low-precision floating point multiplications.The proposed dual-mode multiplier saves roughly 13% delay.Thirdly the circuit structure of floating point fused add-subtract unit(FAS),fused two-item dot-product unit(FDP)and fused radix-2 butterfly unit is researched.Generally fused arithmetic unti is used to accomplish complicated operations.(1)The proposed FAS unit can simultaneously perform addition and subtraction of two floating point operands.The rounding circuit of the FAS unit is simillar to that used in the proposed floating point adder.The FAS unit is implemented with dual-path algorithm and an extra adder is introduced in the far path to accomplish the summation or difference of mantissas.By sharing the circuit resources,the exponent adjusting and mantissa addition/subtraction are completed in far path.By the above optimizing approachs,the delay of the proposed FAS is reduced by 32%.(2)The FDP unit reduces rounding error and circuit delay by fusing the addition of partial products of two mantissas and the final addition.The proposed FDP unit can compute the dot-product of two floating point operands and is implemented with dual-path algorithm.Dual-way rounding structure based on injection rounding algorithm is used to further reduce the delay.Compared to the related works,the performance of the proposed FDP is increased by 24%.(3)Finally,a radix-2 butterfly unit which is widely used in FFT is constructed with the proposed FAS and FDP2 unit,and it has 24% higher performance over the previous works.Finally the algorithm and circuit structure of floating point division/square-root arithmetic unit is researched.The algorithm of division/square-root includes subtraction-based SRT algorithm,multiply-add-based iteration algorithm.The division/square-root unti implemented with SRT algorithm consumes more cycles than that of iteration algorithm and has low converging speed.So the proposed division/square-root unit in this dissertation is implemented with iteration algorithm.The proposed division/square-root unit can perform floating point division or square-root operation by configuring the mode-controlling signal.Compared to the current research works,the proposed division/square-root unit reduces the cycles by 8.8% and latency by 7%.

Keywords/Search Tags:

floating point arithmetic unit, floating point fused arithmetic unit, multi-mode floating point arithmetic unit, floating point ALU

PDF Full Text Request

Related items

1	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
2	Hardware Design And Implementation Of Floating-point Instruction Based On AltiVec
3	High-performance Floating-Point Unit Design
4	Implementation Of RISC-V Floating Point Instructions Based On ShenWei Architecture
5	The Architecture And Implementation Of Arithmetic Clusters Based On Stream Applications
6	Analysis And Design Of High-performance Floating-Point Unit
7	Design Of Double Precision Floating Point Unit
8	Implementation And Optimization Of High-Performance Floating-Point Unit In X Processor
9	Research On High Performance Floating Point Unit
10	The Design Of. P1750a Floating-point Implementation Of The Components, And Achieve