Research On High Performance Floating Point Unit

Posted on:2011-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:K W Fu

Full Text:PDF

GTID:2178360302483145

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

As embedded applications have increased in their floating-point computational complexity, high performance low power floating-point unit (FPU) became the key component of embedded processor. This thesis focused on the architecture design of the floating-point arithmetic unit, analyzed some key techniques to improve FPU's computing performance, reduce hardware cost and power dissipation. The original contributions of this thesis are as follows:1. Floating point SIMD instruction set extension and efficient resource reused hardware frame. Designed a floating point SIMD instruction set for parallel processing applications and proposed a hardware reused technique for floating point SIMD execution unit's design. In this technique, the data path for double-precision instructions is divided into two independent ways. By adding few control logic, SIMD instruction can efficiently reuse the higher way and the lower way of double-precision data path. FPU's performance is significantly improved at a cost of little extra control logic.2. Unified SRT algorithm for floating point division and square root. Proposed the strategy of how to choose SRT algorithm's key parameters, which will greatly affect the algorithm's performance and hardware cost. In this thesis, we designed a unified SRT selection function for both division and square root by transforming square root's boundary value equation and pre-processing the operands. To reduce cost and power of selection function, a constant comparison and decoding based technique were proposed. Also, we adopted on-the-fly conversion mechanism for quotient/root accumulation, which converted the complicated add operation into simple shift and OR operation. To accelerate computing speed, a prediction based method was proposed. The iteration time of SRT algorithm can be controlled by the prediction result.3. Fast rounding mechanism for floating-point addition and division/square root. In floating-point addition's rounding logic, the addition computation in fraction's twos complement transformation and the addition computation in rounding are merged, thus reducing the latency of rounding logic and saving hardware cost. Also, the thesis proposed an on-the-fly rounding mechanism for division and square root to resolve the critical path. It made use of SRT algorithm's iteration process to generate all possible rounding results for rounding stage without using any adders. To complete the rounding, the rounding stage only needs to select the right value.4. Fine-grained clock gating technique based on floating point operating characteristic. A precision based clock gating method was proposed. When a single-precision operation was in process, the lower parts of data path which were idle will be shut down to save the unnecessary power dissipation. An exception prediction based clock gating mechanism was also proposed. It predicted whether the current instruction will raise an exception. If the prediction result is yes, all clocks of data path will be turned off. So zero switching power was consumed. Moreover, this thesis proposed a result prediction based clock gating technique. It will predict the computation result when the operands are zero or infinity. Then the whole data path is closed, resulting in zero dynamic power.

Keywords/Search Tags:

Floating Point Unit, Floating Point SIMD Instructions, Hardware Reused, Division and Square Root, SRT Algorithm, Selection Function, Rounding Mechanism, Clock Gating

PDF Full Text Request

Related items

1	Research And Implement Of Floating-point Division And Square Root Unit Based On Unified Structure
2	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
3	Realization Of Adaptive Floating-point Multiplication, Division And Square Root Unit For Single, Double And Extended Precision
4	Design And Implementation Of High-performance Floating-point Division And Square Root
5	Research And Design Of High Precision And High-performance Floating-point Division And Square Root Unit
6	Verification Of Processor Floating-point Division/Square Root Unit Based On UVM
7	The Research And Implement Of The High Performance Floating-Point Multiply, Add Unit
8	The Research And Implementation Of The Floating-Point Divider Unit In X Processor
9	Design And Implementation Of FPU In X Microprocessor
10	Implementation Of RISC-V Floating Point Instructions Based On ShenWei Architecture