Font Size: a A A

Research On High Performance Floating Point Unit

Posted on:2011-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:K W FuFull Text:PDF
GTID:2178360302483145Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
As embedded applications have increased in their floating-point computational complexity, high performance low power floating-point unit (FPU) became the key component of embedded processor. This thesis focused on the architecture design of the floating-point arithmetic unit, analyzed some key techniques to improve FPU's computing performance, reduce hardware cost and power dissipation. The original contributions of this thesis are as follows:1. Floating point SIMD instruction set extension and efficient resource reused hardware frame. Designed a floating point SIMD instruction set for parallel processing applications and proposed a hardware reused technique for floating point SIMD execution unit's design. In this technique, the data path for double-precision instructions is divided into two independent ways. By adding few control logic, SIMD instruction can efficiently reuse the higher way and the lower way of double-precision data path. FPU's performance is significantly improved at a cost of little extra control logic.2. Unified SRT algorithm for floating point division and square root. Proposed the strategy of how to choose SRT algorithm's key parameters, which will greatly affect the algorithm's performance and hardware cost. In this thesis, we designed a unified SRT selection function for both division and square root by transforming square root's boundary value equation and pre-processing the operands. To reduce cost and power of selection function, a constant comparison and decoding based technique were proposed. Also, we adopted on-the-fly conversion mechanism for quotient/root accumulation, which converted the complicated add operation into simple shift and OR operation. To accelerate computing speed, a prediction based method was proposed. The iteration time of SRT algorithm can be controlled by the prediction result.3. Fast rounding mechanism for floating-point addition and division/square root. In floating-point addition's rounding logic, the addition computation in fraction's twos complement transformation and the addition computation in rounding are merged, thus reducing the latency of rounding logic and saving hardware cost. Also, the thesis proposed an on-the-fly rounding mechanism for division and square root to resolve the critical path. It made use of SRT algorithm's iteration process to generate all possible rounding results for rounding stage without using any adders. To complete the rounding, the rounding stage only needs to select the right value.4. Fine-grained clock gating technique based on floating point operating characteristic. A precision based clock gating method was proposed. When a single-precision operation was in process, the lower parts of data path which were idle will be shut down to save the unnecessary power dissipation. An exception prediction based clock gating mechanism was also proposed. It predicted whether the current instruction will raise an exception. If the prediction result is yes, all clocks of data path will be turned off. So zero switching power was consumed. Moreover, this thesis proposed a result prediction based clock gating technique. It will predict the computation result when the operands are zero or infinity. Then the whole data path is closed, resulting in zero dynamic power.
Keywords/Search Tags:Floating Point Unit, Floating Point SIMD Instructions, Hardware Reused, Division and Square Root, SRT Algorithm, Selection Function, Rounding Mechanism, Clock Gating
PDF Full Text Request
Related items