Font Size: a A A

The Research And Development Of The High Performance Floating-Point Multiply-Add-Fused Unit

Posted on:2007-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:E K MaoFull Text:PDF
GTID:2178360215970163Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Algorithms of floating-point Multiply-Add-Fused (MAF) are very complex, so its latency of the circuit-implement is especially long, and its amount of logic gates is large. These make MAF to be one bottleneck of high performance processors. Research of MAF has become one of challenges of high frequency processors. High performance MAF, which has been sufficiency verification and accomplished correctness in the function, and which is reusable unit with own intellectuality property, is significance for improving performance and reduction the design-cycle.On the basic of the double floating-point MAF with reduced latency, the following aspects are made an intensive study in this thesis, in order to meet high clock frequency, small area and the Std IEEE 754-85, without interruptions and Software Assistance:1. At Algorithms, the algorithms of MAF with reduced latency are improved to allow denormal. A method of rounding sign integer is proposed.2. On the architecture, it consists of eight stages for full-pipelining. A 64-bit multiplication, a sign-detection, and a Leading-Zero-Anticipator (LZA) are designed. A simplified circuit of rounding and an architecture allowing denormal are proposed.3. By sharing hardware, the proposed MAF implements floating-point operations including multiply-add, normalization, convert between floating-point and integer, etc. Additionally, a new instruction is implemented to fetch fraction of floating-point digital.4. The design has passed the tests including the Std IEEE CC754 test vectors, special operands and boundary conditions of per instruction, random data, transcend functions and system programs. The consistency of behavioral level versus RTL level description has also been verified.5. The critical paths of the MAF, such as 64-bit multiplier, alignment shifter, LZA, sign detection, etc, are optimized by full-custom design.The software IP core has been completed, including the behavioral level model, test-vector sets with high error coverage, instruction sets description and synthesizable code with high performance. The synthesis result displays that the clock frequency of the design is over 500MHz. After optimizing the mantissa path by full-custom design, the latency of the MAF is smaller 40% than half-custom design and the clock frequency of the MAF, can be up to 700MHz.
Keywords/Search Tags:Multiply-Add-Fused, Leading-Zero-Anticipator, Normalization, Sticky-bit, Signed-Detector
PDF Full Text Request
Related items