Font Size: a A A

The Design Of Floating-Point Multiply-Add Fused Units In General Purpose Processors

Posted on:2006-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X L MeiFull Text:PDF
GTID:2178360185496965Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With development of VLSI technology, the density of a single chip is increasing and the cost of chips is reduced continuously. So, modern processors have the ability to integrate more and more devices, have more complex architectures and perform more powerfully.Floating-point performance is an important measurement for general purpose processors. Floating-point multiply-add fused (FMAF) unit can prominently improve the floating-point performance of microprocessors, which is proved by several recent commercial processors. As a result, high-performance, general purpose processor has a requirement for the FMAF unit.The advantages of FMAF unit include low latency and more precision than individual multiply and add instruction, the low request for bus bandwidth, the reduction of the register file pressure, the increment of instructions throughput. The FMAF unit can also perform individual multiply or add instruction by set one operation to be 1 or 0. Moreover, the FMAF unit have the ability to implement division and square root and compute transcendental functions by software.The structure of traditional FMAF is"multiply– add– normalize– round". Different from traditional one, the proposed FMAF structure is"multiply– normalize– add and round", which reduces the critical path delay and greatly improves the floating-point performance. The disadvantage of this structure is that it needs more devices and the hardware cost is higher than traditional one,Leading-One Prediction (LOP, or named LZA for the short of Leading-Zero Anticipate) is a key logic in the design of FMAF unit. The FMAF unit needs a LOP circuit to deal with 3 operands. However, the traditional LOP arithmetic can not deal with 3 operands directly. While, in an indirect way, the traditional LOP arithmetic will increase the delay of the critical path and enlarge the circuit area. The paper describes the design of a 3-operand leading-one prediction (LOP) logic. It can effectively reduce the critical path delay and area in LOP circuit.Through the FMAF unit, the latency of multiply-add, multiply, add operations are all the same. It's possible that for some special applications the performance of FMAF unit is lower than the one of FADD unit and FMUL unit. The multiplication / addition bypass technology can avoid these cases. The multiplication / addition bypass can reduce the latency of floating-point multiplication / addition effectively through the FMAF unit. With these technologies, the overall floating-point performance of our CPU can be greatly improved.
Keywords/Search Tags:Multiply-Add Fused, Multiply Accumulate, Leading-One Prediction, 3-operand Leading-One Prediction, Multiplication Bypass, Addition Bypass, Double-Datapath
PDF Full Text Request
Related items