Font Size: a A A

High-speed Floating Point Addition Operation Unit

Posted on:2007-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z P JinFull Text:PDF
GTID:2208360182478664Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
A binary floating-point number could represent a number with the high precision as well as the large value. So, some special logic units were used to process floating-point operations in the contemporary microprocessor design. An FPU (Floating-Point Unit) is the principal component in graphics accelerators, DSPs (Digital Signal Processor) and high performance computer systems. The area of chips formerly limited the complexity of FPU, however, as the continuous development of semiconductor technology, the decrease of feature size and the increase of chip area, all the changes cited above have provided a firm basis for the design and implementation of FPU.The hardware implementations of floating-point operations are usually slower than the ones used for fixed-point operations because of the complexity of floating-point digital systems. In the mean time, many application programs, such as scientific computation program, 3D graphic program, digital signal processing, and system performance evaluating program etc., belong to floating-point computation-intensive applications. It is no double that the operational performance of these programs was influenced by the performance of FPU. Therefore, it is necessary to design high performance FPU.This thesis is supported by the National Defense Pre-Research Project of the Chinese "Tenth Five-Year Plan": "Design and Implementation of Application-Specific High Performance Microprocessor" and the NPU Graduate Innovation Fund Project "Design and Implementation of High-Performance Floating-Point Arithmetic Unit". Combined with the R&D of "LongTeng R2" microprocessor, the author analyzes and discusses the floating-point addition which occupies an important position in the high performance floating-point arithmetic field, and also completes the related circuit design.Firstly, this thesis reviews the history of FPU, introduces the development of computer floating-point arithmetic and the typical research achievements in this area all over the world in detail, and illuminates the broad application of FPU and the great significance of the research on floating-point arithmetic.Secondly, this thesis analyzes the theory and operational procedure of floating-point adders, especially discusses the Two-Path algorithm and Combined Rounding Two-Path algorithm. Based on some characteristics of floating-point addition/subtraction operations, these two algorithms parallelize each computational step sufficiently so as to reduce the total latency of the operation processing.Thirdly, based on the statistical analysis of the characteristics and features of float-ing-point operands, we illustrate the distribution rules of the absolute value of exponent difference in floating-point addition/subtraction operations. Based on the idea of Two-Path algorithm, we introduce Triple Data Path floating-point adder architecture. Furthermore, according to the processing characteristics in each of the two data path, this thesis presents Variable Latency Algorithm and implements a One, Two, or Three Cycle Variable Latency Adder. All of these improvements are focused on the low power application as well as reduce the total latency of operational procedure.In the end, as the kernel of high performance floating-point addition unit, a high speed binary adder is proposed for improving the performance of floating-point addition operation. Based on various CMOS technologies: 0.18jLim, 0.15/xm, 0.13/im and 90nm, the performance comparisons among three parallel prefix adders, which can be attractively fast and compact when implemented in VLSI, with different bit widths are made in this thesis. And the adder architecture fit for deep submicron technology is selected according to the impact of connective wires on adder performance. The organization and circuit design of a 64-bit high speed binary parallel adder built in TSMC 2.5V 0.18/wn 1P6M CMOS fabrication technology is presented. Using clock-delayed domino logic, the delay of each stage in the adder is reduced. The addition latency is no more than 668ps with about 4500 transistors integrated into the area of 0.13mm2.
Keywords/Search Tags:Floating-Point Unit, Two-Path Algorithm, Variable Latency, Parallel Prefix Adder, Dynamic Domino Circuit
PDF Full Text Request
Related items