The design of an IC half precision floating point arithmetic logic unit

Posted on:2010-06-16

Degree:M.S

Type:Thesis

University:Clemson University

Candidate:Kannan, Balaji Navalpakkam

Full Text:PDF

GTID:2448390002985977

Subject:Engineering

Abstract/Summary:

A 16 bit floating point (FP) Arithmetic Logic Unit (ALU) was designed and implemented in 0.35im CMOS technology. Typical uses of the 16 bit FP ALU include graphics processors and embedded multimedia applications.;The ALU of the modern microprocessors use a fused multiply add (FMA) design technique. An advantage of the FMA is to remove the need for a comparator which is required for a normal FP adder. The FMA consists of a multiplier, shifters, adders and rounding circuit. A fast multiplier based on the Wallace tree configuration was designed. The number of partial products was greatly reduced by the use of the modified booth encoder. The Wallace tree was chosen to reduce the number of reduction layers of partial products. The multiplier also involved the design of a pass transistor based 4:2 compressor. The average delay of the pass transistor based compressor was 55ps and was found to be 7 times faster than the full adder based 4:2 compressor. The shifters consist of separate left and right shifters using multiplexers. The shift amount is calculated using the exponents of the three operands.;The addition operation is implemented using a carry skip adder (CSK). The average delay of the CSK was 1.05ns and was slower than the carry look ahead adder by about 400ps. The advantages of the CSK are reduced power, gate count and area when compared to the similar sized carry look ahead adder. The adder computes the addition of the multiplier result and the shifted value of the addend.;In most modern computers, division is performed using software thereby eliminating the need for a separate hardware unit. FMA hardware unit was utilized to perform FP division. The FP divider uses the Newton Raphson algorithm to solve division by iteration. The initial approximated value with five bit accuracy was assumed to be pre-stored in cache memory and a separate clock cycle for cache read was assumed before the start of the FP division operation. In order to significantly reduce the area of the design, only one multiplier was used. Rounding to nearest technique was implemented using an 11 bit variable CSK adder. This is the best rounding technique when compared to other rounding techniques. In both the FMA and division, rounding was performed after the computation of the final result during the last clock cycle of operation.;Testability analysis is performed for the multiplier which is the most complex and critical part of the FP ALU. The specific aim of testability was to ensure the correct operation of the multiplier and thus guarantee the correctness of the FMA circuit at the layout stage. The multiplier's output was tested by identifying the minimal number of input vectors which toggle the inputs of the 4:2 compressors of the multiplier. The test vectors were identified in a semi automated manner using Perl scripting language. The multiplier was tested with a test set of thirty one vectors. The fault coverage of the multiplier was found to be 90.09%.;The layout was implemented using IC station of Mentor Graphics CAD tool and resulted in a chip area of 1.96mm2. The specifications for basic arithmetic operations were met successfully. FP Division operation was completed within six clock cycles. The other arithmetic operations like FMA, FP addition, FP subtraction and FP multiplication were completed within three clock cycles.

Keywords/Search Tags:

Arithmetic, FMA, Unit, ALU, FP division, Multiplier, Operation, Clock

Related items

1	Clock multiplier unit and clock data recovery circuit for 10Gb/s broadband communication in 0.18mum CMOS
2	Design And Implementation Of The Processor Integer Arithmetic Unit Based On PowerPC
3	The Full-Customed Design And Optimization Of Arithmetic Unit On High Performance DSP
4	Research On Arithmetic Unit And Key Technology In Residue Number System
5	Fixed Point Division Device And Vector Alu&shifter Design
6	Design Of Majority Logic(ML) Based Arithmetic Operation Unit
7	Research On Key Technologies Of VLSI Implementation Of Adaptive Filtering Algorithm
8	A Large Width Arithmetic Multiplier Design With 256-bit Based On Toom-Cook-4 Algorithm
9	Design And Implementation Of Double-precision 64-bit Floating-point Division Operation Unit
10	Key Technologies Of VLSI Implementation Of High Performance Floating-point Arithmetic Unit