Font Size: a A A

Research On Several Frequently-used Algorithms And Their Implementation For Reconfigurable System

Posted on:2009-04-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:S M MuFull Text:PDF
GTID:1118360278456591Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hardware-based implementations of algorithms are desirable, since they can be several orders of magnitudes faster than software-based methods. Reconfigurable devices such as Field Programmable Gate Arrays are ideal candidates for this purpose, because of their high speed, low power and flexibility. With the development of integrated circuits, the density and performance of FPGA grow steadily, which makes it more widely used in applications such as digital signal processing and scientific computing.Aiming at reducing latency, optimizing resource utilization and increasing throughput, in this paper, we make research on such frequently used algorithms and their implementation methods as evaluation of elementary and compound functions, vector rotation and fast Fourier transform on reconfigurable devices. Some effective methods and optimization techniques are proposed based on approximation algorithms, table-based evaluation methods, pipelining and memory access scheduling. Our work includes:1) Research on the evaluation methods of exponential and logarithm function: First, we design a simplified circuit based on CORDIC algorithm to compute exponential function, which fuses the datapath of x and y to reduce the area cost. Second, we propose a unified exponential and logarithm function evaluation algorithm called LnE; after setting initial values and function selection signal properly, the wanted function are computed. LnE algorithm spares z datapath of conventional CORDIC algorithm and saves more than 1/3 of the area estate. Thirdly, we design and implement a fast binary logarithm calculator based on an approximation algorithm to cater the applications with requirement of low accuracy but high speed, with simple and effective correcting circuits to improve accuracy.2) Propose an effective table-based multi-function evaluation algorithm, which uses minimax quadratic approximation on each segment. It chooses different segmentation techniques to tradeoff segment address decoding complexity and memory cost of lookup tables, and uses fixed-point and floating-point datapath to compute nearly linear and highly non-linear functions respectively. Through a three-pass iterative approximation process, the algorithm takes into account the effect of rounding the polynomial coefficients to a finite size, allowing for a further reduction in the size of lookup tables to be used. Through accuracy controlling and intermediate results truncation, the area and delay of the system are reduced. Finally a multi-function system is realized by skillfully allocating memory space to different lookup tables.3)There are two types of vector rotation applications, one with random and unknown rotation angles, and the other with pre-known and limited rotation angles. Based on such two kinds of applications, we propose 2S-PCS and FFT CORDIC algorithms to reduce iteration numbers and area occupation. Making use of lookup tables to support computing, these two algorithms reduce the iteration number and computing delay greatly while not compromising the ease of scale factor computing and compensation as that in conventional CORDIC algorithm. When using 28-bit datapath, 2S-PCS algorithm requires 38% less pipeline stages and 27.9% less area consumption compared to those of conventional CORDIC algorithm, while achieving 3 more binary bits accuracy, which shows great performance superiority.We still make some improvements on conventional CORDIC algorithm based on two special FFT applications.4) Based on radix-2 time decimation FFT algorithms, we design a variable-length fixed-point FFT processor based on cascaded butterflies. Furthermore, we analyze the read-after-write dependency in floating-point FFT processors, give some proposals to reduce it, and realize a 32-bit IEEE 754 single-precision FFT processor with modified structure of butterfly units and optimized RAM access, which makes it possible to read and write two complex operands simultaneously per cycle, and double the throughput of traditional FFT processors.It should be mentioned that the implementation methods of algorithms discussed in this paper are general and not limited to reconfigurable hardware; they can also be used to guide ASIC design by making some modifications. It's desirable to integrate these algorithms into larger applications to achieve higher acceleration ratios. If just implementing these algorithms alone in hardware, we should also consider the effect of communication overhead between hardware and software.
Keywords/Search Tags:FPGA, Elementary function, CORDIC algorithm, Function evaluation, Lookup table, Polynomial approximation, Vector rotation, Fast Fourier transform
PDF Full Text Request
Related items