Font Size: a A A

Energy and time efficient designs for digital signal processing kernels on FPGAs

Posted on:2005-08-29Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Choi, SeonilFull Text:PDF
GTID:1458390008995169Subject:Engineering
Abstract/Summary:
Reconfigurable hardware such as FPGAs is flexible alternatives to DSPs or ASICs used in mobile devices, for which energy is a key performance metric. Designs on reconfigurable hardware offer many design parameters such as operating frequency, precision, amount of storage, degree of parallelism, etc. These parameters define a large design space that must be explored to find energy efficient solutions. It is also challenging to predict the energy variation at the early design phases when a design is modified at algorithm level. To address this scenario, a methodology to develop energy efficient designs on FPGAs is proposed. The methodology integrates domain-specific modeling, coarse-grained performance evaluation, design space exploration (DSE), and low-level simulation to understand the trade-offs among energy, latency, and area. The domain-specific modeling technique defines a high-level model by identifying various components and parameters specific to a domain that affect the system-wide energy dissipation. A domain is a family of architectures and corresponding algorithms for a given kernel. The high-level model also consists of functions for estimating energy, latency, and area that facilitate trade-off analysis. This model is used to understand the impact of various parameters on system-wide energy and is a basis for energy efficient designs. DSE analyzes the design space defined by the domain and selects a set of designs. Low-level simulations are used for accurate performance estimation for the designs selected by the DSE and also for final design selection.; The modeling technique and design methodology are applied to three digital signal processing kernels: matrix multiplication, matrix factorization, and Fast Fourier Transforms. The designs identified by our methodology demonstrate trade-offs among energy, latency, and area. Our designs are compared with state-of-the-art designs to demonstrate the effectiveness. As the first kernel, matrix multiplication is considered. From the well-known designs for matrix multiplication, "energy hot spots", which are responsible for most of the energy dissipation, are identified. Then three new algorithms and architectures that offer trade-offs among the number of I/O ports, registers, and PEs are proposed. Functions to represent the impact of algorithm design choices on the energy, area, and latency are derived. These functions are used to either optimize the energy performance or provide trade-offs for a family of candidate algorithms and architectures. As the second kernel, two designs for matrix factorization are proposed. The first design is used for a normal LU factorization. A linear array architecture is employed to minimize the usage of long interconnects, leading to lower energy dissipation. The optimal latency is achieved on the linear array architecture. The second design is used for block-based LU decomposition. The linear array based design for LU decomposition and the design for matrix multiplication kernel are re-used. Through the analysis of design trade-offs, the block size that minimizes the total energy is identified. As the third kernel, energy efficient designs for Fast Fourier Transform are proposed. Architectural parameters such as degrees of vertical and horizontal parallelism are identified and a design domain is created through a combination of design choices. Design trade-offs are performed using high-level performance model to obtain energy efficient designs.
Keywords/Search Tags:Energy, Designs, Kernel, Trade-offs, Used, Performance, Matrix multiplication, Model
Related items