Font Size: a A A

Computational limits of VLIW architectures for digital signal processing transforms

Posted on:2004-05-17Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Sankaran, JagadeeshFull Text:PDF
GTID:1468390011473775Subject:Engineering
Abstract/Summary:
There has been a rapid acceptance of the VLIW architectural style, by many DSP silicon providers who gave up on previous generation devices, based on the ability to exploit the instruction level parallelism (ILP) found in computational problems. The introduction of instructions that act on multiple data (SIMD) allows VLIW architectures to take advantage of data level parallelism (DLP) as well. VLIW architectures have also been accepted by the microprocessor community based on their promise to offer flexibility in sustaining high levels of performance on even general-purpose processing. However, there is a critical dependence on software optimization of the algorithms used, to realize the true levels of performance that VLIW architectures have to offer. Current research on VLIW architectures, is trying to further enhance the available performance levels by: (a) Enhancing clock rates of the device, by adopting new process technology nodes for fabrication. (b) Using multiple cores, in order to expose task level parallelism in addition to data level parallelism. (c) Enhancing performance of single cores through the addition of more powerful instructions to the instruction set architecture and hardware accelerators where appropriate.; Irrespective of which approach or combination of approaches wins in the end, there is still an essential need, to maximize the performance achievable within a single core. This dissertation presents answers to the following questions that will aid in the process of achieving optimal performance using VLIW cores for a given load-store bandwidth. (a) What bottlenecks limit the computational performance of VLIW architectures on DSP transforms? (b) What is the additional boost in performance offered by combining SIMD and VLIW architectural styles? (c) What are the preferred software techniques to achieve optimal performance on these architectures? (d) How much of the performance can be obtained from high-level languages? (e) What is the additional performance that can be gained by hand optimized coding? (f) What are the new levels of performance that can be gained by new instructions?; The answers to these questions are provided in this dissertation by an in-depth analysis of optimal implementations of DSP algorithms using C62x (VLIW) and C64x (VLIW+SIMD) instruction set architectures. The study is performed by analyzing performance of DSP algorithms in time (convolution), frequency (FFT) and spatial domains (DWT). Time and frequency domain based methods form the bulk of existing DSP algorithms. (Abstract shortened by UMI.)...
Keywords/Search Tags:VLIW, DSP, Performance, Level parallelism, Computational
Related items