Computational limits of VLIW architectures for digital signal processing transforms

Posted on:2004-05-17

Degree:Ph.D

Type:Dissertation

University:The University of Texas at Dallas

Candidate:Sankaran, Jagadeesh

Full Text:PDF

GTID:1468390011473775

Subject:Engineering

Abstract/Summary:

There has been a rapid acceptance of the VLIW architectural style, by many DSP silicon providers who gave up on previous generation devices, based on the ability to exploit the instruction level parallelism (ILP) found in computational problems. The introduction of instructions that act on multiple data (SIMD) allows VLIW architectures to take advantage of data level parallelism (DLP) as well. VLIW architectures have also been accepted by the microprocessor community based on their promise to offer flexibility in sustaining high levels of performance on even general-purpose processing. However, there is a critical dependence on software optimization of the algorithms used, to realize the true levels of performance that VLIW architectures have to offer. Current research on VLIW architectures, is trying to further enhance the available performance levels by: (a) Enhancing clock rates of the device, by adopting new process technology nodes for fabrication. (b) Using multiple cores, in order to expose task level parallelism in addition to data level parallelism. (c) Enhancing performance of single cores through the addition of more powerful instructions to the instruction set architecture and hardware accelerators where appropriate.; Irrespective of which approach or combination of approaches wins in the end, there is still an essential need, to maximize the performance achievable within a single core. This dissertation presents answers to the following questions that will aid in the process of achieving optimal performance using VLIW cores for a given load-store bandwidth. (a) What bottlenecks limit the computational performance of VLIW architectures on DSP transforms? (b) What is the additional boost in performance offered by combining SIMD and VLIW architectural styles? (c) What are the preferred software techniques to achieve optimal performance on these architectures? (d) How much of the performance can be obtained from high-level languages? (e) What is the additional performance that can be gained by hand optimized coding? (f) What are the new levels of performance that can be gained by new instructions?; The answers to these questions are provided in this dissertation by an in-depth analysis of optimal implementations of DSP algorithms using C62x (VLIW) and C64x (VLIW+SIMD) instruction set architectures. The study is performed by analyzing performance of DSP algorithms in time (convolution), frequency (FFT) and spatial domains (DWT). Time and frequency domain based methods form the bulk of existing DSP algorithms. (Abstract shortened by UMI.)...

Keywords/Search Tags:

VLIW, DSP, Performance, Level parallelism, Computational

Related items

1	VLIW processors: Efficiently exploiting instruction level parallelism
2	Branch optimizations and instruction-level parallelism exploitation for dynamic superscalar and VLIW processors
3	Complementary compiler and architecture features for embedded VLIW processors
4	Investigation On Basic Block Scheduling Optimization For Predicate Execution VLIW DSP
5	The Research And Implementation Of Key Techniques On Block Cipher ASIP
6	Macroblock Level Parallel Implementation And Its Scheduling Optimization Strategy For H.264 Decoders
7	Research On Memory-level Parallelism For Multi-core Microprocessor Chip
8	Modeling Memoey-level Parallelism Of Cache Analytically
9	Research On BGP Parallelism Technologies For Multicore And Multi-threading
10	Parallel Algorithm Design And Optimization For H.264 Video Encoding