Compute Efficient Embedded Processors

Posted on:2013-02-06

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Gilani, Syed Zohaib

Full Text:PDF

GTID:1458390008970649

Subject:Engineering

Abstract/Summary:

Emerging embedded computing applications are becoming increasingly compute intensive and require high performance processors. However, embedded processors are typically battery-powered with limited power and energy budgets. This dissertation focuses on improving the power efficiency of modern embedded processors. For floating-point (FP) intensive applications, this dissertation proposes a novel FP fused multiply-add (FMA) design and a low-overhead approach to FP hardware using Virtual floating-point units (Virtual-FPUs). The proposed approaches improve the performance, accuracy and power efficiency of low-power embedded processors.;Modern embedded architectures integrate graphics processing units (GPUs) with embedded processors. These integrated GPUs can be used to accelerate general-purpose (GPGPU) applications. This dissertation proposes a novel compiler-directed data-forwarding approach that can significantly improve the performance of GPGPU applications without the high power overhead of traditional data-forwarding networks (DFNs). The proposed approach is also used to reduce the power consumption of GPUs by lowering the voltage of execution units without increasing the RAW time of a large percentage of instructions. This allows a significant reduction in the GPU power consumption with negligible performance impact.;This dissertation also proposes to improve the performance of integer applications by efficiently utilizing the FP execution units in GPUs. This allows considerable energy and performance improvements for GPGPU applications. Further improvements in performance and power efficiency are achieved by exploiting computational redundancy within a set of co-issued threads in GPUs. This computational redundancy exists whenever the operand values for all co-issued threads are identical and thus produce the same result.;Finally, to efficiently utilize the register file and execution bandwidth in GPUs, this dissertation proposes a sliced GPU architecture that considerably increases instruction throughput for instructions whose operands only require 16 or fewer bits for accurate representation.

Keywords/Search Tags:

Embedded processors, Power, Performance, Applications

Related items

1	Study On Low-Power Technologies Of High-Performance Embedded Processors
2	Power-aware compilation techniques for high performance processors
3	Cross-layer customization for low power and high performance embedded multi-core processors
4	Automated Software Synthesis for Streaming Applications on Embedded Manycore Processors
5	System oriented delta sigma analog-to-digital modulator design for ultra high precision data acquisition applications
6	Effectiveness of SPEC CPU2006 and multimedia applications on Intel's single, dual and quad core processors
7	Performance And Power Prediction Models On Multi-core Processors For DVFS
8	Data-parallel digital signal processors: Algorithm mapping, architecture scaling and workload adaptation
9	Analog circuit design for embedded and high performance processors in nanoscale technologies
10	Performance enhancing software loop transformations for embedded VLIW/EPIC processors