Font Size: a A A

Compute Efficient Embedded Processors

Posted on:2013-02-06Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Gilani, Syed ZohaibFull Text:PDF
GTID:1458390008970649Subject:Engineering
Abstract/Summary:
Emerging embedded computing applications are becoming increasingly compute intensive and require high performance processors. However, embedded processors are typically battery-powered with limited power and energy budgets. This dissertation focuses on improving the power efficiency of modern embedded processors. For floating-point (FP) intensive applications, this dissertation proposes a novel FP fused multiply-add (FMA) design and a low-overhead approach to FP hardware using Virtual floating-point units (Virtual-FPUs). The proposed approaches improve the performance, accuracy and power efficiency of low-power embedded processors.;Modern embedded architectures integrate graphics processing units (GPUs) with embedded processors. These integrated GPUs can be used to accelerate general-purpose (GPGPU) applications. This dissertation proposes a novel compiler-directed data-forwarding approach that can significantly improve the performance of GPGPU applications without the high power overhead of traditional data-forwarding networks (DFNs). The proposed approach is also used to reduce the power consumption of GPUs by lowering the voltage of execution units without increasing the RAW time of a large percentage of instructions. This allows a significant reduction in the GPU power consumption with negligible performance impact.;This dissertation also proposes to improve the performance of integer applications by efficiently utilizing the FP execution units in GPUs. This allows considerable energy and performance improvements for GPGPU applications. Further improvements in performance and power efficiency are achieved by exploiting computational redundancy within a set of co-issued threads in GPUs. This computational redundancy exists whenever the operand values for all co-issued threads are identical and thus produce the same result.;Finally, to efficiently utilize the register file and execution bandwidth in GPUs, this dissertation proposes a sliced GPU architecture that considerably increases instruction throughput for instructions whose operands only require 16 or fewer bits for accurate representation.
Keywords/Search Tags:Embedded processors, Power, Performance, Applications
Related items