Font Size: a A A

Architectural enhancements for efficient operand transport in multimedia systems

Posted on:2008-12-29Degree:Ph.DType:Dissertation
University:Georgia Institute of TechnologyCandidate:Kim, HongkyuFull Text:PDF
GTID:1448390005970773Subject:Engineering
Abstract/Summary:
Multimedia applications pose new challenges to computer architecture. Their tremendous communication demands severely burden the interconnect between functional units, which has become a bottleneck in high performance architectures. This dissertation addresses the critical challenge in multimedia processors: to efficiently transport operands among computational and storage components. It provides architectural enhancements that enable the high bandwidth, low latency communication demanded by multimedia applications.; This research analyzes multimedia workloads to characterize the communication patterns that occur in the execution of standard multimedia benchmarks. This empirical analysis indicates that most operands exhibit strong locality, enabling several optimizations of transport mechanisms, particularly to operand transport networks, storage structures, and instruction steering algorithms. This empirical study shows that an eight-entry local buffer with approximate information on operand lifetime is sufficient to suppress 81% of operand writes. In addition, chaining selected pairs of FUs based on producer-consumer information allows 50% of reads to be accessed through the shortest path.; These results guide the design and development of two efficient operand transport mechanisms: (i) a traffic-driven operand bypass network and (ii) a dynamic instruction clustering. The traffic-driven operand bypass network is designed using a novel, systematic design customization process for wide-issue architectures. It is driven by a technology model-based evaluation methodology on different execution engines, resulting in a low cost, high performance bypass network targeted for multimedia applications. This technique places microarchitectural components exploiting the transport communication patterns, reorganizes each of the bypass paths based on the traffic rate, and maps inter-instruction communication on the local paths. The reduction in operand transport latency combined with a faster clock cycle achieves an instruction throughput gain of 2.9x over the broadcast bypass network at 45nm. In addition, the instruction throughput gain over a typical clustered architecture is 1.3x.; Dynamic instruction clustering groups dependent instructions into clusters during instruction execution, detects the operand lifetime, performs intra- and inter-cluster operand transport pattern analysis, and maps the clustered instructions to an efficient cluster execution unit. Two cluster execution unit implementations are explored: network ALUs and a dynamically-scheduled SIMD PE array. In the network ALUs, intermediate values within the inner loops are propagated among ALUs without distribution through global bypass buses. The reduction in operand transport latency results in a 35% IPC speedup over a conventional ILP processor. The dynamically-scheduled SIMD PE array supports DLP processing of the innermost loops in image processing applications. Dataparallel operations combined with localized operand communication produce an IPC speedup of 2.59x over a 16-way, four-clustered microarchitecture.
Keywords/Search Tags:Operand, Multimedia, Communication, Applications, Bypass network, Efficient, Over
Related items