Font Size: a A A

Dataflow-like Driven Tiled Processor Architecture

Posted on:2010-10-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:M CongFull Text:PDF
GTID:1118360302471485Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The development of traditional program execution model and processor architecture is restricted by many problems arose in nanometer technology designs, such as power dissipation, wire delay, design complexity, etc. Tiled processor architecture is a potential solution to these challenges. It organizes computation, storage and interconnects resources into basic tiled architectural units, which are relatively simple, distributed and reusable. A high-productivity processor could be composed of plenty of such tiled units, interconnected by highly efficient and scalable on-chip networks. In the tiled architecture, the design of single tile could be much less complicated, and the wire delay problem of long interconnect could be eliminated, and thus plentiful but cheap transistor resources introduced by Moore's Law could be fully utilized to achieve higher performance. However, tiled architecture is just under studies currently, and some key points need to be investigated and discussed.This dissertation focuses on exploring and investigating the tiled processor architecture in the views of both execution model and architecture. The major research contributions include: (1) Based on the study of the theory of dataflow-like computation model, a novel dataflow-like driven program execution model which suits for tiled processor architecture is proposed. In this model, sequential program is partitioned into a series of VLIBs (Very Long Instruction Block), which may contain tens or hundreds of instructions and form the atomic units for instruction fetching, execution and commitment. The relationship intra-VLIB follows the dataflow style, in which dataflow diagram is introduced as the machine language to expose parallelism to hardware explicitly, so that the design complexity and costs on hardware of dynamic dependence detection could be alleviated. The control-flow style kept inter-VLIB could effectively utilize the dataflow locality in program, and exploit the parallelism from thread-level in the mean time. (2) The design space of dataflow-like driven architecture for tiled processor is further explored in many aspects, and the key factors which impact the performance are analyzed. Firstly, to improve the utilization of computational resources in tiled processor, the feasibility of aggressive speculative execution is studied in the aspects of dataflow and control-flow respectively, and a quantitative metric is given for the depth of speculation. Secondly, to determine a good topology of interconnect network for tiled processor, the impact of many topologies on processor performance are evaluated and analyzed. Thirdly, to alleviate the effect on memory access by tile processor architecture and multi-hop interconnect network, the behavior of memory access of many applications on tiled processor is analyzed, and the data prefetching scheme to reduce memory access latency on tiled processor is proposed and studied. Fourthly, a systematic analysis on the behaviors of applications executed on dataflow-like computing model is shown, and a more accurate requirement imposed on architecture is derived from applications. (3) An optimized scheme for tiled processor is investigated, and a novel proposal about tiled unit is presented, by which parallelism can be fully exploited and the cost on communication can be effectively reduced. The computing power of single tile should be restricted to the granularity of potential instruction-level parallelism of applications. Meanwhile, the local communicational connectivity should be increased proportionally according to locality of dataflow in program, without any impacts on the design of global communication network. Our experiments showed the need for instruction-level parallelism in applications could be met, and the communicational latencies in critical path could be effectively reduced. (4) Based on the above scheme, an optimized dataflow-like driven tiled processor architecture called TPA-PI is designed and implemented. In TPA-PI processor, an instruction set named DISC-I is defined and realized following the data-flow driven execution model. The design of TPA-PI Tile finds a good tradeoff among the exploitation of more instruction-level parallelism, the limited computing power of single tile and more serious restrictions of wire delay, which provides a good scalability on performance, architecture and technology. (5) Based on TPA-PI experimental platform, the effectiveness of Dataflow-like execution model and architectural design are evaluated. The experimental results verified the advantage of dataflow-like execution model on performance compared with control-flow execution model, the validity of proposal of tiled unit and the reasonableness of optimized TPA-PI architecture.Based on the work of this dissertation, some important conclusions on dataflow-like driven tiled processor architecture are drawn as following: Firstly, the granularity of processor cores, execution model, on chip interconnect model and the selection of applications are key factors which determine the performance of dataflow-like driven tiled processor architecture. Secondly, the combination of dataflow-like driven program execution model and tiled processor architecture could utilize massive computational resources effectively, and it has a considerable potential to exploit parallelism both in instruction-level and thread-level, meeting the requirement of applications with arbitrary characteristic.The research and experimental results in this dissertation can provide direction for the design and optimization of tiled processor architecture.
Keywords/Search Tags:tiled processor architecture, program execution model, dataflow-like driven, design space exploration, limit of computing model
PDF Full Text Request
Related items