Font Size: a A A

Research On The Design Techniques Of Synchronous Data Triggered Multi-core Architecture

Posted on:2009-05-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:M C LaiFull Text:PDF
GTID:1118360278956582Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of very large scale integration technology and the increasing magnitude of application requirements, the advanced multi-core architecture has been the prevalent approach to further improve the processor performance instead of high frequency. Recently, with the promotion of integrate circuit conditions, the multi-core processor has come into sight. However, there still remain lots of problems to be solved, including multi-core parallelism architecture, the solution for on-chip communication, the bandwidth-balanced multi-level memory system and so on. The in-depth study on these theories and design problems will provide the implementation of further high-performance multi-core with great theoretical and practical significance.During the research on high-performance processor, this dissertation presents a syn- chronous data triggered multi-core architecture, where each processor element with scalability characteristics provides high performance, while corresponding to the simple structure and high utilization of transistor resources. Combining with the synchronous data triggered multi-core architecture, some key design techniques on SDTA processor element have been well studied. The novel resource optimization approach is used to improve the performance and save the hardware cost, and then the code compression method is deeply studied to solve code density problem. Following, an accurate analytical performance analysis approach for network on chip is developed, and the on-chip communication structure with characteristics of high-bandwidth, low latency and low cost is implemented. The main contributions are listed as follows.1. We propose a synchronous data triggered multi-core architecture, which is composed of SDTA computing cores, SDTA memory system, the on-chip com- munication structure, the multi-core synchronization mechanism and so on. Each processor with simple and flexible structure supports both SIMD and MIMD, and it has the high performance ability by exploiting the parallelisms during different levels. Besides, the memory system includes the instruction cache, local memory, DMA engine as well as secondary eDRAM-based cache. The network on chip is introduced for the on-chip commication structure, while the effective synchroni- zation mechanism is adpoted to be compatible with SPARC architecture.2. We develop the software and hardware utility suits for synchronous data triggered processor element, and introduced an analytical approach for cost estimation, which meets the precision requirement and has the advantages of flexibility and high- efficiency. Also, we proposed a novel automated approach to explore and design the high-efficiency processor element. The design space is explored using a divide- and-conquer approach, where heuristic-based search process is followed for optimal computing cores and the analytical method using trace-driven simulation is for overall processor element.3. We put forward a template vertical dictionary-based program compression scheme to solve poor code density problem of synchronous data triggered architecture. This scheme emphasizes three aspects, involving the low compression ratio, the limited hardware cost and the run-time decompression. Furthermore, we develop the multi- stream parallel decompression engine and update the software utility suits. This scheme achieves the ultra-low compression ratio with the expense of little execution overhead, while the area and power consumption are saved efficiently.4. We propose a novel performance analysis approach for network on chip based on analytical router modeling. According to the generalized router architecture, the analytical router model which uses M/G/1/N queuing system is established, and it may be used to explore the communication architecture and guide the application mappings. To eliminate the bottleneck during the performance analysis, the analytical models for the improved multi-channel structures are described, which may be used to further guide the design of on-chip routers. By the analytical analysis results, the on-chip network micro-architecture for multi-core processor is designed and implemented in the end.5. We further present the novel dynamic virtual channel architecture with congestion awareness scheme to solve the low buffer utilization and eliminate various blockings. By modifying the previous high speed router, the VLSI implementation of router with dynamic channels is completed. The modified router may regulate the channel organization according to traffic conditions, and it provide throughput increase and latency decrease with the obvious savings of silicon area and power consumption.Plenty of experiments are completed. Towards multimedia and signal processing domains, the optimized processor element has the characteristics of high performance and low cost. The computing core is similar with TI TMS320-C64 series DSP and the overall processor element does the obvious acceleration in the multimedia applications. Then, the communication structure with low-latency and high throughput is presented, and the measure for low hardware cost is put forward. These key techniques with sufficient theory basis may be directly applied to the design and implementation of further multi-core processor.
Keywords/Search Tags:Synchronous Data Triggered Architecture, Multi-core, Hardware Exploration, Code Compression, Performance Analysis, Network on Chip, Virtual Channel
PDF Full Text Request
Related items