Font Size: a A A

Study And Implementation Of High Performance Parallel Hierarchy Stream Memory System

Posted on:2008-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z F DaiFull Text:PDF
GTID:2178360242499000Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
X64 is a stream processor implemented by National University of Defense Technology, it aims at bridging the gap of performance and energy efficiency between traditional programmable processor and fixed-function processor. X64 makes use of the structure of VLIW and SIMD to expose the special characteristics of stream application, and try to satisfy the computational requirement of scientific application by integrating large amount of arthimetic units on a single chip. With the design of three-tier bandwidth memory system, X64 is capable to exploit the parallelism and locality existed in scientific applications, and thus successfully reliefs the tight demands for off-chip memory bandwidth. Tests performed on X64 have proved it to be a successful implementation that can provide the three required characteristics: performance, energy efficiency and programmability.The work in this dissertation is based on the architecture of X64 stream processor, but put more emphasis on the optimization and completion of stream memory system. The goal is making X64-like stream processor qualified for more enlarged application set including scientific computing, which is supposed can be achieved by improving the bandwidth of stream memory system. The advantages and disadvantages of X64's three-tier bandwidth memory system is analyzed and the need as well as the feasibility of a more optimized memory bandwidth structure is discussed. Then, we design and implement a Shared by Heterogeneity Processor L2 Cache sub-system to exploit the extra parallelism and locality in stream application, and improve the off-chip memory bandwidth. These lead to a more completed parallel hierarchy stream memory system. Sufficient verification work of our design is carried out in every design phase and every integrated level to insure its validity and reliability. Besides, we give a detailed description of the method and process of performance test done on the parallel hierarchy stream memory system, and the results have well proved the rationality and efficiency of our design.Finally, we disscuss the scalability of our stream memory system, and state the problem of how memory system can sustain performance and energy efficiency in future VLSI technology, where thousands of arithmetic units on a single chip will be feasible. Further more, the method of designing our shared L2 Cache can be a valueable reference to the study of memory system design under a heterogeneity processor environment.
Keywords/Search Tags:Parallel Hierarchy Stream Memory System, Heterogeneity Processor, Shared Level 2 Cache, Data Prefetch, Stream Program Model, Scalability
PDF Full Text Request
Related items