Font Size: a A A

Key Techniques Research Of High Productivity Stream Architecture

Posted on:2009-05-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:N WuFull Text:PDF
GTID:1118360278956524Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The increasing importance of numerical applications and the properties of modern VLSI processes have led to resurgence in the development of parallel microarchitectures with a large number of ALUs as well as extensive support for parallelism.As a result,a lots of novel architecture model are designed.In particular, kernel-stream model which break through von Neumann processor architecture and is originated from media processing have applicable to broader application domains including signal processing,digital graphic,cryptography and scientific simulation. Emerging stream architecture based on kernel-stream model is becoming a focus of microarchitecture research.The academe and industry have designed a series of typical stream processors(or prototype system) such as Imagine,Merrimac,STORM, YHFT64-2 and MASA.In addition,processors such as CELL,Trips,RAW,Clearspeed Tiles and some 3D GUPs are compatible with kernel-stream model or using stream accelerators.They achieve area-and energy-efficient high performance and provide a totally programmable flexibility advantage.The author thinks that stream architecture will not replace scalar architecture,but it is able to become an important component of the future high performance processor.However,along with VLSI technology's development and stream applications' broaden,emerging stream architecture confronts with many challenges such as irregular stream,overloaded registers,the scalability of architecture and performance.The key problem is how to maintain and improve parallel efficiency of stream architecture.As a result,this article is focus on techniques of high productivity stream architecture.This article studies deeply on stream architecture,including kernel-stream model, stream application's mapping,hardware architecture,and stream programming model and compiler.develop the whole research platform of stream architecture——MASA research platform.This is a system platform developed specially for research of stream architecture,and an important component of stream architecture research's tool chain. Based on this paltfrom,this dissertation completed the following main contributions and innovations:1) The paper presents a novel dual mode syncretic adaptive memory system to support irregular stream access for extended irregular stream model.In mapping broader applications to stream architecture,we have found that a complex application usually has characteristics of both regular and irregular streams.These characteristics impose requirements beyond the capturable locality scope,management granularity,and space allocation of a pure SRF,a cache,or a simple combination of the two.Thus,it is essential to optimize memory hierarchy design for stream architecture to support not only regular and irregular streams simultaneously but also efficient transformation between regular and irregular streams.DSAM's basic principle is that software manages space allocation and prefetching at the coarse granularity of the entire stream,while hardware manages runtime accessing at the fine granularity of the cache line.It provides some advantages:sharing space of irregular and regular stream,avoiding data transfer, allowing access mode's seamless switching and reducing miss.We complete hardware design,programming and compiler extending and evaluating.Results show that DSAM can make broader irregular stream application executed more efficiently.2) The paper presents a novel optimization technology of compiler to solve the problem of overloaded distributed local register file in stream processo-spill scheduling. It solves register allocation and scheduling for computation-intensive kernels in stream processor by three strategies:workload moving,instruction slot inserting and basic module repartition.There are three aspects of its meanings to this technology.First is that for stream memory system not allowing access memory caused by overflowing, spill scheduling can guarantee successful scheduling of kernel with overloaded register demand without performance suffering caused by changed program.Second is that for stream memory system allowing access memory caused by overflowing,it can reduce access memory stall caused by register spilling and improve performance of kernels. Third is that for processor designers,load scheduling can save the the number of registers in each register file,it is especially useful in embedded processor.3) This paper presents a novel Tiled Multi-Core Stream Architecture(TiSA) which supports multiple levels parallelism and has good scalability.Along with the increasing demand for performance of processor and the broadening application domain of kernel-stream model,stream application and algorithm is more complexity.These require stream architecture has good scalability of performance and resource.Therefore, for epoch of more than one billion transistors and 1000 ALUs in one chip,architecture innovation is necessary to keep performance and cost efficiency of stream architecture. TiSA introduces the macro-Tile which consists of multiple stream cores as a new category of architectural resources,and designed an on-chip network to support stream transfer among Tiles.In addition,we extend stream programming model and execution mode.Multiple levels parallelisms including ILP,DLP and TLP are exploited on different granularity of processing elements.This work completes hardware microarchitecture's design,stream programming/compiler model's extending and prototype system's constructing.Finally,we analysis hardware overhead based on scaling cost model,and evaluate TiSA's performance for several applications.The result shows that TiSA is a VLSI-and performance-efficient architecture for the billions-transistors era.
Keywords/Search Tags:Stream Processor, Stream Architecture, Kernel-stream Model, Irregular Stream, Distributed Register File, Multi-core Scaling
PDF Full Text Request
Related items