Font Size: a A A

Compiler And Program Optimization On Reconfigurable Manycore Stream Processor

Posted on:2014-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:G LiuFull Text:PDF
GTID:1228330395489291Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The development of stream processor architecture is mainly driven by semiconductor technology and stream programming model. For fully utilizing cheaper transistor resources, CMP has been considered to be the trend of the next generation processor architecture. In addition, stream programming model has become a commonly used method for exploring parallelism with all kinds of applications. However, the runtime characteristics of stream applications demand flexible and reconfigurable processor architecture. On one hand, programs differ from each other in the requirement of hardware resources. On the other hand, a single program may go through different phases of execution, which demonstrate dynamic even periodic features. In order to solve this problem, we present a reconfigurable manycore stream processor TPA-S, which can adaptively reconfigure hardware by forming logical processor out of light-weight physical cores.In this dissertation, we studied the compiler and program optimization techniques on reconfigurable manycore stream processor. Our goal is to implement CUDA programming model on TPA-S, and make as much as possible applications including irregular programs perform well on this processor. The major contributions are as follows:(1) We implemented a multi-threading mapping scheme on reconfigurable stream processor. By studying the essential principle of stream programming model, a program is separated as controller threads and worker threads. The controller threads focus on program control flow and data organization, while worker threads are extracted from kernel functions with intensive computation. Data level parallelism and producer-consumer locality can be explored respectively. On TPA-S processor architecture two kinds of thread mapping scheme, namely master&slave and phased schemes are designed to implement stream programming model. In addition, we present two methods to generate worker threads by extracting from a single CUDA thread or combining a thread group.(2)An extended dataflow-like instruction set architecture DISC-S is presented. This ISA is extended from EDGE (Explicit Dataflow Graph Execution) instruction set, which is conducted by a dataflow-like execution model. It can combine the benefits of reconfigurable logical processor and stream processor. The CUDA programming model brings some new features for TPA-S processor, which needs to be support by software/hardware interface. These features include rapid access of thread index in the special register level, software manageable cache (SMC) shared by worker threads, and efficient barrier synchronization mechanism. According to these requirements, DISC-S added special register mapping and new instruction for SMC and synchronization.(3)Design and implement the compiler framework for TPA-S. We use the NVCC compiler to separate the worker thread code from the controller thread, and implement a two-level compiler chain. The worker thread is processed by Ptx2EDGE compiler, which is designed for PTX assemble frontend and TPA-S backend. The controller thread is compiled by Scale compiler, and we transplant the CUDA runtime library and API. Finally, two kinds of threads can cooperate on our simulator with the support of runtime system Mpsim. The correctness and efficiency of our compiler system has been verified on a group of benchmarks.(4)We also study the program optimization on stream processor. We mainly focus on those irregular programs which perform not very well on TPA-S. By analyzing the performance bottleneck and special parallelism pattern of irregular programs, we present some general optimizing techniques through a case study on breadth first search problem. These techniques can be applied in other irregular algorithms too.All the works in this dissertation can bring the following insights.(1)Stream programming model is a driven force for instruction set architecture design.(2)The multi-thread mapping scheme requires the runtime system support, if not operating system.(3)Designing compiler system needs a suitable software engineering pattern.(4)Irregular program can benefit from parallelism profile.
Keywords/Search Tags:stream programming model, reconfigurable manycore stream processor, dataflow-like execution model, runtime system, program optimization
PDF Full Text Request
Related items