Font Size: a A A

Research On Optimized Programming For Heterogeneous Multi-core Platform

Posted on:2012-11-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiFull Text:PDF
GTID:1118330362955223Subject:Computing system structure
Abstract/Summary:PDF Full Text Request
Over the past few decades, the frequency of CPU is sustained increased, which dominate the performance of CPU. However, the approach that gets performance improvement through increasing the frequency is not available due to the microchip production process, power, CPU architecture design and some other inevitable challenges. To maintain the Moore's Law, the chip vendors lead the CPU to enter multicore era. Currently, there are two kinds of architecture for multicore CPU, homogeneous and heterogeneous multicore. For computing-intensive applications, the homogeneous multi-core which is consisted of a few of identical CPU cores is not the most appropriate solustion. Oppositely, the heterogeneous multicore architecture which is consisted of CPU and dedicated accelerator-based cores could achieve better performance and is becoming the mainstream architecture of the high performance computing community. The innovation of the hardware introduced the new challenges of programming on it. How to boost up the performance of heterogeneous multicore is becoming hot research topic. This paper presents the research on optimized programming for peterogeneous multi-core platform.Firstly, to boost maxily performance on heterogeneous multi-core processors, programs need to expose multiple grain parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is a labor work, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. This paper presents a model of multi-grain parallel computation for steering the parallelization process on heterogeneous multicore processors and surveys some disciplines to optimized implementation an application on heterogeneous multicore platform. The evaluation results of Cell-specific implementations of two applicationsit proved these optimal schemes could exploit the computing potential of the heterogeneous multicore.Secondly, to best utilize the computing power of CPU, a novel heterogeneous data-parallel computational model on heterogeneous multicore platform is proposed. After the optimized workload distribution across heterogeneous cores, this aggressive model could not only exploit computing power of APU but also CPU cores'and aggregate them together to accelerate the pure data-parallel application. The heterogeneous data-parallel computational model is used to implementation the Raytracing algorithm on Cell processor and the results show the model could boost the overall performance of the system.Thirdly, for the same goal with the last chapter to explore the CPU performance, a three-stage streaming model is proposed for some kinds of streaming application. After a filter module running on CPU preliminary processes the raw dataset, large part of―empty‖data for APU is filtered. The filter module running on CPU could avoid unnecessary data transfer between CPU and APU and computation workload of APU. To evaluate the efficiency of the the model, the MC algorithm is used as a benchmark program to implement on Cell with the streaming model.Finally, when programming on GPU-APU system, the programmer must manually deal with APU local memory, and data transfer between host memory and GPU device memory explicitly. To relieve this burden, the frontend source-to-source compiling and runtime library technologies are used to implement an experimental prototype system based on NVIDIA CUDA programming environment, called memCUDA. It can automatically map NVIDIA GPU device memory to host memory. With some pragma directive language, programmer can directly use host memory in CUDA kernel functions, during which the tedious and error-prone data transfer and device memory management are shielded from programmer. The performance is also improved with some near-optimal technologies. Experiment results show that memCUDA programs can get similar effect with well-optimized CUDA programs with more compact source code.
Keywords/Search Tags:Heterogeneous Multicore Architecture, Programming Model, Parallel Computing, Cell BE, CUDA, Performance Evaluation
PDF Full Text Request
Related items