Research On Optimized Programming For Heterogeneous Multi-core Platform

Posted on:2012-11-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:B Li

Full Text:PDF

GTID:1118330362955223

Subject:Computing system structure

Abstract/Summary:

PDF Full Text Request

Over the past few decades, the frequency of CPU is sustained increased, which dominate the performance of CPU. However, the approach that gets performance improvement through increasing the frequency is not available due to the microchip production process, power, CPU architecture design and some other inevitable challenges. To maintain the Moore's Law, the chip vendors lead the CPU to enter multicore era. Currently, there are two kinds of architecture for multicore CPU, homogeneous and heterogeneous multicore. For computing-intensive applications, the homogeneous multi-core which is consisted of a few of identical CPU cores is not the most appropriate solustion. Oppositely, the heterogeneous multicore architecture which is consisted of CPU and dedicated accelerator-based cores could achieve better performance and is becoming the mainstream architecture of the high performance computing community. The innovation of the hardware introduced the new challenges of programming on it. How to boost up the performance of heterogeneous multicore is becoming hot research topic. This paper presents the research on optimized programming for peterogeneous multi-core platform.Firstly, to boost maxily performance on heterogeneous multi-core processors, programs need to expose multiple grain parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is a labor work, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. This paper presents a model of multi-grain parallel computation for steering the parallelization process on heterogeneous multicore processors and surveys some disciplines to optimized implementation an application on heterogeneous multicore platform. The evaluation results of Cell-specific implementations of two applicationsit proved these optimal schemes could exploit the computing potential of the heterogeneous multicore.Secondly, to best utilize the computing power of CPU, a novel heterogeneous data-parallel computational model on heterogeneous multicore platform is proposed. After the optimized workload distribution across heterogeneous cores, this aggressive model could not only exploit computing power of APU but also CPU cores'and aggregate them together to accelerate the pure data-parallel application. The heterogeneous data-parallel computational model is used to implementation the Raytracing algorithm on Cell processor and the results show the model could boost the overall performance of the system.Thirdly, for the same goal with the last chapter to explore the CPU performance, a three-stage streaming model is proposed for some kinds of streaming application. After a filter module running on CPU preliminary processes the raw dataset, large part ofâ€•emptyâ€–data for APU is filtered. The filter module running on CPU could avoid unnecessary data transfer between CPU and APU and computation workload of APU. To evaluate the efficiency of the the model, the MC algorithm is used as a benchmark program to implement on Cell with the streaming model.Finally, when programming on GPU-APU system, the programmer must manually deal with APU local memory, and data transfer between host memory and GPU device memory explicitly. To relieve this burden, the frontend source-to-source compiling and runtime library technologies are used to implement an experimental prototype system based on NVIDIA CUDA programming environment, called memCUDA. It can automatically map NVIDIA GPU device memory to host memory. With some pragma directive language, programmer can directly use host memory in CUDA kernel functions, during which the tedious and error-prone data transfer and device memory management are shielded from programmer. The performance is also improved with some near-optimal technologies. Experiment results show that memCUDA programs can get similar effect with well-optimized CUDA programs with more compact source code.

Keywords/Search Tags:

Heterogeneous Multicore Architecture, Programming Model, Parallel Computing, Cell BE, CUDA, Performance Evaluation

PDF Full Text Request

Related items

1	Research On Performance Optimization Of Heterogeneous Platform Based On CPU-GPU And Multicore Parallel Programming Model
2	A programming model and processor architecture for heterogeneous multicore computers
3	Research Of Parallel Computing On CPU/GPU Heterogeneous Architecture
4	Study On Multi-thread Parallel Programming Method Based On Multi-core Environment
5	Research On The Performance And Scalability Of Data-Parallel Programming Model On Multicore
6	Analysis And Extension Of The Typical Programming Model For Heterogeneous Platforms
7	Research Based On CUDA Parallel Computation Of FFT
8	Design And Implementation Of Unified Programming And Compiled Separately For Heterogeneous Multi-core Processor
9	Research On Heterogeneous System Oriented Parallel Programming
10	Parallel Computing Scalability Studies And Applications On The Distributed Memory Environments