Font Size: a A A

Parallel Compilation And Optimization For Multi-core Processors

Posted on:2011-10-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:M WangFull Text:PDF
GTID:1118360308485655Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The single-core processors improve the computing performance by increasing the clock frequency. However, simply increasing the number of transistors comsumes resources, moreover, the higher clock frequency can also lead to more power consumption. The performance is contradicted to energy consumption. Therefore, the performance of single-core processor has reached to its limit. To solve this problem, the multi-core processor architecture was proposed. Integrating more cores on a single chip can improve the performance while keeping the clock frequency. According to the type of cores, the multi-core processors can be classified into homogeneous multi-core and heterogeneous multi-core. The heterogeneous multi-core integrates different type of cores onto one chip, which is superior to the homogeneous multi-core on efficiency and performance. Recently, the heterogeneous multi-cores achieve high performance to accelerate the practical applications, and have been applied to high-performance computing field. Although the heterogeneous multi-cores provide huge potential for high-perfomance computing, parallel programming and memory management for heterogeneous multi-cores are very complicated. Moreover, there are still many unsolved problems in automatic parallelization tools and software. How to fully exploit the heterogenous multi-cores brings great challenges to programming model and compiler designers.This thesis focuses on researches about compilation design and optimization techniques for the heterogenous multi-cores. The compilation approaches proposed in this thesis are not only suitable to specific heterogenous multi-mores, but also can be adapted to other multi-core systems.The main contributions of this work are as follows:1. An automatic code generation framework for heterogenous multi-cores is proposed. Based on the distributed memory model, a source-to-source compiler is designed for a heterogeneous multi-core Cell. The compiler first performs data alignment, data distribution to partition the program data across different SPEs. Then, communication is generated and inserted in the program. The compiler finally generates different versions of SPMD codes for the Cell according to different data distribution schemes. A version of MCAPI library is implemented on the Cell. This is the first reported MCAPI library for the Cell. This library includes several communication schemes, such as send/recv, shift and transpose. The communication protocol is based on the the mailbox mechanism of the Cell. Experiments verify the effectiveness and performance of the compiler. The performance of our compiler is compared to the IBM XL C/C++ OpenMP compiler. The comparison results demonstrate that the distributed memory model is more suitable to the heterogeneous multi-cores than the shared memory model. 2. An automatic data management framework for heterogenous multi-cores is proposed. The heterogenous multi-cores such as GPU and Cell provide fast explicitly managed local memories, in addition to slow off-chip memory. However, local memories are usually not large enough to hold large amount of data, to solve this problem, an automatic data management system is designed. The system performs hierarchically data distribution, communication generation, loop tiling and loop splitting to decompose the data and computation into tiles which can fit the local memories. To reduce the memory access and improve the data reuse, a communication optimization method is implemented. This method builds the reuse graph for the program and deletes redundant communication through graph partition. The whole framework is designed based on the Cell. Experiments show that the data management framework can generate efficient codes for the program by orchestrating the data between the local stores and memories.3. Model-driven iterative multi-dimensional parallelization of multi-task programs for the heterogenous multi-cores is studied. To orchestrate the computational and memory resources allocation on the multi-cores, a resouse allocation model is first presented. The model is expressed using a three-dimensional optimization space which consists of variant selection, grouping and PE assignment. Then, a genetic algorithm-based approach is proposed to intelligently search the optimization space and pick out the performance optimal parallelization schemes for the multi-task program. The model is implemented for the Cell. Experiments show that the model can derive good schemes in very short time, which significantly reduces the programming burden of the programmers.4. Automatic SIMD code generation techniques for multimedia applications are proposed. Several loop transformations are applied to exploit the SIMD parallelism in the loops. An instruction selection method based on the cost subgraph is proposed to evaluate which part in the loop is more suitable to use the SIMD instructions. Compilation techniques such as loop unrolling and register renaming are used to generate SIMD instrutions for the program. Exprements show the effectiveness of the instruction select and the performance improvement using simdization.
Keywords/Search Tags:multi-core, heterogeneous multi-core, distributed memory model, automatic code generation, automatic data management, resource allocation model, data distribution, SIMDization
PDF Full Text Request
Related items