Font Size: a A A

Research On Automatic Data And Computation Partition And Optimization In Parallelizing Compilation

Posted on:2008-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:C L DongFull Text:PDF
GTID:1118360242472200Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High performance parallel machine is a key support for researching modern science, developing engineering technology and processing large scale data. With the development of high performance parallel machine, the architecture of parallel machine is becoming more and more complex, and the design of parallel program is becoming more and more difficult. An advanced parallelizing compiler can automatically translate serial program into parallel program. And it is an important mean to overcome the difficulty of software programming and transplanting. In recent years, parallelizing compilation has made a great progress, but there are still many problems need research. For distributed memory machine, the first problem in developing parallelizing compiler is to distribute the data in a program. Comparing with share memory machine, the great difference is that data on distributed memory machine is stored in different processors. We need considering not only the computation partition but also data partition. We are currently developing a parallelizing compiler S-KAP to automatically translate sequential scientific programs into efficient parallel codes on distributed memory machines. This thesis describes a compiler algorithm that automatically finds the data and computation partition and provides the optimization for data and computation decompositions. We represent partitions as two separate components. First, the data and computation are mapped onto the virtual processor space. Second, the processors of the virtual processor space are mapped onto the physical processors via one of the following folding functions: BLOCK, CYCLIC or BLOCK-CYCLIC (b), where b is the block size.This thesis firstly discusses an algorithm of data and computation partition with uniform data decomposition. Secondly, this thesis discusses the mapping relationship between the virtual processor space and the physical processor space. Finally, the optimization methods of data and computation partition are discussed. Main contributions of this thesis are as follows:1. Analyzing the algorithm presented by Anderson and Lam of data and computation partition. The algorithm sacrificed the parallelisms of a program in order to minimize communication. An improved algorithm is presented in our thesis, and the experiment shows that the algorithm is effective.2. Based on neighbor communication and load balancing, the principle of choosing three mapping methods is presented. And based on symbol linear inequalities system, the mapping function between the virtual processor space and the physical processor space are implemented. The experiment shows that the cyclic mapping can improve load balancing effectively.3. Based on the improved data and computation partition algorithm and the accurate array data-flow dependence, a partition algorithm of data and computation which can process DOALL and DOACROSS parallel is implemented.4. Based on the improved data and computation partition algorithm and the symbol linear inequalities system, a partition algorithm of data and computation which can process read-only data is implemented. The experiment shows that the boundary replication can reduce communication overhead effectively, and improve the performance of generated parallel programs.
Keywords/Search Tags:Parallelized Compiler, Data Decomposition, Computation Decomposition, Data Reorganization, Read-only Data Replication, Boundary Replication, Data Flow Analysis
PDF Full Text Request
Related items