Font Size: a A A

Research On Automatic Data And Computation Decomposition On Distributed-Memory Systems

Posted on:2007-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y R WangFull Text:PDF
GTID:1118360185454181Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data and computation decomposition is the basis of parallel programs. To achieve highperformance, locality of data access and available parallelism must be considered at the sametime. The complexity comes from both the program and the target machine. Automatic dataand computation partitioning are of great challenge in parallelizing compilers.Automatic data layout is the basis for automatic parallelization. To find a good datalayout, the compiler must consider the characteristic of the program and the target machine atthe same time. Supporting of dynamic data layout further complicated the problem. There areseveral state-of-art former researches and employing ILP (integer linear programming)framework to solve the automatic data layout problem is a promising method. We proposed anovel framework for automatic data layout tools. Compared to previous works, this frameworklowers the analysis cost considerably and increases the precision of performance model. Toachieve these goals, we employ several techniques. First, we introduce tree shaped programpartitioning directed by parallelism and locality characteristics. Such a program partitioningmethod reduces the analyzing cost and enables more precise performance prediction. Second,this framework models the effect of communication optimization for shift communications,which increase the precision of problem formulation. There are also some other improvements,such as automatic multipartitioning support for the first time and consideration for exact shapeand size of processor grid. Experimental results show that our framework generate data layoutfor example programs equal to or better than previous works.Partial replication is an important optimization that reduces near-neighbor communicationat the cost of some repeated computation. It can considerably improve the performance andscalability of parallel programs. Former exploration of partial replicate computationpartitioning is limited within a single loop nest, and no explicit cost model is used. In thispaper, we present a formal description of more general partial replicate computationpartitioning problems, which is called global partial replicate computation partitioning. Asredundant message elimination exerts great effluence on the effect of such optimizations, weintroduced a linear cost model which considers its effect. We also developed a frameworkwhich employs the integer linear programming method. Experimental results show that thesolution is superior to local approaches. Compared to heuristic method, the new approach candeal with more general cases and is easier to be adapted to different data distribution.
Keywords/Search Tags:distributed memory parallel systems, parallelizing compiler, automatic data distribution, computation partitioning
PDF Full Text Request
Related items