Font Size: a A A

Research On Cost Model And Memory Optimization Techniques For Heterogeneous Processor

Posted on:2014-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:P F HuangFull Text:PDF
GTID:2268330401476761Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays heterogeneous processors have been widely used in high performance computing. They integrate different types of cores in a single chip, providing huge potential for parallel computing, but bringing great challenge at the same time. One has to master higher level programming model and richer architecture knowledge before he can write parallel codes that make full use of heterogeneous resources. This is a tough job for ordinary programmers. So parallelization compilation becomes the primary way to quickly obtain efficient parallel programs.Presently parallelization compilation for heterogeneous architecture has born many research fruits, but there are still a lot of problems needed to be analyzed and resolved. This dissertation targets at OpenACC heterogeneous parallel programs, discusses the cost model and memory optimization techniques used in automatic parallelization. The contents of research mainly include:1. Aiming at the problem that’traditional compilation frameworks cannot suit to new architectures, a parallelization compilation system called Auto-ACC is designed for a domestic heterogeneous processor. The system performs parallel loop recognition and parallel region optimization based on the parallel cost model oriented to heterogeneous architectures, then partitions the loop data across different slave cores’local memory correctly with several memory optimization methods, lastly generates parallel codes that can be run efficiently on heterogeneous processors. Experiments demonstrate that the system can generate OpenACC programs correctly, which achieves high performance on target platforms.2. Existing parallel cost models have some drawbacks when evaluating loops’execution performance on heterogeneous processors. To solve the problem, a novel cost model for heterogeneous architecture is designed. According to heterogeneous processors’characteristics, the model introduces different parameters to describe the impact of different computing capabilities of master and slave cores, different access delays of main and local memory, and data transfer between them, enhancing the accuracy of loops’parallel profit evaluation. Experiments show that the model reduces the parallelization of negative profit loops effectively, and promotes the generated parallel program’s performance.3. To exploit the limited local memory space in slave cores, a multi-dimensional self-adaptive memory optimization framework for the local data memory is proposed. The framework manages the data referenced by parallel loops efficiently through five memory optimization methods. Firstly, take advantage of array blocking to partition the loop arrays, making partitioned data tiles satisfy the restriction of local memory space. Secondly, through data distribution guided by array access pattern analysis, essential elements of regular access arrays are extracted and transferred to local memory, and irregular access arrays are mapped to software cache directly. Thirdly, by array transposition, change the store sequence of array dimensions, and make the array store with the loop nest access way, realizing transformation from inconsecutive array access to consecutive access. Fourthly, aggregate multiple discrete scalars to a data unit by scalar aggregation, so as to reduce data transfer times and improve scalars’transfer efficiency. Fifthly, through creating accelerative data region, shift data transfer operations from the loop nest inside to outside, making data transfer happen only before and after loop computation, avoiding multiple data copy in inner parallel loop during the execution of outer sequential loop. Experiments show that these memory optimization methods optimize the data transfer, store and access procedures from different levels and different granularities, resulting in the significant performance improvement of generated OpenACC parallel programs.
Keywords/Search Tags:Heterogeneous Processor, Parallelization Compilation, Heterogeneous Architecture, Parallel Programming Model, Parallel Cost Model, Memory Optimization
PDF Full Text Request
Related items