Font Size: a A A

Research On Performance Analysis And Optimization Techniques For Scientific Programs

Posted on:2005-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y G CheFull Text:PDF
GTID:1118360152457202Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High Performance Computing (HPC) is widely used in science and engineering to solve large computation problems. With the advance of HPC, many high performance computers are developed and used. The computers' peak performances increase in a continuous and rapid way. But the sustained performances achieved by real applications do not increase in the same scale as the machine's peak performances do. The gaps between the applications' sustained performances and the machines' peak performances are widening. Program performance optimization, which is one of the effective ways to solve this problem, draws the attentions of the research community as well as the industry community. As the computer architectures and program structures are becoming more and more complex, the number of factors that affect program performance is increasing. Furthermore, these factors interplay with each other in a complex and nonlinear way. This makes program performance analysis and optimization challenging tasks.Aimming at optimizing the performances of scientific programs, reducing the gaps between the programs' real performances and the machines' peak performances, we research on program performance optimization parameter selection, program performance measurement and analysis, performance tuning for real applications. Our work includes:(1) We propose a novel idea to reduce the cost of execution-driven optimization parameter selection by program reduction transformations. We formulate the theory of program reduction transformations. And we find several situations where legal program reduction transformations can be applied. These situations are common in scientific applications. Program reduction transformation effectively reduces the time spent on evaluating each candidate optimization parameter while preserving the parameter selection quality. It makes execution-driven optimization parameter selection widely affordable.(2) To tackle the difficulties that exhaustive search and iterative compilation face, we propose an optimization parameter selection framework called Limited Execution and Genetic Algorithms based optimization parameter selection (Lega). In Lega, an Execution-Driven and Genetic Algorithms based optimization parameter search engine (Edga) is designed, which searches the feasible optimization parameter space quickly and globally. Lega employs program reduction transformations to reduce the time spent on evaluating each candidate optimization parameter. Furthermore, Lega utilizes program parameterization to eliminate repeated preprocessing and native compilation procedures. The experiments done on 3 platforms show that Lega selects better optimization parameters than DAT, a representative analytical model based performance optimization selection algorithm. Lega's cost is much smaller as compared to Iterative Compilation. It also bears the merit of automatic platform adaptation.(3) We investigate hardware performance monitoring mechanism and the performance measurement software based on it. To overcome the limitations of previous performance measurement softwares, we design and implement PTracker (Performance Tracker), a program performance measurement and analysis software based on hardware performance counters.PTracker not only facilitates program performance measurement, but also provides users with high-level synthesis performance data. These data reflect the program's performance characteristics in memory accesses and ILP (Instruction-Level Parallelism), which are useful in helping users to analyze and optimize the program's performance.(4) We adopt an inter-nest reuse optimization method to enhance Jacobi iteration code's memory locality. This method does not hurt Jacobi iteration code's inter-node parallelism as compared to Time skewing and New Tiling. Experimental results verify the effectiveness of our method.(5) We optimize a real large CFD program's single-node performance. Our optimizations improve the program's performance remarkably. When the optimized program runs on 64 processors of a homemade MPP (Massively Paral...
Keywords/Search Tags:Program performance optimization, Optimization parameter selection, Reduction transformation, Genetic Algorithms, Hardware performance monitoring, Performance analysis, Machine efficiency
PDF Full Text Request
Related items