Font Size: a A A

Research On Integral Optimization Techniques For Scientific Programs

Posted on:2006-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y DanFull Text:PDF
GTID:2178360185463731Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High Performance Computing (HPC) is widely used in scientific and engineering areas to solve large problems. With the advance of HPC, many high performance computers are developed and used. Nowadays, the most popular parallel computer architectures in the world are SMP, MPP, COW and SMP Cluster, etc. SMP Cluster, which combines the characteristics of both shared memory and distributed memory architectures, is not only powerful in computing capabilities but also excellent in scalabilities. This thesis mainly focuses on the multi-level parallelism development and performance optimization of scientific programs on this architecture, and our works are summarized as follows.(1) We put forward the multi-level parallel computing time model, which is suitable for SMP Cluster to analyze program performance from the micro- aspect. We also provide a multi-level parallel optimization speedup model based on the single-processor speedup factor, which can evaluate program performance from three parallel levels and guide us to improve the programs.(2) We put forward a new mapping scheme of grid-points to processors, which organizes the processors in the best way to minimize the communication cost.(3) Based on the real experimental time, there provides a new iso-communication-requirement scaling model.(4) We make further improvement in Fox algorithm on processors organized as a rectangular grid, and we also develop multi-level parallelisms and make performance optimization for this improved algorithm on SMP Cluster at process level, thread level and instruction level. The experimental results indicate that the hybrid parallel programming model applied to this problem can achieve better performance than that of pure MPI.(5) We parallelize a real large CFD application LM3D and another application of explosion in a box by employing many useful techniques, for example, the divide-conquer method and the new mapping scheme mentioned above. These two parallel programs both achieve satisfactory performance. Furthermore, we draw some useful conclusions by analyzing the experimental results in detail.
Keywords/Search Tags:SMP Cluster, parallel computing time model, speedup, scaling model, multi-level parallelism, single-processor performance optimization, mapping scheme
PDF Full Text Request
Related items