Font Size: a A A

Research On Parallel Program Performance Tuning On Multi-Core Computing Platform

Posted on:2013-02-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LvFull Text:PDF
GTID:1118330362968400Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It has been a long time that computer industries use the method of scalingfrequency and increasing instruction level parallelism continuously to improveprocessor performance. In recent years, this method is coming to the end for sure, dueto the limitation of memory bandwidth, instruction level parallelism, and power wall.On the other hand, a more powerful processor is urgent need in application fields suchas multimedia, virtualization for their demand for high performance computing. As aresult, multi-core processor has become the de-fact solution to improve peakperformance given the power and frequency-limited single-thread performance. Thatis, abandon instruction-level parallelism and frequency-scaling in favor of theexponential scaling of the number of compute cores per microprocessor. The newprocessor architecture provides massive thread level parallelism, resulting intremendous potential performance. However, it requires efficient parallel program,which is a task that existing software tools are ill-equipped for. In fact, theperformance portability must be guaranteed on multi-core computers, the ability towrite a program once and not only have it deliver good performance on thedevelopment computer, but on all multicore computers today and tomorrow.In such a case, layered parallel computing model is used for guidance in thisarticle. From the perspective of application-drived parallel program performanceoptimization, this article designed a new parallel multi-core processor computingmodel, which can be used in the architecture of multi-core processor chip. Based onthis model, g-scan algorithm is designed which has good extendibility together withhigh performance after analyzing and optimizing some fundamental parallelalgorithms. To verify the newly designed parallel computing model and algorithms, ascalable multi-core processor simulator is developed. At the last, newly designedparallel algorithm is applied to OpenSees which is a finite element software widelyused in structure engineering. The main innovations include:(1) The architecture of the current multi-core processor chip is the research objectin this article. A new parallel computing model named UPMM model which can beused in the architecture of multi-core processor chip is designed after analyzing thecharacteristic of the interconnection of cores on chip and the cache hierarchy. The costof data access of UPMM model is analyzed Based on data access pattern which isderived from basic unit of numerical computation and scientific computingapplications algorithms. At the end, the performance and cache misses of the matrixmultiplication are studied in detail based on UPMM model and conclusion of the costof data access. The experiments proved that UPMM model can give exact analysis ofparallel computing performance and cost of cache accessing. (2) This article designed and implemented a flexible, scalable multi-coreprocessor chip simulator by using SystemC, based on SimpleScalar tool set. Thefunctional experiment shows that all cores in this multi-core processor chip simulatecan simultaneously execute instructions. In contrast with real multi-core computer,this multi-core processor simulator has almost the same parallel execution behavior asreal multi-core processor, which is proved that this simulator can do an accuratesimulation of multi-core process. The performance experiment showed that themulti-core simulator perform more efficient than SimpleScalar's sim-outordersimulator. Finally, the performance prediction experiment proved the accuracy whenusing UPMM model to analyze parallel algorithm.(3) In this article, the parallel scan algorithm is studied in detail, which is widelyused in other algorithm, such as sorting, minimum spanning tree, sparse matrix-vectormultiplication. Based on the analysis and the comparison of parallel scan on PRAMmodel and on UPMM model, order of data accessing is changed, so that time andspace locality of data accessing is better. A new parallel scan algorithm named g-scanalgorithm is designed, which is based grouping the scan elements. g-scan algorithm isapplied to sparse matrix-vector multiplication, resulting in a new sparse matrix-vectormultiplication algorithm. Experiments showed that group based parallel scanalgorithm and scan based sparse matrix-vector multiplication algorithm have a betterperformance and better scalability. This research provides a high performance basicfor many scientific applications such as finite element analysis, molecular dynamicsanalysis.(4) OpenSeesSP, a finite element software widely used in structure engineering, isthe research object in this article. Three performance bottlenecks are found inOpenSeesSP by using performance analysis tools. They are matrix decomposition,communication among process and solving linear/non-linear equation systems. In thisarticle, a parallel UTDU maritx decomposition is developed to solve the firstperformance bottleneck and threads communication supported in MPI2are used tosolve the second one. At last, the sparse marix-vector multiplication algorithm newlydesigned in this article is applied to solve the third performance bottleneck.Experiment shows that these three performance optimization schemes greatly improvethe OpenSeesSP efficiency.
Keywords/Search Tags:multi-core processor computing model, multi-core simulator, parallelalgorithm, scan algorithm, sparse matrix-vector multiplicaton, finite element method
PDF Full Text Request
Related items