In recent years,with the rapid development of multi-core and many-core processors in the field of high performance computing,task parallel programming model has become a research focus in the field of high performance computing in order to exploit and maximize the computing power of multi-core and many-core processors,reduce the difficulty of parallel programming and improve the efficiency of parallel computing.AceMesh is a task parallel programming model that can automatically discover data-driven parallelism in structured grid applications to support multi-core,many-core heterogeneous platform.Its performance is superior to other task parallel programming models.AceMesh task parallel programming model is composed of AceMesh task scheduling system and AceMesh compiler.The compiler converts the instruction statement into task parallel program which calls the related functions of the underlying task scheduling system.The task scheduling system is mainly responsible for the scheduling and execution of task parallel program.In order to improve the performance of AceMesh task scheduling system,this paper realizes the parallelization of its composition phase,and further optimizes the composition.Then,based on the platform of "Sunway Taihu Light",the task graph parallelization of spacecraft application is realized by using AceMesh instruction statements.Compared with the existing OpenACC*programming model of "Sunway Taihu Light" platform,the test data show the superiority of AceMesh task parallel programming model after composition optimization.Specifically,based on the in-depth analysis of the composition stage of AceMesh task scheduling system,this paper designs a scheme of thread separation and two-level data domain partitioning to realize the parallelization of the composition.Hash table address partition management and task partition edge-building strategies are adopted to ensure the correctness of the composition.In order to ensure the execution of all tasks,a method of task security termination checking is introduced.The implementation of parallel composition method brings up to 158%performance improvement in the composition phase.The composition is optimized from the aspects of variables slave,memory pool,no successor task collection and so on.In order to verify the effect of optimization,seven hot sub-programs in spacecraft application are tested.The test data show that the above optimization has brought about 500%performance improvement for composition.Finally,based on the optimized AceMesh task scheduling system,the task graph automatic parallelization of spacecraft application is realized by using AceMesh instruction statements.In order to improve the computing efficiency,the optimal number of loop blocks and execution threads is determined,and the task registration method using virtual address is designed.Combining with the structure characteristics of "Sunway 26010" heterogeneous many-core processor,the appropriate data transmission mode is selected;In order to adapt to the "Sunway Taihu Light" platform from the limitation of slave local memory space 64 KB,the method of adding one-dimensional division within a task is adopted.The test results of spacecraft application on "Sunway Taihu Light" platform show that the performance of AceMesh task parallel programming model based on the above optimization method is nearly 50%better than that of the original OpenACC*programming model on the platform. |