Research On Optimizations Methods Of Sparse Matrix And Stencil Computation Based On New Generation Domestic Exa-scale Supercomputing System

Posted on:2023-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Ye

Full Text:PDF

GTID:2568307025452274

Subject:Computer science and technology

Abstract/Summary:

PDF Full Text Request

The Von Neumann architecture is still the mainstream in now computer systems.The "bottleneck" of memory-access is still insurmountable,two operations related it can be divided into Sparse-matrix operation and Stencil.In this study,we investigate in-depth the two aforementioned scenarios on Sunway heterogeneous manycore architecture and optimize the existing algorithm.Our paper’s main contribution is the development of a new technique for improved performance,which goes as follows:First,we propose a method combined with blocked Jacobian and Cholesky algorithms.The blocking approach geometrically disengages the entire matrices and completely eliminates the data connection between the parallelizing matrices iterations.Additionally,we use an RMA-based double-cache many-core optimization mechanism for our Sunway system’s matrix multiplications.When the matrix dimension reaches2354928,experiments using Open FOAM benchmarks demonstrate that our solution solves the problem 9.15 times faster than the Incomplete Cholesky decomposition method and 8.3 times faster than the Geometric-Algebra Multi-Grid method.Second,we develop a discrete-memory-access optimization algorithm in our Sunway systems after extensive research on memory accessing in unstructured grids.And we make use of a message queue technique and the on-chip communication mechanism in slave cores to increase performance.To further enhance memory access performance,non-blocking data allocating approach is also implemented.The results demonstrate that our method uses an average memory bandwidth that is 70% of the theoretical value,and the target discrete memory access process in various kernels is accelerated with a maximum ratio of 45 and an average of 10.Our method also demonstrates its adaptability and durability across a range of domains and applications.Finally,we combine our program with the Stencil calculation’s characteristics and conduct a thorough analysis of the Stencil application on Sunway systems,which includes,(1)an adaptive four-level parallel framework based on Sunway architecture using a master-slave merging,which according to testing results outperforms the conventional three-level framework in terms of memory bandwidth and speeds up the master-slave process by a factor of 12 to 65.(2)a partial block parallelism and dynamic cache scheduling algorithm based on Sunway on-chip communication mechanism via RMA,which highly utilizes the space and time resources.(3)a mixed precision method combined with half-precision,single precision and double precision,which improves the overall performance while validating the results.The three aforementioned advances help to deliver a 7.53 times faster acceleration on the overall application with the 70.58% of the parallelized programs getting optimized,which makes 6.8 times lead than the idea’s,and thus achieve the 99.29% parallel efficiency with 27988480 cores in the global 500metre-resolution case using our new generation Sunway system.

Keywords/Search Tags:

Heterogeneous Manycore Architecture, Coupling blocked Jacobian and Cholesky, Message Queue, Master-slave Merging, RMA, Mixed Precision

PDF Full Text Request

Related items

1	Research On The Precision Of Universal Surgical Master Hand With Force Sense
2	Machine Learning-inspired High-performance and Energy-efficient Heterogeneous Manycore Chip Desig
3	Research On Master-slave Heterogeneous Manipulators Motion Algorithm Based On Reinforcement Learning
4	Design Of Exoskeleton Teleoperation Master Manipulator And Research Of Master-slave Heterogeneous Control Method
5	Design Of 6-dof Master-slave Heterogeneous Robot Control System In Virtual Environment
6	Design Of Master Manipulator Of Master-slave Telecontrol Robot Based On The Internet
7	The Design And Implementation Of Room Server On Video Live Streaming Platform
8	Development And Maneuverability Of Modular Master-slave Robot Teleoperation System
9	Design And Implementation Of A Heterogeneous Data Interactive System Base On Message Queue
10	Research On Master-slave Tele-robotics System Based On Virtual Reality