Font Size: a A A

Design And Implementation Of Heterogeneous Parallel Algorithms On The Sunway Taihulight

Posted on:2021-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D ChenFull Text:PDF
GTID:1488306122480234Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science,technology,and mobile Internet,high-performance computing technology is widely used in various fields,including industry and scientific com-puting,etc.,involving increasing computational complexity and data scale.In order to support very large-scale and high efficiency,high-performance algorithms for domain applications are urgently needed.In recent years,the research and development of supercomputing systems in China has ranked on the top of the world.For example,Tianhe and Sunway TaihuLight,the heteroge-neous parallel supercomputers,have won the first place in the TOP500 list for several times.However,because most of the existing parallel and optimized basic algorithms are designed for homogeneous computing systems and accelerators with a single node,there is a lack of scalable algorithms that take full advantages of large-scale heterogeneous parallel computing systems for real-world applications.On the one hand,the problem of information security is becoming more and more serious in various large-scale applications,which not only requires the protec-tion of data security,but also puts forward higher requirements for the efficiency of encrypting and decrypting massive data.Large-scale heterogeneous parallel systems provide opportunities and challenges for efficiently protecting data security.On the other hand,sparse linear algebra contains the most basic and important algorithms used in many high-performance engineering applications.However,there are several challenges for parallelizing and optimizing sparse lin-ear algebra on high-performance heterogeneous systems,such as redundant memory footprint and calculation,load imbalanced,irregular memory access patterns,low ratio of computing and memory access,etc.In this paper,we study heterogeneous parallel algorithms of cipher algorithms and sparse linear algebra based on the 100P Sunway TaihuLight supercomputer to alleviate the above-mentioned difficulties.In summary,this paper makes the following contributions:(1)To achieve efficient data encryption and decryption for large-scale applications,this paper designs a heterogeneous parallel AES algorithm based on the heterogeneous manycore and cache-less architecture of SW26010 processor.First,an adaptive heterogeneous parallel scheme is designed according to the characteristics of the AES algorithm and Sunway archi-tecture.Second,based on the heterogeneous parallel scheme,the multi-level parallelization de-signs for AES encryption and decryption are proposed.Third,two performance optimizations are proposed for the heterogeneous parallel AES algorithm to further leverage the computing resources of SW26010 processor and improve the performance of computation and communi-cation.(2)To ensure the data security and integrity,while still enjoying efficient data encryp-tion/decryption for large-scale applications,this paper proposes a high-performance and secure data protecting system based on parallel AES and SHA-3 algorithms on Sunway TaihuLight supercomputer.In addition,a fine-grained heterogeneous parallelization design for the data protecting system is proposed to fully leverage the multi-level parallelism and better control memory of the Sunway architecture.Furthermore,several optimization strategies are adopted to obtain better performance of data encryption/decryption of the high-performance data pro-tecting system.(3)To alleviate the problems of local memory limitation,high memory access latency and load imbalance,this paper proposes a two-phase large-scale SpMV(TPSpMV)computa-tion framework on the Sunway TaihuLight.First,the two-phase parallel execution technique for TPSpMV that performs parallel CSR-based SpMV into two separate phases is proposed to overcome the computational scale limitation.Second,the adaptive partitioning methods and parallelization designs using the local memory caching technique for the two phases are de-signed to exploit the architectural advantages of the platform and alleviate the problem of high memory access latency and load imbalance.Third,several optimizations are designed to im-prove bandwidth usage and further optimize TPSpMV's performance.(4)To alleviate the problems of redundant data and irregular accesses in scaling SpMSpV over large-scale heterogeneous many-core systems,this paper proposes a fine-grained parallel SpMSpV(fgSpMSpV)framework on the Sunway TaihuLight.First,a re-collection method for SpMSpV,that removes unnecessary data and exploits the data locality,is proposed to eliminate redundant computations,optimize the coalesced memory accesses,and improve the bandwidth utilization.Second,a customized fine-grained parallelization for the re-collected SpMSpV with an adaptive compressed sparse matrix format is presented to adapt to the limited local memory and exploit the multi-level parallelism of the Sunway.Third,several optimization techniques are demonstrated to further exploit the computing resources for parallel re-collected SpMSpV.
Keywords/Search Tags:Cipher System, Information Security, Sparse Linear Systems, Heteroge-neous Parallel Computing, High-performance Computing, Sunway TaihuLight
PDF Full Text Request
Related items