Font Size: a A A

Heterogeneous Parallelization Of Heuristic Algorithms Based On Sunway Manycore Architecture

Posted on:2020-09-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:1368330596967733Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Heuristic algorithms are widely used to solve optimization problems,which tend to be computationally expensive when dealing with large-scale problems.The Sunway TaihuLight system,the first supercomputer offering a peak performance over 100 PFlops,is powered by SW26010 many-core processors.The system can provide a powerful computing platform for the heuristics.Owing to the unique heterogeneous architecture and software environment of SW26010,it is of great practical significance to study the heterogeneous parallelization of heuristic algorithms to utilize the system.The genetic algorithm(GA)and self-organizing map(SOM)are two typical heuristic algorithms with different algorithm characteristics.The heterogeneous parallelization of GA and SOM can significantly reduce the solving time,and the complementary design principles,such as the hierarchical models and optimization methods,can provide references for the heterogeneous parallelization of other heuristic algorithms.The main contributions are as follows.1.A heterogeneous parallel genetic algorithm based on the Sunway TaihuLight is proposed,referred to as swGA.A migration pool with dynamic immigration strategy and interleaving emigration scheme,an island model based on the migration pool,three hierarchical models based on the island model,and a data sharing scheme among cells by using register-level communication(RLC)are proposed.The pool preserves the elitists,and guarantees the population diversity.The dynamic immigration strategy is based on the proposed criterion,diversity in population.The interleaving emigration scheme is proposed to reduce the communication overhead during the pool updating.The data sharing scheme is designed for the crossover among cells for the island-cellular model.Besides,several optimization methods,such as double buffering and vectorization are presented.The experiment results show that the better search ability and faster convergence are obtained by swGA,compared with the serial GA and the parallel GA of island model with directed migration.For the hierarchical models,the even super-linear speedup about 76 times is achieved on a single core group,and the speedup up to 3500 times is achieved using 64 MPEs with 4096 CPEs,compared with the serial.2.A heterogeneous parallel nondominated sorting genetic algorithm II based on the Sunway TaihuLight is proposed,referred to as swNSGA-II.A pool based enhanced island-cellular model,a RLC based data sharing scheme,and two parallel algorithms based on the scheme,non-dominated sorting and crowed-distance computation,are proposed.In the model,CPEs are assigned the most computationally expensive parts,such as non-dominated sorting and croweddistance computation.By using the data sharing scheme,the parallel algorithms can avoid the performance degradation caused by the insufficient memory bandwidth,and the extra overhead caused by indeterminate communication traffic and direction.Besides,several optimization methods,such as data reorganization and vectorization are presented.The experiment results show that swNGSA-II can get the solutions with more diversity,compared with the serial.The super-linear speedup is achieved by the process-level swNSGA-II.The speedup beyond 17 times is achieved on a single core group,and the speedup up to 46000 times is obtained using 64 MPEs with 4096 CPEs by the two-level swNGSA-II,compared with the serial.3.A heterogeneous parallel self-organizing map based on the Sunway TaihuLight is proposed,referred to as swSOM.A hierarchical parallel model based on data parallelism,the matrix operations of distance computation and model updating,a software-emulated cache are proposed.In the model,CPEs are assigned the most computationally expensive parts to utilize CPEs.The distance computation and model updating are re-structured into matrix operations,leading to a significantly acceleration beyond 2700%.The proposed software-emulated cache can compensate the performance degradation caused by the incomplete memory hierarchy of SW26010.Besides,several optimization methods,such as vectorization and double buffering are presented.The experiment results show that the better training quality on clustering is obtained by swSOM,compared with Matlab and Somoclu.And the ultra-high super-linear speedup of 923 times is achieved on a single core group,and the speedup of 577 times is achieved using 1 MPE with 64 CPEs compared with Somoclu using 1 MPE.The speedup of 29439 times is achieved using 128 MPEs with 8192 CPEs,compared with the serial SOM.In summary,the dissertation studies the parallelization and optimization of the heuristics represented by GA and SOM on the Sunway TaihuLight.The satisfied acceleration has been achieved,and the algorithms have been or are being included in the software repository of the Sunway TaihuLight,and also are in the repository plan of the forthcoming exa-scale supercomputer.
Keywords/Search Tags:heterogeneous parallel, Sunway, manycore, heuristic algorithms
PDF Full Text Request
Related items