Font Size: a A A

Parallel Optimization Of Data Intensive Computing On Sunway TaihuLight

Posted on:2020-05-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:L D LiFull Text:PDF
GTID:1368330626964468Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet of Things,mobile network,artificial intelli-gence and other technologies,the human society has stepped into the big data era,with the data generated increasing exponentially.Big data usually contains big value and thus,is regarded as strategic resources by many countries.Getting big data value raises higher demands on computing.However,due to the failure of Moore's Law and the power walls,the gap between available computing capabilities and computing demands gets wider and wider,making it badly needed to explore new ways of data-intensive computing.Supercomputers of heterogeneous many-core architecture are considered to be the"killer" weapon to solve big data problems.But,fully exploiting the potential of these supercomputers faces many challenges such as memory access,thread organization,data sharing,and programming model.Taking the Sunway TaihuLight supercomputer as the target platform and the unsupervised machine learning algorithm k-means,deep learning kernel function calculation and the AES(Advanced Encryption Standard)algorithm in data security as the typical data-intensive computing paradigms,this paper explores how to efficiently perform parallel computation and optimization so as to meet the need of timely and accurate data analysis.Our main contributions are as follows:(1)For the unsupervised machine learning algorithm k-means,this paper proposes and implements for the first time a multi-level hierarchical optimization method in which the three dimensions of data sample number,clustering centroid number and data dimen-sion can be parallelized simultaneously and independently,thus eliminating the bottleneck of high-dimensional data processing.In addition,parallel optimization strategies such as two AllReduce operations and large-scale parallel communication are designed and implemented to solve the problem of large-scale scalability and achieve efficient paral-lel computation.The experimental results show that the optimized k-means algorithm achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 1,265,723 samples,196,608 data dimensions and 2,000 centroids by applying 4,096 nodes(1,064,496 cores)in parallel,making k-means a more feasible solution for complex scenarios.(2)For the deep learning kernel functions calculation,this paper designs and imple-ments a parallel optimization strategy on Sunway heterogeneous many-core architecture.It shortens the computation time and improves the training and reasoning efficiency of neural network models through such mechanisms as register communication,DMA mem-ory access,loop tiling and/or merging,double buffering,and so on.The experimental results show that the proposed parallel optimization method on a single SW26010 hetero-geneous many-core processor can reach 23%-116%overall performance of the case with NVIDIA K40m GPU,and when compared with the case of Intel 2-way 12-core E52680 V3 CPU,the performance improvement ranges between 3.04 and 7.84 times.(3)For the AES algorithm in data security,this paper designs and implements a vectorized programming model and an optimization strategy for inter-/intra-core group as well as instruction-level parallelism.In this way,the problem of vectorization and parallel instruction execution of the AES algorithm on the SW26010 heterogeneous many-core processor is solved and the processor capability is fully exploited.The experimental results show that the parallel optimization method in this paper achieves a maximum throughput of 13.49 GB/s on a single SW26010 heterogeneous many-core processor,and when the number of computing nodes is expanded to 1,024 and the input data block size is 1 GB,a throughput up to 13,381.58 GB/s can be achieved,approaching near linear scalability.
Keywords/Search Tags:Data Intensive Computing, Heterogeneous Many-core Architecture, k-means, Deep Learning, AES
PDF Full Text Request
Related items