Font Size: a A A

Research On Parallel Optimization Technology Of PCA Algorithm

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:W YuFull Text:PDF
GTID:2428330611993151Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of science,with the wide application of various types of precision instruments and the increasing amount of data generated by simulation,there are more and more data that researchers need to analyze and process.The large size of the data makes it impossible for traditional serial computing methods to obtain data processing results within the time overhead that researchers can accept.How to efficiently analyze and process huge scientific data and obtain valuable information from it has become a challenging problem.Principal Component Analysis(PCA)can effectively simplify information and compress data,becoming a data preprocessing technology widely used in big data applications.In the case of ever-increasing data size,pre-processing data using the PCA algorithm can compress and simplify data and save memory while minimizing data loss.However,in practical applications,the data size is too large,exceeding the capacity of the computer memory,the sample data can not be fully read into the memory,the data pre-processing step will still cause huge time overhead.How to make the PCA algorithm as fast and efficient as possible when dealing with large-scale data is a problem worth studying.Traditional supercomputing systems are more focused on "computation-intensive" applications,while the emerging big data analytics ecosystem is more focused on "dataintensive" applications due to demand-driven.There are some separations between the two technologies to some extent.In the era of big data,how to integrate big data technology and utilize the huge computing power of high-performance computing systems to deal with big data has become an opportunity and challenge.This paper takes PCA algorithm as the research object,and studies the parallel optimization technology and efficient implementation of PCA algorithm for processing large-scale scientific data on high performance computing system Tianhe No-2.The main results have been achieved as follows:1)A fast PCA parallel optimization algorithm is designed.The algorithm first simplifies the process of PCA solving,and then migrates the idea of MapReduce computing model in big data analysis technology to high-performance computing environment.This design makes it simple and efficient to implement the algorithm on different platforms while using the powerful computing power of high-performance computing system.2)A multi-threaded PCA implementation algorithm for multi-core architecture is proposed.This algorithm is an efficient implementation of fast PCA optimization algorithm design under multi-core architecture.It uses OpenMP to implement the MapReduce calculation model and SIMD to further optimize the calculation method,which is up to 110 times faster than the efficient serial algorithm.Compared with the implementation of the PCA algorithm in the Intel DAAL algorithm library in the same computing environment,A 28 x acceleration ratio is obtained.3)A PCA implementation algorithm for hierarchical hybrid parallel optimization for distributed architecture is proposed.This algorithm is an efficient implementation of fast PCA optimization algorithm design under distributed architecture.It uses MPI to achieve coarse-grained parallelism,OpenMP to implement fine-grained parallelism,and SIMD to further optimize the computational method.On 128 nodes,it can achieve 145 times faster than the efficient serial algorithm.On 256 nodes,the performance is 29.6 times better than Spark MLlib.4)A heterogeneous hybrid PCA implementation algorithm for CPU+GPU heterogeneous architecture is proposed.This algorithm is an efficient implementation of the fast PCA optimization algorithm designed under the CPU+GPU heterogeneous architecture.Using CUDA to implement the fast PCA solution process,OpenMP implements multi-thread parallel control GPU method,which can get 202 times faster than single serial algorithm on a single GPU,and can achieve the highest 553 by using multi-thread parallel hybrid optimization on a single node.The speedup ratio is doubled.At the same time,the algorithm is also tested in the SNP gene analysis processing on HapMap3 dataset application,and achieves good performance.
Keywords/Search Tags:PCA, big data, multi-core, distributed architecture, GPU
PDF Full Text Request
Related items