Font Size: a A A

Application Reaserch Of Singular Value Decomposition And PCA Algorithm Based On MapReduce

Posted on:2016-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:S L HuFull Text:PDF
GTID:2428330542457357Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Internet has become the main tool for the people around the world to communicate with each other and to work with.Although the computer hardware is upgrading all the time,the speed of the hardware process the data is slower than the speed of the data is produced,and the expand of data will bring lots of troubles to the analysis of data obviously.Currently,there is a problem that lots of algorithms can't process massive data set,a good solution to this problem is to parallel the algorithm.So how to parallel algorithm to process massive data set is becoming a popular direction during the computer area.Among the algorithms that conduct the data mining and analyze on data set,Principal Component Analysis(PCA)is a frequently-used algorithm to analyze the data matrix and to reduce dimension by extract the main feature of the matrix.This thesis summary the recent research findings on the parallel of PCA,and then analyze these findings' advantages and defect,especially the process of calculate the eigenvalue of the matrix.After reading and analyzing the literature about PCA,we can found that singular value decomposition(SVD)is easier to parallel than the traditional feature solution algorithm.Meanwhile,the SVD algorithm can handle any type of matrix,so parallel the SVD algorithm is a indirect way to parallel the PCA algorithm.This thesis divides the whole process of PCA algorithm into tow stages,the first stage is to solve the correlation coefficient matrix,and the second stage is the singular value decomposition of the matrix.The core of the algorithm is how to realization the second stage,the singular value decomposition of the matrix,in parallel environment.Through the combination of the most popular parallel framework and the QR decomposition of matrix,this thesis propose a new way to parallel realize the SVD.Confirming the calculation speed of the parallel algorithm through the experiment on the data set that was consisted of random produced double floating point matrix of different dimensions,and compare the result with the traditional serial algorithm to show the efficiency improvement of the mentioned algorithm.Then we integrate the SVD algorithm into the PCA algorithm,and propose the parallel computing of the correlation coefficient matrix,and this will parallel the two stages of PCA algorithm.Subsequently,we conduct the comparison between the different dimensions of the matrix,and the comparison with the exists not fully parallel PCA algorithm and normal PCA algorithm,we can find that our algorithm will consume less time when process massive data set.Meanwhile,this thesis combines the parallel SVD algorithm and traditional PCA algorithm to get a new parallel PCA algorithm based on MapReduce.And we extract three different application data set of economic,medical,electronic competitive data to get the practical significance through the analysis of the PCA calculation results to guarantee the application value of the algorithm in real life.
Keywords/Search Tags:Principle Components Analysis, SVD, MapReduce, Application
PDF Full Text Request
Related items