Font Size: a A A

Research And Implementation Of A Large Scale Matrix LU Decomposition And Inversion Algorithm Based On Spark Platform

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:2180330485460541Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Matrix inversion operation is a fundamental building block of many computational tasks in many fields such as machine learning and image processing. With the development of computer science, the matrices to be processed are usually very large and become even larger in the current big data period. Therefore, the parallelization of the matrix inversion operation has become an important research topic.With the development of Spark, machine learning algorithms on Spark have gradually become a popular research topic nowadays. The inversion of a matrix can be computed using many methods, such as LU decomposition, SVD decomposition and QR decomposition. In this thesis, the inversion of input matrix is computed based on LU decomposition because using LU decomposition method for matrix inversion is advantageous in parallelization over other methods. This thesis analyzes Spark platform’s feature and the characteristics of LU decomposition algorithm, then proposes a Spark based parallel matrix inversion algorithm. There are two steps in the algorithm:(1) Spark based large scale matrix LU decomposition algorithm is designed to calculate the input matrix’s LU decomposition. Based on the parallel LU decomposition algorithm, the input matrix is splitted into a set of independent blocks, which are fitted into memory and iteratively processed to get the lower triangular matrix and upper triangular matrix. (2) Based on the triangular matrices in step one and block matrix principle, Spark based triangular matrix inversion algorithm is proposed to invert the two triangular matrices. Therefore the inversion of the input matrix can be computed with the two inverted triangular matrices. In the implementation process, three methods are recommended to optimize efficiency of the algorithm:(1) Cache the intermediate results which are frequently used into memory. (2) Tune the efficiency of shuffle process. (3) Use non-recursive method to implement the algorithm.The algorithm in this thesis is valuable to other Spark based matrix operation researchers. In the experiments, the scalability of our algorithm on Spark is validated in both local mode with multiple cores and cluster mode with multiple computing nodes. The algorithm is compared with the conventional matrix inversion algorithm on Hadoop. Experimental results indicate that the algorithm achieves great scalabilities both on local mode with multiple cores and cluster mode with multiple computing nodes. Compared with the previous Hadoop based algorithm, efficiency has been promoted by 24.6%.
Keywords/Search Tags:Parallel Computing, LU Decomposition, Matrix Inversion, Spark, MapReduce
PDF Full Text Request
Related items