Font Size: a A A

The Design And Application Of SVD Algorithm Based On Spark Platform

Posted on:2016-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y OuFull Text:PDF
GTID:2348330479954619Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In linear algebra, the singular value decomposition(SVD) is an important matrix computation algorithm. SVD is also widely used in signal processing and machine learning, which is used for reducing the dimensionality of complex data sets, principal component analysis, filtering noise and so on. In the era of information explosion, the traditional SVD algorithm can't deal with massive data under the background of big data. The combination of data processing platform and design of efficient distributed algorithm has become a significant and challenge research.Spark, developed by California Berkeley AMPLab, is a memory computing based distributed framework. Compared with the MapReduce distributed computing framework, Spark can well adapt the iterative calculation and efficiently handle the mass of complex data calculation, which is convenient to develop distributed iterative algorithm.In order to address the problem of massive data processing, this article proposes a parallel SVD algorithm in response to large-scale sparse matrix with implementation on Spark platform. Two important problem need to be addressed under the big data processing. Keeping the invariance of the sparsity of data is the first one, and the second one is the convenience and high efficiency to parallelize. To deal with these problems, a SVD algorithm based on Lanczos algorithm, binary algorithm and inverse power algorithm is proposed. Lanczos algorithm is used to transform a real symmetric matrix to a symmetric tridiagonal matrix by orthogonal similarity transformation, which is one of the most effective methods for solving large-scale eigenvalue problem. The binary algorithm and inverse power algorithm respectively for efficiently solving tridiagonal matrix eigenvalue and eigenvector. The experiment based on SVD of Spark platform parallel algorithm in accuracy, efficiency show the results that the algorithm has high efficiency in the large-scale data processing.A new application of SVD algorithm in query recommendation in the field of information retrieval is also proposed in the paper. Using the SVD algorithm, the latent semantic analysis model is constructed by the clicked title text analysis in search engine query log, which is used to calculate the similarity between queries. The results show that the algorithm in query recommendation also has good application effect.
Keywords/Search Tags:SVD, LSA, Big Data, Spark, Query Recommendation
PDF Full Text Request
Related items