Font Size: a A A

Spark Based Large Scaled Matrix Algorithms

Posted on:2018-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2348330518995454Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years , as the increasing development of mobile internet ,processing and querying massive amounts of data is meeting the challenge. The Hadoop ecosystem offers a rich set of libraries,applications, and systems with which you can build scalable big data applications .Linear algebra is a fundamental operation for machine learning algorithm. The growing scale and importance of big data has driven the development of parallel linear algebra algorithms. Spark is more suitable for iterative machine learning algorithms than MapReduce framework.First of all, we build a parallel linear algebra library based on Spark.The distributed matrix including index row matrix and block matrix is represented by Resilient Distributed Datasets . In this library , matrix operations builded in the library are matrix addition and matrix multiplication . Matrix multiplication is the fundament of other matrix algorithms. We adapted the matrix multiplication to characteristics of Spark. Based on that we proposed two parallel linear algebra algorithms .One is the dense matrix inversion and the other is singular value decomposition for bidiagonal matrix. Dense matrix inversion are both CPU-bound and IO-bound processing. We present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. With well-designed implementation and optimized data structure, we can reduce space complexity and network communication .Bidiagonal singular value decomposition is both time consuming and memory demanding .We propose a novel algorithm for solving bidiagonal SVD. With the lazy matrix multiplication. we sharply reduce the network communication . The proposed algorithms and implementations will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.
Keywords/Search Tags:matrix computation, parallel computation, matrix multiplication, matrix inversion, singular value decomposition
PDF Full Text Request
Related items