Font Size: a A A

The Research Of The Tensor Decomposition Algorithm In The Distributed Environment

Posted on:2018-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:C MaiFull Text:PDF
GTID:2428330518957952Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
Tensors are higher order generalizations of matrices.While matrices consist of rows and columns in a form of two-dimensional arrays,tensors are multi-dimensional arrays.Two-dimensional arrays are able to describe pairwise relationships between a pair or multiple pair of variables,but tensors are able to represent high order relationships between three or more variables.Owing to the capability of tensors' description,not only are tensors powerful to text analysis,but also,they are widely used in social network,time series analysis,etc.Over the past few decades,the research of tensor is mainly concentrated on the field of physics,numerical analysis,signal processing and theoretical computer science and other theoretical fields.Because the computers were not yet very powerful in the early days of the computer and the time complexity of tensors related algorithm are usually exponential,the matrix was heavily used in the engineering field of computer science.With the evolution of computer and the raising of big data,once again tensors receive a lot of attention after the development of tensors' theory.When processing massive volume,it is a common case that people must deal with the data in high-dimensional feature spaces.The ability of the matrix to describe the data in a two-dimensional form is becoming increasingly difficult to handle high-dimensional data.Tensors are becoming the mainstream when dealing with high-dimensional data.Tensor decomposition is a primary tool when applying tensor to high-dimensional data.Though tensor decomposition,the characteristics,which implicitly lie in the data,can be efficiently extracted.At the same time,unimportant part is removed.In this way,the removal of noise data,the data dimension decrease and the reduction of the amount of data is achieved.The tensor's CP decomposition is an important method in tensor decomposition.At first,this paper describes the tensor and the tensor's CP decomposition and analyze the difficulty of computation and implementation lies in the traditional ALS based CP decomposition algorithm.And then,aiming the efficiency problem of traditional CP decomposition algorithm,this paper designs and implements distributed CP decomposition algorithm based on Spark platform,ParaTD.Comparing to traditional CP decomposition algorithm,the main contributions of this paper can list as following:(1)A distributed algorithm based on Spark platform for CP decomposition is proposed.It uses Scala to implement the project.ParaTD is able to take advantage of Spark's RDD and utilize memory is used as the main storage method of the data in the calculation process,which reduces the overhead of disk access.(2)Design and implement an algorithm to decouple the Khatri-Rao product.The algorithm splits a tensor into multiple fiber,which avoiding the expansion of temporary data during the computation and lay a solid foundation for CP decomposition of large-scale data.(3)Design and implement an algorithm to compute outer product in a parallel way and a method to utilize distributed cache to accelerate the computation of matrix's product.The algorithm split the matrix that is used in the computation of outer product into row vector and compute the outer product that is related to another in a distributed manner.At the same time,the algorithm makes use of broadcast variables in Spark and distribute small vector to break up the big matrix's product into parts,which further improves the efficiency of the calculation.The experiments prove that our algorithm has large enhancement for the efficiency of computation and the utilization of resources when comparing to traditional CP decomposition algorithm.
Keywords/Search Tags:Tensor, CP decomposition, Spark, Distributed algorithm
PDF Full Text Request
Related items