Font Size: a A A

Research On Distributed Manifold Learning Algorithm Based On Spark

Posted on:2021-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z FangFull Text:PDF
GTID:2518306560453034Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age,the increase in data scale and data dimensions have made data processing more difficult,leading to "data disasters".Reducing the data dimension is an important means to reduce the difficulty of processing high-dimensional big data.Manifold learning,as a method of data reduction,can maintain the original topological structure of high-dimensional data and extract the inherent characteristics of the data.But the traditional manifold learning algorithm has certain problems.Manifold learning method based on spectral decomposition needs to solve the eigenvalues of the matrix.It has high space complexity and time complexity,and it is difficult to adapt to the dimensionality reduction requirements of large-scale data.On the other hand,the serial manifold learning method in a stand-alone environment is also difficult to adapt to the dimensionality reduction requirements of large-scale data due to environmental constraints.Aiming at the above problems,this paper has conducted in-depth research on manifold learning methods based on iterative solutions,and implemented them in parallel on the Spark platform.The main contents are as follows:(1)Aiming at the problem of high complexity of eigenvalues in manifold learning method based on spectral decomposition,a manifold learning algorithm framework based on iterative solution is proposed,and low-dimensional embedding of highdimensional data is obtained through iterative solution.The framework uses the fastest gradient descent method to establish a unified iterative framework for the classic manifold learning algorithm,which realizes the consistency of the different manifold learning iterative solving processes,which greatly reduces the space and time complexity of the manifold learning algorithm.In particular,the space complexity of the algorithm has been reduced to greatly reduce the storage requirements of the algorithm.(2)In order to overcome the limitations of serial manifold learning algorithms in a stand-alone environment,based on the proposed manifold learning algorithm framework based on iterative solution,a distributed iterative manifold learning algorithm framework based on Spark was designed,and ISOMAP,LLE,and LE were implemented.The parallel processing of this manifold learning algorithm greatly improves the execution efficiency of the algorithm.At the same time,the characteristics of Spark are used to optimize the algorithm from the aspects of RRD tuning and operator tuning,which further improves the execution efficiency of the algorithm.(3)The proposed method is validated on Swiss roll and S-Curve datasets.The experimental results show that the manifold learning algorithm based on iterative solution can effectively map high-dimensional data to low-dimensional space,which greatly reduces the complexity of the algorithm;through parallel solving and optimization,it greatly improves the execution efficiency of the algorithm and can adapt to dimensionality reduction processing in big data environment.
Keywords/Search Tags:Manifold learning, Spark, eigenvalue solution, fastest gradient descent method, distributed computing
PDF Full Text Request
Related items