Font Size: a A A

Research On Multi-manifolds Learning Algorithms In Spark Environment

Posted on:2019-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y CaoFull Text:PDF
GTID:2428330548495252Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
So far,the rapid growth of high-dimensional image datasets containing redundant information has made it more and more difficult to use,so there is an urgent need for new dimensionality reduction methods.Since Manifold learning algorithm was published in the journal of Science in 2000,it has become a hot topic in the field of machine learning.Whether in theory or in practice,it has important significance to the manifold learning algorithm.After many years of research,manifold learning algorithm has evolved from traditional manifold structure to multi-manifolds structure,from unsupervised manifold learning algorithm to supervised manifold learning,semi-supervised manifold learning,incremental manifold learning and multi-manifolds learning algorithm.Manifold learning algorithm has important application value in data visualization,pattern recognition,file retrieval and biometrics.However,the development of science and technology makes the size of high-dimensional data increasing,which makes the traditional manifold learning algorithm seem inadequate.In recent years,the popular data parallel processing technology provides a new way for popular learning algorithms to deal with massive and high dimensional data.In this thesis,the algorithm LLE for multiple manifolds is studied,and the parallel programming framework of Spark is used to parallelize the multi-manifolds learning algorithm.The main innovative contributions are achieved as follows:(1)Propose an improved MM-LLE algorithm(IMM-LLE).Aiming at the shortcomings of MM-LLE algorithm,IMM-LLE algorithm framework is proposed,establish local low dimensional embedding between arbitrary two manifolds,and the embedding method and classification method of out sample points are designed.An adaptive optimal dimension selection method is proposed for the optimal dimension finding problem.This method extracts the validation set from the training set first,and then uses the out sample learning method to embed the validation set into the low dimensional manifold,and then chooses the dimension with the highest classification accuracy as the candidate best dimension.The best dimension is further selected from the best candidate dimension to make the maximum ratio of the inter flow distance and the internal density of the manifold.(2)Propose a IMM-LLE parallel learning algorithm(PIMM-LLE)based on Spark.In order to adapt manifold learning algorithm to big data environment,parallel learning framework is introduced.Based on algorithm IMM-LLE,Spark programming framework is added to achieve parallelization of k nearest neighbor lookup,cost matrix construction and feature vector extraction.Result based on different data sets and computing nodes show the superiority of the IMM-LLE parallel learning algorithm based on Spark.
Keywords/Search Tags:Manifold Learning, Multi-Manifolds Learning, Parallel Multi-Manifolds Learning, LLE
PDF Full Text Request
Related items