In the era of big data,people need to analyze all kinds of complex data in the fields of biology,meteorology,transportation,economy,medicine and so on.They have presented a lot of features such as large scale,high dimension,high complexity and hard real-time,resulting in the ”Curse of Dimensionality”,and ”data explosion but knowledge poverty”,and many other phenomena,which have brought many practical challenges to data analysis.As an effective data analysis technique,data dimensionality reduction can reveal valuable information hidden in the high dimensional data,providing an important technical approach for analyzing the internal structure of the high dimensional data.In recent years,non-negative matrix factorization and manifold learning,which are typical traditional dimensionality reduction methods,have been widely researched and applied.However,they are faced with many problems when dealing with large-scale complex high dimensional data,such as high computational complexity,large storage overhead,low learning accuracy,etc.Oriented to large-scale batch and incremental datasets,this thesis studies the dimensionality reduction method combining non-negative matrix factorization and manifold learning respectively,from mathematical theoretical model,software parallel computing and hardware accelerated implementation.Meanwhile,by taking various optimization strategies,this thesis proposes parallel batch algorithm,online algorithm and hardware acceleration for non-negative matrix factorization with manifold regularization.Specifically,the main work and innovations of this thesis are as follows:1.Proposing parallel non-negative matrix factorization with manifold regularization method(Chapter 2)In this thesis,we parallelize non-negative matrix factorization with manifold regularization method for large-scale batch datasets,establish the theoretical model PNMF-M of the proposed parallel method,and present a load-balanced data partition strategy,by which we define the objective function of parallel solving model and derive the update rules for factor matrices.According to the different definition of neighborhood patterns in manifold construction,we propose parallel manifold construction methods respectively.A multi-process parallel algorithm based on distributed cluster system is designed as well.Experimental results show that the PNMF-M algorithm greatly improves the learning accuracy and scalability,with less computational and communication overhead.2.Proposing online non-negative matrix factorization with manifold regularization method(Chapter 3)We introduce the manifold learning method into the incremental non-negative matrix factorization,and propose an online non-negative matrix factorization with manifold regularization model for large-scale incremental datasets,called INMF-M.By adding manifold regularization item,we use the recursive method to reconstruct the objective function of the model,give the update rules for factor matrices in detail,and analyze the scalability.In order to further improve the scalability,we propose a buffering optimization strategy and design an online non-negative matrix factorization with manifold regularization using buffer algorithm,called INMF-MB,and analyze its complexity.Experimental results show that this algorithm greatly improves the learning accuracy and scalability.3.Proposing online non-negative matrix factorization with manifold regularization and feature selection method(Chapter 4)A lot of high dimensional data in practical applications often contain some noisy,irrelevant or redundant attributes.Thus,we introduce the feature selection technique into online non-negative matrix factorization with manifold regularization method,and propose an online non-negative matrix factorization with manifold regularization and feature selection model,called INMF-M.By adding feature selection matrix,we redefine the objective function of the model.Then we take the feature selection matrix as one of the factor matrices,to re-derive the update rules for all the factor matrices.For the scalability problem in the FS-INMF-M,we present an random projection tree optimization strategy and design an online non-negative matrix factorization with manifold regularization and feature selection using random projection tree algorithm,called FS-INMF-MT,and analyze the complexity of the algorithm.Experimental results verify the feasibility and effectiveness of the proposed algorithm.4.Proposing hardware acceleration technique for non-negative matrix factorization with manifold regularization(Chapter 5)Due to the high real-time performance of many practical applications,just using traditional computing platform to reduce the data dimension is difficult to meet the actual needs.In view of the online non-negative matrix factorization with manifold regularization method,we study the feasibility of hardware acceleration.Focusing on analyzing the data and operation characteristics of the core code in the INMFMB algorithm,we give the specific optimization measure and the modular design,and propose two different hardware acceleration methods,which are based on InterFeature Parallel and Intra-Feature Parallel respectively,with detailed design scheme of data storage structures and functional modules described.We realize the design scheme based on FPGA,and Experimental results show that our design can effectively accelerate the execution of the INMF-MB algorithm,and get considerable speedup. |