Font Size: a A A

Research On Manifold Embedding Matrix Factorization Algorithm

Posted on:2018-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiFull Text:PDF
GTID:1368330575978865Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the wide spread use of Internet,the processed datas have many characteristics such as massive information,high-growth,highdimensionality,and nonlinearity.How to quickly and effectively deal these complex massive data and extract the valuable information needed by users have always been a common concern in the field of pattern recognition and computer vision.As a new nonlinear dimensionality reduction method,matrix factorizationmethod,which decomposes a high dimension matrix into two or more matrixes,has become a new focus in machine learning.From manifold learning perspective,we explore how to effectively extract the valuable information of data,which embedded in the high-dimensional space.Because of defects of traditional manifold learning methods,such as the simplistic graph,the complex distribution of samples,the limitations of single layer decomposition,and these methods do not make full use of the geometric structure information or neglect the label information of labeled data,thus,we give an analysis of these problems and provide the solution in this paper.The major contributions in this thesis are summarized as follows:1.The Global Data Nonnegative Matrix Factorization(GDNMF)is proposed.K-Nearest Neighbor is employed to code the structure information of data in traditional graph methods,which is not enough to completely extract the geometric structure information due to the complex distribution of the samples.In addition,the optimal value of K is changed in different datasets,thus,how to select an appropriate K is always a difficult problem in graph learning methods.GDNMF keeps the topological relations between a sample and other points constantin low dimension space.Additionally,to learn better data representation and reduce redundancy,we also require that different bases should be as orthogonal as possible.Therefore,the data structure informations in new representation space are consistent with original datas via GDNMF.Finally,GDNMF is verified on ORL,USPS,and OUTEX datasets;2.The Structured Discriminative Nonnegative Matrix Factorization(SDNMF)is proposed for hyperspectral unmixing.SDNMF preserves the structural information of hyperspectral data by introducing structured discriminative regularization terms to model both local affinity and distant repulsion of observed spectral responses.Therefore,SDNMF takes advantage of local affinity property of data to guarantee similar raw data having similar abundance,and simultaneously ensure that dissimilar data have different estimated abundance.In addition,due to the low resolution of the spectral imager and the complex distribution of hyperspectral data,SDNMF is inaccurate to determine their endmembers only takes the distances between pixels as an evaluation criterion,especially for the materials which distribute at the junction.Therefore,we futher propose Global centralized and Structured discriminative Non-negative Matrix Factorization(GSNMF)method for hyperspectral unmxing.In GSNMF,the structured discriminative regularization term and the global centralized clustering are imposed to NMF framework,which is helpful to discover the underlying geometrical data structure and the characteristic between different categories of signatures,respectively.Through maintaining the global centralized clustering and local structured discriminative regularization,GSNMF drives a discriminative representation of the spectra and the obtained fractional abundances can well coincide with the real distributions of constituent materials.The experimental results on synthetic dataset and the real hyperspe ctral image datasets(Urban and Washington-DC)have demonstrated the effectiveness of the proposed SDNMF and GSNMF;3.The traditional Concept Factorization(CF)may yield inferior results as their factorization procedures only perform on single layer,to solve this issue,we propose Multilayer Concept Factorization(MCF),based on the hierarchical data representation.MCF is a cascade sub-system to decompose the observation matrix iteratively into a number of layers.With the sequential decomposition process,the feature matrix obtained via MCF is a cascade system which befinites the performance.Inspired by the manifold learning,we propose an extension of MCF,namely Graph regularized Multilayer Concept Factorization(GMCF),GMCF further incorporates graph Laplacian regularization in each layer to efficiently preserve the manifold structure of data.Generally speaking,multilayer matrix factorization methods consistently achieve better performance than their corresponding single layer methods.The experiments results on TDT2 corpus,COIL-20,and NJUrobt datasets have demonstrated the proposed multilayer methods,i.e.,MCF and GMCF,can effectively improve the accuracy and normalized mutual information in clustering;4.The manifold learning methods of simple graph model ignore the high-order relationship between data points,we propose a novel algorithm,called Hyper-graph Regularized Concept Factorization(HRCF)to solve this shortcoming.HRCF considers the high-order relationship of samples by constructing the hyper-edge in hyper-graph with a subset of data points sharing with similar attribute.HRCF preserves the high-order relationship of the manifold structure with resorting to add the Hyper-graph regulation term to CF framework.In order to consider the label information of data,we further proposes Hyper-graph regularized Constrained Concept Factorization(HCCF).HCCF not only extractsthe multi-geometry information of samples by constructing an undirected weighted hyper-graph Laplacian regularize term,but also takes full advantage of the label information of labeled samples as hard constraints to preserve the label consistent in low-dimensional space.The experiment results On Reuters corpus,MNIST,and OUTEX datasets demonstrate that HRCF and HCCF based on Hyper-graph own the superiorities interms of data representation and clustering performance than other compared methods.
Keywords/Search Tags:Manifold Embedding, Nonnegative matrix factorization, Concept factorization, Orthogonal, Hyperspectral unmix, Structured discriminative, Cluster, Multilayer factorization, Graph Laplacian, Hyper-graph, Semi-supervised, Hard constraint
PDF Full Text Request
Related items