Font Size: a A A

Fast Algorithms For Trace-Ratio And Nonnegative Matrix Factorization Problems On Dimensionality Reduction Of High Dimensional And Large Sample Data

Posted on:2022-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Y ShiFull Text:PDF
GTID:1488306533968399Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
High dimensional data is the product of the era of big data.The data obtained from aerospace,internet of things engineering,computer vision and other fields are often high dimensional.With the rapid flow of information,the dimension and sample size of data gradually increase exponentially,which leads to a large number of uncertain factors in the collected data,such as white noise,abnormal data,missing data and so on.Therefore,how to extract effective information and rules from high-dimensional data both quickly and effectively has become a crucial challenge for researchers.Dimensionality reduction is a common technique for feature extraction,which can effectively overcome the problem of dimension disaster.There have been many fruitful results for dimensionality reduction.However,classical methods are difficult to deal with high-dimensional large sample data effectively,especially high-dimensional dense large sample data.This paper proposes a series of new fast algorithms for highdimensional large-scale dense sample data.The new algorithms can also effectively deal with large-scale sparse data,and the effectiveness of the proposed method is verified in face recognition,text classification and other practical applications.The main contributions of this paper are as follows:a)We focus on the trace-ratio problem.The trace-ratio problem is a key problem to feature extraction and dimensionality reduction to circumvent the high dimensional space.However,it has been long believed that this problem has no closed-form solution,and one has to solve it by using some inner-outer iterative algorithms which are very time consuming.Therefore,efficient algorithms for high-dimension and largesample trace-ratio problems are still lacking,especially for dense data problems.In this work,we present a closed-form solution for the trace-ratio problem,and propose two algorithms to solve it.Based on the closed-form solution and the randomized singular value decomposition(RSVD),we first propose a randomized algorithm for solving high-dimension and large-sample dense trace-ratio problems.For high-dimension and large-sample sparse trace-ratio problems,we then propose an algorithm based on the closed-form solution and solving some consistent under-determined linear systems.This algorithm can keep the sparse structure of data unchanged.Theoretical results are established to show the rationality and efficiency of the proposed methods.b)We study the graph embedding method based on principal component analysis(PCA)preprocessing.PCA plus graph embedding methods are commonly used techniques and are often used as benchmark algorithms for dimensionality reduction and face recognition.In these methods,the PCA is used first,and then some graph embedding methods are applied.In this work,we point out a shortcoming related to PCA:the instability of the widely used PCA plus graph embedding methods.That is,the classification performance of these methods may vary significantly after data perturbation or contamination.To the best of our knowledge,there are no theoretical results for this.To fill in this gap,we show the reason from a matrix perturbation analysis point of view.Moreover,so as to cure the drawback of instability,a framework of PCA plus exponential graph embedding methods is proposed.The computational cost of the proposed methods is comparable to that of their original counterparts,and our methods are much more stable.c)We focus on exponential discriminant analysis for dimensionality reduction which is a research topic in recent years.However,one has to solve some large scale matrix exponential eigenvalue problems which are the bottleneck in these types of methods.In this work,we propose a framework for fast implementation on general matrix exponential-based discriminant analysis methods.The key is to equivalently transform large scale matrix computation problems into much smaller ones.As the equivalence of the transformation,the fast algorithm do not loss recognition rates.On the other hand,it was mentioned that the exponential model is more reliable than the original one and suppresses the sensitivity to perturbations.However,the interpretation is only heuristic,and to the best of our knowledge,there is no theoretical justification for reliability and stability of the matrix discriminant analysis methods.To fill in this gap,the second contribution of our work is to provide stability analysis for the fast exponential discriminant analysis method from a theoretical point of view.It is shown that the exponential discriminant analysis methods are stable.d)Nonnegative matrix factorization(NMF)is a prominent technique for data dimensionality reduction.Alternating iterative methods are usually applied to solve this problem,which involves the computation of NMF subproblem,namely negative least squares(NLS)problem.In this paper,a framework called ARk NLS(Alternating Rankk Nonnegativity constrained Least Squares)is proposed for computing NMF.First,a recursive formula for the solution of the rank-k nonnegativity-constrained least squares(NLS)is established.This recursive formula can be used to derive the closed-form solution for the rank-k problem for any integer k ? 1.As a result,each subproblem for an alternating rank-k nonnegative least squares framework can be obtained based on this closed form solution.This paper is then focused on the framework with k = 3,which leads to a new algorithm for NMF via the closed-form solution of the rank-3 NLS problem.Furthermore,a new strategy that efficiently overcomes the potential singularity problem in rank-3 NLS within the context of NMF computation is also presented.
Keywords/Search Tags:dimensionality reduction, large scale discriminant analysis, trace-ratio problem, fast algorithm, nonnegative matrix factorization
PDF Full Text Request
Related items