LowRank Structure Based Learning Data Representation  Posted on:20130905  Degree:Doctor  Type:Dissertation  Country:China  Candidate:F H Shang  Full Text:PDF  GTID:1228330395955444  Subject:Circuits and Systems  Abstract/Summary:  PDF Full Text Request  In recent years, with the rapid development in computer and sensor technology,highdimensional data such as image, video, text and web data are becomingincreasingly ubiquitous across science and engineering. They, in addition to requiringextensive computational and memory recourses, make inference, learning, andrecognition infeasible and pose some severe challenges to traditional machine learningand statistical methodology such as the â€œcurse of dimensionalityâ€ and â€œsmall samplesizeâ€ problems. How to find an effectively way of representation by appropriatedimensionality reduction technique has become important, necessary and challenging inthose data analysis, and it has also aroused considerable interests from the statistical andmachine learning, data mining and computer vision communities.Effective dimensionality reduction could mitigate the curse of dimensionality andmake the learning task more reliable and more efficient. However, some classicaldimensionality reduction techniques do have certain limitations: The traditional spectralclustering algorithm, nuclear norm minimization based algorithms for matrix recoveryand completion problems, and nonparametric kernel learning algorithms are intractablefor large scale problems because their computational complexity usually could be ashigh as O ï¼ˆn3ï¼‰or O (n^{6.5}), where nis the size of samples; The traditionalnonnegative matrix factorization ï¼ˆNMFï¼‰ method and other existing variants of NMF donot consider the intrinsic geometrical structure of features; And existing learning datarepresentation approaches with some supervised information do not make use of thelowrank structure of the involved kernel matrix. In order to overcome these problems,the dissertation tries to learn effective data representations by exploring the underlyingdata structure such as the geometrical structure, sparse and lowrank structures. Thenutilizing a small amount of available supervised information such as scarcely labeleddata and pairwise constraints, even more effective data representation can be learned.The authorâ€™s major contributions are outlined as follows:1. An efficiency twostage spectral clustering framework with local and globalconsistency is proposed, and it can make use of the intrinsic geometrical and lowrankstructures of the given data. The framework consists of the following phases: I. use afast sampling method to achieve a small set of representative exemplars; II. run theproposed densityweighted Nystr m approximation spectral clustering algorithm toextrapolate the whole case from the small set of final representative exemplars. Underthis framework, the local and global distances are defined to relatively elongate or shorten the distances between data points lying on different manifolds or the samemanifold, and they are very robust against noise and outliers. Then a fast twostageaffinity propagation algorithm is presented to produce a small number of representativeexemplars, and is a nonuniform sampling technique. Finally, a fast densityweightedlowrank approximation spectral clustering algorithm is proposed. Encouragingexperimental results show that the proposed algorithm outperforms the stateoftheartaffinity propagation and spectral clustering algorithms in terms of speed, memory usageand clustering quality, and it can handle largescale clustering problems and produce thelowdimensional spectral embedding.2. A graph dual regularized nonnegative matrix factorization framework isproposed, and can make use of the geometrical and lowrank structures of the given data.The framework can simultaneously consider geometrical structures of both the datamanifold and the feature manifold, whose information is encoded by constructing twonearest neighbor graphs, respectively. Under this framework, a graph dual regularizednonnegative matrix factorization ï¼ˆDNMFï¼‰ model is presented, and its iterativemultiplicative updating scheme is also provided. As an extension of DNMF, anothergraph dual regularized nonnegative matrix trifactorization model is designed. Finally,the convergence proofs of two proposed iterative multiplicative updating schemes areprovided. The experiments on pattern clustering, face clustering and SAR targetrecognition validate the effectiveness of two proposed algorithms, and show that theycan learn better partsbased representations of data than the competitions.3. A robust lowrank matrix factorization ï¼ˆLRMFï¼‰ framework is proposed torecover the data matrix contaminated with noise and outliers and completion the matrixwith missing entries. The framework can make full use of the sparse and lowrankstructures of the given data. The stateoftheart nuclear norm minimization algorithmsinvolve singular value decomposition ï¼ˆSVDï¼‰ at each iteration, therefore, they sufferfrom high computation cost of multiple SVDs. To resolve this difficult, an efficientmatrix bifactorization idea is introduced into the original nuclear norm minimizationmodels, and then a small scale matrix nuclear norm minimization problem is formulated.According to the given data lying on a single linear subspace or a union of multiplesubspaces, two different robust lowrank matrix factorization models is presented, theyare two hybrid optimization problems with nuclear norm and l1norm or l2,1norm.Furthermore, an efficient alternating direction method ï¼ˆADMï¼‰ based iterative scheme isdeveloped to solve the proposed problems. Finally, the proposed LRMF approach is extended to address the observed data with missing entries.4. A data representations learning framework based on the binary classification onthe entries of kernel matrix is proposed and makes full use of the geometrical andlowrank structures and the given pairwise constraints. The framework takes advantageof Laplacian spectral regularization and then learns the enhanced spectral embeddingtowards an ideal representation as consistent with pairwise constraints as possible.Specially, a symmetryfavored kNN graph is constructed, and it is highly robust againstnoise and outliers and can reflect the underlying manifold structure of data. Under thesquared loss function, learning data representations is formulated into asemidefinitequadraticlinear programs problem, which can be efficiently solved byusing the standard software package, such as SDPT3. Finally, the enhanced spectralembedding is applied to tackle semisupervised clustering and tranductive classificationproblems.5. A data representations learning framework based on completing an â€œidealâ€ kernelmatrix is proposed. The framework not only considers the geometrical structure of data,but also takes advantage of the given side information such as pairwise constraints.Generally, the number of the available pairwise constraints is far less than the one whichis sufficient to complete the lowrank kernel matrix with high probability. Therefore, thespectral embedding of graph Laplacian is considered as the auxiliary information andthen only a smallscale positive semidefinite ï¼ˆPSDï¼‰ matrix with nuclear normregularization needs to be solved. The proposed model for leaning the enhanced matrixis a nuclear norm regularized least squares problem. Then a modified fixed pointcontinuation iterative algorithm with an eigenvalue thresholding operator is developed.Finally, a formal proof is provided that the proposed algorithm converges to an optimalsolution. Moreover, the proposed algorithm is also extended to tackle semisupervisedclustering and transductive classification problems.6. A semisupervised learning framework with mixed information is proposed. Theframework can not only simultaneously handle both sparse labeled data and additionalpairwise constraints, but also utilize a large amount of unlabeled data to assist with theclassification learning process. Specifically, a unified semisupervised learning modelwith nuclear norm regularization is presented for classicization tasks. Then a twostageoptimization strategy is developed, and a modified fixed point continuation algorithm isproposed to learn lowrank kernel matrices. Moreover, the global convergence propertyof the proposed algorithm is analyzed, and the BarzilaiBorwein technique is incorporated in the proposed algorithm to accelerate its convergence. Finally, asemisupervised classification approach with enhanced spectral kernel is presented, andan effective method is presented to extend the propose approach to outofsample data.  Keywords/Search Tags:  Local and global consistency, Graph Laplacian regularization, Nonnegative matrix factorization, Nuclear norm, l1or l2,1 norm, Matrix recovery and completion, Mixed information, Lowrankspectral kernel  PDF Full Text Request  Related items 
 
