Font Size: a A A

Isometry and convexity in dimensionality reduction

Posted on:2010-05-12Degree:Ph.DType:Thesis
University:Georgia Institute of TechnologyCandidate:Vasiloglou, Nikolaos, IIFull Text:PDF
GTID:2448390002479580Subject:Engineering
Abstract/Summary:
In this dissertation we study two current state of the art dimensionality reduction methods, Maximum Variance Unfolding (MVU) and Non-Negative Matrix Factorization (NMF). These two dimensionality reduction methods do not fit under the umbrella of Kernel PCA. MVU is cast as a Semidefinite Program, a modern convex nonlinear optimization algorithm, that offers more flexibility and power compared to KPCA. Although MVU and NMF seem to be two disconnected problems, we show that there is a connection between them. Both are special cases of a general nonlinear factorization algorithm that we developed.;Two aspects of the algorithms are of particular interest: computational complexity and interpretability. In other words computational complexity answers the question of how fast we can find the best solution of MVU/NMF for large data volumes. Since we are dealing with optimization programs, we need to find the global optimum. Global optimum is strongly connected with the convexity of the problem. Interpretability is strongly connected with local isometry1 that gives meaning in relationships between data points. Another aspect of interpretability is association of data with labeled information.;The contributions of this thesis are the following: (1) MVU is modified so that it can scale more efficient. Results are shown on 1 million speech datasets. Limitations of the method are highlighted. (2) An algorithm for fast computations for the furthest neighbors is presented for the first time in the literature. (3) Construction of optimal kernels for Kernel Density Estimation with modern convex programming is presented. For the first time we show that the Leave One Cross Validation (LOOCV) function is quasi-concave. (4) For the first time NMF is formulated as a convex optimization problem (5) An algorithm for the problem of Completely Positive Matrix Factorization is presented. (6) A hybrid algorithm of MVU and NMF the isoNMF is presented combining advantages of both methods. (7) The Isometric Separation Maps (ISM) a variation of MVU that contains classification information is presented. (8) Large scale nonlinear dimensional analysis on the TIMIT speech database is performed. (9) A general nonlinear factorization algorithm is presented based on sequential convex programming. Despite the efforts to scale the proposed methods up to 1 million data points in reasonable time, the gap between the industrial demand and the current state of the art is still orders of magnitude wide.;1Preservation of local distances.
Keywords/Search Tags:MVU, Dimensionality, Convex, NMF, Methods
Related items