Font Size: a A A

Geometric Perspective On Linear Dimensionality Reduction Algorithms

Posted on:2015-06-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J R HeFull Text:PDF
GTID:1228330467464389Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data is one of the largest resources in the new era of digital economy, has been highly valued in academia, business and government sectors. The core issue of data science is "how to extract valuable information from high-dimensional large data set". Dimensionality reduction is an important technique for high-dimensional data analysis. With the help of dimensionality reduction, we can visualize data in2D or3D space, find meaningful potential structure of data, reduce computatinal complexity, then get more favorable data representation which will be better for classification or clustering. On the subject of dimensionality reduction, this thesis gives some geometic perspectives on linear dimensionality reduction algorithms. High-dimensional data set can be viewed as data points that distributed in high dimensional Euclidean space, and the relationship between them can be represented by Euclidean distance or other similarity measure. The geometric structure and discriminant structure in data can be modeled as graph structure, and then dimensionality reduction can be taken as graph embedding. The main work of this thesis includes:(1) Geometric properties and statistical properties of high dimensional data space are reviewed, and the impact of distance measure between sample points on performance of KNN and Kmeans algorithms have been discussed through experimental analysis. The experimental results show that fractional norm based distance measure can reduce the "measure concentration" effect, so as to improve the performance of KNN and Kmeans algorithm.(2) A high-dimensional data intrinsic dimensionality estimation method based on manifold assumption is proposd. High dimensional data set can be viewed as distributed on low dimensional manifold. Therefore, the low-dimensional manifold dimensionality can be approximated through the local geometry relationship between high-dimensional data. Due to the number of smaple points in the local superballs centered at each sample points is proportional to its radius of superballs. According to this geometric relationship, we can obtain explicit formular of local intrinsic dimensionality in the local neighborhood of each sample point, then to estimate the intrinsic dimensionality of high-dimensional data set approximately by selecting different neighborhood parameters. Compared with traditional intrinsic dimensionality estimation methods, such as correlation dimensionality, maximum likelihood estimation and geodesic minimum spanning tree, the proposed method is simple, and is not sensitive to noise and neighbor parameter selection.(3) Two kinds of improved algorithm for Marginal Fisher analysis is proposed. Marginal Fisher analysis is a classical supervised linear dimensionality reduction method, has been widely used in pattern classification of high-dimensional data.Since Marginal Fisher Analysis involves matrix inversion operation which may lead to matrix singularity problem in numerical computation, especially when the number of samples is less than the dimensionality of sample, which is called "small sample size problem". Principal Component Analysis can be used for data preprocessing to overcome singularity problem, however, it may lose some discrimination information in samples. For dealing with these limitations, according to the non-singularity of matrix exponential, we apply matrix exponential transformation on scatter matrix in Marginal Fisher Analysis, and then the singularity problem in matrix inversion operation is overcome. Theoretical analysis shows that this method is equivalent to Null Space Marginal Fisher Analysis, which can extract information in the null space of intra-class scatter matrix, and then the discriminant ability of algorithm is enhanced. Regularized Marginal Fisher Analysis models local diversity information, intra-class similarity and inter-class seperation in high-dimensional data, and unified them into a trace difference optimal model with regularization method, such that intrinsic geometric structure in data is preserved and discriminant ability of projection matrix is enhanced.(4) A supervised linear dimensionality reduction algorithm called Marginal Discrimiannt Projection is proposed. In Marginal Discriminant Projection algorithm, the furthest two samples in the same class can be viewed as intra-class marginal sample points and the nearest two samples in the different classes can be viewed as inter-class marginal sample points. The algorithm aims to minimize distances between low-dimensional representations corresponding to intra-class marginal sample points and maximize distances between low-dimensional representations corresponding to inter-class marginal sample points, while confining the projection directions to be orthogonal, then the discriminant ability in low-dimensional representation is enhanced. In data modeling, we discussed Marginal Discriminant Projection algorithm based on trace differential criterion and trace ratio criterion respectively. Moreover, in order to encode local diversity information in samples, we proposed a regularized Marginal Discriminant Projectin algorithm which tries to extract discriminant structure and geometrical structure in high-dimensional data. Vectorized face gray image data is a typical high-dimensional data. In order to extract image features which are helpful to recognition, dimensionality reduction algorithms are adopted for feature extraction in experiment step, and low-dimensional feature representation is classified by the nearest neighbor method. Face recognition experiments on benchmark data sets demonstrated the effectiveness of the proposed algorithms in the task of discriminant feature extraction.
Keywords/Search Tags:Dimensionality Reduction, Graph Embedding Model, Intrinsic DimensionalityEstimation, Marginal Discriminant Projection, Geometric Structure, FaceRecognition
PDF Full Text Request
Related items