Font Size: a A A

The Application Of Manifold Learning In Data Dimensionality Reduction

Posted on:2016-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:S W ChenFull Text:PDF
GTID:2308330464967291Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, more and more data presents high-dimensional property and nonlinear distribution.Through the data dimensionality reduction, the nature of data can be deeply mined and gradually attracted the attention of scholars. Dimensionality reduction method rarely considers the geometric structure of data in general, but manifold learning can find the low-dimensional manifold structure hidden in the high-dimensional data. So it is widely used in data visualization, pattern recognition, image processing and image or text information retrieval. After reviewing and summarizing the relevant literature home and abroad, we carry out a research on manifold learning and its application to data dimensionality reduction. The concrete steps are as follows:1. This paper not only describes a variety of classic manifold learning methods, especially isometric mapping algorithm, but also summarizes and compares with algorithm. Taking two data sets as an example, the result directly reflects the effect of data dimensionality reduction by manifold learning.2. Starting from the aspect of manifold learning which is based on geodesic distance, the paper describes the theory of geodesic distance. And aiming at the ISOMAP algorithm that has no supervision, no explicit mapping function and other limitations, it puts forward an improved algorithm which called Vector Quantization Landmark Points for Supervised Isometric Mapping with Explicit Mapping(SE-VQ-ISOMAP). The algorithm introduces category information and adds the landmark points for iterative optimization in the treatment of distance matrix. Finally algorithm gets explicit mapping function regarding the RBF function as basis. The experimental result shows that the algorithm is fast and stable, and much higher than the traditional ISOMAP algorithm and improved versions in terms of recognition rate.3. Using the semi-supervised manifold learning, the paper puts forward Semi-Supervised Kernel Discriminant Analysis(SS-KDA), Semi-Supervised Discriminative Orthogonal Neighborhood Preserving Projection(SDONNP) and Regularized Semi-Supervised Isometric Mapping(Reg-SS-ISOMAP). SS-KDA maximizes separation of different classes by labeled data and estimates the intrinsic geometry structure of data by unlabeled data to enhances the effect of data dimensionality reduction. SDONNP follows the orthogonality property of ONPP,takes into account both intraclass and interclass geometries, and neighborhood information of interclass relationships. Reg-SS-ISOMAP first constructs a K-CG graph using the labeled samples of training samples, and obtains the approximate geodesic distance between samples as the feature vector instead of the original data point. Then the algorithm takes the geodesic distance as the kernel and uses a semi-supervised regularization method instead of MDS algorithm processing the feature vector. Finally the algorithm constructs object function by regularized regression model and obtains the explicit mapping of low dimensionality representation. The experimental result shows that the dimensionality reduction of the algorithm is stable and has a high recognition rate. The result proves the effectiveness of the algorithm.4. According to complex multi-manifold characteristics of real data sets, the paper puts forward Multi-Manifold Isometric Mapping(Multi-ISOMAP). Multi-ISOMAP first uses neighborhood graph construction method and geodesic calculation method which both apply to multi-manifold. Then the Sammon mapping is used to preserve the shortest path. Finally, the test samples or new samples can be judged according to the similarities between neighboring local tangent spaces. The experimental result shows that the algorithm primely applies to multi-manifold data sets and demonstrates good generalization ability.
Keywords/Search Tags:manifold learning, data dimensionality reduction, geodesic distance, semi-supervised method, multi-manifold method
PDF Full Text Request
Related items