Font Size: a A A

Robustness Manifold Learning Research

Posted on:2012-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z B DaiFull Text:PDF
GTID:2218330341451937Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the time of information age, many data represent in high-dimensional form, which leads to generate a large number of high-dimension data. On the one hand, the high-dimensional information provides a detailed description of the objective world, which brings great convenience to the people. But on the other hand, the high-dimensional information also brings unprecedented problems, as that informa- tion are too complex to find the effective thing in vast amounts of information. And more serious is that the existing machine learning and data mining algorithms can hardly deal with them effectively, which leads the "Dimension Disaster". So, how to reduce the dimension, find the essential information and the internal rules, that is the focal point of the current data mining and machine learning research. Therefore, the dimension reduction is the hot spot of the current research.The current dimension reduction methods can be divided into two types: one is the linear dimension reduction method, the representative algorithms as: Principal Component Analysis (PCA), Multidimensional Scaling (MDS), Non-negative Matrix Factorization (NMF), etc. These methods can find the linear structure of data, also with the advantages of simple, easy explanation and scalability. Another type is nonlinear dimension reduction method, which is also called manifold learning algorithm. The main method as: Isometric embedding (Isomap), Locally Linear Embedding (LLE), Laplacian Eigenmap (LE), Local Tangent Space Alignment (LTSA), etc. These methods can find the nonlinear relationship between the data, and also with the advantages of few parameters, fast calculation and easy to find global optimal solution. As in real world, many high-dimension data are distributed or similarly distributed in the non-linear manifold. So in recent years, the manifold learning algorithms are widely used in image, video and many other fields. At the same time, the research study about the manifold learning has made great progress.As a hot spot direction in manifold learning algorithms, Semi-Supervised manifold learning algorithms provide a good platform for manifold learning algorithms to actual application. To improve the effect of manifold learning algorithms, recently the researchers combine them with the semi-supervised machine learning methods, which is called Semi-Supervised manifold learning algorithms, can fully exploit the essential information from high dimension data. The semi-supervised manifold learning algorithms use the prior information to speculate the essential information of the test data. They can improve the recognition rate of the classifier effectly, and show good applications prospects. The main representative algorithms are Semi-Supervised Local Linear Embedding (SSLLE), Semi-Supervised Local Tangent Space Alignment (SSLTSA), Semi-Supervised Laplacian Eigenmap (SSLE) and so on.However, there is an unavoidable problem for manifold learning algorithms to apply, which is the algorithms are sensitive for the outliers. Both nonsupervised manifold learning algorithms and semi-supervised manifold learning algorithms are lack of robustness to the outliers. This is mainly because these manifold learning algorithms share a common characteristic: find a local geometry around each data point and the use the collected local geometric information to nonlinearly map the manifold to a lower dimensional space.In the presence of outliers, the local relationship between the data points can not be accurately constructed, which can not accurately reflect the local characteristics of manifold, thereby the algorithms can not find the essential characteristics of the data. Because the manifold learning algorithms are based on mathematical models, it is hard to find a unified approach to improve their robustness. We can only make the appropriate improvements based on the characteristics of each algorithm.So, the contributions of this paper are as follows:1) For the robustness problem of unsupervised manifold learning algorithms, this paper presents a robust Laplacian Eigenmap (RLE). RLE projects the outliers and their neighbors to the low-dimensional tangent space with the robust PCA method. In the low-dimensional tangent space, RLE constructs the weighted graph connected the outliers and their neighbors, which can reflect the intrinsic local geometry of the outliers.In the way, RLE reduces the impact of outliers on the Laplacian matrix.2) For the robustness problem of semi-supervised manifold learning algorithms, this paper presents a robust Semi-Supervised Locally Linear Embedding (RSSLLE). RSSLLE improves the robustness against outliers in two ways. On the clean data set, SSLLE is applied to obtain the low-dimensional results, to avoid the influence caused by the outliers. On the outlier set, the local reconstruction weights of the outliers are computed by using the local projection coordinates, which could reflect the intrinsic local geometry of the manifold. And then it regards the clean data points as training data points to compute the low-dimensional coordinates of the outliers by SSLLE.At last, this paper gives plentiful synthetic and real-world examples to show that the improved algorithms are robust against outliers.
Keywords/Search Tags:Manifold Learning, Semi-Supervised Machine Learning, Robust, Outlier, Local Tangent Space
PDF Full Text Request
Related items