Font Size: a A A

Research On Manifold Learning: Theories And Approaches

Posted on:2007-07-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1100360185459973Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the progress of science, especially the development of informational industry, we enter a brand-new information age. When doing research in information age, one is inevitably confronted with large volumes of high-dimensional data, such as global climate patterns, image classification system, text clustering and gene expression. In real-world applications, observations represented as high-dimensional data or vectors can be modeled as samples lying on or close to a low-dimensional nonlinear manifold possibly with noise. Hence, data reduction especially nonlinear dimensionality reduction is an important tool of data mining, and the goal of dimension reduction is to find the low dimensional structure of the nonlinear manifold from the high dimensional data.In passed years, the problem of nonlinear dimensionality reduction has aroused a great deal of interest in many research fields including data mining, machine learning, image analysis, and computer vision. Recently, there have been advances in developing effective and efficient algorithms to perform nonlinear dimension reduction. The proposed algorithms include isometric mapping (ISOMAP), locally linear embedding (LLE) and its variations Hessian LLE and local tangent space alignment (LTSA). All these algorithms share a common characteristic: find a local geometry around each data point and then use the collected local geometric information to nonlinearly map the manifold to a lower dimensional space. The performances of these algorithms, however, are different in collecting local information within neighborhoods and in constructing a global embedding using the collected local information. For example, ISOMAP constructs a graph on the data points using the connections and Euclidean distances between neighbors in each neighborhood and estimates the global geodesic distance between all the data points in terms of the graph distance. Global low dimensional coordinates are constructed to preserve the estimated geodesic distances. LLE finds a linear combination of each points with respect to its neighbors and determines a low dimensional embedding that preserves linear combination structures. LTSA projects all neighbors of each point onto its local tangent space of the manifold and then aligns the local coordinates to determine low dimensional global coordinates. Clearly the effec-tiveness of the local geometry retrieved will determines the efficiency of the methods.LLE is a classical local nonlinear approach in manifold learning, and it has many applications such as image classification, image recognition, spectra reconstruction and data visualization because of its few parameters, rapid computation, and global optimization. However, LLE may map faraway inputs to nearby outputs in the low dimensional space which lead to a visible distortion in the embedding results. One of the curses that make LLE fail is that only one set of reconstruction weights can not reflect the local geometry of the manifold. Furthermore, the constrained least squares problem for finding the weights may do not have a unique solution. Using regularization method to solve the constrained LS problem is involved in the selection of a regularization term 7, and we can not verify that whether the computed solution is optimal. These questions remain to be answered.There are some common issues that determine the effectiveness of the manifold learning algorithms. The first step of manifold learning approaches is the selection of neighborhood, and there is a need to select an appropriate neighborhood to find local linear information. The effectiveness of the selection of the neighborhood will determine the embedding results. Clearly the smaller the neighborhood is, the more distinct the linear structure of the neighborhood will be. However, we should notice that the neighborhoods should be fully overlapping such that the faraway points are fully connected, which lead that the neighborhood can not be too small.By intuition, larger curvature tends to shrink the neighborhood, while smaller curvature gives rise to a larger neighborhood. So the main difficulty of the selection of neighborhood is that when imposing the connectivity structure on the sample points, how to adaptively select the neighborhood sizes to match the local geometry of the manifold. Furthermore, the variation in the curvature of the manifold and the sampling density of the data points will inevitably lead to a bias in the collected local structure. There is a need to take the bias into account when using the collected local structures to construct global low dimensional embedding. Hence these questions in manifold learning remain to be answered: how to estimate the curvature of the manifold, and how to account for the bias in the collected structure caused by the variation in the curvature of the manifold and the sampling density of the sample points, and how to reduce the bias in the low dimensional embedding by accounting for the bias in the collected local structure when using the col-lected local structure to construct the global low dimensional embedding. To confront these proposed problems, we give relatively consummate answers. The contributions of this paper are as follows:1. We give a detailed analysis to characterize the reconstruction weights and show that (using the regularization method) to obtain the optimal weight is not stable in numerical computation, and there exist multiple linearly independent weights which approximate to be optimal within a given accuracy.2. The proposed modified locally linear embedding using multiple weights (MLLE) use the linearly independent solutions to establish stable local linear structure and preserves the local linear structure in the low dimensional embedding. MLLE improves the stability and effectiveness of LLE.3. We show that MLLE can retrieve the ideal embedding for data points sampled from an isometric manifold, and we give the detailed comparisons between MLLE and LTSA to discover the internal connection between LLE, MLLE and LTSA. It lays the foundation of the further comprehension and analysis.4. We propose the adaptive neighborhood selection strategy to solve the problem of the selection of neighborhood. Based on the analysis of the local linear approximation of neighborhood, we give a criterion for deciding whether a neighbor set can be well approximated within a give accuracy by a liner fitting. Then we propose two algorithms for selecting neighbor sets that satisfy the criterion using the strategies of neighborhood contraction and neighborhood expansion. We show that the selected neighbor set by our methods can be expanded as large as possible to reinforce the connectivity between the sample points which can fit the local geometry of the manifold. These adaptive neighborhood selection methods can be used for all neighborhood-based manifold learning methods.5. We propose a method to estimate the local curvatures of the manifold and modify the minimization model in LTSA by tacking into account local curvatures of the manifold. This improvement can reduce the bias in construction of global embedding by LTSA. The curvature model can be used together with the adaptive neighborhood selection strategy, and we propose an adaptive manifold learning method, namely, adaptive local tangent space alignment (ALTSA). Although the curvature model is specially designed for LTSA, we believe that the basic ideas we propose can be adapted to othermanifold learning algorithms as well.6. We give plentiful examples(synthetic examples and real-world examples) to compare our proposed algorithms with Isomap, LLE and LTSA, and the experiment results show the efficiency of the proposed new methods in this paper.
Keywords/Search Tags:reconstruction weights, Local tangent space alignment, adaptive neighborhood selection, bias reduction, curvature and tangent space
PDF Full Text Request
Related items