Font Size: a A A

Research On Data Dimensionality Reduction Algorithms Based On Manifold Learning

Posted on:2011-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:X T WuFull Text:PDF
GTID:2178330332456554Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the information age's arrival, scientific research workers always encounter massive high-dimensional data inevitably when doing researches, such as global climate models, human gene distribution and text cluster, so they often meet with the problem of data dimensionality reduction. The goal of dimensionality reduction is to find the low–dimensional structure embedded in the high-dimensional data set. To portray the image and changes of other stimulation precisely, it is essential to use mathematical method. Manifold learning is an important method of machine learning rose in recent years, it has archived remarkable success in non-linear data dimensionality reduction.In the past decades, non-linear data dimensionality reduction has attracted a lot attention in many fields, including data mining, machine learning, images analysis and computer vision. In recent years, there has already developed many effectively non-linear data dimensionality reduction methods which are based on manifold learning, mainly including Isometric mapping(Isomap), Locally Linear Embedding(LLE) as well as its transformation Hessian LLE, Laplacian Eigenmaps, Local Tangent Space Alignment and so on. LLE is a classical non-linear dimensionality reduction method based on manifold learning. It has many applications in data dimensionality reduction, cluster and visualization. For a sample point on manifold, LLE uses the linear combination of its neighbors to carry on linear approximation, and to get the locally reconstructing weight matrix. According to this weight matrix, LLE constructs the reconstructing error and minimizes it, then gets the low-dimensional embedding. However, different selection of neighbors will get different reconstructing error. The differences of reconstructing error will affect the presentation that data shows in low-dimensional space seriously, which will make the result of LLE be unstable. Moreover, LLE doesn't consider the density information of data points very well. What is worth to say is that data density of high dimensional observation space has very important influence on its dimensional determination. LLE is only suitable for the manifold with uniform distribution. When data density changes largely, LLE will possibly make data far away with each other in high-dimensional space be closed in embedded space. In that case, it is difficult for LLE to get the correct dimensionality reduction results. These are all problems LLE has.This paper starts from the structural property of data, analyzes and summarizes the existing data dimensionality reduction algorithms generally. It puts emphasis on LLE algorithm and proposes its improved algorithms. Specifically speaking, the following innovative works are achieved in this paper:(1) Gave a comprehensive summarization of existing dimensionality reduction methods, as well as made a classification to the representative methods systematically and described them in detail. Furthermore, deeply analyzed and compared these methods by their computational complexity and their advantages and disadvantages.(2) In LLE, the different selection of neighbors can have the different reconstructing error, thus produces the different dimensionality reduction results. By making use of the characteristic that cluster center includes massive information, this paper defined the approximately reconstructing coefficient, then proposed the improved algorithm of LLE.(3) LLE supposes that samples in high-dimensional space are even. When data density changes largely, it is difficult for LLE to get the correct dimensionality reduction results. This paper analyzed the LLE algorithm, studied its insufficiency and proposed an improved algorithm which was based on density——Density locally linear embedding (DLLE).
Keywords/Search Tags:Dimensionality reduction, Manifold learning, Locally Linear Embedding, Approximate reconstruction coefficient, Density information
PDF Full Text Request
Related items