Study On Theories And Algorithms In Manifold Learning

Posted on:2008-04-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M M Sun

Full Text:PDF

GTID:1118360215498557

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the development of information technology, the data to be processed become moreand more complex so that human can hardly understand the inner structure of the data bydirect-viewing cognition. People must fall back on machine learning methods to learn fromdata. Manifold learning is an important area of machine learning that rose in recent years.It has archived remarkable success in learning the inner structure of nonlinear data. Thispaper studies some theoretical and algorithmic problems in manifold learning.The basic idea of the paper is that the learning results should help people to under-stand the structure of the data. Based on the idea, the paper analyzes the requirements oftwo unsupervised learning tasks on the manifold learning algorithms: feature extraction andtopology description, and then studies the compatibility of current manifold learning meth-ods with these requirements. For feature extraction tasks, the paper proposes two criterionsthat the learning manifolds should obey: Reconstruction Error Minimization, and SimilarityPreserving. These two criterions are proposed to ensure that people can get reliable informa-tion of the data from the extracted features. The compatibilities of current manifold learningmethods with these two criterions are analyzed. As a result, we find that no theoreticalmodel satisfies the both criterions. For the topology description tasks, two requirements areproposed. The first is that the learning model should have powerful ability of topology gen-eration; the second is that the result of the learning algorithm should build a correspondencebetween topology of the data and that of itself. These two requirements are proposed tohelp people understand the topology characteristic of the data with direct-viewing cognition.However, through the analysis of current topology description algorithms, we find that thereis no algorithm that satisfies both requirements. In this paper, we find new theories andalgorithms to solve these problems, which can be summarized as follows:1) We develop the theory of Similarity Preserving Principal Curves (SPPC). The SPPCsare the curves that satisfy both the Reconstruction Error Minimization criterion and the Sim-ilarity Preserving criterion. So the SPPC of a data set is the optimal one-dimensional featureextraction model for manifold learning. In this paper, the existence and the differentiabilityof SPPC are studied and the high-dimensional generalizations of SPPC are discussed. Wealso propose a practical learning algorithm of SPPC and test the algorithm on several sim-ulation data sets and real data sets. The experimental results show the effectiveness of thelearning algorithm thus reveal the reasonability of the theory of SPPC.2) Current embedding methods in manifold learning always suffer from the problem thaton some data sets lying manifolds with high nonlinearity, they could not learn the manifold very well. To solve the problem, we propose a new embedding method - Global LaplacianEigenmap. Compared with local embedding methods such as LLE, Global Laplacian Eigen-map method considers the global structures of the manifold besides the local structures; com-pared with global embedding methods such as ISOMap, Global Laplacian Eigenmap methodincorporates the global information of data in a more flexible way other than preserving themanifold distances in ISOMap. In some tests on data sets lying on manifolds with high non-linearity, the Global Laplacian Eigenmap algorithm obtains lower reconstruction error thanmost even all other embedding methods that attend the tests.3) For the topology description task, to help people understand the topology character-istic of the data with direct-viewing cognition, we propose the concept of Principal Topologyand Topology Graph of data. The principal topology of data is the most notable topologycharacteristic of data, while the topology graph is the concise representation of principaltopology. There exists a one-to-one correspondence between the principal topology and thetopology of the topology graph, so people can understand the most notable topology charac-teristic of data via a simple graph. We propose the Divide-and-Combine learning strategy tolearn the topology graph. Based on the learning strategy, we propose a Cluster Growing Al-gorithm. The algorithm works well on both simulation data sets and reai-world applications.

Keywords/Search Tags:

Manifold Learning, Feature Extraction, Topology Description, Reconstruction Error, Similarity Preserving, Topology Graph

PDF Full Text Request

Related items

1	Feature Extraction Based On Manifold Learning
2	Research On Robust Training Methods Of GCN Against Topology Attack
3	Research On Manifold Learning Feature Extraction Method Based On Subspace And Its Applications To Face Recognition
4	A Nearest Neighbor Subspace Preserving Feature Extraction Method And Its Application In High-Dimensional Images
5	Research On Unsupervised Hashing Methods Based On Similarity Preserving
6	Research On Linear Reconstruction Based Feature Extraction And Classification
7	Design And Simulation On Topology Reconstruction Scheme In Ad Hoc Networks
8	Manifold-based Feature Extraction And Face Recognition Analysis
9	A Research On Several Fundamental Problems And Core Algorithms Of Manifold Learning
10	Research On Text Retrieval Based On Feature Representations Learning