Font Size: a A A

Study On Theories And Algorithms In Manifold Learning

Posted on:2008-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:M M SunFull Text:PDF
GTID:1118360215498557Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of information technology, the data to be processed become moreand more complex so that human can hardly understand the inner structure of the data bydirect-viewing cognition. People must fall back on machine learning methods to learn fromdata. Manifold learning is an important area of machine learning that rose in recent years.It has archived remarkable success in learning the inner structure of nonlinear data. Thispaper studies some theoretical and algorithmic problems in manifold learning.The basic idea of the paper is that the learning results should help people to under-stand the structure of the data. Based on the idea, the paper analyzes the requirements oftwo unsupervised learning tasks on the manifold learning algorithms: feature extraction andtopology description, and then studies the compatibility of current manifold learning meth-ods with these requirements. For feature extraction tasks, the paper proposes two criterionsthat the learning manifolds should obey: Reconstruction Error Minimization, and SimilarityPreserving. These two criterions are proposed to ensure that people can get reliable informa-tion of the data from the extracted features. The compatibilities of current manifold learningmethods with these two criterions are analyzed. As a result, we find that no theoreticalmodel satisfies the both criterions. For the topology description tasks, two requirements areproposed. The first is that the learning model should have powerful ability of topology gen-eration; the second is that the result of the learning algorithm should build a correspondencebetween topology of the data and that of itself. These two requirements are proposed tohelp people understand the topology characteristic of the data with direct-viewing cognition.However, through the analysis of current topology description algorithms, we find that thereis no algorithm that satisfies both requirements. In this paper, we find new theories andalgorithms to solve these problems, which can be summarized as follows:1) We develop the theory of Similarity Preserving Principal Curves (SPPC). The SPPCsare the curves that satisfy both the Reconstruction Error Minimization criterion and the Sim-ilarity Preserving criterion. So the SPPC of a data set is the optimal one-dimensional featureextraction model for manifold learning. In this paper, the existence and the differentiabilityof SPPC are studied and the high-dimensional generalizations of SPPC are discussed. Wealso propose a practical learning algorithm of SPPC and test the algorithm on several sim-ulation data sets and real data sets. The experimental results show the effectiveness of thelearning algorithm thus reveal the reasonability of the theory of SPPC.2) Current embedding methods in manifold learning always suffer from the problem thaton some data sets lying manifolds with high nonlinearity, they could not learn the manifold very well. To solve the problem, we propose a new embedding method - Global LaplacianEigenmap. Compared with local embedding methods such as LLE, Global Laplacian Eigen-map method considers the global structures of the manifold besides the local structures; com-pared with global embedding methods such as ISOMap, Global Laplacian Eigenmap methodincorporates the global information of data in a more flexible way other than preserving themanifold distances in ISOMap. In some tests on data sets lying on manifolds with high non-linearity, the Global Laplacian Eigenmap algorithm obtains lower reconstruction error thanmost even all other embedding methods that attend the tests.3) For the topology description task, to help people understand the topology character-istic of the data with direct-viewing cognition, we propose the concept of Principal Topologyand Topology Graph of data. The principal topology of data is the most notable topologycharacteristic of data, while the topology graph is the concise representation of principaltopology. There exists a one-to-one correspondence between the principal topology and thetopology of the topology graph, so people can understand the most notable topology charac-teristic of data via a simple graph. We propose the Divide-and-Combine learning strategy tolearn the topology graph. Based on the learning strategy, we propose a Cluster Growing Al-gorithm. The algorithm works well on both simulation data sets and reai-world applications.
Keywords/Search Tags:Manifold Learning, Feature Extraction, Topology Description, Reconstruction Error, Similarity Preserving, Topology Graph
PDF Full Text Request
Related items