Font Size: a A A

A Research On Several Fundamental Problems And Core Algorithms Of Manifold Learning

Posted on:2009-06-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:D Y MengFull Text:PDF
GTID:1118330338489044Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Feature extraction, i.e. dimension reduction of data is one of the fundamental issues in data modeling and data, mining Currently, the hottest issue in the featnre extraction research is the manifold learning In the recent decade, some effective methods, such as Isomap, LLE, and Laplacian eigenmap, for this issue have been presented However, many problems have still beenl encountered and need to be resolved. The typical problems include the essential effectiveness principles model selection and the availability on data sets with complex forms of the manifold learning methods. This thesis is specially proposed on these classical problems in manifold learning research Particularly we make a series of theoretical and algorithmic achievements as introduced in the following.(1) For the automatic model selection problem of the manifold learning method, we propose an effective strategy to automatically choose the two parameters involved in the manifold learning method, the neighborhood size(k) and the intrinsic dimension(d). Specifically, by formulating the mathematical expression for the mean of the exponen-tially weighted) k-NN distances of the given data set, we calculate the function to ap-proximately describe the local intrinsic dimension distribution of the data manifold then through defining the measure for the stability of the description function so formulated we construct the automatic algorithm for the selection of the optimal neighborhood size; furthermore, via selecting the compromising value of the global distribution of the local intrinsic dimensions, we realize the automatic algorithm for thd selection of the data in-trinsid dimension. The proposed theories and algorithms are simple, effective and stable, and hence are very meaningful for the wide-scope applications of the manifold learning methods in future.[2] Aiming at illuminating the essential principle underlying the manifold learning method we prove an important result that "The manifold Is loopy iff the manifold learn-ing method is ineffective" by taking the well-known Isomap method for the entry. This result first rigorously explains why Isomap is ineffective in loopy manifold case. For the dimensionality reduction problem of the loopy manifold data, we further develop Effect-five manifold learning method In particular, the proposed algorithm has the following functions:(a) judging whether the data are lying on a loopy manifold only based on the given data set; (b) generating a series of loopy paths which can approximately depict the loopy structurc of the data manifold; (c) computing an approximate maximal no-loop subset from the data set; (d) implementing effective dimensionality reduction for the loopy manifold data. The experimental results verify the correctness of the proposed theory and the effectiveness of the presented algorithm. [3] Aiming] at solving the nonlinear dimensionality reduction problem of data re-siding] on multi-cluster manifold we propose two efficient manifold learning methods: the tunnel method and the decomposition-composition method The core idea of the tunnel method is: to build the smooth transitional inter-cluster tunnels with consist tent intrinsic dimension with the intra-cluster parts by virtue of graphical techniques and differential geometry knowledge, and thus to construct a global smooth manifold configure superimposed on the original data set and the constructed tunnel set. The union of the two sets makes the effectiveness condition underlying the manifold learn-ing methods satisfied, and hence make the adopted methods avoid the short-circuit, the disconnectivity and the roughness-connection problems, which are three of the most commonly encountered problems in multi-cluster manifold learning. The core idea of the decomposition-composition method is first to decompose the data into their intrin-sic clusters and implement nonlinear dimensionality reduction on each cluster data; and then to utilizc the inter-cluster relationships to appropriately locatc and orient cach clus-ter embeddings and compose them into the final result. Both proposed methods can bc effectively applied to the multi-cluster manifold learning. Furthermore both of them prominently outperform other current methods[4] Aiming at improving the precision of the neighborhood graph construction and the accuracy of the geodesic distance estimation, we present an improved neighborhood graph construction strategy and based on this strategy develop a more accurate geodesic distance estimation method. Particularly using the well-known locally linear assump-tionl (i.e., the locally linear patch, expressed as a convex combination of neighbors of a vertex approximately resides on the manifold, we extend the traditional neighbor-hood graph composed by points and neighboring edges to the improved one by points and neighborhood patches. The neighborhood graph so constructed realizes the more precise description to configure the manifold underlying the data Further we propose the new geodesic distance estimation method via, embedding the paths located on the locally linear patches and computing the shortest paths on the improved neighborhood graph A series of experiment results verify the effectiveness of the newly constructed neighborhood graph construction strategy and geodesic distance estimation method.[5] By composing the above achievements and the current manifold learning meth-ods, we develop a compositive application system on manifold learning The system the functions of the automatical model selection, the manifold type detection the traditional and the improved neighborhood graph construction the no-loopy and loopy manifold learning and multi-cluster manifold learning. All these functions facil-itate the operation and application of the manifold learning to a greater extent. The developed system also strengthens the automatic degrees of the traditional manifold learning methods, expands their available application ranges, and improves their capa-bilities on feature detection, especially on geodesic distance estimation] Furthermore, we apply the developed system to a series of practical image data sets as test samples, and the results are all satisfactory. The potential usage of the system in future is thus verified.
Keywords/Search Tags:Manifold learning, Nonlinear dimensionality reduction, Feature extraction, Model selection, Neighborhood graph, Loopy manifold, Multi-cluster manifolds, Geodesic distance
PDF Full Text Request
Related items