Font Size: a A A

A Study On Classifiers In High-dimensional Space Based On Manifold Learning

Posted on:2009-08-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:R J GuFull Text:PDF
GTID:1118360272957318Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Raw data sets taken with various capturing devices are usually multidimensional and need to be preprocessed before applying subsequent operations, such as clustering, classification, outlier detection etc. One of the steps of data preprocessing is dimensionality reduction. It has been developed with an aim to reduce or eliminate information bearing secondary importance, and retain or highlight meaningful information while reducing the dimensionality of data. Since the nature of real-world data is often nonlinear, linear dimensionality reduction techniques, such as principal component analysis (PCA), fail to preserve a structure and relationships in a high-dimensional space when data are mapped into a low-dimensional space. Manifold learning is a popular recent approach to nonlinear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high; though each data point consists of perhaps thousands of features, it may be described as a function of only a few underlying parameters. That is, the data points are actually samples from a low-dimensional manifold that is embedded in a high-dimensional space. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data.Manifold learning has important applications in areas such as pattern analysis, data mining, and image processing. The research on classification and clustering has attracted many scholars just in recent years and many problems are still unsolved. In this paper, some classic manifold learning algorithms are introduced, specially for Isomap. The property of each algorithm is analyzed and the key problems such as distance metric, intrinsic dimension, kernel method, are studied in detail. Also, some improved algorithms are proposed.(1) Spectral partition theory is the foundation of manifold learning. This paper proposes an improved spectral clustering algorithm based on neighbor adaptive scale, who fully considers the local structure of dataset like DBSCAN. Unlike standard spectral clustering algorithm who has a global unique scale, our algorithm makes each point with a different neighbor adaptive scale computed according to the mean distance to its k nearest neighbors. Neighbor adaptive scale simplifies the selection of parameters and makes the improved algorithm insensitive to both density and outliers. Then, improved spectral clustering is applied in color quantization successfully.(2) How to estimate the intrinsic dimension of a dataset is the key to manifold learning. An improved maximum likelihood estimation method named AMLE is proposed in this paper. Considering the distribution of a dataset, AMLE adjusts the contribution of each point to the estimator by designing a weight function. By applying it to a number of simulated and real datasets, experimental results showed that it performed better than MLE and other methods.(3) Isomap is a representative manifold learning algorithm, which can discover low dimensional manifolds from high dimensional data. Isomap is simple but time-consuming. To speed up Isomap, L-Isomap, which used landmark points, was proposed. But how to select landmarks is an open problem. In this paper, we present an extension of Isomap, namely WL-Isomap, which assign data point variant weight according to the distance between it and its neighbors. Point with a higher weight has a lager probability to be selected as a landmark point. Experimental results show that WL-Isomap is more stable than L-Isomap and outperforms L-Isomap especially when the number of landmark points is quite small.(4) Image data is popular in high-dimensional datasets. It is a challenging work to define the distance of two images. In this paper, an improved Isomap based on image distance, namely IMD-Isomap, is proposed. Because spatial information of images is considered in image distance, as our experiments will show, IMD-Isomap outperforms Isomap for data visualization especially when noise is added. Then, combining IMD-Isomap and generalized regression neural network, which has a good ability for approximation, a classifier is proposed. Experimental results showed that our method is robust to noise for image classification when compared with KNN, Isomap or Eigenface.(5) Since Isomap is an unsupervised learning algorithm and has no out-of-sample ability, So it can not be directly used in classification.Kernel Isomap (KIsomap) is an improved Isomap and has a generalization property by utilizing kernel trick. Considering the data's label, a weighted Euclidean distance is designed, which makes the data form the same class more closely and the data from different classes more far way. This propery favors classification task. Then, weighted distance based kernel Isomap (WKlsomap) is proposed. The experimental results show that WKlsomap is more robust than Isomap and KIsomap in data visualization. Moreover, when noise is added into data, WKlsomap based classifiers are more robust to noise than KIsomap based ones.
Keywords/Search Tags:Manifold Learning, Pattern Classification, Kernel Method, Spectral Clustering, Image Distance, Intrinsic Dimension, Landmark Point
PDF Full Text Request
Related items