Research And Application Of Nonlinear Dimensionality Reduction

Posted on:2005-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:C J Yu

Full Text:PDF

GTID:2168360122489393

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Scientists working with large volumes of high-dimensional data, such as global climate patterns, human gene distribution, regularly confront the problem of dimensionality reduction; finding meaningful low-dimensional structures hidden in their high-dimensional observations.The research of dimensionality reduction is a very important issue in machine learning. The algorithms of dimensionality reduction can be classified into two categories. One is linear dimensionality reduction method, such as PCA and CMDS; the other is nonlinear dimensionality reduction method, such as LLE, Isomap and SIE.At first, several algorithms of dimensionality reduction are analyzed. The classical techniques for dimensionality reduction, PCA and CMDS, are simple to implement, and guaranteed to discover the true structure of data lying on or near a linear subspace of the high-dimensional input space. But these algorithms cannot reveal the true structure of the complex nonlinear manifolds because of their linear features. Isomap is a global optimal algorithm. It builds on CMDS but seeks to preserve the intrinsic geometry of data, as captured in the geodesic manifold distances between all pairs of data points. LLE, an unsupervised learning algorithm, attempts to discover the global structure of nonlinear manifolds. The mapping is derived from the symmetries of locally linear reconstruction. The data is mapping into a single global coordinate system of lower dimensionality. SIE bases on geometric intuitions: a global isometric embedding must be a local isometric embedding. Similarly, a proper set of constraints of local isometric will entail a global isometric embedding. SIE makes use of the point-to-point distribution as local constraints and forces global isometry in a sense of probability .In order to value the quantities of the reconstruction of the nonlinear dimensionality reduction algorithms, this paper uses simulated data sets and the natural data sets. In this paper, the nonlinear dimensionality reduction methods are applied to text categorization and the usability of text categorization based on NLDR is validated. The simulation experiments told us, for the non-noise data set, the reconstruction quantity of Isomap is similar to that of SIE, and for the data set which containing noise, the reconstruction quantities of global nonlinear dimensionality reduction algorithms, such as LLE and Isomap , are heavily descending because of the reconstruction manifolds are distorted by the noise. Whereas SIE can keep good reconstruction quantity by shielding a few noise points. For the natural data sets, the reconstruction quantities of NLDR methods are difference when they are applied to difference applications.

Keywords/Search Tags:

Machine Learning, Nonlinear Dimensionality Reduction, Local Linear Embedding, Isometric Mapping, Self-organizing Isometric Embedding, Text Categorization

PDF Full Text Request

Related items

1	The Application Of Global Optimization Algorithm Based On Nonlinear Dimensionality Reduction In EEG Problem
2	Students' Performance Prediction Based On Dimensionality Reduction
3	Research On Some Problems Of Isometric Mapping (Isomap)
4	The Study Of Manifold Learning Algorithms And Their Applications
5	Analysis On The Advantages And Disadvantages Of Isomap And LLE In Dimension Reduction
6	Research Of Dimensionality Reduction And Its Appliacation On Data Mining Of Large-Scale Text
7	The 3D Shapes Isometric Deformation Based On Stochastic Neighbor Embedding
8	Research On Locally Linear Embedding Dimensionality Reduction Algorithms Based On Density
9	Study And Application Of Several Improved Methods Of Nonlinear Dimension Reduction For High Dimensional Data
10	Study Of Data Reduction Technique Based On Manifold Learning