Font Size: a A A

Research And Application Of Nonlinear Dimensionality Reduction

Posted on:2005-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:C J YuFull Text:PDF
GTID:2168360122489393Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Scientists working with large volumes of high-dimensional data, such as global climate patterns, human gene distribution, regularly confront the problem of dimensionality reduction; finding meaningful low-dimensional structures hidden in their high-dimensional observations.The research of dimensionality reduction is a very important issue in machine learning. The algorithms of dimensionality reduction can be classified into two categories. One is linear dimensionality reduction method, such as PCA and CMDS; the other is nonlinear dimensionality reduction method, such as LLE, Isomap and SIE.At first, several algorithms of dimensionality reduction are analyzed. The classical techniques for dimensionality reduction, PCA and CMDS, are simple to implement, and guaranteed to discover the true structure of data lying on or near a linear subspace of the high-dimensional input space. But these algorithms cannot reveal the true structure of the complex nonlinear manifolds because of their linear features. Isomap is a global optimal algorithm. It builds on CMDS but seeks to preserve the intrinsic geometry of data, as captured in the geodesic manifold distances between all pairs of data points. LLE, an unsupervised learning algorithm, attempts to discover the global structure of nonlinear manifolds. The mapping is derived from the symmetries of locally linear reconstruction. The data is mapping into a single global coordinate system of lower dimensionality. SIE bases on geometric intuitions: a global isometric embedding must be a local isometric embedding. Similarly, a proper set of constraints of local isometric will entail a global isometric embedding. SIE makes use of the point-to-point distribution as local constraints and forces global isometry in a sense of probability .In order to value the quantities of the reconstruction of the nonlinear dimensionality reduction algorithms, this paper uses simulated data sets and the natural data sets. In this paper, the nonlinear dimensionality reduction methods are applied to text categorization and the usability of text categorization based on NLDR is validated. The simulation experiments told us, for the non-noise data set, the reconstruction quantity of Isomap is similar to that of SIE, and for the data set which containing noise, the reconstruction quantities of global nonlinear dimensionality reduction algorithms, such as LLE and Isomap , are heavily descending because of the reconstruction manifolds are distorted by the noise. Whereas SIE can keep good reconstruction quantity by shielding a few noise points. For the natural data sets, the reconstruction quantities of NLDR methods are difference when they are applied to difference applications.
Keywords/Search Tags:Machine Learning, Nonlinear Dimensionality Reduction, Local Linear Embedding, Isometric Mapping, Self-organizing Isometric Embedding, Text Categorization
PDF Full Text Request
Related items