Font Size: a A A

Reseach On Nonlinear Dimensionality Reduction Methods In Manifold Learning And Their Applications For Tobacco Data

Posted on:2012-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:L Z XuFull Text:PDF
GTID:2218330338464819Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
One of the purpose of doing the research in such areas as machine learning and data mining is to obtain the inner law hidden in the high-dimensional data according to the data analysis and processing. More and more data sets present many characteristics such as the exorbitant dimension, so many datas, the nonlinear structure of the data, high growth rate and so on due to the the complexity of the real world, which proposes a stern test for the traditional machine learning and data mining analysis methods. However, as a new data analysis method for machine learning, the manifold learning can better find the inherent geometric structure of the high-dimensional data distribution, excavate internal rules and high dimensional data eigenvalue information, observe high-dimensional data internal properties in combination with the visualization technology in low-dimensional space. At present, the manifold learning has been widely used in various fields and has achieved good effects.In this paper, the main linear and nonlinear dimension reduction methods are introduced for the the theory and application in details; the nonlinear dimension reduction method is properly proposed on the basis of the analysis for the limitations of linear dimension reduction method in dealing with high-dimensional data. Two classic nonlinear dimension reduction methods such as LLE algorithm, ISOMAP algorithm which are contrasted, summarized and analyzed for part of some tobacco data sets and artificial data sets, are focused on the research and analysis. According to the analysis above, the questions that how to choose the samples properly in a neighborhood is the common problem of the two algorithms. In order to solve the problem better, the major research work are summarized as follows:1. The Kernel transformation which can map the original data space into a linear or approximately linear space with the regularity of mining and analysis being done, is introduced, aiming at solving the nonlinear characteristic of high dimensional space data distribution. The main purpose of the research above is to solve the sparsity and nonlinear of some local blocks of the tobacco materials quality data samples effectively, which is quite useful for the proposition of the next algorithm and the verifiability for the experimental results. 2. The factor which has important influence on the implementation effect LLE algorithm is the choice of the number K. The major premise for obtaining the number of the adaptive nearest neighbors K is to confirm that whether the high-dimensional data space distribution conform to some certain features distribution; In order to solve this problem, some characteristic and concepts of gaussian distribution (normal distribution) is introduced briefly and also some quality indexes attributes (random variables) of tobacco quality data are researched in this paper. The analysis results show that the normal characteristics is contained in the inner tobacco quality datas. After the normality analysis of tobacco materials quality datas, the method of obtaining the adaptive neighbor number K under in gaussian distribution is proposed.3. The KANNLLE algorithm is proposed on the basis of solving problem of the sparsity and nonlinear characteristics of the local areas of the tobacco materials quality data according to the kernel solution and on the basis of the comformation to the gaussian distribution of the tobacco materials quality data. Besides this, a detailed analysis and design process of the algorithm is given.4. Some experiments and analysis is implemented using the KANNLLE algorithm in combination with the clustering algorithm; the data clustering effect is contrasted between KANNLLE and LLE algorithm in combination with the two-dimensional visualization technology; the experiment results show the KANNLLE algorithm is more effective than the LLE algorithm from the angle of visual intuition. At the same time, the superiority of the algorithm is confirmed according to the numerical statistics for the clustering results, which provides a new idea for manifold learning algorithm to combine with some other fields of technology.5. The thesis summarizes the major innovative work and research findings, and indicates the future research emphasis from three directions: the actual applications of the algorithm, the combination with other clustering algorithm besides K-meas algorithm, combination the idea of the algorithm with other manifold learning algorithm such as ISOMAP algorithm.
Keywords/Search Tags:Manifold Learning, Nonlinear Dimensionality Reduction, Kernel Transformation, Adaptive Neighbor, Tobacco Data
PDF Full Text Request
Related items