Font Size: a A A

Research Of Manifold Learning In The Problem Of Sparse Sample Data

Posted on:2012-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiangFull Text:PDF
GTID:2218330362953132Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As technology progresses and the times develops, people have moved into a brand new information age. The complexities of widespread appeared high-dimensional data like Image Classification Retrieval, Text Clustering, and Gene Order Modeling etc. People must use tools to identify the inherent laws of high-dimensional data. The technique of data dimension reduction is one of the important ways to solve this kind of problems. The aim of data dimension reduction is to discover the low-dimensional structure which hides in the high-dimensional data. Usually we divide into linear and nonlinear dimension reduction.Liner Dimension Reduction is by mapping the sample points of high-dimension space into low-dimension space through linear changing, in order to obtain the low-dimension representations of the original data sets'intrinsic features. Liner Dimension Reduction method has its firm theoretical basis and it is easy to implement and apply. In reality, the useful features are not the simple linear combination. Therefore, Nonlinear Dimension Reduction, Manifold Learning, is attracting widespread interests.Manifold Learning can be divided into two categories:one is global method. Proceed from the situation as a whole, when reducing dimension, while mapping the manifold closing sample points into low-dimension space, the sample points must keep their original features. The other one is local method, these dimension reduction methods just need to be consider the relations among the manifold closing sample points, the manifold correspondence low-dimension space needn't to be convex, and its complexity of computation is lower. Therefore the local method has more widespread applicable objects.Local Manifold Learning Algorithm has the same feature: that is to find out the local properties around each sample point, and to map them into a low-dimension space. Obviously, the maintenance of the local geometry structure information and the extent of recovery decide the quality of Local Manifold Learning Algorithm. While obtaining the local manifold information, the Local Manifold Learning Algorithm suppose if in a very small area, local are homeomorphism with a connexity open sets in an Euclidean space, which decide when we use Local Manifold Learning Algorithm to chooses nearby areas, we must make sure the sample points in nearby areas can fulfill the local homeomorphism conditions. And when the sample points are sparse, it's hard to keep them local homeomorphism, thus it will cause big inaccuracy even failure when we process sparse data sets with the Local Manifold Learning Algorithm.The article is based on the concept and theory of Manifold Learning Algorithm, focusing on analyzing and researching the problems of the invalid procession on sparse sample sets with Manifold Learning Algorithm. The author summarized the basic framework and procedure of Manifold Learning, analyzed the reason why processing sparse data sets with Manifold Learning Algorithm, the result turned bad even fail. The author also illustrated the structure of nearby areas when the sample data points are intense or sparse. One of the efficient ways to solve the problem of sparse sample data points is to increase interpolations which make the sample point sets become intense. Therefore we advanced linear and nonlinear interpolation method separately. First we advanced the method of using the sample points in the nearby area to form the core of triangle as the interpolating points. Viewing form the global, this method, using the sample points in original area and the sample points which in the nearby area to form the core of triangle are used as the interpolating points, can improve the intensity of the sample point sets. The representation of the sample points in nearby area is more accurate and more overlaps of nearby areas make the global array's inaccuracy smaller. For this reason, the algorithm improves the sparse problem of sample sets to a certain degree. But it will not improve the inaccuracy of Locally Linear Approximating. Moreover the new interpolating points didn't show the intrinsic structure and features of manifold. For these reasons, we advanced the Nonlinear Interpolation Method which based on Matlab's four lattice points and splines. According to the intrinsic features of the spares sample point sets, we combined manifold's structure and features, used Matlab's four lattice points and splines'Nonlinear Interpolation Method, based on surface reconstruction, selected certain amounts of sample points as interpolating points. In the end, we compared and analyzed the difference dimension reduction affects of both linear and nonlinear interpolation method though experiments by using manifold learning algorithm. And we pointed out that compared with the liner interpolation method; the nonlinear interpolation method can reduced the approximate error by selecting interpolating points, also can maintain and reflect the manifold intrinsic structure and features well.
Keywords/Search Tags:Manifold Learning, Dimensionality reduction, Sparse Sample, interpolation
PDF Full Text Request
Related items