Nonlinear Dimensionality Reduction Based On Stochastic Initialization

Posted on:2017-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:S C Tian

Full Text:PDF

GTID:2308330503961532

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Since twenty-first Century, with the development of Internet and Information Industry, the amount of data that people need to process is growing at a geometric level. In many fields, such as electronic commerce, astronomy, aerospace, etc, the data collected every day can no longer be measured using the traditional GB or TB. At present, data has been bringing important and far-reaching impact from different levels in the face of human social and economic life. Such a huge amount of data not only has brought convenience to people’s lives, but also trouble to some enterprises and research staff. Abundant information can let people have more trade-offs before making a choice. But how to excavate the information that is meaningful to enterprise from the huge amounts of data has become the problem that enterprises have to face.How to make an effect analysis on these lager volume and higher dimensional data has puzzled scientific researchers. In fact, the fundamental solution to these problems lies in how to use a technical means to represent the high dimensional data with the low dimensional data. Because if the dimension of data is lower, it will be simpler for us to deal with. The existence of dimensionality reduction technology can be used to solve this kind of problem.The emergence of dimensionality reduction technology provides a great convenience for the work of enterprise and scientific research staff. We can easily analyze the low dimensional representation of high dimensional data by dimensionality reduction technology. In this way, we can make better use of these high dimensional data. The dimensionality reduction technology has experienced a process from linear to nonlinear. In the early time, linear dimensionality reduction technique was widely used. Many linear dimensionality reduction algorithms were presented. For example, principal component analysis algorithm, linear discriminant analysis algorithm, projection pursuit algorithm and so on. The linear dimensionality reduction algorithm can usually produce better low dimensional representation when applied on high dimensional data with linear relationship. However, with increasing of the volume and dimension of data, the structure of data has changed a lot. The structure of data is no longer a simple linear relationship, but a more complex nonlinear structure. In the application of linear dimensionality reduction algorithms to data with nonlinear structure, the low dimensional results are often not satisfactory. In order to improve this situation, many nonlinear dimensionality reduction algorithms have been proposed in last few years. Such as, multidimensional scaling analysisalgorithm, ISOMAP, local linear embedding algorithm, stochastic neighbor embedding algorithm and t-distribution stochastic neighbor embedding algorithm etc.Compared with linear dimensionality reduction algorithm, nonlinear dimensionality reduction algorithm has obvious advantages in face of data with nonlinear structure.The low dimensional results produced by nonlinear dimensionality reduction algorithm are usually better than the results generated by linear dimensionality reduction algorithm.In this paper, we mainly discuss nonlinear dimensionality reduction algorithm.After discussing the principles of several kinds of linear dimensionality reduction algorithms, we then explain the principles and realization process of some nonlinear dimensionality reduction algorithms in detail. By comparing the principle of different nonlinear dimensionality reduction algorithms. We present a new nonlinear dimensionality reduction algorithm based on stochastic initialization, called for the nearest neighbor stochastic embedding algorithm(Stochastic nonlinear dimensionality reduction based on nearest neighbors, NNSE). We first compare the low dimensional data produced by NNSE with the other three algorithms(principal component analysis,locally linear embedding and t distribution stochastic neighbor embedding) on the visualization. For those algorithms which are difficult to be distinguished by the visual results, we have present a comparison of quantitative indicator. Until now, there are still no common of quantitative indicators in evaluation of the low dimensional representation produced by different dimensionality reduction algorithms. Based on the argument that if arbitrary two sample points in high dimensional space is adjacent,the relationship in the low dimensional space should be maintained after applying dimensionality reduction algorithms. We put forward a reasonable quantitative indicator to evaluate the four dimensionality reduction algorithms.

Keywords/Search Tags:

High dimensional data, dimensionality reduction algorithm, nearest neighbor, evaluation method

PDF Full Text Request

Related items

1	Research On K-nearest Neighbor Search Algorithm In High Dimensional Space
2	Research Of Method And Application On Dimensionality Reduction Of High Dimensional Data Based On Multivariate Chart
3	Study And Application Of Several Improved Methods Of Nonlinear Dimension Reduction For High Dimensional Data
4	Efficient computation of k-nearest neighbor graphs for large high-dimensional data sets on gpu clusters
5	Research On Dimensionality Reduction And Quantification Methods Of Approximate Nearest Neighbor Query For Streaming Data
6	Research On The High-Efficient K-Nearest Neighbor Algorithm And Its Parallelization Of MPI
7	Similarity Search On Large-scale High-dimensional Data
8	A Research Of Key Technology Of Dimensionality Reduction Of High Dimensional Data
9	Hubness-based Measure For High-dimensional And Imbalanced Data Classification
10	Research Of Local Sensitive Hash Index Based On Nearest Neighbor Graph