Font Size: a A A

Nonlinear Dimensionality Reduction Based On ISOMAP And Its Applications

Posted on:2018-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ShiFull Text:PDF
GTID:1318330512982682Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of Big Data era,massive data brings us opportunities as well as more challenges.Massive data usually has a high dimensionality,which makes them difficult to analyze using traditional data analysis method.This is known as the "curse of dimensionality".In order to deal with the high-dimensional data adequately,dimen-sionality reduction has been a research hot spot in machine learning,artificial intelli-gence,data mining and so on.Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality.It can remove redundant information from the original data and alleviate the "curse of dimen-sionality" problem.In the last few decades,many dimensionality reduction algorithms have been proposed.Principal component analysis(PCA)and multidimensional scal-ing(MDS)are traditionally linear methods.However,these linear methods fail to deal with complex nonlinear data.So in recent years,nonlinear dimensionality reduction method has received more and more attention.Manifold learning,as one of the most popular nonlinear dimensionality methods,has been successfully applied in many dif-ferent fields.Currently,a number of manifold learning methods are available,such as isometric feature mapping(Isomap),local linear embedding(LLE),local tangent space alignment(LTSA).However,there are still some unresolved problems in the classical manifold learning methods.Under this background,we have conducted some deep theoretical and practical researches regarding the manifold learning to address the limitations of the existing methods.The main work and contributions of this dissertation are as follows.1.Regarding the landmark selection problem in L-ISOMAP,two landmark selec-tion methods are put forward in this thesis.The first landmark selection method is based on the set cover problem of the neighbourhood graph.It optimizes the original neigh-bourhood graph by a greedy heuristic process and obtains the landmark candidate points.After that,the method determines the landmark points by removing the candidate points that are contained in the neighborhood of other candidate points.The second landmark selection method is based on the coloring problem of the neighbourhood graph.It uses the well-known Welsh-Powell approximation algorithm to select an independent set of the neighbourhood graph as the landmark points.Experiments on different artificial and real-world data indicate L-ISOMAP can faithfully recover the low-dimensional struc-ture hidden behind the original data sets by adopting the landmark points selected by either method.2.Regarding the instability problem in L-ISOMAP caused by "short-circuit" edges,two "short-circuit" elimination methods are put forward in this thesis.The first method uses "edge flow" to judge whether an edge is a "short-circuit" edge.In order to calcu-late the "edge flow" efficiently,the method selects some of the shortest paths as "test-ing path" and estimates the number of times an edge appears in the "testing path".The method shrinks the neighborhood size of the endpoints of the edges that have extraor-dinarily high "edge flow" to eliminate the "short-circuit" edge.The second method calculates data local density and edge areal density respectively using kernel density estimation.When the input data are not sampled uniformly,the method shrinks the neighbourhood size of the data points with extremely low local density values.This way the "short-circuit" edges can be removed.When the data sets are sampled uni-formly,the edge which has extraordinarily low edge density is claimed as the "short-circuit" edge and our method removes such edges from the neighbourhood graph.The performance of the proposed methods is validated by several numerical experiments.3.Analyze the low dimensional structure of the high-dimensional traffic matrix data.The Internet traffic matrix data usually has a high dimensionality in the observa-tion space.We apply the improved L-ISOMAP algorithm to reduce the dimensionality of the matrix data.It is found that there indeed exists a low dimensional structure be-hind the high-dimensional Internet traffic matrix data and different traffic matrix data presents different structure in the low-dimensional space.This helps us understand the characteristics of the Internet traffic from a whole network perspective.
Keywords/Search Tags:Dimensionality Reduction, Manifold Learning, L-ISOMAP, robustness
PDF Full Text Request
Related items