Font Size: a A A

Research On Nonlinear Isomap Dimension Reduction Method For High-Dimensional Datasets

Posted on:2022-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Mahwish YousafFull Text:PDF
GTID:1488306323482064Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Manifold learning is a non-linear dimensionality reduction method for finding low-dimensional compact representations of high-dimensional observation data and exploring the inherent law and intrinsic structure of data.At present,manifold learning has become a hot issue in data mining,pattern recognition,machine learning,and other related research topics.These manifold learning methods yield impressive results on some artificial and real-world benchmark data sets due to their non-linear nature,geometric intuition,low accuracy,and computational feasibility.The basic principle of manifold learning is maintaining a relationship between the topological invariance of high-dimensional to low-dimensional space.The manifold learning provides some non-linear methods such as Classical Isomap(C-Isomap),Local Linear Embedding(LLE),Local Tangent Space Alignment(LTSA),Multidimensional Scaling(MDS),etc.However,the manifold learning methods like C-Isomap still show some problems,such as topological instability,shortest path,and noise sensitivity.This thesis has systematically researched and thoroughly improved the Classical Isomap(C-Isomap)algorithm to overcome these problems.Firstly,we introduced the basic concept of manifold learning methods and compared them in detail.Secondly,we discuss the fact of linear and non-linear methods in detail.Finally,in this dissertation,we proposed four new methods for C-Isomap.Our proposed methods are compared with the C-Isomap method in very detail.Moreover,the experimental results demonstrate the effectiveness of our proposed methods.The main work of this dissertation can be summarized as follows:1.The C-Isomap method is faced two main problems as it may make incorrect links in the neighbourhood graph G and high computational cost.In this regard,we introduce a new FastIsomap method to overcome these problems.The primary purpose of the FastIsomap is to increase the accuracy of the graph by using two state-of-the-art algorithms;a randomized division tree(KD-tree)and NN-descent.The FastIsomap basic idea is to construct an accurate approximated KNN graph from millions and hundreds of dimensional' data points,then project the graph into low-dimensional space.The experiments were performed on six large-scale and high-dimensional datasets of social networks,Facebook,Twitter,LiveJournal,YouTube,Orkut,and SIFT1M.We compared the FastIsomap method with the existing C-Isomap method to verify its efficiency and provide accurate high and large dimensional datasets.2.For lack of topological stability on the nearest neighbourhood G graph and shortest path problem of the C-Isomap method,we design a novel FastIsomapVis method.However,when C-Isomap is applied to real-world datasets,it shows a shortcoming for the shortest path between all pairs of data points based on the nearest neighbourhood G graph via the Dijkstra algorithm makes it a very time-consuming step.The FastIsomapVis uses a hierarchal divide,conquer,and combine approach through two algorithms:randomized division tree(KD-tree)and Dijkstra Buckets Double(DKD)The FastIsomapVis makes it easy to construct an accurate K nearest neighbourhood G graph and scale high-dimensional data points into low-dimensional space.Our proposed method is compared to the C-Isomap to verify its effectiveness and provide highly authentic results of the high-dimensional datasets.The experiments were performed on nine large-scale and high-dimensional datasets of social networks,Facebook,Twitter,LiveJournal,YouTube,Orkut,SIFT1M,Amazon,Perfume 20Newsgroups.The finding of the current study shows that our proposed method is much fastened than C-Isomap.Moreover,our proposed method can quickly reduce time complexity3.We have a Noise Removal Isomap(NR-Isomap)method for noise sensitivity problems in topological instability.The Topological Instability Problem(TIP)generates the issues of noise sensitivity and short-circuits edges.The severe case in noisy data C-Isomap results is short-circuited edges.That directly connects to the two submanifolds data points,known as Topological Instability Problem(TIP).Although,if the C-Isomap input data points are corrupted with noise,then C-Isomap faces the TIP.The TIP is affected by the large nearest neighbour of the manifold and depending on the neighbourhood's size.We use the Local Tangent Space Alignment(LTSA)algorithm for noise removal and short-circuit edges problems in topological instability.The LTSA algorithm provides the optimum result for the short-circuit edges problem caused by topological instability.LTSA algorithm can easily handle the large neighbourhood size of the data points.Our experimental results show that we can reduce the noise from datasets and provide effectual noise-free results.Our NR-Isomap method is much more capable as compared to the existing C-Isomap algorithm.4.We have introduced a new denoising approach called Noise Removal Isomap with a Classification(NRIC).The core problem of the C-Isomap is sensitivity to noise.Our proposed(NRIC)approach uses the Local Tangent Space Alignment(LTSA)algorithm with classification techniques to remove noises and optimize the neighbourhood structure of the C-Isomap method.We use four classification techniques like Support Vector Machine(SVM),K Nearest Neighbour(KNN),Naive Bayes(NB),and Random Forest(RF).The key purpose of the NRIC is to increase effectiveness,decrease noise,and improve the performance of the graph.Experiments on the five real-world datasets have shown that the NRIC method efficiently outperforms and overcomes the noise problem of the C-Isomap method.The LTSA with classification techniques results provides high accuracy,mean-precision,mean-recall,and Areas under the(ROC)curve(AUC)of the high-dimensional datasets and optimizes the graphs.Therefore,our NRIC method is a much more promising method to reduce noise and generate a very effective graph.
Keywords/Search Tags:Classical Isomap, Dimension Reduction, FastIsomap, FastIsomapVis, Manifold Learning, Noise-Removal Isomap, Noise Removal Isomap with Classification
PDF Full Text Request
Related items