Font Size: a A A

Research On Fast Unsupervised Dimensionality Reduction Algorithm Based On Graph

Posted on:2022-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:L H TaoFull Text:PDF
GTID:2518306545953569Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continuous popularization of"digitalization",a large amount of information such as text,web pages,and images in real-word needs to be stored.Many Internet and high-tech companies are equipped with dedicated servers to store these valuable data.Large-scale and high-dimensional data usually contains noise and feature redundancy.The storage of such data will bring high storage costs,and is undesirable and unnecessary.Machine learning algorithms are used to analyze and mine valuable information in the data.Most of the machine learning algorithms used on small-scale data sets can no longer handle such large-scale data well,because they will become very time-consuming when processing these data,and sometimes even cannot run.In addition,when dealing with high-dimensional data,there will be some difficulties such as sample sparsity and distance calculation.Such problems are serious problems faced by all machine learning methods,which is called"Curse of Dimensionality".In general,there are two ways to eliminate this problem.One is that when the dimensionality increases linearly,the number of samples required for learning increases exponentially.However,this method is obviously not advisable,because the burden of data storage is increasing.The other is dimensionality reduction,which can reduce the dimensionality of the data when the loss of data information is acceptable or even without loss,and also alleviate the burden of data storage,so it is an effective approach.Graph embedding is a widely used method for dimensionality reduction due to its computational effectiveness.The graph embedding methods involves a two-stage process:the first step is to construct a similarity or dissimilarity graph between samples,and the computational complexity of the graph construction is at least O(n~2d),where n and d represent the number of the samples and the sample dimensions,respectively;The second step is graph embedding analysis.Since the computational complexity of the graph construction of traditional graph embedding methods is proportional to the square of the number of samples,they will become very time-consuming when dealing with large-scale data.The bipartite graph between the samples and anchors(representative points)is constructed to reduce the computational complexity of graph construction.And its computational complexity is O(nmd+nmlog(m)),which is proportional to the number of the samples,where m is the number of anchors.In addition,the initial cluster centers of k-means++method are used as the anchors so that the anchors can cover(represent)the entire data set in a better way.The proposed method is called Unsupervised Bipartite Graph Embedding(UBGE).On this basis,an alternate optimization method is introduced to optimize the bipartite graph.The proposed method is called Unsupervised Optimal Bipartite Graph Embedding(UOBGE).In addition,a simple but effective strategy was adopted to accelerate the convergence of UOBGE for large-data sets.The convergence of UOBGE is theoretically provided,and the experimental results show that the performance of UOBGE is better than that of UBGE,which shows that the bipartite graph learned by UOBGE is optimized in the iterative optimization process.Extensive experiments on several benchmark data sets(Coil20,MSRA25,Palm,Protein,Connect-4,and SensIT)demonstrate the effectiveness and efficiency of UBGE and UOBGE.
Keywords/Search Tags:Dimensionality reduction, Graph embedding framework, Bipartite graph, Anchors, K-means++, Orthogonal constraint
PDF Full Text Request
Related items