Font Size: a A A

Dimension Reduction Study Based On Sparse Structure And Deep Learning

Posted on:2021-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y MengFull Text:PDF
GTID:1488306050464354Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data,the increasing amount of data and dimension,people urgently need some efficient dimension reduction methods to extract effective feature information from massive high-dimensional data,so as to quickly process massive highdimensional data.However,most of the commonly used dimension reduction methods do not make full use of the structure information of the original high-dimensional data and a small amount of label information,and cannot fully mine the inherent structural features of the data.Therefore,the low-dimensional data representation is not sufficiently discriminative,and the clustering results need to be improved.In addition,the mapping relationship between the new low-dimensional data representation and the original highdimensional data is very complex,so that the single-layer clustering method cannot fully express.Therefore,in view of the above problems,this dissertation has carried out relevant research and achieved some research results as follows:(1)A feature selection based dual-graph sparse non-negative matrix factorization(DSNMF)is proposed.The algorithm can find an appropriate low dimensional representation of data by NMF and then select more discriminative features to further reduce the dimension of the low dimensional space by feature selection.DSNMF combines dual-graph model with nonnegative matrix factorization,which can not only simultaneously preserve the geometric structures in both the data space and the feature space,but also make the two non-negative matrix factors update interactively and give full play to the effect of the dual-graph model.What's more,a new feature selection based local discriminative clustering is proposed,which is called DSNMF for local discriminative clustering(DSNMF-LDC).It has stronger discriminative ability and better clustering effect than other clustering algorithms.The experimental results show that DSNMF-LDC has obvious advantages in clustering accuracy(ACC)and normalized mutual information(NMI)in comparison to 8 feature selection algorithms and 7 clustering algorithms.(2)A dual-graph regularized non-negative matrix factorization with sparse and orthogonal constraints(SODNMF)is proposed.Semi-supervised non-negative matrix factorization is not only an efficient technique for dimension reduction of high dimensional data,but also can utilize a fraction of label information to effectively learn local information of the objectives(such as texts and faces).SODNMF introduces the dual-graph model into the Semi-supervised non-negative matrix factorization,and takes the manifold structures of the data space and the feature space into account.In addition,the sparse constraint is used in SODNMF,which can simplify the calculation and accelerate the processing speed.The most important is that SODNMF makes use of bi-orthogonal constraints,which can avoid the noncorrespondence between images and basic vectors.Therefore,it can effectively enhance the discrimination and the exclusivity of clustering,and improve the clustering performance.Empirical experiments demonstrate encouraging results of SODNMF in comparison to four state-of-the-art algorithms on three real datasets.(3)A dual-graph sparse deep non-negative matrix factorization(DSDNMF)is proposed.Non-negative matrix factorization can learn a low-dimensional data representation from the original high-dimensional data space.However,the mapping relationship between the new low-dimensional data representation and the original high-dimensional data is very complex,so that the single-layer clustering method cannot better express.DSDNMF can learn a hidden layer representation for clustering according to the unknown and different attributes in the original datasets.What's more,in order to fully mine the local geometric information of both the data space and feature space,DSDNMF adopts the multi-layer dual-graph manifold learning,which can not only deal with the high-dimensional datasets,but also the datasets with large numbers of samples.A multi-layer sparse representation is introduced,which simplifies computation,accelerates processing speed and improves performance.The experimental results show that DSDNMF is better than other six state-of-the-art algorithms on four different datasets.(4)A semi-supervised dual-hypergraph deep non-negative matrix factorization with biorthogonal constraints(SDDNMFB)is proposed.Semi-supervised non-negative matrix factorization not only has the advantage of non-negative matrix factorization(NMF)which can effectively learn local information in target,but also can utilize a fraction of label information to improve the effect of dimension reduction for high dimensional data.Based on the unknown and different attributes of the original datasets,SDDNMFB can learn a representation from the hidden layers for cluster in the deep framework.SDDNMFB introduces bi-orthogonal constraints on the two factors after dimension reduction of each layer,so that the solution is unique,which can have the better cluster interpretation.What's more,SDDNMFB adopts multi-layer dual-hypergraph manifold learning which can mine the high order relations among data points in both the data space and the feature space to fully retain the intrinsic geometric structure of the original data.Empirical experiments demonstrate encouraging results of SDDNMFB in comparison to six state-of-the-art algorithms on four different datasets.(5)A graph convolutional neural networks with geometric and discrimination information(GDGCN)is proposed.GDGCN integrates the traditional machine learning idea into the convolution networks to improve the performance of feature extraction.Graph convolution network(GCN)considers the structure information of original data,but it constructs the same feature graph to perform graph convolution,and ignores the difference between the local structures of different samples.In order to exploit the difference between the local structures of different samples and make full use of the structure information of original data,GDGCN constructs different feature graphs for different training batches.What's more,the discriminant regularization is introduced in GDGCN to effectively utilize the discriminant information of original data,so it has good discriminative ability and robustness.The experimental results show that GDGCN can perform feature extraction tasks very well,and it is superior to the existing method in classification with high accuracy and F1-Score.
Keywords/Search Tags:structure information, discrimination information, sparse constraint, orthogonal constraint, dimension reduction, deep learning, spectral graph theory, graph convolution
PDF Full Text Request
Related items