Font Size: a A A

Study On The Classification Of The High Dimensional Data Based On The Density Subspace And Null Space

Posted on:2018-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Y TianFull Text:PDF
GTID:2348330533960997Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Classification includes supervised classification and unsupervised classification which is also called clustering.Many classification methods majorly use the Euclidean distance to measure the similarity between to samples and then classify samples by their distance such as the clustering technique,like Kmeans and classification method,such as LDA.However,in the new age of big data,the object of the classification algorithms is high dimensional samples which largely challenges the performance of them.Geometrically,this kind of sample generally carries a lot of redundant information,which makes the geometrical structure of data become complex and has high local curvature.In this case,the Euclidean similarity based algorithms,like K-means and K-nearest neighbor,will mismatch samples.Algebraically,the high dimensionality generally means data matrix are low rank matrix,in other words,this kind of matrix is singular and thus some algorithms which need to computer the inverse of the matrix are unavailable,like Linear Discriminative Analysis.In this paper,we majorly study the method of learning the semantic structure to obtain the representation of samples.In the semantic space,the structure of data becomes more clear and discriminative.Each sample only carries the useful information of data.For the unsupervised classification,we proposed a method to detect the semantic space of the data by learning its distribution called LDSC.LDSC transform the manifold of data in the ambient space into the semantic space by the homeomorphism whose existence is guaranteed by the Moser theorem.According to the corollary of the Moser theorem,this homeomorphism can be achieved by preserving the density of samples in ambient space and it in the semantic space.After identify the semantic presentation of samples,its geometry structure will be compact locally and discriminative globally.For the supervised classification,we imposed the classical Linear Discriminative Analysis(LDA)and proposed the Sparse Orthogonal Null Space LDA(SONLDA).This method constrains the discriminative vector in the semantic space of data to be sparse and orthogonal by solving an optimization problem with orthogonal constraint.This problem,in fact,is minimizing the reconstruct function on the Stiefel manifold.Meanwhile,we utilize the idea of null space LDA in order to overcome the small sample problem which makes the within-class scatter matrix is singular.Due to the sparse and orthogonal of the discriminative vector,the samples in the semantic space become more discriminative.In the experiment section,we use the LDSC to cluster image data set,document data,and data with abstract representation and use the SONLDA to classify face data set ORL and object data set COIL20.The experiment result shows that both LDSC and SONLDA are prior to the competitive algorithms.Besides,the semantic structure of the data in the experiment has proved that the structure is clean and discriminative like the theoretical analysis said.
Keywords/Search Tags:Density subspace, Null space, Homeomorphism, Sparseness, Stiefel Manifold
PDF Full Text Request
Related items