Font Size: a A A

Research On The Key Technology Of Dimensionality Reduction

Posted on:2022-06-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L LiaoFull Text:PDF
GTID:1488306602993759Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of internet based on“5G”technology and information collec-tion technology,the dimensionality of data has been growing rapidly,which leads to the problem of“dimensionality curse”.Dimensionality reduction has been effectively applied to deal with the high-dimensional data with small size,aiming at effectively extracting valu-able information(features)from high-dimensional data with a lot of redundant information.In addition,the process of data collection and transmission is easily affected by the exter-nal environment,so that there will inevitably be outliers(noise or shielding,etc.)in the collected data,which will cause the statistical distribution of data to deviate from its real distribution,greatly improving the difficulty of classification.To deal with the above prob-lems,we mainly focus on the robustness of the model to noise and redundant features in this dissertation.And we construct some robust feature extraction and feature selection mod-els by exploiting discriminant information,nonlinear features,local geometric information(submanifold structure)and feature distribution in data.The research contents involve robust distance metric,Euler transform,“class separation”problem,graph embedding optimization and feature distribution constraint.Specific research contents are as follows:For dealing with the problem that the presence of noise will greatly weaken the separa-bility of data,and that the square l2-norm is very sensitive to noise,which leads to the performance degradation,a robust discriminant feature extraction method combining Euler transform and l2,1-norm is proposed.Firstly,we obtain the Euler representation of data by Euler transformation,so that the latent nonlinear feature of data can be extracted effectively and the differences between samples are increased.Then the model usesl2,1-norm(with rotation invariance)to measure the distance between samples,which can effectively reduce the influence of noise(relative to the square l2-norm),and then improve the robustness of the model.Finally,a fast and non-greedy method is proposed to solve the proposed model.At the same time,the proposed one-dimensional model has been extended to the corresponding two-dimensional model.Experimental results show that the proposed method significantly improves the performance on classification of model.Some existing graph-based feature selection methods can not accurately describe the real neighbor relationship,due to that the graph is predefined in the original space,where usually exists a lot of noise and redundant information.Thus,we propose an adaptive graph embed-ding discriminant feature selection method,in which the graph learning process would be embedded into the optimization of the objective function,so that the model can adaptively learn the nearest neighbor relationship(graph structure)of the data from the optimal sub-space.And then the sparse structure of the graph is implemented by l0-norm regularization in rows,so as to effectively mine the low-dimensional manifold structure embedded in high-dimensional data.Moreover,the l2,1-norm is used to constrain the projection matrix to be sparse in rows for feature selection,and the entropy regularization is applied to constrain the similarity matrix to avoid its trivial solution.Finally,the effectiveness of the proposed method has been proved on six data sets including human face,voice and biological data.The existing feature selection methods based on distance measurement are mostly based on average distance to maximize the overall separation degree of data,so it is impossible to guarantee the separability of any two categories,resulting in the performance degradation of model.Therefore,a discriminant feature selection model based on the worst classifica-tion situation is proposed.This approach considers maximizing the separability between the two hardest to distinguish categories to ensure that all data is separable as much as possi-ble.However,it scores all features independently,so it is easy to make the selected feature subsets contain more redundant features.In order to reduce the redundancy of selected fea-ture subsets,we improved the model from the perspective of uncorrelation of features and uncorrelation of feature discriminant ability,and proposed an uncorrelated worst-case dis-criminant feature selection model.At the same time,we reconstruct the model from the perspective of subset evaluation,and propose a worst-case feature selection method based on optimal subsets.Finally,experiments are carried out on eight data sets,including human face,handwritten numbers and biological data,and compared with the existing methods.The existing feature selection model based on l1,2-norm only pays attention to the sparsity of transformation matrix,but ignores the distribution sparsity of dimension-reducing data in sample dimension when the category is specified.A feature selection method based on class-specific distribution is proposed.This method uses l1,2-norm to constrain the com-petition relationship(sparsity)of the projected data in each specified category,so that the features of samples in the same category maintain a specific distribution,and the features of samples of different classes have obvious differences.Based on the existing deep learning framework Py Torch,the proposed model is built to optimize the model by virtue of its au-tomatic derivation property.Experimental results show that the proposed method has good classification performance and fast convergence.
Keywords/Search Tags:l2,1-norm, l1,2-norm, Euler Transform, Dimensionality Reduction, Feature Extraction, Feature Selection, KNN-graph, feature distribution
PDF Full Text Request
Related items