Research On The Key Technology Of Dimensionality Reduction

Posted on:2022-06-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S L Liao

Full Text:PDF

GTID:1488306602993759

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet based on�5G�technology and information collec-tion technology,the dimensionality of data has been growing rapidly,which leads to the problem of�dimensionality curse�.Dimensionality reduction has been effectively applied to deal with the high-dimensional data with small size,aiming at effectively extracting valu-able information(features)from high-dimensional data with a lot of redundant information.In addition,the process of data collection and transmission is easily affected by the exter-nal environment,so that there will inevitably be outliers(noise or shielding,etc.)in the collected data,which will cause the statistical distribution of data to deviate from its real distribution,greatly improving the difficulty of classification.To deal with the above prob-lems,we mainly focus on the robustness of the model to noise and redundant features in this dissertation.And we construct some robust feature extraction and feature selection mod-els by exploiting discriminant information,nonlinear features,local geometric information(submanifold structure)and feature distribution in data.The research contents involve robust distance metric,Euler transform,�class separation�problem,graph embedding optimization and feature distribution constraint.Specific research contents are as follows:For dealing with the problem that the presence of noise will greatly weaken the separa-bility of data,and that the square l₂-norm is very sensitive to noise,which leads to the performance degradation,a robust discriminant feature extraction method combining Euler transform and l_2,1-norm is proposed.Firstly,we obtain the Euler representation of data by Euler transformation,so that the latent nonlinear feature of data can be extracted effectively and the differences between samples are increased.Then the model usesl_2,1-norm(with rotation invariance)to measure the distance between samples,which can effectively reduce the influence of noise(relative to the square l₂-norm),and then improve the robustness of the model.Finally,a fast and non-greedy method is proposed to solve the proposed model.At the same time,the proposed one-dimensional model has been extended to the corresponding two-dimensional model.Experimental results show that the proposed method significantly improves the performance on classification of model.Some existing graph-based feature selection methods can not accurately describe the real neighbor relationship,due to that the graph is predefined in the original space,where usually exists a lot of noise and redundant information.Thus,we propose an adaptive graph embed-ding discriminant feature selection method,in which the graph learning process would be embedded into the optimization of the objective function,so that the model can adaptively learn the nearest neighbor relationship(graph structure)of the data from the optimal sub-space.And then the sparse structure of the graph is implemented by l₀-norm regularization in rows,so as to effectively mine the low-dimensional manifold structure embedded in high-dimensional data.Moreover,the l_2,1-norm is used to constrain the projection matrix to be sparse in rows for feature selection,and the entropy regularization is applied to constrain the similarity matrix to avoid its trivial solution.Finally,the effectiveness of the proposed method has been proved on six data sets including human face,voice and biological data.The existing feature selection methods based on distance measurement are mostly based on average distance to maximize the overall separation degree of data,so it is impossible to guarantee the separability of any two categories,resulting in the performance degradation of model.Therefore,a discriminant feature selection model based on the worst classifica-tion situation is proposed.This approach considers maximizing the separability between the two hardest to distinguish categories to ensure that all data is separable as much as possi-ble.However,it scores all features independently,so it is easy to make the selected feature subsets contain more redundant features.In order to reduce the redundancy of selected fea-ture subsets,we improved the model from the perspective of uncorrelation of features and uncorrelation of feature discriminant ability,and proposed an uncorrelated worst-case dis-criminant feature selection model.At the same time,we reconstruct the model from the perspective of subset evaluation,and propose a worst-case feature selection method based on optimal subsets.Finally,experiments are carried out on eight data sets,including human face,handwritten numbers and biological data,and compared with the existing methods.The existing feature selection model based on l_1,2-norm only pays attention to the sparsity of transformation matrix,but ignores the distribution sparsity of dimension-reducing data in sample dimension when the category is specified.A feature selection method based on class-specific distribution is proposed.This method uses l_1,2-norm to constrain the com-petition relationship(sparsity)of the projected data in each specified category,so that the features of samples in the same category maintain a specific distribution,and the features of samples of different classes have obvious differences.Based on the existing deep learning framework Py Torch,the proposed model is built to optimize the model by virtue of its au-tomatic derivation property.Experimental results show that the proposed method has good classification performance and fast convergence.

Keywords/Search Tags:

l2,1-norm, l1,2-norm, Euler Transform, Dimensionality Reduction, Feature Extraction, Feature Selection, KNN-graph, feature distribution

PDF Full Text Request

Related items

1	Dimensionality Reduction Based On Approximated Zero And Infinite Norm
2	Study Of Non-greedy Algorithm Based Discirminant Analysis
3	Dimensionality reduction and feature selection using a mixed-norm penalty function
4	Research On Robustness And Sparsity Of L2P Norm Distance Metrics
5	Study Of Graph-based Feature Extraction And Feature Selection With Their Applications
6	Research And Application Of Dimensionality Reduction Techniques
7	The Study Of Some Issues For Unsupervised And Semi-supervised Dimensionality Reduction
8	Research On Key Technologies Of Semi-Supervised Feature Selection
9	Study Of L21-norm Based Feature Extraction
10	Robust Discriminant Feature Extraction Based On L2,1-norm And L1-norm