Font Size: a A A

Research Of Dimension Reduction Algorithm And Its Application

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z W XuFull Text:PDF
GTID:2518306527482964Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of information explosion,there is a large amount of high-dimensional data.How to quickly obtain discriminative information from high-dimensional data has become an important topic of current research.The common method of this topic is to use dimensionality reduction technology.The existing dimensionality reduction methods have achieved good results through continuous innovation in theory and practice,but there are still some shortcomings.1.The distribution of noise in real samples is not uniform,and the difference of samples are ignored;2.The captured structure of data dimension and feature dimension of samples is incomplete;3.The graph matrix that contains neighborhood relationship is often not accurate due to the influence of noise.In order to solve these problems,three new dimensionality reduction methods are proposed,including one feature extraction method and two feature selection methods.The specific work is as follows:Firstly,the existing methods based on principal component analysis do not consider the difference of samples,and cannot extract important information of samples jointly,so a sparse optimal mean principal component analysis algorithm based on self-paced learning is proposed.In our model,the loss function is defined by L2,1norm and the mean value is taken as a variable to be optimized to improve the robustness to outliers;projection matrix is regularized by L2,1norm to achieve feature selection;considering the difference of training samples,we utilize the idea of self-paced learning to complete a learning process of training samples from"simple"to"complex",so as to improve classification accuracy.Secondly,the existing unsupervised feature selection methods do not consider the importance of both sample dimension and feature dimension at the same time,and are not robust to noise samples,so a dual graph regularized robust feature selection algorithm based on self-representation is proposed.The model uses self-representation property,considers sample self-representation and feature self-representation at the same time,and defines loss function with L2,1norm,so that robust to noise;the sample graph and feature graph are established by using the representation coefficients of sample self-representation and feature self-representation,and the sample graph Laplace matrix and feature graph Laplace matrix are obtained,so that local geometric structure of both sample and feature dimensions is maintained,and the performance of classification and clustering is improved.Finally,the existing feature selection methods often use Laplacian graph matrix to maintain manifold structure,and ignore the combination of global and local structures.To solve this issue,a robust feature selection algorithm based on group low-rank is proposed.The model takes label information as prior information and uses the low-rank representation based on class,so that to make transformed samples locate in their own subspace as far as possible,thereby replacing the original Laplacian graph matrix and avoiding the influence of noise on maintaining local geometry;regression projection matrix is defined as the product of two low-rank matrices,so both global and local structure of samples are maintained,then the L2,1norm of regression projection matrix is used to achieve feature selection,and the performance of the model in classification task is improved.Experiments on different public data sets and noise data sets show that,the proposed methods have better performance and practical significance than the existing algorithms.
Keywords/Search Tags:feature extraction, feature selection, self-paced learning, self-representation, manifold learning, group low-rank
PDF Full Text Request
Related items