Font Size: a A A

Research On Feature Extraction Method Of High Dimension Small Sample Based On Graph Embedding

Posted on:2022-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:H S HuFull Text:PDF
GTID:1488306602493704Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of science and technology,there is many high-dimensional observation data appearing in many practical application problems,such as natural language processing,image retrieval,face recognition,etc.Although high-dimensional data can provide rich information,it also brings difficulties in transmission,storage,and processing.Directly processing high-dimensional data will consume a lot of storage and computing resources,making the efficiency of data processing low.At the same time,because highdimensional data contains a lot of redundant and noisy information,it will seriously affect data retrieval,classification and recognition tasks performance.Especially in the case of small samples size,many complex methods are easy to suffer from "overfitting",which leads to the failure of these methods.Dimension reduction,sometimes called dimensionality reduction,is an important method of processing high-dimensional data.Its purpose is to transform high-dimensional data into low-dimensional space in a certain way without losing important information.Feature extraction is an important type of dimensionality reduction method.The low-dimensional representation of data can be obtained by feature extraction,which greatly improves the efficiency of data processing.At the same time,it can also eliminate redundancy and noise information in high-dimensional data,thereby improving the performance of tasks such as classification,recognition and visualization,etc.From the point of view of graph embedding,the difference between most feature extraction methods is that constructing weighted graphs and embedding methods are different.For example,the classic PC A and LDA use the global information of the data to construct the relationship graph between the data.Although some existing feature extraction methods have achieved good performance,they still exist some shortcomings such as insufficient discriminant ability and unstable graph structure,etc.This dissertation focuses on the feature extraction method of high-dimensional data.Taking highdimensional small sample size data as the research object and the graph embedding discriminant analysis method as the basis,the feature extraction method based on the graph embedding framework is studied.Aiming at the performance limitations of existing feature extraction methods,several effective feature extraction methods are proposed,including supervised linear and nonlinear feature extraction methods and unsupervised linear feature extraction methods.The main research work is listed as the following:1.The collaborative representation projection uses the collaborative representation coefficient to describe the relationship between samples,which ignores the class information of the samples.Also,when there are fewer training samples,it has a larger reconstruction error,which leads to the lack of discriminative ability of this method.To solve this problem,considering the information of sample space and eigenvalue space at the same time,a minimum eigenvector collaborative collaborative discriminant projection(MECRDP)method is proposed.This method uses the original sample data and the smallest eigenvector to reconstruct the samples,which can effectively reduce the reconstruction error when the number of samples is small,while maintaining more collaborative representation relationships in the projection space.On the other hand,inspired by the idea of discriminant analysis,the class information of the sample and the scatter matrix between the reconstructed samples are introduced into the discriminant analysis method,which improves the discriminability of the sample in the projection space.The experimental results verify the excellent performance of the proposed method.2.Since the representation coefficients of sparse representation or collaborative representation are easily affected by the addition or removal of samples,the local structure relationship defined by them is not robust enough.To overcome this problem,a similarity order preservation analysis(SOPDA)method is proposed.This method uses the similarity relationship of samples to define a stable global structural relationship between samples,and constructs a within-class similarity order matrix to obtain the more robust local relationship between samples,and then maintains this global and local relationship in the projection space.In addition,a scale parameter is introduced into the between-class scatter to balance the between-class distribution and the within-class distribution of the sample.The experimental results show that the proposed method can achieve better performance than other methods and more robust to the projection dimension,for example,on the FERET data set,the proposed method improves the classification accuracy of about 3%to 5%compared with other methods.3.Kernel-based collaborative representation projection is an extended method of collaborative representation projection.It has poor discriminative ability when dealing with non-linear feature extraction problems.To address this problem,using kernel techniques and collaborative representation projection methods,a kernel-based within-class collaboration preserving discrimination projection is proposed(KWCCPDP).This method first maps the sample to a kernel space,and then obtains the collaborative reconstruction relationship of the sample,and maintains this relationship of the same class samples in the low-dimensional projection space.Besides,using the idea of discriminant analysis,in the projection space,the optimal low-dimensional projection matrix can be obtained by minimizing the betweenclass distance and maximizing the within-class distance,thereby improving the discriminability of projection features.In order to solve the optimal projection matrix of KWCCPDP,a two-step eigenvalue decomposition method is proposed.The theoretical analysis shows that the method can obtain a stable solution when the number of training samples is more than twice the number of sample categories.Simulation experiments verify the effectiveness of the proposed method.4.Because the existing unsupervised feature extraction methods do not make full use of the global and local structure information of the sample,this will reduce the discriminativeness of the sample in the low-dimensional space.To deal with this problem,a global neighborpreserving cluster feature extraction(GNPCFE)method is proposed.Firstly,in order to improve the discriminativeness of samples in the projection space,we use the samples that are k neighbors to each other to define their local structure.Also,using idea of the supervised discriminant analysis,we make the samples that are k neighbors to each other have small distance in the low-dimensional space,while maintaining samples that are not k neighbors with large distance.Secondly,the global and local structure information of the sample is considered at the same time,and the idea of unsupervised clustering is used to make the sample automatically maintain good clustering characteristics in the lowdimensional space.Finally,in order to solve the optimal projection matrix of GNPCFE,two effective alternative iterative optimization methods are proposed,including the orthogonality relaxation GNPCFEr algorithm and the GNPCFEk algorithm based on k means clustering,and theoretically proved the convergence of these two algorithms.Experimental results show that GNPCFE can achieve better performance than existing unsupervised feature selection or feature extraction methods.For example,on the UMIST,ORL,and COIL20 datasets,the classification accuracy of the method proposed in this chapter compared with other unsupervised feature extraction methods has increased by about 9.1%,5.1%,and 10%,respectively.
Keywords/Search Tags:dimensionality reduction, feature extraction, graph embedding, collaborative representation, sparse representation, small sample size
PDF Full Text Request
Related items