With the popularization of digital cameras and Internet technology,the acquisition of image and video data has become much easier,and the set-based classification problem has received extensive attention from researchers.Different from the traditional single image recognition problem,the image set data can extract richer information,which is beneficial to improve the recognition accuracy.However,due to factors such as scene changes,different views,diverse poses,and uncontrolled illumination,each image set contains great variations,making the difference within the class may be much larger than that between classes.Meanwhile,the image set data is located in a high-dimensional space and often contains a large number of redundant features,resulting in high time cost.Therefore,how to effectively model image sets and efficiently measure the similarity between them has become the key to the problem of image set classification.Image sets are usually nonlinear data with complex intrinsic properties,and conventional Euclidean metric-based models are difficult to represent such structural features.Recently,some statistical models(such as covariance matrix,linear subspace)can effectively encode complex changes in an image set,and measure the distance between two sets at a low cost,which has become a robust feature representation of image sets.Since these statistical models commonly reside on a specific Riemannian manifold,Riemannian manifold learning can provide an effective and efficient learning scheme for this problem.By incorporating collaborative representation,local preservation,hash learning,and manifold dimensionality reduction,this paper goes from maintaining intrinsic structure and reducing time complexity,and launches the following four main works on the research of Riemannian manifold learning:(1)In view of the fact that the existing Grassmannian sparse representation methods cannot effectively design an over-complete dictionary,a new Grassmannian collaborative representation method based on prototype learning is proposed.This method draws on the idea of superposed sparse representation classifiers to construct principal component subspace and variation subspace.They are then mapped to a symmetric space,thereby constructing a more complete dictionary to improve the robustness of training samples.Through theoretical analysis,the obtained principal component subspace and variation subspace can be employed to represent class prototypes and intra-class differences.Meanwhile,the prototype learning and collaborative representation are integrated into a unified framework,so that the dictionary learning problem on Grassmann manifold is transformed into the traditional Euclidean collaborative representation problem with L2 penalty term.This method is capable of preserving more discriminative class-specific parts,and mitigating the negative effects of intra-class variation.Experimental results show that the proposed method can achieve good recognition performances and has better robustness to contaminated datasets.(2)For data representation under multiple sets scenario,we propose a Grassmannian locality-aware sparse coding method.Since it is difficult for a single subspace to cover changes within an image set,a Hierarchical Divisive Clustering(HDC)algorithm is employed to extract multiple subspace structures from the training set.In other words,each image set can be represented as multiple points on the Grassmann manifold.Considering the interference of irrelevant information and outliers,we take advantage of the relationship among image sets to capture the intra-and inter-set variations simultaneously.Specifically,a representation-adaptive term and a locality-consistent term are introduced to constrain the sparse coefficients,so that similar Grassmann points should be enforced to have similar sparse codes.In addition,in order to solve the non-linearity of the data,the original data is mapped to the Reproduced Kernel Hilbert Space(RKHS),and a local-aware sparse representation model based on the kernel metric is established.Experiments on set-based face recognition and object classification tasks verify that the proposed algorithm can achieve better classification accuracy.(3)To solve the problem of high time cost in image set classification task,we propose a discrete metric learning method based on the hashing representations.Most existing image set-based approaches focus on extracting effective latent discriminative features,ignoring its computational time cost and memory cost.To overcome this limitation,this paper introduces hash learning for fast image set classification.Firstly,each image set can be regarded as a point on Riemannian manifold(Grassmann or SPD manifold).Then the manifold is mapped to the Hamming space by exploiting a new metric learning method.In this space,we jointly learn a projection matrix and hash functions to ensure the strong discriminative capability of hash codes.On this basis,by exploring the matrix structure,a bilinear discrete metric learning method is proposed,which can better cope with high-dimensional data problems.Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach on image set classification tasks.(4)For the problem of high time complexity caused by the unlabeled high dimensional manifold representation,we propose a Grassmannian dimensionality reduction method based on Neighborhood Preserving Embedding(NPE)criterion,which realizes the transformation from the original high-dimensional Grassmann manifold to a lower but more discriminative one.In this method,two strategies for constructing similarity graph on manifolds are firstly proposed,using self-representation properties to eliminate the effects of errors.By building a mapping from manifolds to manifolds,the traditional NPE is extended onto the space of Grassmann manifold,so that the neighborhood structure of high-dimensional Grassmann samples can be well preserved in the lower-dimensional one with more discriminatory.At the same time,to address the problem that the pre-defined similarity graph is sensitive to noise,we propose a new framework that jointly learns the similarity graph and the projection matrix to further obtain a more accurate structure and a more discriminative mapping.Experimental results on several datasets verify the effectiveness of the proposed method. |