Image classification is a fundamental task in the field of computer vision.With the development of deep learning techniques,the image classification ability of machines has been improving,but deep learning methods can easily overfit with insufficient training data to obtain satisfactory results.Few-shot image classification aims to enable machines to recognize new categories using a small number of images.To this end,existing methods generally use prior knowledge of known categories to assist in the learning of new categories,where feature transfer-based methods take image representations learned from known categories as prior knowledge and can achieve better performance.However,feature transfer-based methods still face several problems.(1)Insufficient feature discriminability.Due to the semantic differences between the new category and the known category,the learned representations on the known category cannot be used well for the new category.Since models trained with limited training samples cannot fully get rid of the background features,these features may also be considered as relevant features for that category in the few-shot case,which further impairs the discriminative ability of the features.(2)Poor cross-domain capability.Due to changes in imaging equipment,shooting environment,lighting conditions,and other factors,the distribution shift between different image domains is a frequent problem,and image features learned under one image domain may perform poorly under another domain,causing a large impact on model performance.(3)High learning cost.Current methods generally use supervised learning for feature learning,which requires known category data to contain label information and requires high data labeling cost,and it is currently difficult to perform low-cost extensions in the face of the large amount of unlabeled data on the Internet.To address the above problems,this dissertation investigates methods for learning robust image representations in a low-cost manner from the perspective of representation learning,so as to solve the problems of insufficient feature discriminability,poor cross-domain capability,and high learning cost,and to improve the effectiveness and practicality of feature transfer methods.Specifically,this dissertation combines the techniques of multi-modal learning,domain adaptive and unsupervised learning to improve feature discriminability,cross-domain consistency and learning efficiency,respectively,to achieve overall robust representation learning.The main contributions and innovations of this dissertation are as follows:(1)To address the problem of insufficient feature discriminability,this dissertation draws on the idea of multi-modal learning and proposes the Semantic Prompt(SP)method,which can improve the discriminative ability of novel classes by using the text information in few-shot image labels.Unlike previous methods that only use text information to adjust the classifier,SP uses text information as a prompt and interacts with image features in the spatial dimension and channel dimension to adaptively adjust the feature extraction process of the image encoder,so that the image encoder focuses on the essential features of the category related to the prompt and suppresses other features that are not related to the category.By comparing with the baseline method on four datasets,the proposed method in this dissertation has a significant performance lead under one-shot learning tasks,which verifies the effectiveness of the semantic prompt method for improving feature discriminability.(2)To address the problem of poor cross-domain capability,this dissertation first models the cross-domain problem in the few-shot image classification task,proposes a new cross-domain cross-set few-shot learning problem setting and constructs a benchmark dataset,an evaluation method and a series of baseline methods.Under this problem setting,this dissertation proposes a new bi-directional compact and aligned representation learning method based on an strongly augmented bi-directional prototypical alignment framework,which can learn the aligned representation space to solve the domain shift problem between the source and target domains,and learn the compact representation space to improve the few-shot classification ability.The experimental results show that the method in this dissertation substantially outperforms the baseline method on two benchmark datasets,verifying the effectiveness of the method in mitigating the poor cross-domain capability.(3)To address the problem of high learning cost,this dissertation is inspired by unsupervised learning and proposes a few-shot image classification method based on unsupervised part discovery,enhancement and alignment,which can directly learn discriminative and generalized object part features from unlabeled images and better apply the learned object local features to downstream few-shot image classification tasks through part augmentation and part alignment.The method is fully validated on five few-shot image classification datasets,and its performance is ahead of previous unsupervised learning methods,achieving results that match those of supervised learning methods,verifying that effective image representations can be learned using low-cost unlabeled data using the proposed method in this dissertation.In summary,this dissertation presents a series of solutions to the few-shot image classification problem from the perspective of robust representation learning,by analyzing the shortcomings of current methods and combining them with the difficulties in practical application scenarios.Extensive experimental results show that the methods proposed in this dissertation can effectively improve the feature discriminability,enhance the domain adaptation capability of current methods,reduce the cost of data annotation required for training.The research results in this dissertation have been published in authoritative conferences and journals in this field,and promoted the research development of few-shot image classification methods. |