Deep learning has now achieved remarkable results in many fields.However,the training of deep neural network models often relies on large amounts of manually annotated data,which can be costly in terms of time and labour,and is an important bottleneck for the application of deep learning in real-world scenarios.Few-shot learning has thus received increasing attention in recent years,and it focuses on how to adapt deep learning models to a novel task with a very small number of samples.In this thesis,research is carried out on the problem of few-shot image classification,with the following main focus.Recent research proposes that good performance on few-shot tasks can be achieved using standard transfer learning,which is composed by pre-training with fine-tuning.However,direct application of transfer learning methods to few-shot scenarios actually faces a dilemma as to whether the parameters in the feature extractor should be adjusted during the fine-tuning phase.This is because fine-tuning a feature extractor with a large number of parameters using a very small number of samples can easily lead to overfitting,while fixing the parameters in the feature extractor can lead to bias in the extracted features of the novel samples.To solve this problem,previous studies propose to assist fine-tuning by introducing the data used in pre-training in the fine-tuning phase,but in a practical scenario,it is likely that the data from the pre-training phase will not be available in the fine-tuning phase.Unlike previous studies,this thesis proposes to improve finetuning without using any additional data set.Specifically,this thesis proposes to redefine fine-tuning on few-shot tasks as the calibration of features with bias using an additional auxiliary network,and thus proposes a feature self-calibration framework.The framework uses Transformer as the auxiliary network model for feature calibration by aligning local features of an image in an unsupervised manner.Numerous experiments demonstrate the effectiveness of the proposed method in this thesis.In conventional supervised learning scenarios,global averaging pooling is usually used to transform feature maps into one-dimensional feature vectors,but this tends to lead to underutilization of local information in few-shot scenarios.As a result,recent research in few-shot learning has started to focus on the utilization of block-level features for images.A fundamental problem is that in the image classification problem the image is labeled for the entire image,while the image blocks are unlabeled,which leads to performance degradation if we incorrectly use the image blocks representing the background for classification.In this thesis,we observe that this phenomenon coincides with the application scenario of multiple instance learning,and propose to introduce multiple instance learning to few-shot learning.Specifically,we propose a relationship-aware multiple instance learning framework that constructs global features of images by modeling the relationships between image blocks explicitely and using graph attention neural networks for the aggregation of block-level features.We conduct extensive experiments on several few-shot learning datasets to demonstrate the effectiveness of the proposed method in this thesis. |