Font Size: a A A

The Research Of Visual Recognition Based On Few-shot And Zero-shot Learning

Posted on:2021-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:S Q TanFull Text:PDF
GTID:2428330647450751Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has made great progress in artificial intelligence.It has made remarkable achievements especially in the field of computer vision.However,most work require large amounts of labeled data for training,which brings difficulties to tasks that are difficult to obtain data and costly to label.In addition,new categories are increasing constantly in some cases.Without any instances of new categories,how to obtain recognition ability for them is also challenging.Aiming at the above problems,this paper studies few-shot and zero-shot visual recognition tasks respectively.For few-shot visual recognition,only a small number of labeled samples is avail-able in each category.Thus the learner has to learn quickly with few instances.Inspired by the idea of meta-learning,we construct meta-learning tasks with the same distribu-tion to learn meta-knowledge between tasks.We first propose Memory-augmented Compact Bilinear Network(MCBNet),which regards the similarity metric between samples and categories as meta-knowledge,constructs task-specific feature representa-tions through spatial attention mechanisms and memory enhancement techniques,and uses Compact Bilinear Pooling(CBP)to get fused representations.According to a series of experiments,MCBNet shows fantastic performance on Omniglot,miniIma-geNet,tieredImageNet image datasets and UCF101,UCF11,HMDB51 video datasets.Finally,we propose an alternative method of CBP,which is named Element-wise Con-volution Pooling(ECP).It is essentially a non-linear generalized form of fixed-distance metric implemented by a neural network.Experiments show that CBP has better perfor-mance under low-quality features while ECP has advantages under high-quality ones.For zero-shot visual recognition,there is no sample instances for target categories,but each category is provided with a semantic representation.Therefore,the key prob-lem of zero-shot recognition is to learn the compatibility between the visual features and semantic representations.We first discuss the impact of different compatibility functions and embedding space on zero-shot recognition from the perspective of met-ric learning.Subsequently,the problem of difference in feature importance due to the difference in category distribution is pointed out,and inspired by the Feature Generation Network that can synthesize the sample features of unseen categories,we propose Meta-WGAN which is based on the task-based training in meta-learning.Unlike the Feature Generation Network,Meta-WGAN is a task-agnostic generative model,which aims to generate synthetic features of target categories given specific semantic representations.Since the classification loss of target synthetic samples based on the metric learning model which is trained on base categories can be used to guide the training of WGAN,it can synthesize high-quality target category features,thereby greatly reducing the dif-ficulty of metric learning.Experiments on CUB,AWA,SUN,FLO image datasets and UCF101,HMDB51,Olympic video datasets show that the feature quality generated by Meta-WGAN is superior to Feature Generation Network.After combining the metric learning model based on various pooling methods like CBP or element-wise product,the performance can be further improved.
Keywords/Search Tags:few-shot, zero-shot, metric learning, meta-learning, GAN
PDF Full Text Request
Related items