Font Size: a A A

Transduction-based Zero-shot Image Classification Method

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:H Q MaoFull Text:PDF
GTID:2518306512987869Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Supervised learning is a machine learning task that infers functions from labeled training dataset.It is one of the most important branches of machine learning.In recent years,with the development of deep learning,the performance of supervised learning has been greatly improved,for example,the recognition accuracy of top-5 objects of 1000 categories in the Imagenet dataset has reached 97.7%,which has been proved to exceed the recognition ability of human beings.However,supervised learning is subject to a very strong constraint that the test data must come from the same category as the training data and that hundreds or even thousands of tagged samples need to be collected for each category.Because there are billions of species on Earth,new species appear every day,so it is impossible to include all classes in a training model.To solve this problem,transfer learning,Few-Shot learning,One-shot learning,Zero-shot learning and other methods came into being.Zero-shot learning aims to identify the categories that have no label data available during training.With its characteristics more in line with human learning mechanism,it has gained great attention in recent years,and has become a research hotspot in many fields.Based on the above research background,this paper mainly studies the image classification of Zero-shot learning based on the transductive setting.Through the depth analysis of the existing method short board,different modeling methods are used to effectively solve the current problems of the domin shift and hubness problem faced by Zero-shot learning,so as to improve the performance of Zero-shot learning.The main research contents and contributions of this paper are as follows:1.In this paper,a general GTR is proposed for transductive Zero-shot learning.It assigns unlabeled samples to known attributes by defining Kullback Leibler divergence(KLD).Specifically,it forces unlabeled unseen data to be distributed to data distribution similar to target distribution.GTR has nothing to do with the original direct method,so it can be easily extended to many compatible linear models and depth models about Zero-shot learning,and significantly improve the classification accuracy of the original Zero-shot learning method.The current regularizers of transductive settings are only applicable to the model they proposed,and cannot be extended to other models,which seriously limits the use of these methods.In this paper,we propose a generalized GTR,which can be easily extended to other Zero-shot learning methods,especially compatible models.2.This paper proposes a probabilistic framework,which defines a new latent space.It has two characteristics.The first is the clustering of feature classes in the space and the dispersion between classes,which is achieved by triplet network.The other is that the class prototype of invisible class is composed of non negative coefficient and class prototype of visible class.The non negative coefficient is generated by the relationship between visible and invisible classes,which is calculated by nonnegative matrix decomposition(NMF).The Gauss model was used to complete the transductive model of the framework.In this paper,the probability classification model is improved,and the method of metric learning is used to make the data more discriminative.In addition,the relationship between visible classes and invisible classes is established to solve the problem of domain shift.3.In this paper,a method based on robust principal component analysis(RPCA)is proposed,which makes a relaxation by adding sparse noise constraints.In addition,in order to avoid confusion between similar classes,orthogonal constraints are used to disperse all class prototypes(including visible and invisible classes)in latent space.In addition,in order to alleviate the problem of domain shift,the visual features and semantic attributes are reconstructed by using the vectors from latent space.In this paper,latent space vectors are used to reconstruct the visual features and semantic attributes respectively to alleviate the problem of domain shift.In addition,by using the maximum probability model in the combined three spaces,the hubness problem is also alleviated.
Keywords/Search Tags:transductive ZSL, KLD, NMF, RPCA, domain shift
PDF Full Text Request
Related items