Font Size: a A A

Zero-Shot Learning Based On Cross-Modal Feature Synthesis

Posted on:2020-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2518306518465184Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Deep learning has greatly promoted the development of computer vision,such as image retrieval and object detection.But these tasks need to be trained with a large amount of labeled data.However,some categories in the real-world have few or even no labeled samples,which brings new challenges to traditional classification algorithms.In order to solve this problem,Zero-Shot Learning(ZSL)has attracted much attention,which aims at recognizing unseen classes that are absent during the training stage.In this thesis,two models are proposed for ZSL,which are Class-Specific Synthesized Dictionary Model(CSSD)and Multi-Modal Generative Adversarial Network(M~2GAN),respectively.Firstly,the CSSD approach is proposed to map class semantic prototypes to visual space and learn a class-specific coding matrix for each class to replace the global coding matrix of all classes.The process of CSSD includes two stages,i.e.,reconstruction of visual features and synthesis of pseudo instances.Through the synthesis of pseudo instances,the ZSL problem can be transformed into the traditional classification problem.Considering the specificity and similarity among classes,CSSD algorithm obtains the class-specific coding matrix of each class and a shared coding matrix of all classes in the reconstruction process.In the synthesis of pseudo instances stage,we seek the seen classes that are similar to the unseen classes and use the linear combination of those seen classes to synthesize pseudo instances of unseen classes.On this basis,we use support vector machine(SVM)to complete the classification task.The proposed CSSD is evaluated on four benchmark datasets(Aw A,CUB,a PY and SUN),and the results illustrate the effectiveness of the proposed CSSD approach and its relatively advanced level on both traditional ZSL and generalized ZSL tasks.Secondly,the ZSL based on the generative adversarial network is explored.The existing such approaches usually map a single type of class semantic prototype to visual space to complete the task.In this thesis,considering the complementary information between various types of semantic prototypes,the proposed M~2GAN constructs multiple generators and takes as input the various types of semantic prototypes respectively to generate corresponding pseudo features,and then assign different weights to fuse the feature to mine the complementary information.The whole training process is formulated into an adversarial framework and the fused pseudo features are more suitable for the distribution of real features.In order to verify the effectiveness of the proposed M~2GAN,we conduct experiments on three datasets(Aw A1,Aw A2,CUB)and prove that the performances of the approach reach the level of current advanced algorithms.Meanwhile,we further analyze the influence of various types of semantic prototypes and their weights on performance,which fully verifies the effectiveness of utilizing the complementary information and feature fusion.
Keywords/Search Tags:Zero-Shot Learning, Dictionary Learning, Generative Adversarial Network, Image Classification
PDF Full Text Request
Related items