Font Size: a A A

Research On Zero-shot Learning Techniques And Their Applications In Semantic Segmentation

Posted on:2021-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y D WangFull Text:PDF
GTID:2518306512987709Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,thanks to the great potentiality of deep neural networks,object recognition models based on deep neural networks have achieved great success.Some models have even performed much better than humans in recognizing objects.However,these models always require a large scale of labeled training data during training.On the one hand,it will cost a lot of manpower and resources to annotate the training data,on the other hand,the trained model would become infeasible when meet category that does not appear during the training process.So,when recognizes a new category,it often needs to add the data of new category into the training dataset and retrain the model.To circumvent this issue,Zero Shot Learning,which does not need the training of unseen categories but can recognize them,has attracted more and more attention of researchers.Therefore,ZSL is of important theoretical and practical significance.Zero shot learning(ZSL)can establish the bridge between seen and unseen classes by using some auxiliary intermediate information,just as humans can also employ some predefined description to help recognization when encountering new objects.Therefore,ZSL models will behave more like human beings when recognizing new things,and its application can be more flexible.However,the existing ZSL methods still have many shortcomings: such as the hubness problem,the inconsistent manifold structure of the semantic and the visual space,the domain shift problem,and the lack of descriptiveness and discriminability of the latent space.In addition,the scene parsing has gradually become one of the most fundamental and challenging tasks in computer vision,but in the era of big data,more and more new categories will appear in practical application scenarios,and the traditional fully supervised model can not adapt well to the categories that have never appeared in training phase.Based on this,the major work and achievements are as follows:(1)In order to alleviate the hubness problem and the inconsistent manifold structure of the semantic and the visual space,in this paper,we proposes a novel ZSL model,namely Asymmetric Graph-based Zero Shot Learning(AGZSL),to simultaneously preserve class level semantic manifold and instance level visual manifold in a latent space.In addition,to make the model more discriminative,we also constrain the latent space to be orthogonal,which means that the projected visual features and semantic embeddings in the latent space are orthogonal when they belong to different categories.(2)In order to alleviate the problem of projection domain shift and make the latent space more discriminative,in this paper we propose a novel method,called Learning Discriminative Domain-Invariant Prototypes(DDIP).In DDIP,both target and source domains are combined and projected into a hyper-spherical space,which is automatically learned by a regularized dictionary learning.In addition,an orthogonal constraint is employed to the latent hyperspherical space to ensure all the class prototypes,including seen classes and unseen classes,to be orthogonal to each other to make them more discriminative.(3)In order to fully capture the underlying cross-modal semantic consistency,and make latent representations become unsimilar and less discriminative.in this paper,we propose a novel deep framework,called Modality Independent Adversarial Network(MIAN)for image classification,which is an end-to-end deep architecture with three submodules.First,both visual feature and semantic description are embedded into a latent hyper-spherical space,where two orthogonal constraints are employed to ensure the learned latent representations are discriminative.Second,a modal-adversarial submodule is employed to make the latent representations indistinguishable for modalities to make the shared representations grab more cross-modal high-level semantic information during training.Third,a cross reconstruction submodule is proposed to reconstruct latent representations into the counterparts instead of themselves to make them capture more modal independent information.(4)Inspired by traditional ZSL methods,which employ auxiliary information to establish the connection between seen and unseen categories,in this paper,we propose a novel method called Semantic Embedded Network(SCN)to achieve the goal of Zero Shot Scene Parsing.The Semantic Embedding Network aims to learn a scene parsing model that only dependent on seen classes but hope to work on unseen classes.In addition,due to the embedding of label semantic into traditional networks,our SCN can further improve the performances of traditional full-supervised scene parsing methods.
Keywords/Search Tags:Zero Shot Learning, Generalized Zero Shot Learning, Scene Parsing, Semantic Segmentation, Dictionary Learning, Orthogonal Constraint, Adversarial learning
PDF Full Text Request
Related items