Font Size: a A A

Research On Compositional Zero-Shot Recognition Method Based On Visual And Semantic Embedding

Posted on:2023-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:R PanFull Text:PDF
GTID:2568306908453854Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning technology,methods based on supervised learning have achieved great performance in various visual tasks in recent years.However,such methods rely on a large number of labeled training samples,and the learned classifier can often identify the samples belonging to categories seen in the training process well,but cannot migrate to the samples from other categories which are not seen during the training phase.In order to solve this learning task without enough labeled data,Zero-Shot Learning(ZSL)has attracted wide attention,aiming at training a neural network that can transfer the knowledge learned from the samples of seen classes to unseen classes,so as to implement the recognition of samples from unseen classes.In this paper,we mainly study the problem of Compositional Zero-Shot Learning(CZSL)in the context of Generalized Zero-Shot Learning(GZSL),which is more realistic and challenging.The learning systems not only learn information that can be transferred to unseen classes but generalize to new data from seen classes as well.There are various challenging issues in GZSL,such as the semantic gap caused by semantic and visual space mismatches and the tendency to predict data as seen classes when testing.To solve these problems,this paper carried out in-depth research,and the main research contents are as follows:(1)We propose a novel learning model based on Dual-Stream Contrastive Network(DSCN)facing the task of CZSL,which is a specific sub-problem of ZSL.Previous works in CZSL have either focused on learning the dependency between objects and attributes based on shared visual primitives of seen compositions,or disentangling the correlation between objects and attributes and then obtaining independent representations.However,we advocate that these two characteristics are equally important for CZSL.Learning model that considers only dependency may capture some spurious correlations that overfit into seen compositions and degrade generalization,while focusing only on disentangling dependencies of concepts will sacrifice the holistic information of each image.From this,a concept-level contrastive module is proposed to establish a specific prototype library as the samples of contrastive learning according to the similarity of different concepts,so as to improve the discriminability of independent representations for concepts.Meanwhile,an instance-level contrastive module is devised to learn a compositional representation in a self-supervised manner.It excavates the relevance specified to each instance by utilizing more confusable samples of compositions and further improves the transferability of the model.Extensive experiments conducted on two popular datasets,i.e.,MIT-States and UT-Zappos,demonstrate that our proposed method outperforms the state-of-the-art approaches by a large margin.(2)We proposed a novel learning model based on visual feature reconstruction and semantic enhancement.In order to further improve the migration ability of the model on unseen classes and the discriminability of the learned attribute and object representation,this paper improves the existing methods in visual apsect and semantic aspect.From the perspective of visual features,we introduce a visual feature decomposition and reconstruction module to reconstruct the independent representation of decomposed objects and attributes back to the complete visual features,and use a new combination strategy to increase the diversity of training samples and improve the generalization performance of the model.From the perspective of semantic features,we use a compositional graph to spread the label embeddings of attributes,objects and compositions in the dataset to each other,enhance the semantic information of each concept,and promote the semantic knowledge transfer from seen classes to unseen classes.Extensive experiments on two benchmark datasets demonstrate the effectiveness of our proposed approach,which outperforms the state-of-the-art CZSL methods.
Keywords/Search Tags:Zero-Shot Learning, Contrastive Learning, Compositional Zero-Shot Learning, Self-Supervised Learning, Graph Convolution Network
PDF Full Text Request
Related items