Font Size: a A A

Zero Shot Image Classification Based On Semantic Guidance

Posted on:2023-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2568306827475404Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Training a high-performance supervised model needs to obtain a large number of high-quality annotation data.The model can only identify the categories seen in training stage.Zero shot learning proposes to transfer the knowledge learned by the existing model to the unseen category,which solves the dilemma of lack of labeled data and low model reusability in supervised learning.Due to the lack of unseen class images,semantic information is introduced as ground truth to classify unseen class correctly.However,domain shift and modal gap were introduced.Focusing on the problems of domain shift and modal gap in zero shot learning,this paper proposes two zero shot learning methods based on semantic guidance.The specific research contents are as follows:(1)Aiming at the problem of domain drift caused by the poor generalization of the learned seen class-related attributes,a zero-shot image classification method based on the discriminative attribute space is proposed.The algorithm proposes two effective attribute enhancement strategies and an adaptive margin loss to enhance the discriminativeness of the unseen class attributes.The visual features are weighted and combined to expand the trainable set so that the model can strengthen the learning of the relevant attributes of the unseen class;with the relevant sample retraining module,the seen class samples fine-tuning model with similar visual features to the unseen class samples is used in the testing stage,which can enhance the weight of unseen class-related attributes in model prediction;with the adaptive margin loss function,the generalization of the model is improved by optimizing the discriminativeness and transferability of the model,thereby improving the discriminability of the unseen class attribute space.The proposed method is effective and advanced on multiple public datasets.(2)Aiming at the modal gap problem between semantic embedding and visual features,a zero-shot image classification method based on visual-semantic feature decoupling is proposed.This method decouples the rich information contained in the image into visual and semantic modal information,fully exploits the representation capabilities of visual information and semantic information in their respective modalities.First,using feature extractors with the same structure but not shared parameters to extract visual features and semantic features respectively;secondly,the modified attribute labels are obtained by weighted fusion of the attribute labels and semantic features which predicted by the seen classes through the attribute correction module.The modified attribute is used as a label-supervised model to generate more discriminative visual features;finally,the visual information in the semantic features and the semantic information in the visual features are discarded during testing stage.Mixing the remaining information to obtain more discriminative mixed features.Compared with similar methods,the proposed method has achieved the best classification accuracy in multiple public datasets.
Keywords/Search Tags:zero shot learning, image classification, data augmentation, feature decoupling
PDF Full Text Request
Related items