Font Size: a A A

Research On Zero-Shot Image Classification Based On Deep Feature Representation

Posted on:2023-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:S M ChenFull Text:PDF
GTID:1528307172953699Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image classification is a fundamental task in computer vision.In recent years,the image classification technology has made great progress using deep learning for visual feature representation(i.e.,deep feature representation).However,such conventional image classification can only classify the images of seen classes,failing to recognize thousands of unknown objects.Under the guidance of semantic information(e.g.,attributes),zero-shot image classification(ZSIC)can classify images of the unseen class by transferring knowledges from seen classes,and it has become a popular research in artificial intelligence.To release the potential of the deep feature representations based ZSIC,this thesis focuses on tackling the challenges of:(1)cross-dataset bias;(2)the inconsistency between visual-semantic feature representations;and(3)the heterogeneous feature alignment between vision-semantic.The main contributions and innovations of this thesis are as follows:(1)To tackle the challenge of cross-dataset bias in deep feature representations based ZSIC,this thesis proposes two visual feature enhancement models for the embedding-based and generative ZSIC methods.As for the embedding-based ZSIC,a graph-guided dual attention network is introduced to fuse the local visual features and explicit global visual features to enhance visual features.As for the generative ZSIC,a feature refinement learning mechanism is proposed to enhance the visual features and encourage the generator to synthesize realistic visual features for unseen classes.The experimental results show that the proposed methods effectively improve the discrimination and transferability of visual features,which enables ZSIC to conduct effective interactions between visual-semantic features and achieve significant performance gains.(2)To tackle the challenge of the inconsistency between semantic-visual feature representations in deep feature representations based ZSIC,this thesis proposes a key common semantic knowledges between visual-attribute features based ZSIC methods.First,an attribute-guided Transformer network employs the cross-attention to learn the visual features with accurate attribute localization to represent the key common semantic knowledges.Then,a mutually semantic distillation network takes bidirectional attention sub-nets to learn attribute-based visual features and visual-based attribute features.Under the guidance of mutually semantic distillation learning,the two sub-nets learn consistent semantic features.Finally,the two networks are integrated into an unified framework to fully and exactly discover the key common semantic knowledges between visual-attribute features,which improves the semantic consistency between visual-semantic features.As such,our method conducts effective semantic knowledge transfer from seen classes to unseen ones for ZSIC.(3)To tackle the challenge of the heterogeneous feature alignment between visionsemantic in deep feature representations based ZSIC,this thesis proposes a hierarchical semantic-visual adaptation based ZSIC.Different to existing one-step adaptation method that on alignment the feature distributions between visual and semantic domains,this method utilizes a hierarchical adaptation to learn an intrinsic common space for semantic and visual feature representations by adopting sequential structure adaptation and distribution adaptation.To this end,the proposed method realizes the real alignment of visual and semantic features to achieve classification performance gains for common space learning based ZSIC.This thesis conducts the extensive experiments to demonstrate the effectiveness of the proposed methods,which lead the state-of-the-art performance on several popular benchmark datasets.
Keywords/Search Tags:Zero-shot learning, Zero-shot image classification, Deep feature representations, Knowledge transfer, Feature enhancement
PDF Full Text Request
Related items