Font Size: a A A

Zero-Shot Retrieval Based On Cross-Modal Semantic Guided

Posted on:2019-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y X SunFull Text:PDF
GTID:2428330623462500Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of multimedia data,the fast and accurate image retrieval become an urgent issue.However,the training procedure of typical image retrieval method is based on massive labeled training data,and the acquirement of label is time-consuming and labor-intensive.To solve this problem,Zero-Shot retrieval is proposed to deal with the image retrieval in“unseen” categories.In this work,We devise an effective strategy to resolve zero-shot hashing using semantic Softmax loss(SSL)and attribute-guided network(AgNet).Different from typical zero-shot learning(ZSL)methods which integrate the visual features and the class semantic descriptors into a multimodal framework with a linear or bilinear model.A nonlinear approach is proposed to impose ZSL as a multi-class classification problem via a Semantic Softmax Loss by embedding the class semantic descriptors into the Softmax layer of multi-class classification network.To narrow the structural differences between the visual features and semantic descriptors,we further use an L2 normalization constraint to the differences between the visual features and visual prototypes reconstructed with the semantic descriptors.The results on three benchmark datasets,i.e.,AwA,CUB and SUN demonstrate the proposed approach can boost the performances steadily and achieve the state-of-the-art performance for both zero-shot classification and zero-shot retrieval.The combination of zero-shot retrieval and hashing is another topic of this work.Existing efforts in zero-shot hashing mainly focus on single-modal retrieval task,especially Image-Based Image Retrieval(IBIR).However,as a highlighted research topic in the field of hashing,cross-modal retrieval is more common in real world applications.To address the Cross-Modal Zero-Shot Hashing(CMZSH)retrieval task,we propose a novel Attribute-Guided Network(AgNet).It aligns different modal data into a more high-level semantic space,i.e.,attribute space.Besides,category similarity is utilized to construct the relationships between different modalities while attribute similarity is introduced to regularize the distance of similar categories in single modality.Experimental results on both cross-modal and single-modal retrieval tasks have demonstrated the superiority of the proposed approach.To further evaluate the performance of AgNet in each category,confusion matrix and t-SNE are utilized to visualize the neighbor relationship between textual hash codes and visual hash codes of AgNet.
Keywords/Search Tags:Zero-Shot Learning, Cross-Modal Hashing, Image Retrieval, Attribute-Guided, Semantic Embedding
PDF Full Text Request
Related items