Font Size: a A A

Zero-Shot Sketch-Based Image Retrieval Via Deep Semantic Information Mining

Posted on:2021-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:X X XuFull Text:PDF
GTID:2518306047984119Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of image content on the Internet,how to retrieve targeted images in large-scale data images has become the focus of people's attention.Conventional image retrieval requires providing textual descriptions.However,it takes a lot of manpower and resources to get the textual descriptions when training for retrieval model.In recent years,with the development of mobile intelligent devices,people can quickly draw sketches on them and describe the features of objects simply and clearly.Therefore,Sketch-Based Image Retrieval(SBIR)has aroused extensive research interest.In addition,it is difficult to ensure that all object classes are seen during training.Therefore,Zero-Shot Sketch-Based Image Retrieval(ZS-SBIR),a more practical and challenging problem,has become an important research topic in the field of computer vision.ZS-SBIR mainly has two difficulties:semantic gap in cross-modal retrieval and knowledge transfer in Zero-Shot learning(ZSL).The key to solving these two problems is how to mine the common semantic information of images and sketches.Therefore,this thesis focuses on how to mine the semantic information of images and sketches,such that the model can generate retrieval features containing valid semantic information.The main contributions of this thesis are as follows:1.In order to mine common semantic information of the two modalities during the pro-cess of learning,this thesis proposes a ZS-SBIR method based on semantic cross-modal reconstruction.First,two word embedding models are used to extract word vectors.In or-der to ensure that the retrieval features can maintain the semantic information,this thesis adopts a progressive generation strategy.Our proposed model first maps the visual features to semantic space and aligns the word vectors to obtain semantic features,and then maps semantic features into a low-dimensional common space for efficient retrieval.When gen-erating semantic features,this thesis adopts the idea of generative adversarial network,in which word vectors are taken as real samples to constrain semantic features.The learned se-mantic features should approximate the distribution of word vectors as much as possible.In addition,in order to further mine semantic information,this thesis proposes semantic cross-modal reconstruction loss,which ensures that the semantic features generated by the two modalities can be reconstructed back to the original sketch visual features and image visual features.Experiments on Sketchy and TU-Berlin prove the effectiveness of the proposed method.2.In order to make the learned semantic features more complete and pure,this thesis proposes a ZS-SBIR method based on visual feature decomposition.As a matter of fact,the visual feature contains semantic information and domain information,while the domain information interferes with the retrieval task.Therefore,in order to mine more complete and pure semantic information,the visual feature decomposition module is adopted in this thesis to decompose visual features into retrieval features containing only semantic information and domain features containing only domain information.Among them,the generation of retrieval features is also progressive.In addition,in order to further mine the complete and pure semantic information,this thesis proposes the cross-combination reconstruction loss,which requires that the decomposed retrieval features and domain features can be recon-structed back to the original sketch visual features and image visual features according to different combinations.Experiments on Sketchy and TU-Berlin prove the effectiveness of the proposed method.
Keywords/Search Tags:Sketch-Based Image Retrieval, Zero-Shot Learning, Deep Learning, Semantic Mining, Generative Adversarial Network
PDF Full Text Request
Related items