Font Size: a A A

Cross-model Retrieval For Free-hand Sketch

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:J Y XueFull Text:PDF
GTID:2428330632962937Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,the cross-modal retrieval task of free-hand sketch is a relatively popular research area.This paper mainly focuses on fine-grained cross-modal retrieval between free-hand sketches and natural pictures,i.e.,fine-grained sketch-based image retrieval(FG-SBIR)task.The query sketch is an abstract and ambiguous data,while the retrieval target is a common natural image dataset,thus there is domain gap between the two modal data.Therefore,the key problem of FG-SBIR are to build a bridge between two modal data to eliminate domain gap,i.e.,we need to extract visual features from sketches and natural images,and embed them into a common embedding space.Then,the two main challenges of FG-SBIR task are(1)the abstraction of sketches brings challenge when extracting effective visual feature information from sketches and natural pictures;(2)We need to build a common embedding space,which is suitable for cross-modal information.This paper is dedicated to solving the above two challenging problems,by analyzing the characteristics of sketches and the difficulties of cross-modal retrieval task,corresponding solutions are proposed.We propose a novel FG-SBIR model:(1)by analyzing the characteristics of sketch data,our paper finds that sketches are abstract and sparse.Aiming at these two characteristics,an attention mechanism is introduced to enable the model to extract more effective visual feature information;(2)Through further research,it was found that the existing model only focused on the features extracted from the final fully-connected(FC)layer of the model but ignored the features from the intermediate layer,which are full of low-level visual feature information.Therefore,our paper fuses the feature information on the middle layer and the final FC layer to build a common embedding space;(3)In order to obtain the feature information from the middle layer better,our paper proposes a multiple triplet ranking model,which introduces an auxiliary supervised loss function of the middle layer to obtain more effective feature information.Finally,our paper also proposes a novel distance metric to further improve the performance of our model.In this paper,extensive experiments are performed on three fine-grained sketch-image retrieval public datasets:QMUL-Shoe,QMUL-Chair and QMUL-Handbag.The experimental results show that the proposed method achieves better performance than the state-of-the-art methods.The results of comparative experiments have proved the effectiveness of each module in the model.
Keywords/Search Tags:Cross-modal, Fine-grained retrieval, Free-hand sketch, Embedding space, Visual feature
PDF Full Text Request
Related items