Font Size: a A A

Free-hand Sketch Based Visual Retrieval Study

Posted on:2020-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:P XuFull Text:PDF
GTID:1368330572472160Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
This thesis studies sketch-based visual retrieval,mainly including fast re-trieval for large-scale free-hand sketch,fine-grained sketch-based image re-trieval,fine-grained sketch-based video retrieval,etc.The contributions of this thesis can be summarized as follows.Firstly,the third chapter of this thesis defines a novel topic of fast re-trieval for large-scale free-hand sketch,and studies serveral intrinsic data traits of large-scale free-hand sketch using million-scale sketches as test-bed.The author proposes a deep hashing network,and it major innovations are:(1)A two-branch architecture of CNN and RNN is used to conduct the feature learn-ing and representation for sketch,utilizing CNN to extract abstract visual con-cepts and RNN to model human sketching temporal orders.(2)A novel sketch hashing loss is proposed that can suppress the impact of noise samples dur-ing network training,alleviating the intrinsic abstraction and noisy problem of large-scale sketches.This novel loss supervises the model to learn a feature space with category-level cohesiveness.Moreover,the proposed two-branch architecture also can be applied to large-scale sketch recognition.Another novel research problem is also defined in the third chapter,i.e.,zero-shot classifica-tion for large-scale sketch.The author proposes a deep embedding model to solve this challenging problem,which uses category-level semantic vector ex-tracted from edge-maps to conduct domain alignment.To obtain high-quality edge-map based semantic vectors,a large-scale edge-map dataset is collected covering 290,281 edge-maps and 345 categories.Secondly,the fourth chapter of this thesis explores the cross-modal sub-space learning for sketch-based image retrieval,and introduces a variety of clas-sical cross-modal subspace learning methods that have been successfully ap-plied to cross-modal matching between images and texts.Then,these methods are applied to the mutual retrieval between sketches and photos,and detailed experimental results and analysis are provided.Based on comparison experi-ments,the key elements that need to be considered in the process of cross-modal modeling for sketches and photos are discussed.At the same time,it also fully verified the application feasibility of the cross-modal subspace learning for the cross-modal matching between sketches and photos.The fifth chapter of this thesis defines a challenging problem:fine-grained sketch-based instance-level video retrieval,that is,a single sketch or a sequence of multiple sketches is used as a query to retrieve corresponding video in-stance.In this scenario,sketch contains both fine-grained visual appearance information and fine-grained motion information,and fine-grained motion tra-jectories are denoted by arrowed straight lines,curves,circles.To investigate this problem,the author collects the first fine-grained sketch-based video re-trieval dataset,containing 1448 sketches and 528 video clips with rich manual annotations.A multi-stream multi-modality neural network is proposed,which uses the idea of meta-learning to effectively solve the data scarcity problem of training samples,and has achieved good experimental results.The proposed network can be trained not only under the strong supervision training strategy,but also under the weak supervision training strategy based on multi-instance learning framework.
Keywords/Search Tags:Free-Hand Sketch, Retrieval, Hashing Retrieval, Image Retrieval, Video Retrieval, Cross-Modal Retrieval
PDF Full Text Request
Related items