Font Size: a A A

Fine-Grained Sketch-Based Image Retrieval

Posted on:2020-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:K LiFull Text:PDF
GTID:1368330575456576Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Traditional sketch-based image retrieval SBIR mainly focus on category-level retrieval,where intra-category variations are neglected.This is not ideal,since if given a specific shoe sketch(e.g.,high-heel,toe-open)as query,it can return any shoe,including those with different part semantics(e.g.,a flat running shoe).Thus fine-grained sketch-based image retrieval(FG-SBIR)is emerging as a way to go beyond con ventional c.ategory-level SBIR,and fully exploit the detail that can be conveyed in sketches.By providing a mode of interaction that is more expressive than the ubiquitous browsing of textual cat?egories,FG-SBIR is more likely to underpin any practical commercial adoption of SBIR technology.We study the problem of fine-grained sketch-based im age retrieval.By performing instance-level(rather than category-le.vel)retrieval,it embodies a timely and practical application.Three factors contribute to the challenging na-ture of the problem:(?)free-hand sketches are inherently abstract and iconic,making visual comparisons with photos difficult,(?)sketches and photos are in two different visual domains,i.e.black and white lines vs.color pixels,and(?)fine-grained distinctions are especially challenging when executed across domain and abstraction-level.In order to bridge the cross domain gap bet.ween sketch and image,(a)we proposed fine-grained attributes and proposed a part-aware m ethod to predict the attributes,which can decorrelate semantic visual attributes.Existing cross domain alignment methods focus on category-level,which are not fit for Fine-grained task,(b)we proposed a synergistic instance-level subspace alignment,which can both align subspace and instance-level cues.Existing FG-SBIR models aim to learn an embedding space in which sketch and photo can be directly compared.While successful,Since the learned embedding space is domain-specific,these models do not generalise well across categories.This limits the practical applicability of FG-SBIR.(c)we proposed a Generalising F G-SBIR method,which can improve the performance on the Unseen categories.(d)we proposed a deep universal sketch perceptual grouper,which can be used on the edgemap of photos to synthesise human-like sketches to retrain the FG-SBIR model.Such that the FG-SBIR model can generalise to novel categories without free-hand sketches.The hardest challenging in FG-SBIR is the semantic gap,we use attributes to bridge it.However,they suffer from being hard to predict due to spurious cor-relations.In order to address those,(?)we contribute a FG-SBIR dataset,where sketch and image are annotated with its semantic parts and associated part-level attributes.With the help of this dataset,we investigate(?)how strongly-supervised deformable part-based models can be learned that subsequently en-able automatic detection of part-level attributes,and provide pose-aligned sketch-image comparisons.Fin ally(?)those multi level features are combined in a matching framework integrating.Extensive experiments conducted on FG-SBIR datasets demonstrate effectiveness of the proposed method.Sketch and image are inherently different domain,it is very challenging to align them in instance-level.In order to address those,we propose a novel method for instance-level domain-alignment,that exploits both subspace and instance-level cues to better align the domains.Extensive experiments con-ducted on FG-SBIR datasets demonstrate effectiveness of our method compar-ing with other cross-model matching methods and domain-adaptation method and even comparing with the deep learning methods in FG-SBIR.Existing FG-SBIR methods could not generalize to unseen categories,so(i)we propose a novel unsupervised learning approach to model a universal manifold of prototypical visual sketch traits for the first time.This manifold can then be used to paramaterise the learning of a sketch/photo representation to adapt to novel categories.Experiments on the two largest FG-SBIR datasets,Sketchy and QMUL-Shoe-V2,demonstrate the efficacy of our approach in en-abling cross-category generalisation of FG-SBIR.It is time-consuming to collect an FG-SBIR database.we lateral address it with deep sketch perceptual grouping,specifically,(?)we contribute the largest sketch perceptual grouping(SPG)dataset to date,consisting of 20,000 unique sketches evenly distributed over 25 object categories,(?)We developed a uni-versal sketch grouper.We show that the proposed model significantly outper-forms the state-of-the-art groupers.(?)our grouper is to use it as an abstraction model so that edgemaps extracted from photos can be grouped and abstracted to synthesise human-like sketches to train a state-of-the-art FG-SBIR model without using any real human sketches.We conduct experiment on the largest FG-SBIR datasets QMUL Shoe-V2 and Chair-V2 to demonstrate effectiveness of the proposed method.
Keywords/Search Tags:Fine-grained Sketch-based Image Retrieval, Instance-level, Subspace, Cross-modal, Generalising model, Universal Sketch Perceptual Grouping
PDF Full Text Request
Related items