Font Size: a A A

Research On Novel Retrieval Techniques For Fashion Media Data

Posted on:2018-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L GuFull Text:PDF
GTID:1368330548477383Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The growing popularity of social media and the prosperity of e-commerce has produced massive fashion media data,such as street data shared by users,runway show data released by fashion brands and product data provided by e-commerce sites,etc.Fashion media data is a special kind of cross-media data,has distinctive features such as multi-modal,multi-domain,multi-scenario and weakly labeled.In this paper,we focus on new multimedia retrieval methods for fashion media data to support a variety of data retrieval ways and fashion trend analysis,which has significant value in research and application.The distinctive characteristics of fashion media data bring a lot of new research challenges to retrieval techniques of fashion data:1)how to leverage weakly labeled fashion media data to learn a feature representation of fashion images,which can be applied for multiple imageunderstanding tasks such as image retrieval and image classification?2)how to map multi-modal fashion data items from different domains into a common space,where metric distance measures can be used for data retrieval and analysis?3)how to design an image-based retrieval model to search fashion objects in different scenarios?This paper presents a new image retrieval method,a new cross-media retrieval method and a new cross-scenario object retrieval method to solve aforementioned research challenges.These novel multimedia retrieval methods provide users with flexible data retrieval and analysis tools,which has very practical meanings for the fashion industry in big data era.In summary,the main contributions of the thesis are as follows:(1)We propose a novel neighbor-constrained embedding learning based image retrieval method.Our approach learns the feature representation of fashion images by leveraging weakly labeled fashion media data,which combines both semantic category and simi-larity measure knowledge so that the learned feature can be applied for multiple image understanding tasks such as image retrieval and image classification.We present Quad-Net,an effective deep convolution neural network based image embedding network driv-en by both multi-task classification loss function and neighbor-constrained quadruplet loss function.The multi-task classification loss function ensures that the learned feature contains the semantic category information,while the quadruplet loss function ensures that the learned feature can be used for the similarity measure.Thus,the learned feature representation is general and robust,can be used for multiple image understanding tasks,such as image retrieval,image classification,image clustering,and image labeling,etc.Quantitative evaluation of QuadNet and a trend analysis of street fashion are conducted on a real weakly labeled street fashion data,which demonstrates the effectiveness of our proposed approach.(2)We propose a novel multi-domain embedding learning based cross-media retrieval method.Our approach maps multi-modal fashion data items from different domains in-to a common space,where metric distance measures can be applied for data retrieval and analysis.Considering the multi-modal and multi-domain characteristics of fashion media data,and in view of the fact that most cross-media retrieval methods only learn heterogeneous similarity,our proposed cross-media retrieval approach learns homoge-neous similarity and heterogeneous similarity simultaneously.Specifically,the proposed framework is comprised of two steps:1)learning the homogeneous similarity for image modality,mapping images from original pixel space to a fine-tuned visual space;2)learn-ing heterogeneous similarity for image and text modalities,mapping text and image data items into a unified feature space.In the first step,we design a deep convolution neural network based quintuplet network and quintuplet-based ranking loss to capture homoge-neous similarity and integrate the multi-domain characteristic of fashion media data.In the second step,we design a neural network based image-text two-branch network ar-chitecture and a cross-view similarity ranking loss to capture heterogeneous similarity.Quantitative evaluation and a trend analysis of fashion brands are conducted on a newly collected multi-modal and multi-domain fashion cross-media data,which demonstrates the effectiveness of our proposed approach.(3)We propose a novel keypoint-based cross-scenario object retrieval method to search fash-ion objects in different scenarios.The proposed retrieval framework mainly consists of a recognition module and a retrieval module:the recognition module recognizes the se-mantic meanings of products from query images,in which an object detection model is responsible for object detection,and an attribute recognition model is responsible for i-dentifying the semantic attributes of products;the retrieval module quickly searches sim-ilar objects from image database.Take the glasses object as an example,we design a keypoint-based scheme for describing glasses and implement a keypoint-based glasses detection model to identify glasses,and design multiple feature extraction mechanisms such as shape-based feature extraction,color-based feature extraction and region-based feature extraction,and design a coarse-to-fine search strategy for retrieval module to quickly search similar glasses from image database.Comprehensive experiments on a newly collected multi-scenario glasses dataset verify the efficacy and efficiency of our proposed approach.
Keywords/Search Tags:image retrieval, cross-media retrieval, cross-scenario object retrieval, deep learning, deep convolutional neural networks, neighbor-constrained embedding learning, multi-domain embedding learning, keypoint
PDF Full Text Request
Related items