Font Size: a A A

Research On Algorithm Of Deep Convolution Network And Feature Fusion For Cross Modal Commodity Retrieval

Posted on:2019-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:D Z WangFull Text:PDF
GTID:2428330548479808Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
During the recent years,with the rapid development and widely application of internet technology and multimedia researching,the multi modal data experienced explosive growth on the Internet.The retrieval for huge amount of cross modal data is emerging as a new search paradigm.The cross modal search engines accept user queries from one modal,and retrieve the data objects from some multi modal datasets to find the relevant results and return them to users.The traditional search engines based on keywords matching can't satisfy cross modal retrieval neither on needs nor technology.The principal method for this problem is learning a set of mapping functions to project the data objects from multi modal datasets to a common latent semantic space,in where the traditional similarity functions or indexing solutions based on vector space can be applied directly to capture the relations between the data objects.These mapping functions are designed to project the relevant multi modal data objects to closer positions while the irrelevant objects are located far away in the common latent semantic space.Recently,most cross modal retrieval methods aim on the situation that the query and the candidate data are from single modal respectively and the solution is mapping the data directly.Particularly,we focus on the search scene of online shopping,where the candidate commodities consist of multi modal data of text and images.Hence the common methods can't be applied on this scenario directly.So we fuse the text feature and image feature by tensor fusion method to form the comprehensive item features and then project them with query text features to the common latent semantic space with some deep learning technique.After that we can calculate the similarity of the query and items directly.To achieve the retrieval algorithm,we modify the origin deep residual network,which is designed for the image classification task,to a multi task multi label learning network.We add some item keywords and property tags to force the network to learn some fine-grained detail features.Since we get the feature vectors of item images from multi-task multi-label ResNet,the text from item titles are also converted to feature vectors.We use a tensor fusion method,which applies an outer product formulation,to combine the feature vectors from both text and image modal while retain all the bi-modal interaction of every dimension from both sides.We treat it as the comprehensive feature vector presentation of our items.The text from user queries are converted to vectors by the same word-to-vector methods and transformed through a deep network with weight sharing with the item title subnet.Then we can get the relations of the query and items by calculate the similarity of their final vectors,which are generated by a deep neural network.During the training phase,we choose a pair-wise way by consider the query-item clickthrough records as positive sample and the random selected items as negative samples,respectively.We conducted some evaluation on the real-world dataset.The results demonstrate the effectiveness of our proposed method.
Keywords/Search Tags:Cross-modal retrieval, Deep learning, Information retrieval, Image processing, Search engine
PDF Full Text Request
Related items