Font Size: a A A

Image-text Cross-modal Retrieval Based On Deep Hashing Method

Posted on:2019-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:W N YaoFull Text:PDF
GTID:2428330545472236Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of mobile Internet and the popularization of smart phones,digital cameras and other devices,the amount of multimedia data shows an explosive growth.In the field of information retrieval,the continuous growth of multimedia big data has brought the need for cross-modal retrieval.Cross-modal retrieval deals with the cases that the modality of query data and data to be retrieved is different,such as using image query to search text or video results.However,most of the current mainstream search engines,such as Baidu,Google and Bing,only provide search results based on one modality.In addition,as deep learning has achieved a series of breakthroughs in computer vision and natural language processing,combining multimedia big data with deep learning is the common development trend of the two fields.Therefore,taking new requirements and technologies into consideration to explore novel cross-modal retrieval models has become one of the challenges in information retrieval area.This paper mainly focus on image-text cross search tasks.Through in-depth analysis and comparison of existing methods,it is found that the hashing method has the advantages of high storage efficiency and fast retrieval speed in solving large-scale cross-modal retrieval problems.However,most of the current cross-modal hashing methods still adopt traditional hand-crafted features,and cannot make full use of the semantic information of the labels when processing multi-label data,resulting in unsatisfactory retrieval results.In view of the above defects,this paper propose a deep multi-level semantic hashing(DMSH)method for cross-modal retrieval to implement image-text cross-modal search.It solves the problem existed in most of the current cross-modal hashing methods that the rich semantic information embedded in class labels of multi-label data has not been fully utilized through the hash code learning process.The proposed DMSH integrates the strong ability of deep learning in feature extraction and representation and the efficiency of hashing in data storage and computation.Specifically,the main work of this paper includes:(1)A comprehensive survey on cross-modal retrieval,deep learning and hashing method is conducted,the research status and existing problems in these fields are analyzed in depth;(2)A similarity matrix based on label co-occurrence is proposed,which solves the problem that existing methods can not make full of the semantic information of labels which results in a low search accuracy;(3)By analyzing the characteristics of the existing network structure design of deep hashing methods,a unified framework that integrates deep feature extraction and hash code learning is proposed.Two different deep neural networks are used to extract the semantic features of images and texts,respectively.While at the output end,two subnetworks are associated through label semantic relations to achieve end-to-end learning;(4)The performance of proposed DMSH and several state-of-the-art cross-modal hashing methods including CCA?CMFH?STMH?SCM?SePH and DCMH are compared on the dataset MIRFlickr-25K;(5)The effects of 3 different convolutional neural networks,CNN-F,VGG-16 and ResNet-50,on the retrieval results are compared experimentally.Experiments show that the DMSH proposed in this paper is superior to the contrasted methods in image-text cross-modal retrieval task,and the retrieval result on CNN-F network is better than VGG-16 and ResNet-50.Based on this,further improvements can be made by exploring ways to better integrate label semantic information,discovering more semantic information,refining the text feature learning module,improving the network structure to learn better feature representations.
Keywords/Search Tags:Cross-modal retrieval, Deep learning, Hashing method, Multi-label learning
PDF Full Text Request
Related items