Font Size: a A A

Research On Semantic Preserving And Correlation Mining For Cross-modal Hashing

Posted on:2020-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z K HuFull Text:PDF
GTID:2428330590463045Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the fast development of Internet,tremendous information with different representations,including text,image,video and sound,can be achieved easily,this various information constitutes multi-modal data.Nowadays,single modal retrieval,such as searching text by text,cannot meet the need of internet users,which may want to retrieve various information across different modalities.For example,they want to retrieve text,sound and video by an image query.Thus,more and more attention has been shifted to cross modal retrieval task.Because of the low storage cost and fast computing speed,hashing methods are highly impressive with the need of retrieval task and it is very meaningful to extend such method to cross modal retrieval task.In this paper,we focus on the research of cross modal hashing methods via semantic preserving and correlation exploration,the main contributions of this paper are listed as follow:(1)We organize traditional cross modal hashing methods and analyze four typical ones.Using the idea of variable-controlling,we try to define the meaningful parts that are beneficial to the retrieval performance of these approaches from four aspects: framework,regression strategy,iteration strategy and hash codes regeneration strategy.Finally,we draw a series of conclusions,which we think are instructive in the later research.(2)We propose a cross modal hashing method based on matrix tri-factorization,termed MTFH.The dimensions of data from different modalities are commonly different,using hash codes of equal length to represent them would be harmful to the representation of these data.In addition,there are numerous unpaired multi-modal data,to which few researchers pay attention.To solve these two problems,we use matrix trifactorization to learn hash codes of different length for multi-modal data and semantic affinity matrix to handle both paired and unpaired data.Massive experimental results demonstrate that MTFH can seamlessly handle all scenarios: both equal and unequal length cross modal hashing retrieval,both paired and unpaired cross modal retrieval and single modal retrieval.(3)We propose a supervised corresponding autoencoder model and a series of variations.In these models,two autoencoders are used to learn different representations of multi-modal data separately.In addition,by utilizing label information,the learnt common representations of multi-modal data become more discriminative.These models are based on real valuable representations,we try to add a hashing layer on the hidden layer to shift them into hash representations and the new hashing model achieves acceptable results.(4)To deal with unpaired multi-modal data,we propose a triplet fusion network hashing model,termed TFNH.Unlike two-stream networks model,in this model,a fusion network is used to deal with multi-modal data simultaneously,which makes the correlation of networks in two-stream framework become stronger.Thus,TFNH can explore correlation between modalities by loss constraints and network themselves simultaneously.In addition,by introducing the operation zero padding,TFNH can not only cope with both paired and unpaired data,the problem domination domain can also be tackled.Experimental results show that TFNH can seamlessly handle both paired and unpaired scenarios and is not highly dependent on paired relationship constraint.
Keywords/Search Tags:Cross Modal Retrieval, Semanctic Common Subspace, Hashing, Matrix Tri-Factorization, Autoencoder, Fusion Network
PDF Full Text Request
Related items