Font Size: a A A

Heterogeneous Graph Hashing For Cross-Modal Audio-Image Retrieval

Posted on:2022-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X LiangFull Text:PDF
GTID:2518306605466034Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and digital multimedia technology,the time of big data is coming.How to effectively store and analyze the massive multimedia data,and provide users with rich media data have paid much attention in both the academy and industry.The key technology to solve the problem of multimedia big data retrieval – crossmodal hashing,aims to map heterogeneous multimedia data into a common Hamming space,which can realize fast and flexible retrieval across different modalities.However,the current researches on cross-modal hashing retrieval mainly focus on text-image retrieval,but paid little attention to audio-image retrieval.Although audio-image retrieval has been widely used in the fields of early childhood education and assistance for visual impairment people,it is difficult to accomplish audio-image retrieval due to the sparsity difference.The biggest challenge of cross-modal retrieval technology is the heterogeneous difference between different modal data,leading to the heterogeneous gap between modalities and performance dropping.Existing methods can only build weaker cross-modal association relationships through correlating different modal data under a simple similarity constraint,resulting in unreliable hash codes.Moreover,existing methods rely on plentiful labeled samples for supervised training,and cannot be applied to unlabeled samples.Therefore,the realization of cross-modal audio-image hash retrieval under zero-shot learning has become another problem to be solved urgently.Based on the above analysis,this paper proposes two cross-modal audio-image hash retrieval methods based on heterogeneous graph learning,which to a certain extent solves the problem of heterogeneity gap between modalities and dependence on labeled information,and obtains better retrieval performance.The specific contents are as follows:(1)Aiming at the heterogeneous gap between modalities,this paper proposes a cross-modal audio-image hashing retrieval method based on heterogeneous graph learning.First,build a basic framework for deep cross-modal audio-image hashing retrieval,combining feature extraction and hash code learning processes;Then,a learning module of heterogeneous graphs is designed,which uses heterogeneous graphs to construct cross-modal data associations to realize the information exchange between modals.Heterogeneous graphs iteratively generate feature with rich heterogeneity information,and then learn cross-modal hash codes.Experiments show that the proposed method enhances the features,constructs a stronger cross-modal association relationship compared with the existing methods,reduces the heterogeneity gap between modalities,and effectively improves the retrieval accuracy.(2)Aiming at the problem that cross-modal hash retrieval relies on labeled datas,this paper proposes a zero-shot cross-modal hashing retrieval method based on attribute mining.First,build a public attribute space,where different categories in the attribute space can be linked by attribute values.Then,learn the association between the sample datas in the feature space,the hash code space,and the attribute space,and map the features of different modalities to the common attribute space for alignment,that is,share the supervision information through some of the same attribute values.Therefore,in the training step,the known categories samples are used to train the cross-modal hash network.The unknown categories samples input in the test step build the association with the known class through the attribute information,and then use this association information to generate hash codes correctly.Experiments show that the proposed method can effectively model the relationship between known and unknown categories,and realize cross-modal audio-image zero-shot hashing retrieval.
Keywords/Search Tags:Cross-modal retrieval, Hashing, Graph convolutional neural network, Zero-shot Learning
PDF Full Text Request
Related items