Font Size: a A A

Research On Deep Hashing Method And Security For Cross-Modal Retrieval

Posted on:2021-12-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LiFull Text:PDF
GTID:1488306311471644Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and digital multimedia technology,the world has entered the era of big data.The volume of network data is growing explosively.Data transmission has developed from a single text to multimedia data including text,image,video,3D model,and so on.How to effectively store and analyze the massive multimedia data,in order to obtain the different media data that users are interested in,has attracted enormous attention in industry as well as academia.Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space,which can realize fast and flexible retrieval across different modalities.Thus cross-modal hashing has been the main technology to manage and analyze massive multimedia data.In addition,the Internet has the characteristics of openness,there is a common phenomenon of abusing multimedia data,so the security of cross-modal hash is also facing a huge ordeal.By designing adversarial attack examples,designing the adversarial attack on retrieval task,and analyzing the security of deep crossmodal hash models,so as to improve the robustness of models,it has become hot research topics in the cross-modal retrieval area.The existing cross-modal hash methods can be roughly categorized into two categories: the hand-crafted feature setting and the deep learning setting.This thesis focuses on the research of deep cross-modal hashing,aiming to solve the problems in the existing methods,such as low representation ability of features,weak cross-modal correlation,inaccurate semantic similarity,and poor model robustness.We can summarize our main contributions as follows:A deep self-supervised adversarial hashing approach is proposed for cross-modal retrieval.Traditional cross-modal hash methods rely on expert knowledge to manually design features.Limited by the existing knowledge,these methods often fail to learn the feature representation with high discriminative ability,thus results in unsatisfied retrieval performance.Moreover,existing deep hash methods associating different modalities by simply constructing a similarity constraint can only build weak cross-modal correlations and produce unreliable hash codes.To solve this problem,a novel deep cross-modal hash network based on selfsupervised learning is proposed.First,a self-supervised semantic network is constructed to explore the latent semantic similarity structure between cross-modal data.Then,we propose a deep cross-modal hash framework based on generative adversarial networks,with which the learned semantic knowledge is combined to learn hash codes in a generative adversarial learning fashion.To learn reliable semantic feature and hash code,the end-to-end learning framework is constructed under the guidance from both Euclidean space and Hamming space.Experimental results show that the proposed method effectively enhances the correlation between different modalities and achieves state-of-the-art cross-modal retrieval performance compared with both traditional methods and existing deep methods.A deep unsupervised cross-modal hashing method based on coupled cycle generative adversarial network is proposed.There is a huge amount of multimedia data here but without sufficient annotations.However,the existing deep learning methods have to rely on highquality manual annotations.Obtaining high-quality annotation training data is expensive,which greatly limits the application of existing deep cross-modal hashing methods.In order to solve this problem,a deep unsupervised cross-modal hashing method based on coupled cycle generative adversarial network is proposed,where a unified framework with two generative adversarial networks is constructed to build cross-modal correlations and learn hash codes,respectively.Considering the co-occurrence of paired cross-modal data,the outer-cycle network is constructed to learn shared semantic representations by executing mutual generation,while the inner-cycle network is designed to generate reliable compact hash codes depending on the learned representations.Experimental results show that the proposed method can effectively mine accurate similarity relationships across different modalities and greatly alleviate the dependence on manual labels.A novel cross-modal learning method by designing adversarial examples is proposed.The latest researches show that problems such as lack of robustness and easy to attack are common among deep networks.Unfortunately,current most advanced cross-modal hashing methods have directly or indirectly adopted deep networks,which leads to the security risks of deep hashing methods and further limits the application of deep cross-modal hashing methods in real-world tasks.To solve this problem,we propose a cross-modal correlation learning with adversarial examples.Comparing the retrieval task between intra-modality and intermodality,we show the distributional differences in semantic structure between regular data and adversarial sample.Then,we propose to learn adversarial examples by maximizing intramodal semantic consistency and minimizing inter-modal semantic consistency.In this way,we achieve exactly to attack the inter-modal retrieval while remaining intra-modal retrieval.The efficiency of the obtained adversarial sample thus can be illustrated.Experiments show that the cross-modal adversarial attack examples designed by our method can effectively attack the existing most popular deep cross-modal hashing networks.On the contrary,using the learned adversarial examples to train a target model,the defense ability of the model can be effectively improved.A disentangled adversarial examples for cross-modal learning is proposed to boost the robustness of cross-modal hashing networks.The existing deep cross-modal hashing methods generally construct the similarity as supervision to learn a deep network.Relying on the strong fitting capacity of a deep network,we can obtain hash codes that meet the designed similarity supervision.However,the reason caused to the vulnerability of deep hashing networks has not been well explored.Even though there are appealing methods being continued proposed,the risk of malicious attack is keeping in deep hashing networks.First,by studying the cross-modal data,we explore the correlation between components of different data,where the data is decoupled into modality-related and modality-unrelated components.Then,combining with the learning to adversarial examples,we design two similarity constraints to simultaneously maximize the similarity between original data and adversarial examples while minimizing its similarity to modality-related examples.As a result,modality-related components are disentangled from the original feature.Experimental results show that the robustness of a deep cross-modal hashing network can be effectively improved by training the network with the modality-related examples learned by our method.
Keywords/Search Tags:hash code, deep learning, generative adversarial network, cross-modal retrieval, adversarial attack
PDF Full Text Request
Related items