Font Size: a A A

Research Of Multi-label Cross-modal Semantic Hashing Image-text Retrieval

Posted on:2023-02-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T ZouFull Text:PDF
GTID:1528307046954009Subject:Intelligent computing and complex systems
Abstract/Summary:PDF Full Text Request
With the development of the Internet and social networks,a large amount of multi-source heterogeneous multi-modal data exists on the network.There is an increasing need to use data from one modality to retrieve data from other modalities,and cross-modal retrieval is suitable for this scenario.When the amount of data is very large,how to achieve fast and accurate cross-modal retrieval is an intractable problem to be solved.With the advantages of low storage cost as well as fast similarity calculation of hash codes,and the powerful feature extraction capability of deep neural networks,in recent years,deep cross-modal hashing retrieval has received extensive attention from researchers.However,most existing deep crossmodal hashing methods simply define the semantic similarity of intra-modal and inter-modal paired instances as discrete 0(no common category)or 1(with common category),ignoring the fact that many cross-modal retrieval datasets and practical application data contain multi-label information,because many paired instances have common categories and also have their own different categories,it is impossible to accurately define the semantic similarity of paired instances in cross-modal retrieval,which will affect the subsequent hash mapping function learning and hash code learning,thereby leading to the suboptimal performance of the learned cross-modal retrieval algorithm.To solve this problem,in this thesis,we take two modalities(the image modality and the text modality)as example,and introduce multi-label learning in deep cross-modal hashing retrieval to accurately define the semantic similarity of instances,and further introduce effective multi-label semantic similarity protection strategies to improve the performance of deep cross-modal hashing retrieval.The main contributions in this thesis are as below:1.Most existing deep cross-modal hashing methods cannot utilize multi-labels to accurately calculate the semantic similarity of pairwise instances,which leads to the suboptimal performance of the learned deep cross-modal hashing model.To solve this problem,we propose a multi-label semantics preserving based deep cross-modal hashing(short for MLSPH)method.MLSPH firstly utilizes multi-labels of instances to calculate semantic similarity of the original data.Subsequently,MLSPH introduces a multi-label semantics preserving strategy based on the defined multi-label semantic similarity computation formula.Moreover,a memory bank mechanism is introduced to preserve the multiple labels semantic similarity constraints.Extensive experiments on several benchmark datasets reveal that the proposed MLSPH surpasses prominent baselines and reaches the state-of-the-art performance in the field of cross-modal hashing retrieval.2.The aforementioned MLSPH method has introduced the multi-label semantic similarity of paired instances,but it is difficult to optimize the multi-labeled semantic similarity of paired instances and the similarity of corresponding hash code.To deal with this problem,in this thesis,we propose a hierarchical semantic preserving based multi-label deep cross-modal hashing(dubbed HCMH)method.Concretely,firstly,HCMH introduces a multi-label based semantic similarity computation criterion which utilizes multi-labels to calculate the semantic similarity of cross-modal pairwise instances.Afterwards,with different ranges of multi-label semantic similarities,HCMH selectively utilizes the Jensen-Shannon divergence or the negative log likelihood loss to preserve semantic similarity unchanged during the real-valued hash representations generation procedure.Comprehensive experiments on three practical datasets verified that HCMH can achieve prominent performance.3.The aforementioned MLSPH method and HCMH method straightforwardly employ all modalities to learn hash functions but neglect the fact that original instances in all modalities may contain noise.To avoid the above weaknesses,in this thesis,a novel multi-label enhancement based self-supervised deep cross-modal hashing(MESDCH)approach is proposed.MESDCH first propose a multi-label semantic affinity preserving module,which uses Re LU transformation to unify the similarities of learned hash representations and the corresponding multi-label semantic affinity of original instances and defines a positive-constraint Kullback-Leibler loss function to preserve their similarity.Then this module is integrated into a self-supervised semantic generation module to further enhance the performance of deep cross-modal hashing.Extensive evaluation experiments on four well-known datasets demonstrate that the proposed MESDCH achieves optimal performance on cross-modal hashing retrieval.4.MESDCH method has utilized the multi-labels to supervise the learning of hash functions,nevertheless,the feature space of multi-labels suffers the weakness of sparse,resulting in suboptimal performance for the learned hash functions.Thus,this thesis further proposes a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing(MMACH)framework.Specifically,a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations,aiming to improve its representation capability.Moreover,a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances.To the best of our knowledge,the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval.The effectiveness and efficiency of MMACH method are demonstrated by experiments on four multi-label cross-modal retrieval datasets.
Keywords/Search Tags:Cross-modal semantic hashing retrieval, deep neural networks, multiple-label learning, self-supervised learning
PDF Full Text Request
Related items