Font Size: a A A

A Study Of Large-scale Retrieval Based On Cross-modal Hashing

Posted on:2024-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1528307331973079Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Benefiting from the rapid development of information technology,various types of data such as images,text and video are generated in large quantities from different devices,which are collectively referred to as multi-modal data.Cross-modal retrieval can establish deep associations between different modalities in multi-modal data,enabling users to directly retrieve content from one modality to other modalities,extracting meaningful information from the chaotic and complex data,and helping to better analyse and process multi-modal data.However,large-scale multi-modal data is characterised by large data size,complex content structure,poor information completeness,semantic gaps between modalities,etc.To achieve faster retrieval speed and higher retrieval accuracy,numerous cross-modal retrieval methods have been designed.As an approximate nearest neighbour search method,hashing has the advantages of low storage consumption and high search efficiency and has received increasing attention and application in cross-modal retrieval tasks for large-scale multi-modal data,which has grown rapidly in recent years.Cross-modal hashing can establish the correlation of different modalities in multi-modal data,fully exploit the similarity structure in the data,and generate compact hash codes for the retrieval of different modal data.Some research progress has been made in cross-modal hashing,which can satisfy large-scale retrieval under a variety of conditions,but there are still problems in practical application scenarios: 1.Unsupervised data use local information to construct similar relationships,and global information is not used;2.Existing methods are difficult to retrieve when pairwise information of multi-modal data is incomplete;3.The influence of common nearest neighbours is ignored when constructing similar relationships between data;4.The similarity matrix cannot be updated during the hash learning process,and so on.The existing cross-modal hashing methods are unable to solve these problems well,making the generated hash codes of low quality,which will affect the performance of subsequent large-scale cross-modal retrieval.In this paper,we delve into cross-modal hashing and propose four cross-modal hash models for large-scale retrieval tasks,as follows.(1)An asymmetric cross-model hashing with the recovery of subspace(RSACH)is proposed.To address the problem of preserving potential global semantic information for multi-modal data in the absence of labelling information,a new hashing method is obtained by introducing an improved low-rank representation model into cross-modal learning.The data is transformed into the form of a low-rank representation to recover the subspace structure and obtain global similarity relations.The complex binary constraint problem of discrete graphs is better solved by an asymmetric strategy,which optimises the hash learning process and reduces quantization losses.Each part can be iterated separately,improving the hash learning speed,saving time and space overhead,and can obtain hash codes quickly.Experiments on the Wiki,MIRFlickr-25 K and NUS-WIDE datasets show that the RSACH method achieves better performance than existing methods in unsupervised cross-modal retrieval tasks.On the MIRFlickr-25 K dataset,an improvement of at least 3% is achieved compared to the comparison methods.(2)A semi-paired asymmetric deep cross-modal hashing(SADCH)is proposed.There is incomplete pairing information in multi-modal data,where only some of the data have pairing relationships between modalities and the pairing information of the rest of the data is missing.At present,this kind of data is less studied in the field of hash retrieval.A hashing method for semi-paired retrieval is constructed to obtain the latent structure of the common subspace using anchor graphs,enabling the association of paired and unpaired data,and maintaining the data semantic relationships.Deep learning methods are used to improve the acquisition of latent semantic information.To reduce the huge computational overhead of deep learning for large-scale cross-modal data,asymmetric learning methods are introduced to train the model quickly.A deep hashing architecture is designed to balance the requirements of each component and to generate high-quality binary hash codes.Experimental results on three datasets show that the SADCH method exhibits superior performance in both full-paired and semi-paired scenarios.On the NUS-WIDE dataset,a performance improvement of at least 2.7% is achieved.(3)A semi-paired semi-supervised deep cross-modal hashing(SPSDH)is proposed.A semi-paired semi-supervised hashing method is proposed to be able to process semi-paired data with a small amount of label information.Combining the label information with the anchor graph information better maintains the similarity structure of the data.To address the loss of neighbourhood information when calculating the cross-modal data distances,a higher-order affinity measure is used,which can combine local and non-local correlations to obtain more similarity information.Experimental results on three public datasets show that the SPSDH method outperforms several other representative comparative methods in a semi-supervised semi-paired environment.(4)An adaptive label correlation and similarity matrix cross-model hashing(ALSCH)is proposed.For supervised data,the existing hashing methods obtain similarity matrices based on label information,ignoring the structural relationships of multimodal data,which does not reflect the similarity information between samples well.ALSCH constructs two adaptive matrices,one to obtain the semantic association of the labels,and one to combine the data features and label relationships to construct the similarity matrix and maintain the consistency of the feature space and label space.The matrices can be updated adaptively during the hash learning process to obtain more effective similarity relations and improve retrieval performance.Experiments on three standard datasets show that the proposed ALSCH method outperforms several other representative benchmark hashing methods in terms of metrics such as average precision and PR curve.
Keywords/Search Tags:Machine learning, Cross-view learning, Hash learning, Semi-supervised Learning, Deep Learning, Multimedia retrieval, Large-scale data, Nearest neighbor retrieval
PDF Full Text Request
Related items