Research On Image-text Cross-modal Hash Retrieval Based On Semantic Preservation And Attention Mechanism

Posted on:2024-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:J L Hong

Full Text:PDF

GTID:2568307178473944

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the exponential growth of network devices and the development of mobile networks,a large amount of multi-modal data has emerged on the Internet,and there may be semantic correlations among different data modalities.As the amount of data increases,traditional cross-modal hash retrieval methods may result in difficulty extracting effective features,thereby reducing retrieval performance.To address this issue,methods relying on deep cross-modal hash retrieval technology have recently received increasing attention.Currently,many scholars have utilized neural network models to extract features as a replacement for traditional methods and have made many effective works.Through the analysis of the current methods,some shortcomings are still found,such as the lack of accuracy in feature extraction of multi-modal data,differences in similarity measurements between multi-label and hash code generated by hash function,and how to balance the accuracy and efficiency of the model.To address these issues,in this paper,the following work is done:To address the issue of measurement differences between similarity coefficients in cross-modal multi-label retrieval,an interval parameter is used to correct the bias.Additionally,the Transformer structure,which has shown excellent performance in various computer vision and natural language processing tasks,is introduced into cross-modal hashing retrieval.A new supervised hashing method,called Deep Semantics Preserving Vision Transformer Hashing(DSPVTH),is proposed.This method maps different modal data into binary hash codes using network structures such as Vision Transformer,and maintains semantic correlations between different modal data using multi-label similarity relationships.The effectiveness and robustness of DSPVTH are demonstrated through validation on four classic multi-modal image-text datasets,with average precision values 2% to 8% higher than the current state-of-the-art methods.However,the DSPVTH method mentioned above has a large number of model parameters and computation requirements,which can affect its efficiency in cross-modal hashing retrieval,even though it achieves high experimental accuracy.Therefore,a lightweight model is used to extract features,with parameters and computation requirements controlled below the current benchmark level.In addition,to address the problem of soft label ignoring in cross-modal hashing by lightweight pre-trained models,a new intermediate-level optimal feature extraction module is constructed in the middle layer to integrate secondary but still important features into the hash representation process.Combining the interval parameter method used in the previous method,a new lightweight supervised hashing method called Lightweight Cross-modal Attention mechanism Hashing(LCAH)is proposed.This method can extract the features ignored in the middle layer,and the fused representation is better.It has a similar number of parameters as the current baseline method,but has lower computational complexity.The effectiveness of this method is verified on four classic benchmark datasets.

Keywords/Search Tags:

cross-modal hashing, semantics preserving, attention mechanism, supervised learning, deep learning

PDF Full Text Request

Related items

1	Research On Binary Hashing Method For Cross-modal Information Retrieva
2	Research Of Weakly-supervised Cross-modal Hashing Learning
3	Attention-aware Deep Cross-modal Hashing
4	Supervised Hierarchical Cross-modal Hashing
5	Research On Cross-modal Hashing Retrieval Based On Deep Feature Learning
6	Research On Cross-modal Retrieval Method Based On Deep Semantic Hashing
7	Research On Supervised Learning For Cross-modal Retrieval
8	Cross-modal Retrieval And Annotation Based On Hashing Learning Method
9	Cross-modal Retrieval Research Based On Correlation Analysis And Structure Preserving
10	Structure Preserving Cross-modal Retrieval Based On Deep Learning