Attention-aware Deep Cross-modal Hashing

Posted on:2021-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:H L Yao

Full Text:PDF

GTID:2428330602983751

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

From production to life,from manufacturing to service,from industry to financial commerce,the door to the era of big data has opened and quietly changed this world.The Internet generates massive amounts of data every day and presents them in a variety of ways,such as text,image,video,audio,etc,which greatly enrich our lives.At the same time,the efficient storage and rapid retrieval of large-scale data have also been widely concerned as a very challenging task.Hashing methods can map high-dimensional features into compact low-dimensional binary hash codes,which have the characteristics of low storage consumption and fast retrieval speed,so they become important methods to solve the large-scale cross-modal retrieval tasks.With the rapid development of deep learning technology currently,its powerful feature extraction ability is used in multiple fields.The proposal of deep learn-ing makes up for the clumsy manual feature extraction and obtains the abstract and effi-cient feature representations.Therefore,many scholars have proposed the combination models of deep learning and cross-modal hashing to improve the retrieval effect.How-ever,many deep cross-modal hashing methods proposed in recent years still have some shortcomings:Generally,real-world data is imperfect and has more or less redundan-cy,making cross-modal retrieval task challenging.But most of the existing cross-modal hashing methods fail to deal with the redundancy,leading to unsatisfactory performance on such datasets.Taking images and text modalities as examples,many methods do not consider the richness of the image content in the original dataset.They just use the entire picture as the network input to extract features and then learn hash codes,which cannot focus on the key information of the image and makes redundant parts such as picture background affect the extraction of effective features;at the same time,for text annotation information,the original data contains a lot of noise interference,thus using such data may also affect the extraction of effective features;in addition,to improve the performance,many deep methods introduce some complex network models,such as generate adversarial networks,LSTM networks,etc.But the substantial increase in the number of parameters may lead to a substantial increase in time cost.In view of the above problems,we propose a new deep cross-modal hashing method-TEACH:aTtEntion-Aware deep Cross-modal Hashing,which could perform feature learning and hash-code learning simultaneously.It creatively proposes an attention-aware method,draws on the current popular attention methods in computer vision field.And we consider the ability of attention mechanism that can select a specific input(or feature)subset,so we introduce it into our cross-modal hash retrieval model.Specifi-cally,different attention modules are designed for samples of different modalities,so as to highlight the key parts and reduce the contribution of redundant interference terms in the retrieval task.What's more,in order to solve the problem that the training time in-creases significantly after the introduction of other complex mechanisms in some deep network models,this thesis completes the acquisition of two local attention maps in the pre-training stage.The classification time complexity of this step is O(n),which is far less than the training time of the deep hashing network model that use the similarity ma-trix as supervised information.At the same time,the simple parameters of classification network have greater advantages compared with other complex models such as generate adversarial networks.Therefore,compared with some simple deep cross-modal hashing methods,the training and retrieval time of TEACH does not increase greatly.In order to verify the effectiveness of our proposed model,multiple experiments has been conducted on three common benchmark datasets,i.e.,MIRFlickr-25K,NUS-WIDE and Wiki,while comparing it with the current effective cross-modal hashing retrieval models.It shows that TEACH is effective and practical.

Keywords/Search Tags:

Learning to hash, Cross-modal retrieval, Attention mechanism, Approximate nearest neighbor retrieval, Deep learning

PDF Full Text Request

Related items

1	Deep Label-based Hashing For Cross-modal Retrieval
2	Research On Supervised Discrete Hashing Method Based On Dictionary Learing For Cross-modal Retrieval
3	Research On Hashing Methods Of Approximate Nearest Neighbor Searching On Big Data
4	Research On Image Content Analysis And Retrieval Methods For Large Scale Data
5	Technology Research And System Realization On Cross-Media Data Retrieval Based On Hashing Learning
6	Research On Visual-Semantic Cross-Modal Retrieval Based Hashing Learning
7	Research On Hash Methods For Large Scale Cross-modal Retrieval
8	Internet Picture Face Retrieval System Based On Deep Learning
9	Deep Learning To Hash For Image Retrieval
10	Cross-modal Retrieval Based On Deep Model Learning