Research On Image Retrieval Method Based On Twins-SVT

Posted on:2022-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:A B Zeng

Full Text:PDF

GTID:2518306776992789

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

Image retrieval methods have been a research hotspot in the field of computer vision for a long time.Recently,Transformer models have achieved better performance than Convolutional Neural Network for image retrieval.However,there is still little research about Transformer for image retrieval.The potential of Transformer models for image retrieval has not been fully exploited.Therefore,based on the Twin-SVT model,one of the Transformer models,and the framework of deep metric learning,this paper studies the deep image retrieval method from the three perspectives of model structure,loss function,and retrieval process to improve retrieval accuracy.Firstly,this paper proposes an Attention-Enhanced Twins-SVT model.This model uses Attention-Enhanced Patch Embedding modules to replace the original Patch Embed-ding modules in Twins-SVT and improve the ability to extract local information.At the same time,this model uses the Generality-Aware Self-Attention module to learn the gen-erality of all images from the dataset and guide each image to produce more powerful image features.Experiments on the CUB200-2011 dataset and CARS196 dataset show that the Attention-Enhanced Twins-SVT model can achieve better retrieval accuracy than other Transformer models.Secondly,to train the Attention-Enhanced Twins-SVT model more effectively,this paper proposes a Patch Diversity-Threshold loss to train the model with contrastive loss.The Patch Diversity-Threshold loss is calculated by the sequence of patch token,which is generated from the fourth stage of the model.It can promote the diversity of the sequence of patch token and improve the expression ability of each token.Experiments show that the Patch Diversity-Threshold loss can effectively improve the retrieval accuracy with image feature of different dimensions,different ranking loss,and different Transform-ers models.It fully reflects the applicability and effectiveness of the Patch Diversity-Threshold loss.In addition,compared with some state-of-the-art methods since 2018,the Attention-Enhanced Twins-SVT model can achieve the highest retrieval accuracy through the training of the Patch Diversity-Threshold loss and contrastive loss,which fully reflects the effectiveness of the method in this paper.Finally,to further improve the retrieval accuracy,this paper proposes an image re-retrieve method based on the Attention-Enhanced Twins-SVT model.The query image and each image from the database extract the sequences of patch token through the model and generate efficient image features through global average pooling.Then,the query im-age calculates the similarity between it and each image from the database by pooled image features to rank and complete the initial retrieval.For each pooled image feature whose similarity is ranked in Top-k in the initial retrieval,the Look-at-Other attention module uses it and the sequence of patch token of the query image to generate the corresponding Lat O feature.The Lat O feature is used to calculate the similarity between it and the cor-responding pooled image feature to complete the reranking.Experiments show that the image re-retrieve method based on the Attention-Enhanced Twins-SVT model can effec-tively improve retrieval accuracy by sacrificing a small amount of retrieval efficiency.

Keywords/Search Tags:

Image Retrieval, Metric Learning, Transformer Model, Attention Mech-anism, Diversity Loss

PDF Full Text Request

Related items

1	Research And Application Of Image Content Understanding And Expression Method Based On Deep Learning
2	Research On Personalized Recommendation Methods Based On Deep Learning
3	Research And Application On Metric Learning Based On Attention Network
4	Research On Deep Metric Learning Image Retrieval Algorithm Based On Robust Loss And Enhanced Features
5	Streamlined Feature Representation For Content-based Image Retrieval
6	Metric Learning And Indexing For Large-Scale Image Retrieval
7	Deep Metric Learning For Cross-Modal Retrieval
8	Research On Sketch-Based Image Retrieval Using Deep Learning
9	Deep Metric Hashing Method For Large-scale Face Image Retrieval
10	Cross-modal Video Retrieval Algorithm Based On Multi-semantic Clues And Metric Learning