Font Size: a A A

Research On Image Retrieval Method Based On Twins-SVT

Posted on:2022-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:A B ZengFull Text:PDF
GTID:2518306776992789Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Image retrieval methods have been a research hotspot in the field of computer vision for a long time.Recently,Transformer models have achieved better performance than Convolutional Neural Network for image retrieval.However,there is still little research about Transformer for image retrieval.The potential of Transformer models for image retrieval has not been fully exploited.Therefore,based on the Twin-SVT model,one of the Transformer models,and the framework of deep metric learning,this paper studies the deep image retrieval method from the three perspectives of model structure,loss function,and retrieval process to improve retrieval accuracy.Firstly,this paper proposes an Attention-Enhanced Twins-SVT model.This model uses Attention-Enhanced Patch Embedding modules to replace the original Patch Embed-ding modules in Twins-SVT and improve the ability to extract local information.At the same time,this model uses the Generality-Aware Self-Attention module to learn the gen-erality of all images from the dataset and guide each image to produce more powerful image features.Experiments on the CUB200-2011 dataset and CARS196 dataset show that the Attention-Enhanced Twins-SVT model can achieve better retrieval accuracy than other Transformer models.Secondly,to train the Attention-Enhanced Twins-SVT model more effectively,this paper proposes a Patch Diversity-Threshold loss to train the model with contrastive loss.The Patch Diversity-Threshold loss is calculated by the sequence of patch token,which is generated from the fourth stage of the model.It can promote the diversity of the sequence of patch token and improve the expression ability of each token.Experiments show that the Patch Diversity-Threshold loss can effectively improve the retrieval accuracy with image feature of different dimensions,different ranking loss,and different Transform-ers models.It fully reflects the applicability and effectiveness of the Patch Diversity-Threshold loss.In addition,compared with some state-of-the-art methods since 2018,the Attention-Enhanced Twins-SVT model can achieve the highest retrieval accuracy through the training of the Patch Diversity-Threshold loss and contrastive loss,which fully reflects the effectiveness of the method in this paper.Finally,to further improve the retrieval accuracy,this paper proposes an image re-retrieve method based on the Attention-Enhanced Twins-SVT model.The query image and each image from the database extract the sequences of patch token through the model and generate efficient image features through global average pooling.Then,the query im-age calculates the similarity between it and each image from the database by pooled image features to rank and complete the initial retrieval.For each pooled image feature whose similarity is ranked in Top-k in the initial retrieval,the Look-at-Other attention module uses it and the sequence of patch token of the query image to generate the corresponding Lat O feature.The Lat O feature is used to calculate the similarity between it and the cor-responding pooled image feature to complete the reranking.Experiments show that the image re-retrieve method based on the Attention-Enhanced Twins-SVT model can effec-tively improve retrieval accuracy by sacrificing a small amount of retrieval efficiency.
Keywords/Search Tags:Image Retrieval, Metric Learning, Transformer Model, Attention Mech-anism, Diversity Loss
PDF Full Text Request
Related items