Research On Image Retrieval Methods Based On Vision Transformer

Posted on:2024-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:B N Xiu

Full Text:PDF

GTID:2568306917996999

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Image Retrieval(IR)is an important research task in the field of computer vision.In recent years,with the development of technology,more challenging subtasks have been proposed and attracted more and more attention,such as Fine-grained Image Retrieval(FGIR)and person Re-identification(ReID).In the two subtasks,models based on Convolutional Neural Network(CNN)have achieved an impressive performance.With the help of CNN,these methods can make full use of global features of images.However,for FGIR and ReID tasks,local features also play a very important role in the retrieval process.More recently,Vision Transformer(ViT)based approaches have achieved great success in the area of traditional image analysis,which is attributed to the natural advantage of ViT in capturing important regions and focusing on fine-grained features in an image.However,how to apply ViT to these more challenging tasks requires further exploration.Therefore,for FGIR and ReID tasks,this thesis carries out research works based on ViT.Thereinto,the author firstly uses ViT as backbone,and proposes a fine-grained image retrieval method based on ViT to make better use of the local features of the image.In this method,the author designs a Local Aligned Loss(LAL)to dynamically calculate the minimum distance between the paired regions of two images,so as to align the local regions of two images.Further,the distance between the two images can be calculated accurately,so that the similarity between them can be better measured.In this way,the discriminative regions of the image can be captured effectively,and the finegrained features contained in the image can be better utilized.At the same time,a twicesorting approach is introduced in this method,which not only improves the efficiency of retrieval,but also guarantees the accuracy of retrieval results to the greatest extent.On this basis,to both utilize the global and local fine-grained information of images,the author introduces a novel hybrid ViT framework for fine-grained image retrieval and further explore how to play the joint role of CNN and ViT adequately in this architecture.Specifically,in this method,the author proposes a Critical Patches Reanalysis(CPReA)module,which uses CNN to guide the selection of critical patches in ViT so that more representative global features can be generated.In addition,the author designs a Cross Network Feature Fusion(CNFF)module to integrate the features of ViT and CNN effectively,so that the output features are more informative.Meanwhile,the author proposes a Global-Local Aligned Loss(GLAL)function to enhance LAL and to better measure the similarity between two images.In order to verify the generalization ability of the proposed hybrid ViT architecture on different tasks,the author proposes a hybrid ViT framework for person Reidentification,and tests its performance ability on the ReID task.In this method,the author designs a Hierarchical Feature Fusion(HiFF)module to make full use of the image features generated by intermediate layers of CNN and ViT.By using this module,the final features used for retrieval can contain richer coarse-grained and fine-grained information.Moreover,a Self-supervised Optimization Ranking(SSOR)module is introduced to further improve the retrieval efficiency and accuracy of the model.To evaluate the proposed methods,the author conducts extensive comparative and ablationexperiments on two typical fine-grained datasets(CUB-200-2011 and Cars-196)and two typical person ReID datasets(DukeMTMC and MSMT17).The results demonstrate the effectiveness of the proposed methods.

Keywords/Search Tags:

Fine-grained Image Retrieval, Person Re-identification, Vision Transformer, Convolutional Neural Network

PDF Full Text Request

Related items

1	Research On Convolutional Neural Network Based Algorithm For Person Re-identification
2	Analysis And Research Of Key Technologies For Fine-grained Image Recognition Based On Convolutional Neural Networks
3	Research On Pedestrian Fine-grained Recognition And Re-identification Technology
4	Research On Fine-grained Image Classification Algorithm Based On Attention Guidance
5	Fine-grained Image Recognition Based On Deformable Transformer And Multi-Scale Attention
6	Fine-grained Image Retrieval Based On Deep Convolutional Feature Aggregation
7	Research On Fine-grained Image Classification Based On Multi-branch Attention And Fused Multi-level Features
8	Research On Fine-Grained Image Analysis Based On Machine Learning
9	Research On Fine-grained Image Classification Based On Deep Convolutional Neural Network
10	Research On Fine Grained Image Recognition Method Based On Visual Transformer And Data Optimization