Font Size: a A A

Multi-feature ViT Person Re-identification Method With Fused CNN Attention

Posted on:2024-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:P T LiFull Text:PDF
GTID:2568307139956129Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Person Re-Identification(ReID)is a typical computer vision problem that involves quickly and accurately identifying a target human who appears in one camera’s field of view and then appears in another camera’s field of view.With the continuous development of monitoring networks and deep learning technology,ReID has become an increasingly important issue for researchers.ReID technology can help smart cities collect and analyze data effectively,improve traffic management levels,improve urban security situations,help researchers obtain human information more accurately and quickly,strengthen tracking and analysis of human trajectories,provide technical support for social tracking and management,and enhance data security.Using deep learning technology can solve key technical problems to some extent in ReID.According to the identity characteristics of human,matching human can be searched.Based on the currently publicly available ReID datasets,deep learning models have achieved unprecedented results.In particular,the performance of deep learning models greatly exceeds that of manual identification due to effective feature extraction networks such as Convolutional Neural Networks and Vision Transformer architectures(ViT).However,for complex scenarios,various models still have a lot of room for improvement.This article focuses on supervised deep learning methods for ReID feature learning and constructs a multi-feature ViT model with convolutional attention fusion that has certain performance improvements for more complex scenarios.The main research contents are as follows:(1)Person Re-Identification based on ViTWith the help of Convolutional Neural Networks(CNN),vision models can obtain feature vectors with strong expression.ReID based on CNNs occupies half of the supervised learning field.However,some inherent problems with CNNs have led to a bottleneck stage in vision models.Whether based on local features or attention-based models,there is still a lot of room for improvement in complex scenarios.Inspired by Trans ReID,many researchers have studied ViT,making ViT popular in the ReID field.This article further studies ViT architecture.(2)Person Re-Identification method with fusion of convolutional attention and ViT architectureResearch shows that ViT-based models have certain shortcomings in extracting local details.In order to further improve the performance of ReID in complex scenarios,a method that fuses convolutional attention and ViT is proposed to enhance ViT’s attention to local detail information.This method mainly embeds convolutional spatial attention and channel attention into the ViT architecture to respectively strengthen the focus on important areas in the image and important channel features.The experimental results show that the proposed model has lightweight characteristics and has greatly improved accuracy in complex scenarios.(3)Multi-Feature Training ModuleReID will cause the model to extract image features that cannot fully generalize human information due to factors such as lighting and changes in human posture.Considering that multiple features in deep learning can correspond to the same label at the same time,this chapter designs a multi-feature training module for ViT architecture.On the one hand,the attention mechanism module generates attention global features that jointly calculate identity loss and triplet loss with original global features to enable the model to reduce bias issues caused by attention features and omit secondary information.On the other hand,inspired by local feature learning methods that divide images horizontally,a horizontal partition model is designed to form a local branch.Finally,global features,attention global features,and multiple local features are used to calculate multiple cross-entropy losses and hard sample triplet losses to optimize the ViT model.The experimental results show that this method improves the recognition accuracy of ReID.
Keywords/Search Tags:Person Re-identification, deep learning, attention mechanism, Transformer, multi-feature training
PDF Full Text Request
Related items