| Person Re-Identification(Re-ID)mainly addresses identifying and retrieving all results of similar images of a given query image across scenes(non-overlapping cameras).Before,convolutional neural networks have achieved many achievements in the field of person reidentification,including dealing with various challenges of person re-identification,such as occlusion,illumination changes,cross-domain,pose changes,etc.While Convolutional Neural Networks(CNNs)have achieved dominance in the field of person re-identification,Transformer-based approaches have emerged over the past two years with their strengths in computer visual processing of long sequences.In this paper,in order to strengthen the complementary advantages of Transformer and CNN in computer visual,a concise method combining convolution and Transformer is proposed to improve performance.The main work of this paper is as follows:(1)To address the problem of pose change and occlusion,this paper proposes a Poseguided Feature-enhancement Person Re-identification algorithm based on ViT named Coarsegrained Person Re-Identification Based on ViT and Pose Guidance.The method is divided into two branches,named Pose Estimation branch and ViT branch.On the Pose Estimation branch,Firstly,the common pose estimation network is used to extract additional pose information and heatmap,and then the pose features are obtained through the fully connected layer.On the ViT branch,we use the Vision Transformer to extract features,and use PM(Patch Mechanism)to block and stack the obtained features,and finally use the Feature Reinforcement Module to calculate the similarity of the stacked features,which aims to use the pose information to enhance features,and finally generate discriminative pose-guided features.Through the experimental results,it is verified that the features extracted by the method proposed in PoseGuide Feature Enhancement Module are more robust.However,through experiments,the experimental results on mAP will be lacking.This paper infers that only the pose is used to enhance the features obtained by the transformer is not suitable for the complex task of person re-identification.(2)In view of Transformer’s lack of channel information extraction,this paper proposes fine-grained person re-identification based on attention mechanism and Transformer.First,a convolutional network with channel attention mechanism is used to generate intra-channel and inter-channel features.The characteristics of the relationship,and a relationship-enhanced transformer layer is proposed,combined with the pose-guided features obtained by the method Pose-Guide Feature Enhancement Module,the Transformer is used to fuse the features and learn the relationship between the features,and the channel information is integrated into the final learned features,and integrate the two parts into a person re-identification model(Fusion Pose Guidance and Transformer Feature Enhancement,referred to as FPTE)that fuses pose guidance and Transformer feature enhancement,effectively increasing the relationship between features.Experiments show that the proposed methods display better results than related state-ofthe-art methods on two large-scale person re-identification benchmark datasets and one occlusion dataset. |