Font Size: a A A

Research On Cross-Modality Person Re-Identification Technology Based On Fusion Of Global And Local Features

Posted on:2024-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:X F MaFull Text:PDF
GTID:2568306941459974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the face of the current multi-modality data constantly generated,how to retrieve or match the information required by users accurately and efficiently is a problem worth thinking about.Visible-infrared cross-modality person re-identification is a cross-modality retrieval task with strong practical significance,aiming at retrieving visible and infrared images under the same person identity,which is widely used in security,criminal investigation,smart city and other fields.The nature of the two modalities imaging principles is different,resulting in large cross-modality discrepancy between the visible and infrared modalities.In this paper,we explore the cross-modality person re-identification method to narrow the feature discrepancy between the two heterogeneous modalities,so that the model can focus more on the discriminative and modality-shared features.The global and local features of person images are also fused to improve the identification accuracy.In this paper,a contour-guided dual-granularity feature fusion network is designed.For cross-modality person re-identification,the key point is to make the model learn features that exist in both modalities.Features such as color and texture are unique to visible images and are not present in infrared images,so they cannot be used directly for identification and retrieval.Contour has some cross-modality invariance and is also a relatively reliable identification cue.In this paper,the contour image is introduced as an auxiliary modality,and the global feature representation of the contour is enhanced through the feature fusion of the contour image and the original person image.Considering the good identity discriminative property of local features to cope with problems such as occlusion and deformation,this paper further combines the features of both granularities to make the obtained person features have stronger discriminative ability.For the possible within-part inconsistency caused by the uniform partition method,the soft partition method is introduced to refine the local features.To obtain the complete spatial features,a feature enhanced Transformer network based on homogeneous middle modality is designed in this paper.The Vision Transformer models long-range dependencies through the self-attention mechanism,and because there is no downsampling operator such as pooling,it can learn complete spatial features of person images.We use an encoder-decoder structure to generate homogeneous middle modality images and jointly optimize three modality features in a unified feature space to reduce cross-modality variation.In this paper,we further introduce the sliding window strategy and fine-grained features in convolutional neural networks to improve the model and compensate the weakness of vanilla Transformer which lacks localization.The global features are also enhanced to make the model learn richer semantic features.Relevant experiments and visual analysis demonstrate the effectiveness of the two methods proposed in this paper.
Keywords/Search Tags:cross-modality, person re-identification, feature fusion, Transformer
PDF Full Text Request
Related items