Person re-identification research tasks mainly focus on improving the limitations brought by complex changes in viewpoint,pose,occlusion,and lighting in a single modality.However,in scenes with insufficient lighting conditions,it is not enough to study only visible light images.Therefore,the research on cross-modal person re-identification has gradually attracted the attention of scholars.Since there are huge differences between the two modalities,how to discover the common characteristics of the two modalities and minimize the difference in the characteristics of the same pedestrian in different modalities,so as to establish the relationship between the two Connectivity is now a major challenge.In response to the above problems,this paper proposes to use person attributes as auxiliary information for the visible-infrared cross-modal pedestrian re-identification task.The main research contents and innovations are as follows:(1)A multi-level feature fusion network assisted by person attributes is proposed.Firstly,attribute labeling work was carried out for the cross-modal person re-identification dataset SYSU-MM01.Character attributes belong to the middle and high-level semantic information,and the antiinterference ability is relatively good,and even in different modalities,some appearance attributes of the same pedestrian will not change.12 uniformly distributed ID-level attributes shared by visible light images and infrared images are selected to annotate the dataset,and the attribute information is used to assist cross-modal pedestrian re-identification tasks.At the same time,because shallow features have higher resolution than deep features and contain more detailed information such as location,but are not as rich in semantic information as deep features,features at different levels have different expressive capabilities and semantic information.In order to better combine the advantages of multi-level features and improve the representation ability and discrimination of features,this paper proposes a discussion on multi-level feature fusion,and performs feature fusion on the output features of the last three stages of Res Net50 according to the fusion scheme.Find the best fusion solution.Experimental results prove that the introduction of attribute information and the fusion of features at different levels can enhance the generalization ability and robustness of the model.(2)A feature learning network based on global-local feature guidance is proposed.In order to give full play to the role of attributes in paying attention to more effective local details,this paper divides the marked attributes into global attributes and local attributes according to the global-local idea,and further divides the local attributes according to the human body area from top to bottom,using the dual-stream SE-Res Net50 as the backbone network,a global feature learning branch and a local feature learning branch are constructed to learn global features and local features respectively.At the same time,softmax loss,triplet loss and heterogeneity center loss are introduced to reasonably design global loss and local loss to better optimize the training process of the model.Experiments were carried out on the dataset SYSU-MM01 to verify the effectiveness of the global-local division of attributes.This method can learn more discriminative pedestrian feature expressions,which effectively improves the recognition accuracy.(3)A dual-stream multi-branch network based on the idea of multi-granularity is proposed.From the perspective of multi-granularity,this paper still uses the attribute information marked above as the data basis,and integrates the output features of different levels of the dual-stream Res Net50 into corresponding levels of features to construct three different levels of feature learning branches.At the same time,the attributes Labels are divided by low,medium and high granularity information to match the multi-level feature learning branch.By designing a reasonable loss function for each branch and assigning appropriate weight parameters to balance the losses of each branch to obtain the optimal total loss.The experimental results show that,based on the idea of multi-granularity,it is effective to match the output features of different levels with the attribute information of different granularities to seek more shared information across modalities and make up for the differences between heterogeneous modalities.To sum up,the three research contents proposed in this paper are based on the assistance of character attributes.In order to better exert the effectiveness of attributes,the attributes are divided based on the idea of global-local and multi-granularity,respectively.Different network models and optimization ideas are proposed to improve the accuracy,generalization ability and robustness of cross-modal person re-identification tasks.These methods have been experimentally verified on the SYSU-MM01 dataset,and have achieved certain improvements and optimization effects. |