| Person re-identification refers to finding specific pedestrians in images or video sequences captured by several different cameras,and its application scenarios are mainly in well-lit daytime scenes.However,in practical applications,many images or videos are captured by infrared cameras at night,and traditional person re-identification cannot solve such problems.Therefore,person re-identification begins to evolves into cross-modal person re-identification.The large difference between infrared images and visible light images makes the task challenging.Compared with traditional features,deep features have great advantages.This paper explores solutions to cross-modal problems based on deep learning methods,and proposes two deep learning-based cross-modal person re-identification network frameworks.The two modal features are unified in an end-to-end manner.The specific contents of the two cross-modal person re-identification methods proposed in this paper are as follows:(1)In the first method,in order to resolve the modal differences and learn more uniform features,a channel-based parititon network is proposed.First,in order to solve the problem of lack of discriminative information,a generator is constructed.The main function of the generator is to destroy the original spatial structure of the image,by randomly reorganizing the channels,the image style can be changed on the premise of maintaining the image content unchanged.The increased number of training samples helps the network to generalize and learn cross-modal features.Secondly,at the feature level,a feature partition layer based on channel is proposed.Through this layer,the original feature map will be evenly divided into several sections along the channel for local feature learning.Finally,at the end of the network,a feature converter trained with Cycle GAN is added,through which the visible light features can be converted into infrared features,and the modal difference will be further eliminated.(2)In the second method,two kinds of auxiliary information are introduced and a multi-stream network is established to help the network perform better cross-modal learning.The first kind of auxiliary information is the transition state.The transition state has some characteristics of the two modes at the same time,and the transition state is used as a bridge to help the two modes establish a better connection.In this paper,based on the visible light mode and the infrared mode,two different transition states are established,and the transition state is used to establish two triple losses and perform metric learning between the original modes.The second kind of auxiliary information is the contour information map.By establishing a shape information filter,the contour information map of the original image is obtained.The contour information is used as an auxiliary means and a branch is added to the basic network for input.Finally,the contour information features are combined with original feature fusion.Experiments on widely used dataset and comparisons with existing mainstream methods demonstrate the effectiveness of the proposed model.The two methods have been tried in different directions.The outstanding achievement of the first method is to innovate at the channel level that is easily overlooked.The overall framework is simple and efficient.The downside is the limited improvement of Cycle GAN-based feature transformers.The advantage of the second method is to use auxiliary information to guide network training,so that it can extract more unified modal features,and propose a new way of using Cycle GAN,which is much better than the feature converter of the first method.The disadvantage is that the overall model involves many modules,the network is more complex,and the training cost is high. |