| Person re-identification technology refers to the technology that uses computer vision technology to judge whether there is a specific pedestrian in the video sequence or image.It is widely regarded as a sub-problem of image retrieval.It is through a given monitoring person image,and then cross-device retrieval person image.However,since the camera system will automatically switch from the visible state to the infrared state in the dark condition,the Person re-identification technology based on RGB-RGB cannot meet the requirements,so the research on cross-modal Person re-identification based on IR-RGB has attracted more and more attention.Most of the existing cross-modal Person re-identification methods focus on the extraction of the common features of the two modes but ignore the extraction of the unique feature information of the two modes.In this thesis,a hybrid learning network is proposed to solve the learning problem of the unique characteristics of the modes through the mixed learning of single-mode and crossmode branches.At the same time,the graph convolution attention mechanism and granularity feature learning are used to enrich the input information of different branch classifiers to improve the feature discriminability.Experiments were performed on the cross-modal dataset SYSUâMM01,and the experimental results show the effectiveness of the proposed method and related modules.The main innovation points of this thesis are as follows:(1)A cross-modal Person re-identification method based on hybrid learning network model is designed.The hybrid learning network proposed in this thesis not only learns the shared features of the two modes,but also learns the unique features of the two modes,extracts the feature information from the two aspects,and extracts the effective features from the two modes in a more comprehensive way.(2)Based on the hybrid learning network model,a graph convolution attention mechanism is proposed to deal with modal differences.To solve the problem of differences between the two modal feature representations extracted from the network model,this thesis adds graph convolution attention to the shared branch of the two-flow weight part,so that it aggregates crossmodal features with the same id,focuses on reducing the differences between modes,and does not discriminate feature information more by the appearance or structure of the person.(3)Based on the hybrid learning network model,this thesis proposes to enrich the input information of classifier by using a grain-size feature module,to improve the discriminability of features.To solve the problem that the feature representation extracted from the network model is not comprehensive enough,this thesis adds the grain-size feature module in the double-flow weight unshared branch,uses certain ways to extract the coarse and fine-grained features,and performs block processing on the features to match the feature information of different parts of the human body,and then carries out independent classification operations on each block part,so as to improve the feature representation. |