| In recent years,the number of surveillance cameras grows rapidly.In hospitals,schools,stations,airports,and many other places,surveillance cameras have been widely deployed to ensure public security,which also brings us huge amounts of data to handle.The traditional manual approach no longer meets the demand,and intelligent video surveillance systems are created.Person Re-identification(ReID)is a very important part of an intelligent video surveil-lance system,which aims at retrieving all videoes/images of a specific person from the candi-date set.It can achieve quick person retrieval from the large-scale video database captured by a surveillance camera network and be integrated into other related tasks,such as person historical trajectory analysis,person tracking,and motion analysis.Furthermore,the image processing methods,network structures,and loss functions used in the ReID task can also be used in other research areas to improve the performance.Because of the great value both in research and in practice,ReID has attracted a lot of attention and has become an important research topic in the computer vision area.However,those complicated factors,such as pose variations,occlusion,illumination changes,and camera viewpoint differences,bring challenges to achieve fast and accurate ReID.This thesis conducts in-depth and systematic research on deep learning based ReID,investigating visible cross-camera ReID,as well as non-visible and single-camera ReID respectively.The main content and contributions of this thesis are summarized as follows:1.Joint holistic and partial person re-identification based on spatial-channel parallelism.Due to the spatial uncertainty of feature positions and occlusion,a network based on spatial-channel parallelism supervision is proposed.In the training phase,two branches are used to extract global features and local features respectively.The two branches supervise each other parallelly,and different channel parts of global features are forced to learn corresponding local features.The learned new local features extracted by the global branch,have a global view and can utilize the contextual information better.Furthermore,the corresponding local features can be located and extracted from the entire input,which can deal with the spatial uncertainty of feature positions caused by pose variations and camera viewpoint differences better.When the corresponding area is occluded,the new local features become the original global features.In the testing phase,only the global branch is used to improve network efficiency.This method has achieved pretty good performance on both holistic person re-identification datasets and partial person re-identification datasets with severe occlusion.2.Person re-identification based on hypersphere embedding in the feature space.To over-come the problem that the widely used cross-entropy classification loss lacks explicit constraint on the feature distribution in the feature space,a sphere embedding loss has been proposed,and a corresponding network is designed to map the input image on the surface of a hyperspace man-ifold in the feature space for the first time in ReID.In this way,the feature bias caused by the norm of feature vectors and the class bias caused by the norm of weight vectors of classification neurons have been eliminated,and the classification results only depend on the angles in the feature space without interfering of other factors,resulting in significant performance improve-ment on multiple public datasets.Besides,a learning rate warming up strategy is proposed,which can improve training results without modification of networks or losses.3.Visible-infrared cross-modality person re-identification.Visible cameras cannot work under dark conditions,so intelligent video surveillance systems introduce infrared cameras,and need to process both visible and infrared images.To solve the visible-infrared cross-modality ReID problem,(1)a visible-infrared cross-modality ReID method based on cross-spectrum dual-subspace pairing is proposed,by generating input images from multiple spectrum,forcing the network to discover shared cross-modality features existing in all spectrum;(2)a visible-infrared cross-modality ReID method based on modality transfer and dual-level unified repre-sentation is proposed.At the image level,image information is fused,while at the feature level,a hierarchical granularity triplet loss is designed.On datasets with short-wave near-infrared or long-wave far-infrared images,both methods have achieved state-of-the-art performance4.In-video person re-identification based on instance hard triplet loss.Existing methods are all designed for cross-camera person scenarios,without consideration of in-video person re-identification problem in a single camera.Thus,a new dataset is proposed to investigate the in-video person re-identification problem,and a ReID-Head network is designed to extract features of multiple persons within the same frame simultaneously.Finally,an instance hard triplet loss is designed,which can be applied in both cross-camera person re-identification task and in-video person re-identification task,with good flexibility and lower computational complexity.Experimental results demonstrate it achieves a better performance with less training time. |