Font Size: a A A

Research On Some Problems Of Person Re-Identification

Posted on:2020-04-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LinFull Text:PDF
GTID:1368330623958170Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Person re-identification(re-id)refers to matching individual images of the same person captured by disjoint camera views,and it is an increasingly important direction in the research fields of computer vision,machine learning,and artificial intelligence.Its research achievements can be widely applied in intelligent video surveillance and security assurance.However,person re-id remains a challenging problem due to the existence of complex interference factors including illumination,viewpoint,pedestrian pose,occlusions and background clutter.This leads to dramatic appearance changes of the same person.Thus,there are still many challenges to be addressed in practical applications.(1)How to effectively reduce the negative impact of complex interference by improving matching process.(2)How to reduce the effect of occlusion and background clutter by focusing on a series of local salient regions in the process of comparing pedestrian images.(3)How to deal with the unconstrained spatial misalignment between image pairs due to view angle changes and pedestrian pose variations.(4)How to consider and realize a pure unsupervised person re-id problem,which dose not require any manual annotation for training data.In this dissertation,we study the above four issues and achieve some valuable results in terms of person re-id.The main contributions are summarized as follows:(1)To solve the problem of complex interference,a method of optimally organizing multiple similarity measures is proposed.First,a visual consistency measure(VCM)method is presented to estimate whether two pedestrian images are in similar visual conditions.Then,the global image and body part training set are grouped into three sub-classes based on their respective VCM results.Finally,the VCM-specific similarity measures of pedestrian as well as body part pairs are selected and are optimally organized to form an ensemble by the reliability estimation and adaptive weighted combination.Experiment results demonstrate that the method can improve the matching process by discriminatively handle image pairs with different visual conditions,and thus alleviating the effect of complex interference.(2)To make use of sequential salient regions when comparing image pairs,a method of recurrent models of visual concurrent attention(co-attention)is proposed.The method aims to simulate human eye movement.First,since reinforcement learning provides a flexible learning strategy for sequential decision-making,it is naturally applied to perform the temporal re-id co-attention learning task.Then,the recurrent neural networks(RNN)are used to extract information from a sequence of attention regions,and at each step the internal representations memorized by RNN can decide where and what to co-attend to in next step.The reward functions are designed to recursively optimize the prediction by rewarding or punishing the learning process.Finally,the joint features are used to learn the identification action and triplet ranking action by the reward functions.Experiment results show the method can dynamically attend to the optimal sequence of salient regions with respect to image pairs and increase robustness to occlusion and clutter background.(3)To solve the problem of spatial misalignment between image pairs,a method of recurrent matching network of spatial alignment learning is proposed,which subtly combines local feature learning and sequential spatial correspondence learning into an end-to-end framework.First,the sequential local features of image pairs are extracted through convolutional neural network and memorized by RNN.Then,a location network is designed to perform the sequential spatial region correspondence learning,where we can not only learn a location policy to decide where and what to attend to in one image at each time step,but can also adaptively locate the corresponding region in the other image through the local pairwise internal representation interactions.At last,the previous procedures are repeated several times and the internal representations at last step are input into the loss function for updating the network.Experiment results validate the effectiveness of the method against the local spatial misalignment problem.(4)To deal with the exhaustive manual identity labelling for increasing pedestrian data,a multi-level descriptors clustering model for unsupervised person re-id is proposed,where the deep feature learning and image clustering learning are jointly optimized on unlabeled images.First,the individual sample is regarded as a different cluster in the intial training,and the multi-level features are extracted from each sample using image cluster labels as supervisory signals.Then,the similar clusters are merged at each step of the agglomerative merging process,and the new cluster labels are used as supervisory signals for training convolutional neural network.At last,the deep feature learning and the image clustering learning are iterated until the model converges,and the logistic regression function is adopted to optimize the model.Experiments on the large-scale image and video datasets demonstrate the efficacy of the method for unsupervised person re-id task.In summary,this dissertation investigates the technical bottlenecks of person re-id.Starting from different aspect of challenges,this dissertation completes the researches on optimally organizing multiple similarity measures,adaptively attending to sequential salient regions,spatial alignment learning,and unsupervised learning based on clustering.Studies in this dissertation provide certain theory significance and application value.
Keywords/Search Tags:person re-identification, intelligent video surveillance, deep learning, multiple similarity measures, co-attention mechanism
PDF Full Text Request
Related items