Font Size: a A A

Research On Convolutional Neural Network Based Algorithm For Person Re-identification

Posted on:2019-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ShenFull Text:PDF
GTID:1368330572488003Subject:Electronic information technology and instrumentation
Abstract/Summary:PDF Full Text Request
Person re-identification(re-ID)is an emerging research topic in the field of computer vision.It aims at accurately matching images of a person of interest across multiple disjoint camera views base on the appearance characteristics of pedestrians.Person re-ID technology can be widely used in video surveillance scenarios,such as intelligent security,intelligent transportation,and smart shopping.It has very important scientific research value and practical application significance,so it has received more and more attention from the computer vision community.In recent years,a large number of researchers have introduced convolutional neural network(CNN)-based deep learning algorithms into the person re-ID problem.This kind of deep learning-based methods can learn more robust and more discriminative feature embedding through a "feature extraction+loss function optimization" procedure in an end-to-end fashion.In this way,they partially solve some problems that traditional methods can't solve very well,and achieve a great improvement in person re-ID performance.However,the person re-ID algorithm based on the general CNN framework still faces some difficulties,such as being less sensitive to the subtle local regions with strong discriminative features.Therefore,this dissertation focuses on person re-ID,which is a popular and challenging research topic of significant research and application value,and proposes better feature embedding learning algorithms based on today's popular CNN technology from three different perspectives of employing multi-level similarity perception constraints,utilizing strong neural activations on high-level convolutional layer feature maps and constructing sampling-based sharp attention mechanism.These all three algorithms are around the same core theme-"to make CNNs more significant Focusing on the highly discriminative local detail features",and have important theoretical research significance and engineering practical value.Specifically,the main contents and contributions of the three research works of this dissertation are as follows:Firstly,this dissertation presents a novel person re-ID algorithm based on deep Siamese network architecture and multi-level similarity perception.According to the distinct characteristics of diverse feature maps,different similarity constraints are effectively applied to both low-level and high-level feature maps,during training stage.Due to the introduction of appropriate similarity comparison mechanisms at different levels,the proposed approach can adaptively learn discriminative local and global feature representations respectively,while the former is more sensitive in localizing part-level prominent patterns relevant to re-identifying people across cameras.In addition,the approach has two other benefits.First,a multi-task learning architecture is employed to simultaneously optimize classification and similarity constraints.Multi-task learning framework can impose knowledge sharing while solving multiple correlated tasks,incorporating both of their merits.Second,because the similarity comparison information has been encoded in the learnable parameters of the network,the algorithm does not require the time inefficient procedure of pairwise input at test time.Therefore,compared with the traditional Siamese network-based methods,the algorithm is more efficient and can extract image features to build index in advance,which is essential for large-scale real-world application scenarios.The experimental results on multiple challenging benchmarks show that the method achieves better performance than other state-of-the-art methods at the time.Secondly,this dissertation proposes a person re-ID algorithm for unsupervised extraction and utilization of strong neural activations on the highest level convolutional layer feature maps.Through careful observation and experimental verification,the strong neural activation regions extracted by the algorithm can be used to represent local subtle features with abstract semantic information,and the extraction method is unsupervised and does not need to use additional supervision.Furthermore,a deep feature embedding model simultaneously encoding original global information and discriminative local features is proposed.This feature embedding can effectively enlarge the gap between the inter-class variance and the intra-class variance,thus significantly improving the retrieval performance.This method is not only suitable for person re-ID,but also for a wider range of fine-grained retrieval problems.The experimental results demonstrate that the proposed method is superior to other state-of-the-art methods at the time in both fine-grained retrieval tasks and person re-ID tasks.Finally,this dissertation presents an innovative person re-ID algorithm based on sharp attention mechanism.The sharp attention mechanism can obtain attention masks by adaptively sampling feature maps from CNNs.Due to the introduction of sampling-based attention models,the proposed approach can adaptively generate sharper attention-aware feature masks.This greatly differs from the gating-based attention mechanism that relies soft gating functions to select the relevant features for person re-ID.Soft attention networks usually use the Sigmoid function to smooth the mask values to[0,1].Soft attention masks obtained through this process have large semantic uncertainty.In contrast,the proposed sampling-based attention mechanism allows us to effectively trim irrelevant features by enforcing the resultant feature masks to focus on the most discriminative features(i.e.,the attention mask value is close to either 0 or 1).It can produce sharper attentions that are more assertive in localizing subtle features relevant to re-identifying people across cameras,with no attention ambiguity.For this purpose,a differentiable Gumbel-Softmax sampler is employed to approximate the Bernoulli sampling to train the sharp attention networks in an end-to-end fashion through backpropagation.Extensive experimental evaluations demonstrate the superiority of this new sharp attention model for person re-ID over the baseline and other related methods on several challenging large-scale person re-ID datasets.
Keywords/Search Tags:person re-identification, fine-grained retrieval, convolutional neural network, multi-level similarity perception, strong neural activations, adaptive sampling, sharp attention
PDF Full Text Request
Related items