Font Size: a A A

Person Search Algorithms Based On Convolutional Neural Networks

Posted on:2022-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:1488306755960059Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In visual surveillance systems,the most fundamental problems are 1)how to locate persons within images,and 2)how to determine,if a query person is present in a particular set of images,typically across different cameras.The above two problems are usually investigated as the two independent tasks of Pedestrian Detection and Person Re-identification(re-ID).However,in practical applications,it is favorable to solve them in a joint framework,not only for convenience and high efficiency,but also for better performance.The task of Person Search is introduced with the goal of retrieving a query person from a gallery of uncropped images captured by different cameras,which makes it a combination of Pedestrian Detection and Person Re-ID.This combination raises a major challenge,i.e.how to properly handle the relationship between detection and reID,especially when there are potential contradictory objectives.In this paper,we focus on this challenge and delve deep with extensive researches.Our works are summarized as follows:Firstly,a mask-guided two-stream convolutional neural network is introduced for Person Search.We find through experiments and visualizations that it is more reasonable to separate detector and re-ID feature extraction rather than sharing representations in a single joint model.We also summarize for the first time the contradictory relationship between detection and re-ID,i.e.detection captures human commonness while re-ID focuses on human uniqueness.Meanwhile,we also propose a simple yet effective re-ID method,which models foreground person and original image patches individually,and obtains enriched representations from two separate convolutional streams.From the experiments on two standard person search benchmarks,our method surpasses the state of the art by a large margin.Secondly,a keypoint message passing method is proposed for Person Search.As one of the performance bottlenecks of Person Search,improving re-ID accuracy is a straightforward way to improve person search methods.A feasible way to boost the performance is to enrich appearance features with temporal information,i.e.replacing static images with videos.Existing video-processing methods are mostly based on convolutional neural networks(CNNs),whose building blocks only process local neighbor pixels at a time.In this paper,we propose to capture the long-range dependencies with a human-oriented graph method.Specifically,features located at person joint keypoints are extracted and connected as a spatial-temporal graph.These keypoint features are then updated by passing message from their connected nodes with a graph convolutional network(GCN).During training,the GCN can be attached to any CNN-based person re-ID model to assist representation learning on feature maps,whilst it can be dropped after training for better inference speed.Our method brings significant improvement over the CNN-based baseline model.It also defines a new state-of-the-art method in terms of mean average precision in comparison to prior works.Thirdly,a hierarchical online instance matching method is proposed for end-to-end Person Search.Separating detection and re-ID could avoid the objective contradiction problem and yields better performance,but it neglects the inter-dependency between pedestrian detection and re-ID and incur a higher computation cost.To reduce the computation cost for real-world applications,we focus on an end-to-end solution for Person Search and propose a Hierarchical Online Instance Matching(HOIM)loss which exploits the hierarchical relationship between detection and re-ID to guide the learning of our network.Our novel HOIM loss function harmonizes the objectives of the two sub-tasks and encourages better feature learning.In addition,we improve the loss update policy by introducing Selective Memory Refreshment(SMR)for unlabeled persons,which takes advantage of the potential discrimination power of unlabeled data.From the experiments on two standard person search benchmarks,we achieve state-of-the-art performance,which justifies the effectiveness of our proposed HOIM loss on learning robust features.Finally,a norm-aware embedding method is proposed for efficient Person Search.Pedestrian Detection and Person re-ID are tightly entangled in Person Search,although their objectives are contradictory.Therefore,it is crucial to reconcile the relationship between detection and re-ID in a joint person search model.The Hierarchical Online Instance Matching method proposed above is a successful attempt,but it also comes with a flaw,i.e.adding extra computations incurred by a large-scale matrix multiplication.To this end,we present a better approach called Norm-Aware Embedding to disentangle the person embedding into norm and angle for detection and re-ID respectively,allowing for both effective and efficient multi-task training.Norm and angle are the two basic geometric attributes of a vector with an orthogonal relationship,fitting the relationship of detection and re-ID in a nice manner.We further extend the proposal-level person embedding to pixel-level,whose discrimination ability is less affected by misalignment.Our Norm-Aware Embedding achieves remarkable performance on both person search and multiple person tracking benchmarks,with the merit of being easy to train and resource-friendly,running at a near real-time frame rate.In summery,the first two works propose to avoid the contradiction between detection and re-ID by addressing them in two independent models.Meanwhile,additional information,e.g.background,temporal and pose keypoints,is used to improve the re-ID performance.The last two works directly face the task contradiction problem,which is alleviated by optimizing the loss functions and feature representations.Therefore,the efficiency of person search models are drastically improved by using joint models.
Keywords/Search Tags:Person Search, Pedestrian Detection, Person Re-Identification, Multi-task Learning, Convolutional Neural Network, Graph Convolutional Network
PDF Full Text Request
Related items