Font Size: a A A

End-to-end Person Search

Posted on:2024-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2568307067494454Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The most important function of an intelligent visual surveillance system is to ”locate” and ”identify” pedestrians,where ”locate” and ”identify” correspond to the tasks of”Pedestrian Detection” and ”Person Re-identification”(re-ID)respectively.However,Pedestrian Detection only focuses on the locations of pedestrians,while re-ID only pays attention to the identities of pedestrians.Therefore,these two independent tasks are not enough to meet the needs of surveillance systems in practical applications.To eliminate the gap between the research and the practical application of these two tasks,the task of Person Search is introduced and attracts a lot of attention in the computer vision community.Person Search aims to locate and identify the query person from a gallery of real and uncropped images.Existing person search methods can be generally divided into two-step and one-step.One-step methods,also known as End-to-End methods,enable end-to-end joint training of detection and re-ID in a unified framework.In this paper,we focus on End-to-End Person Search.Our work is summarized as follows:(1)Attributes such as the the color and style of clothes have been widely used in the re-ID task.But since two public person search datasets have no attribute annotations,there are no methods that utilize person attributes for person search.Motivated by the above observations,we propose an attribute-based representation person search(Attr PS)model,and annotate attributes labels on the person search datasets.Attr PS consists of a detection network,an attribute recognition network(Attr Net)and a re-ID head.Firstly,the detection network predicts accurate bounding boxes.These results are used to obtain the basic features of pedestrians.These features and pedestrian segmentation masks are sent to the Attr Net to extract fine-grained features of multi-attributes.These features are concatenated to re-ID embeddings.The attribute labels are used to supervise the Attr Net.The proposed method is experimentally verified on the CUSK-SYSU dataset,and 91.3% m AP is achieved.(2)Existing person search models,including two-step and one-step methods,are mainly implemented based on CNN,which demonstrates the advantages of CNN on this task.In addition,Transformer has achieved great success in pedestrian detection and re-ID,demonstrating its ability to solve other vision tasks,but there are few related transformer models for person search.Therefore,to take advantage of the respective merits of CNN and Transformer,we propose two methods that combine CNN and Transformer in one framework,i.e.,combining CNN-based detector and Transformer-based re-identifier(CNN-TR)for person search,and combining Transformer-based detector and CNN-based re-identifier(TR-CNN)for person search.With these designs,we fully explore the capability of Transformer in the task of person search.Both methods are validated on the CUSK-SYSU dataset,and CNN-TR achieves 92.9% m AP.(3)Person detection aims to distinguish persons from the background and other objects,therefore its objective is to find the commonness of persons.Person re-ID aims to identify and distinguish different persons,therefore its objective is to find the uniqueness of each person.The contradictory objective is a big challenge for one-stage person search methods.To relieve the contradictory problem,we propose a Sequential Transformer(Seq TR)model to solve detection and re-ID sequentially.The sequential framework decouples different feature maps used for two sub-tasks,reducing their mutual influence during training.In addition,we design a novel re-ID Transformer which not only utilizes the context information in the same scene but also adaptively assigns attention weights to fine-grained features of pedestrian appearance.To solve scale variations of pedestrians,our re-ID Transformer utilizes multi-scale feature maps to generate scale-invariant re-ID embeddings.Seq TR outperforms all person search methods in the same period with a 59.3% m AP on PRW dataset.
Keywords/Search Tags:Person Search, Pedestrian detection, Person re-identification, Attribute recognition, Transformer
PDF Full Text Request
Related items