Font Size: a A A

A Study On The Feature Extraction For Deep Learning-based Person Re-identification

Posted on:2022-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:K WangFull Text:PDF
GTID:1488306569970259Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Person re-identification(Re ID)aims at retrieving pedestrian images belonging to the same identity across non-overlapping camera view.Due to its broad range of potential applications,it has become a hot research topic.However,because of the variance of camera resolution,complex background,changes of pedestrian pose and camera views,errors of pedestrian detection,it remains a challenge task.This dissertation develops several models that target on the feature extraction task of Re ID.The details of this dissertation are as follows:(1)We propose a convolutional deformable part model to extract semantically aligned part-level representation for Re ID.Different with exiting methods,this method proposes a“divide-and-conquer” idea to perform part alignment from vertical and horizontal directions.Specifically,this method works by decoupling the complex part alignment procedure into two sequential steps: first,a vertical alignment step detects each body part in the vertical direction;second,a horizontal refinement step based on attention suppresses the background information around each body part by refining the boundary of body parts.The vertical alignment module and horizontal refinement module work cooperatively to perform part alignment,encourage the model to extract high-quality part feature.Extensive evaluation on three popular datasets demonstrate the effectiveness and superiority of the proposed model.For example,it achieves 95.9% and 87.2% on the Market-1501 dataset for the Rank-1accuracy and m AP,respectively.(2)We propose a multi-task part-aware model by designing a multi-task framework to extract semantically aligned part-level representation.During training,input for the main task is the holistic pedestrian image,while that for auxiliary task is the part-relevant prior.After that,this model transfers the part-relevant concept from auxiliary task to main task by imposing two constraints,i.e.,parameter space alignment and feature space alignment,between two tasks.The former restrains that the corresponding parameter between two tasks to be consistent,and the latter optimizes the features extracted by two tasks to be similar.During test,the main task can independently and directly extract part feature from holistic image,making it easier to be deployed on real-world application.Systematic experiments on four large-scale Re ID datasets demonstrate that this model outperforms state-of-the-art approaches by significant margins.For example,it achieves 96.4% and 90.1% on the Market-1501 dataset for the Rank-1 accuracy and m AP,respectively.(3)We propose a batch coherence driven network to extract semantically aligned part-level representation for robust Re ID.Compared with existing methods,the most impressive innovation of this method is that it thoroughly bypasses body part detection during both the training and testing phases.First,this model designs a batch coherence driven channel attention module to highlight the relevant channels for each respective part by investigating the correspondence between channel and body parts using a batch of training images.Second,based on the semantic consistency between batches,this model uses a pair of spatial regularization terms.The part-level regularization term regularizes the high responses of model for each part to constrain it within a predefined area.The holistic regularization term constrains the aggregation of model's responses for all parts covering the entire human body.The above channel attention module and spatial regularization terms encourage model to learn semantically aligned part feature.Extensive experimental results demonstrate the effectiveness and superiority of this method.For example,it achieves 96.2% and 89.5% on the Market-1501 dataset for the Rank-1 accuracy and m AP,respectively.(4)We propose a context sensing representation learning model to extract high-quality video representation for video based Re ID.This method improves both the frame feature extraction and temporal aggregation steps.First,a context sensing channel attention module is introduced to emphasizes responses from informative channels for each frame,by using the information from both individual frame and the overall video sequence.Therefore,this module explores both the individuality of each frame and the global context of sequence.Second,a context sensing feature aggregation module is designed to effectively aggregate the frame features into video feature by predicts frame weights for temporal aggregation.Here,the weight for each frame is determined in a contrastive manner: i.e.,not only by the quality of each individual frame,but also by the average quality of the other frames in a sequence.Therefore,it effectively promotes the contribution of relatively good frames.Extensive experimental results on four datasets show that this method consistently achieves state-of-the-art performance.For example,it achieves 90.4% and 84.5% on the MARS dataset for the Rank-1 accuracy and m AP,respectively.
Keywords/Search Tags:Person re-identification, Deep learning, Feature extraction, Part model
PDF Full Text Request
Related items