Font Size: a A A

Video-based Person Representation Learning And Feature Aggregation

Posted on:2022-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y HeFull Text:PDF
GTID:1488306608477394Subject:Physics
Abstract/Summary:PDF Full Text Request
This thesis focuses on video-based person re-id task.Unlike image-based person re-id algorithms,the core problem of video-based person re-id tasks lies in how to aggregate image features and characterize person video sequences.In addition,this thesis also studies unsupervised person re-id task,whose core problem is how to learn ID-level person representations in the absence of manual annotations.On top of deep learning techniques,this thesis proposes three methods for videobased person re-id and unsupervised person re-id,respectively.The main contributions include:First,this thesis proposes a video characterization method for person re-id based on bidirectional recurrent neural network(BRNN).Through the joint learning of convolutional neural network(CNN)and BRNN,the proposed method learns representations for person video sequences in an end-to-end manner.The CNN acts as an extractor of the spatial domain features and provides the spatial domain features for BRNN.BRNN simultaneously parses the temporal domain dynamic information in the person video sequences in both sequential and inverse temporal forms,captures the spatio-temporal cues in the video sequences,and constructs complete representations of persons.Experimental results on several datasets demonstrate the effectiveness of the method.Moreover,this thesis proposes a reinforcement learning-based feature aggregation algorithm to aggregate the image features in video sequences.Unlike some existing feature fusion methods based on temporal models,this method trains an agent to judge the quality of a video frame and makes decisions on whether to fuse the feature.If the features of the frame introduce more noise than the positive effect,the agent will discard the frame during the feature aggregation process.Through this process,the agent retains the most valuable frames in the video sequence and obtains a higher quality video representation of the person.On some datasets,this method effectively improves the performance of person re-id.This thesis also proposes a person re-id method based on a spatial-temporal attention mechanism,which aims to extract the most representative local regions in video sequences from a global perspective.Specifically,the proposed spatial-temporal attention mechanism expands the person video from both the temporal and spatial domains and evaluates all regions of each frame.Compared with some other methods using attention mechanisms,this method considers the consistency of both the temporal and spatial domains.In other words,the target subject often has some important regions in the image that change continuously over time.On several video-based person re-id datasets,this method achieves better results than other state-of-the-art methods.In final,this thesis proposes an unsupervised representation learning method for person re-id tasks to address the problem of high cost of data annotation.The method iteratively updates the network perception of the data distribution through clustering and contrastive learning,and learns ID-level person representations without manual information.Moreover,the analysis of existing work reveals a problem that needs to be carefully tackled in the process of unsupervised person representation learning:the hard positive problem,i.e.,in the process of unsupervised learning,the distance between positive samples that should be clustered into the same cluster is pulled apart due to the absence of ID labels,and the opportunity to become the same cluster is lost,posing a great challenge to the optimization process.Experiments on several datasets show that the proposed algorithm can achieve good performance for person re-id,and has comparable performance with supervised methods.This thesis mainly focuses on the video-based person re-id task and conducts an exploration study in the field of unsupervised person re-id.
Keywords/Search Tags:Person Re-Identification, Representation Learning, Feature Aggregation, Deep Learning
PDF Full Text Request
Related items