Font Size: a A A

Research On Spatio-Temporal Appearance Representation For Video-Based Person Re-identification

Posted on:2017-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2308330488451959Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Pedestrian re-identification is a difficult problem due to the large variations in a person’s appearance caused by different poses and viewpoints, illumination changes, and occlusions. To deal with video-based pedestrian re-identification problem, we build an appearance representation. We utilize a concise low-level descriptor that combines color, texture and gradients to describe the person’s appearance. Fisher vectors are then built on these descriptors to obtain the final pedestrian appearance representation, which combine both the spatial and temporal information together.Furthermore, we consider the temporal alignment problem, in addition to the spatial one, and propose a new approach that builds a spatio-temporal appearance representation for pedestrian re-identification. In the temporal dimension, we split the sequence into a couple of segments corresponding to different phases of a walking cycle; and in the spatial domain, we divide the different body parts apart. We then obtain multiple video blobs based on the spatial and temporal segmentation, and each video blob is a small chunk of data corresponding to a certain action primitive of a certain body part, which is named a body-action unit. Based on these units we then extract Fisher vectors and concatenate them to form a fixed-length feature vector to represent the appearance of a walking person. So the two pedestrians to be compared are aligned both spatially and temporally through such a representation.The formation of each body-action unit can be flexible and different for each person. It is even possible to use different body part models for different action primitives, or vice versa, as long as the number of parts and primitives are fixed, resulting in a very flexible joint body-action model, yet the final representation is a consistent feature vector across different people for easy comparisons.We finally combine our proposed unsupervised appearance modeling method with supervised distance metric learning methods to address the video-based pedestrian re-identification task. Based on the distance metric learning with the labeled training samples, the representation distance of same person between different cameras is reduced to some extent. Thus the final re-identification results achieve the state-of-the-art performance.The benefits of the proposed representation are:1) It describes a person’s appearance during a walking cycle, hence covers almost the entire variety of poses and shapes; 2) It aligns the appearance of different people both spatially and temporally; 3) The formation of each body-action unit can be very flexible and different for each person, while Fisher vectors can work with any volume topologies, so the final representation is a consistent feature vector.
Keywords/Search Tags:Pedestrian Re-identification, Appearance Representation, Fisher Vector, Video-based, Spatio-Temporal Alignment
PDF Full Text Request
Related items