Font Size: a A A

Research On Video-based Person Re-identification

Posted on:2021-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2518306503491094Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the construction of the National Skynet Project and people's concern for public safety,person re-identification has been widely used in video surveillance,smart security,criminal investigation and other fields.Due to the urgent practical needs,person re-identification technology develops rapidly.Video-based person re-identification contains richer semantic information and motion information,which has gradually attracted more researchers' attention.Mapping pedestrian sequences to a single feature representation is the key in the study of video-based person re-identification.Average pooling or RNN is the most classic method for aggregating all frame-level features.However,they are often difficult to deal with spatial misalignment caused by occlusion,posture changes,and camera views.Therefore,we introduce the Non-local mechanism to learn the spatiotemporal attention inside the sequence adaptively.At the same time,we use the feature erasing mechanism to build local feature learning branch,so that the network simultaneously focuses on the learning of local feature and global feature,which improves the discriminability of the overall features.The appearance model based on Non-local and feature erasing achieves MAP =81.9%,rank1 =87.0% on the large-scale public dataset——MARS,which is comparable to the state-ofthe-art methods.In addition,in practical applications,existing methods based on appearance feature often have poor results in dealing with changing clothes.Therefore,we introduce human biological characteristics——gait as auxiliary information.Our proposed network combining appearance feature and gait feature,which has a superior performance in a variety of scenarios compared to single appearance feature or gait feature.Especially in the CASIA-B dataset CL(change clothing)subset,the rank1 has been significantly improved to 75.95%,which surpasses single appearance feature or gait feature more than 20%.In our fusion network,we make full use of pedestrian's mask: it is used not only the input of the gait feature extraction network but also the spatial attention in the appearance model to construct the foreground appearance feature branch.On the Mask-MARS and CASIA-B datasets,a large number of ablation experiments have verified the performance of our proposed fusion network.
Keywords/Search Tags:Video-based Person Re-Identification, Non-local Attention, Feature Fusion
PDF Full Text Request
Related items