Font Size: a A A

Research On Video Person Re-identification Based On Two-stream Multi-level Attentive Promotion

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:W G LinFull Text:PDF
GTID:2428330611966532Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of visual big data and artificial intelligence,the analysis of video data is the key to computer vision in future.In particular,pedestrian recognition,retrieval and analysis of surveillance video is an increasingly important task.The pedestrian matching task between multiple cameras is called Person Re-identification.This task is to distinguish pedestrians after locating them.Existing research aspects include image-based and video-based person re-identification,and the video-based one is more difficult.It will be a tough task in the future video analysis.With the development of deep learning,many breakthroughs have been made in person reidentification methods based on convolutional neural networks and recurrent neural networks.Many image-based pedestrian re-recognition algorithms have been applied to video person reidentification research with temporal designs and lead to good results.More methods build a relevant network based on the continuous characteristics of the video sequence,or describe of video information from different dimensions to achieve the supervision of multi-modal information,which also achieve significant results.However,there are still many shortcomings in video person re-identification.On the one hand,because the temporal feature correlation of the continuous sequence in the video has not been reflected,the importance of video features is not effectively distinguished,which affects the representation of video.On the other hand,when describing video through different dimensions,it lacks the key features of capturing the overall discriminative effect from a global perspective across dimensions and modalities.In view of the above two problems,this paper proposes a multi-level attentive optimization scheme for video features,which optimizes features from both of video frame-level and video segmentlevel to improve feature discrimination.The main contributions of this article are:1)Based on the two-stream model of "RGB + optical flow",this paper designs the featureoptimized structure of context perception and multi-modal perception from the frame level,and constructs a two-stream recurrent interactive attentive network TS-RCAN.The network uses convolutional neural networks to extract basic features,and features are connected in series through a recurrent neural network.Two gate structures are designed for context awareness and multi-modal perception.Both these two gate structures are designed using channel attention to distinguish the importance of features.So that feature optimization is achieved,and the feature expression ability of each frame is improved.2)This article extends Non-local self-attention mechanism to the two-stream network of person re-identification,and learns the relationship and importance distinction between the spatiotemporal characteristics of the entire video segment of each modality to form the twostream segment-level attention perception promotion network TS-SAPN.TS-SAPN's attention perception method is different from other self-attention mechanisms in that it uses the framelevel optimization features of the previous stage to generate association weight masks to introduce multi-modal perception information,and improves attentive feature recognition effect.This paper combines the frame-level and segment-level optimization networks to obtain a two-stream multi-level feature-aware optimization network TS-MLPN.The video two-stream feature is optimized from two levels and three perception processes.The final feature representation of the video is obtained by fusing the video frame-level and video segment-level optimized features along the channel,which effectively improves the overall expression ability of the two stream features.This paper tests the proposed TS-RCAN and TS-MLPN on two public data sets,and compares the test results with the cutting-edge algorithms of video person re-identification in recent years,showing that both networks have better performance.The effect of pedestrian reidentification detection validates the perceptual optimization strategy of this paper.
Keywords/Search Tags:video person re-identification, deep learning, convolutional neural network, recurrent neural network, context information, multimodal learning, attention aware
PDF Full Text Request
Related items