Font Size: a A A

Research On Video-Based Person Re-Identification Method Based On Multi-Scale Enhancement And Temporal Sequential Fusion

Posted on:2024-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhaoFull Text:PDF
GTID:2568307118985319Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Video-based person re-identification is a pedestrian retrieval technology based on video sequence,which aims to retrieve the video sequence of designated pedestrians under cross-camera,and is of great significance for maintaining social public safety.With the development of deep learning,video-based person re-identification has attracted wide attention.Compared with person re-identification in a single frame image,video data contains a wealth of time and space information,but there are also problems such as human posture change,target pedestrian occlusion,background interference,etc.How to make full use of the rich temporal and spatial information in video and obtain discriminating video features is the focus of current research.Therefore,from two perspectives of feature enhancement and feature fusion,this thesis proposes a research method of video-based person re-identification based on multi-scale enhancement and temporal sequential fusion to solve the problems of high-level feature detail loss and inability to effectively extract complementary local and global features,so as to improve the feature representation ability of video-based person reidentification.The main work contents are as follows.(1)A video-based person re-identification method based on multi-scale sub-pixel convolution feature enhancement is proposed to improve the accuracy of video-based person re-identification through more effective feature learning and temporal sequential modeling.Firstly,the multi-scale sub-pixel convolution feature pyramid is constructed to extract the spatial features of the image.While extracting the high-level semantic features of the pedestrian image,the network can also use the fusion of the last layer features and the multi-scale intermediate layer features to model the pedestrian image features,and introduce the sub-pixel convolution operation to make full use of the rich channel information in the high-level features.Effectively retain the details in the image;Secondly,the feature enhancement module is used to strengthen discriminative features and suppress irrelevant features through attention operation in space and channel.Then,the dilated convolution and self-attention mechanism are used to model the short-term and long-term temporal respectively,and the frame-level features are aggregated into the sequence-level features to generate the final pedestrian video features.Finally,the feature enhancement loss function is constructed to optimize the network training and improve the performance of video-based person re-identification.The experimental results on MARS and Duke MTMC-Video REID datasets show that the proposed method can effectively improve the performance of video-based person reidentification.(2)A video-based person re-identification method based on temporal complementary feature fusion is proposed,which effectively solves the problem of insufficient ability of pedestrian complementary feature extraction and multi-scale temporal sequential modeling in video-based person re-identification task.Firstly,the frame-level features are divided into two parts by using the time relationship.The features of the former part focus on the saliency information,and the latter part enlarge the attention area of the subsequent frame by making the difference between the original features and the saliency features of the frame,so as to obtain a wider range of pedestrian features.The features of the two parts complement each other,so that the final fused video features have richer pedestrian information.Then,different temporal convolution kernels are used to model multi-scale temporal relationships,and the shortterm and long-term temporal relationships are dynamically captured to improve the robustness of the video-based person re-identification model.Finally,the feature fusion loss function is constructed to supervise the network training and improve the generalization ability of the network.Experiments on MARS and PRID-2011 datasets demonstrate the effectiveness of the proposed method.This thesis has 31 figures,18 tables and 95 references.
Keywords/Search Tags:Video-based person re-identification, Deep learning, Multi-scale feature enhancement, Fusion of complementary features
PDF Full Text Request
Related items