Video Person Re-identification Based On Data Enhancement And Attention Mechanism

Posted on:2022-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:X P Lan

Full Text:PDF

GTID:2518306512452244

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Person re-identification(re-ID)is a key technology to retrieve interested pedestrians from multiple non-overlapping cameras,and it has important requirements in the system construction of smart cities,smart security,smart malls and so on.Compared with the traditional person reID method with hand-crafted features,the person re-ID method based on deep learning has more advantages in recognition accuracy.Image-based person re-ID requires high image quality.Meanwhile,due to the limited available information of a single pedestrian image,the occlusion problem cannot be solved well.Video-based person re-ID can use the information of multiple frames of pictures,and can make up for the shortcomings of insufficient information of single-frame pedestrian images.However,the existing video-based person re-ID data set is still small,and the deep learning method is extremely dependent on the data.For this reason,deep models trained on small data sets generally have over-fitting problems.Compared with a single-frame image,a pedestrian image sequence not only contains spatial information,but also contains temporal information.The above factors have affected the improvement of the accuracy of video-based person re-ID.Therefore,the research content of this paper mainly includes the following two points:(1)Aiming at the over-fitting of small data sets when training on the deep model,this paper designs a pedestrian data enhancement method based on pose-generated image sequences.This method uses progressive pose transfer generative adversarial network to generate a sequence of pedestrian images.The generated data set composed of the generated pedestrian image sequence trains the video-based person re-ID model together with the original data set,and the label smoothing and regularization strategy is adopted in the training process.On the two public person re-ID data sets PRID2011 and i LIDS-VID,the experimental results show that on the baseline model of person re-ID,Rank-1 increases by 4.5% and 3.3%,and m AP increases by 2.2% and 1.2%,respectively.(2)Aiming at how to make better use of the spatial and temporal information of image sequences to obtain more discriminative sequence feature expressions,this paper designs a video-based person re-ID model based on channel attention and multi-scale temporal relationship reasoning,which combines the network SE-Res Net based on the channel attention mechanism,the temporal attention model based on multi-scale temporal relation reasoning and the BNNeck structure.In order to extract more valuable spatial content in pedestrian images,this method introduces a channel attention mechanism,and uses the channel attention network model SE-Res Net as the backbone network for feature extraction.In order to utilize the time sequence information in the image sequence,this method introduces a multi-scale temporal relation reasoning module.The multi-scale temporal relation reasoning module is a multi-scale temporal attention model,which can select more discriminative frames from the image sequence through temporal relation reasoning,and assign greater weight to it.The entire network combines the weights obtained by the multi-scale temporal relation reasoning module and the frame-level features extracted by SE-Res Net,and obtains the feature expression representing the sequence through weighted average.The network adopts the joint training of cross-entropy loss and triple loss.In order to alleviate the inconsistency of optimization objectives during training of two different types of losses,the BNNeck structure is introduced to optimize the convergence of the two losses.On the two public person re-ID data sets PRID2011 and i LIDS-VID,the experimental results show that the accuracy of Rank-1 of this method reaches 92.1% and 83.3%,respectively.The complexity of the algorithm model and the amount of parameters are relatively moderate,and comprehensive consideration can show the effectiveness of the algorithm model.Furthermore,when combining the method in(1)to enhance the original data,the accuracy of Rank-1 is again increased by 0.3% and 0.7%compared to the original basis.

Keywords/Search Tags:

video-based person re-ID, generative adversarial network, over-fitting, channel attention, temporal attention

PDF Full Text Request

Related items

1	Research On Unsupervised Domain Adaptation Person Re-identification Method Based On Generative Adversarial Network And Divergent Attention Mechanis
2	Spatiotemporal Attention On Sliced Parts For Video-based Person Re-identification
3	Image Captioning Based On Generative Adversarial Network With Temporal Attention
4	Research On Multimodal Anomaly Detection Method Based On Graph Attention Network And Temporal Convolutional Generative Adversarial Network
5	Research On Cross Modal Text Generation Image Based On Generative Adversarial Network
6	Research On Video Person Re-identification Method Based On Spatial-temporal Attention Mechanism
7	Research On Text Description Image Generation Based On Generative Adversarial Network
8	Research On Image Deblurring Based On Generative Adversarial Network
9	Research On Face Attribute Editing Algorithm Based On Attention Mechanis
10	Design And Implementation Of Video Character Lip Modification System Based On Generative Adversarial Network