Font Size: a A A

Research On Key Technologies Of Person Re-identification In Video Surveillance System

Posted on:2021-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:W R SongFull Text:PDF
GTID:1488306557962889Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,motivated by Safe City Construction,the video surveillance system has been widely deployed in several public places,such as city streets,stations,and campuses.However,if the massive surveillance videos are processed manually,it will consume plenty of manpower and time.With the rapid development of computer vision and artificial intelligence,the research and applications related to the system of intelligent video analysis and processing have emerged,and person re-identification is an important part of this system.The purpose of person re-identification is to match people across nonoverlapping cameras at different times and places.Current studies can be divided into two types according to different research objects: the image-based and the video-based.Compared with the image data,the video data can provide more information and is more in line with practical applications.Therefore,this thesis focuses on the research of video-based person re-identification.The video-based task remains a challenging issue due to clutter background,occlusion,and variations of viewpoint,pose and illumination,which limits the development of the person re-identification research.Thus,by exploring the discriminative information of the pedestrian in the video,this thesis tends to build an effective spatio-temporal feature extraction algorithm and improve the accuracy of person re-identification.This thesis focuses on the three key points of the video feature research: the feature extraction for the small-scale datasets,the attribute-constraint feature extraction,and the global-local feature extraction.The main contents of the research and innovations are as follows.(1)To solve the problem of video-based person re-identification on small-scale datasets,a temporal feature extraction algorithm based on the fusion of traditional and deep features is proposed.Considering that some current video datasets have a small number of samples,therefore,this thesis first utilizes several low-level features to construct an effective image-level feature for the pedestrian.Then,a two-branch extraction model is designed based on the proposed image-level feature.One branch can obtain the deep-learned sequence-level feature by constructing a deep feature fusion network.In the meantime,according to the characteristics of the video itself,the other branch utilizes frame-wise features to extract the average appearance feature and the motion feature of the sequence.The hand-crafted sequence-level feature can be obtained by fusing the two features.Finally,considering the respective advantages of the above two sequence-level features,the temporal feature of the video is obtained through the feature fusion technique.The proposed algorithm can reduce the limitation of the hand-crafted feature in temporal information description and the influence of overfitting caused by the deep network.Experimental results demonstrate that the proposed algorithm has good performance on small-scale video datasets.(2)To solve the problem that the representation ability of the ID-constraint feature is not strong enough,an attribute-assisted feature extraction algorithm is proposed for video-based person re-identification.The deep classification model is often used for feature learning.However,pedestrian images are limited by the external environment and the resolution of the camera.The performance of the learned model may be poor if only identity labels are used to train the network.Attributes are considered to be auxiliary information for identities.Hence,five general attributes are manually annotated for the three most commonly used video-based person re-identification datasets,and then jointly used to train the attribute-constraint sequence-level feature learning network.The results of numerous experiments verify the feasibility and effectiveness of the annotated attributes.Furthermore,based on the motion change of the pedestrian in the video,the annotated attributes are divided into two classes,namely the static appearance attribute and the dynamic appearance attribute.By modeling the common information between different tasks,the two categories of attributes supervise the image-level feature learning stage and the sequence-level feature learning stage,respectively.Additionally,in order to enlarge inter-class distances and shorten intra-class distances,the multi-attribute and identity classification loss functions including center loss are designed to train the network.The experiments show the role of the attribute division in improving the performance of the video-based person re-identification and verify the generalization ability of the proposed two-stage feature learning model.(3)To solve the problem of the limited improvement in video-based person re-identification performance caused by only using global-level attributes,a global-local attribute-driven feature extraction algorithm is proposed.The local feature of the pedestrian plays an important role in feature representation.It gives a fine-grained description of the pedestrian appearance.Meanwhile,the identity-level attributes often describe the local appearance of the pedestrian.Thus,this thesis further highlights the effect of attributes on the person re-identification task.First,each pedestrian in the video datasets is partitioned using two ways.The two ways contain partition with human body proportion and partition with human keypoints.Then,the attributes are classified into two types,including the global attribute and the local attribute.The classified attributes are used to annotate the corresponding image sequences and region sequences.Finally,a four-stream multi-task network model is proposed to explore the global and local spatio-temporal cues by using the annotated samples.The model contains four modules,including a global feature learning module and three local feature learning modules.Each of them utilizes the channel attention-based Convolutional Neural Network and the Long Short-term Memory Network to learn the features of different regions.Compared to the experimental result without attribute division,the result produced by the proposed algorithm has a performance advantage of about 6% and 4% on the PRID2011 and i LIDS-VID datasets,respectively,which shows the reasonability and validity of the attribute division.Extensive comparison experiments also demonstrate the superiority of the proposed global-local attribute-driven feature extraction algorithm.(4)To solve the problem of serious occlusions and excessive redundant information in the video person re-identification datasets,a global-local spatio-temporal feature extraction algorithm is proposed.Considering the difference between the video data and the image data,this thesis extends the notion of “local” in the video-based task.It exploits not only the regions of the frame but also the partial frames of the video.On this basis,a novel model is designed to extract the global-local spatio-temporal feature of the pedestrian in the video.The model first applies a Convolutional Neural Network based on channel attention and a bidirectional Long-term Short Memory Network to obtain holistic temporal features from an overall perspective.Next,the concept of the “key image group” is set for the video.This group contains important spatial information of all frames and can be regarded as part of the video.The extended local feature is obtained by exploring the appearance information and the spatial context information of pedestrians in these partial frames.Finally,the global and extended local features are complementary and can jointly improve the discrimination power of the final feature.Hence,the two features are fused by using the late fusion method in the similarity measurement stage.The experimental results demonstrate the importance of the extended local feature in improving the performance of the final algorithm.At the same time,the proposed algorithm can achieve Rank-1 accuracy rates higher than 91%,80%,and 81% on the PRID2011,i LIDS-VID and MARS datasets,respectively,which quantitatively verifies the algorithm has good performance of video-based person re-identification.
Keywords/Search Tags:Person re-identification, video, feature representation, pedestrian attribute, global-local feature, feature aggregation network
PDF Full Text Request
Related items