Font Size: a A A

Research On Occluded Person Re-Identification Based On Vision Transformer

Posted on:2024-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhaoFull Text:PDF
GTID:2568307136996239Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and economy in China and the rapid increase of surveillance cameras,the number of related videos is also growing rapidly.This has brought huge workload to the field of public security.It will take a lot of time to rely on traditional manpower to search,and it may also lead to missed detection.In recent years,with the development of deep learning technology,person re-identification technology,as a problem of the same pedestrian image retrieval between multiple cameras,has become a hot topic in current computer vision research.However,in real life,due to the existence of different angles of view,low resolution of images,light illumination changes,occlusion,and complex camera environment,these will lead to uncertainty in the recognition process.As a result of these challenges,person re-identification is still an unsolved problem.In real life,most of the pedestrian images captured by the camera are full of occlusion and background information.Especially when the person re-identification is carried out in crowded places,such as railway stations,shopping malls and hospitals,occlusion is an inevitable problem and effective methods must be found to solve the occlusion problem.This problem is called occluded person re-identification.Aiming at the problem of occluded person re-identification,this paper studies the structure of convolution neural network and vit network.Compared with the existing mainstream methods based on convolution neural network or vit-based methods,the proposed method in this paper has achieved excellent performance on the current mainstream publicly occluded person re-identification datasets Occluded-Duke and other small occluded person re-identification datasets.The specific research contents are as follows:(1)This paper proposes PFTransformer(PFT),a deep learning network framework based on the vision transformer method.First,input a picture of occluded person,cut the image into patches one by one through the image block operation,and input them into the network.After the linear projection of the flat patches,we get the feature sequence of the patches,and then we design a new module to do image enhancement for the feature sequence.Afterwards,after incorporating category and positional encoding,they are fed into the encoding module in the vision transformer network.Finally,the feature sequence is input into the patches fusion and reconstruction module and the space cutting module designed by us.These modules are specially designed for occluded person re-identification tasks,thereby further enhancing the robustness of the patches feature sequence,and then grouped for IDloss and Triplet Loss,complete the training of the network,and realize the improvement of the overall performance of the network.(2)This paper proposes a dual-branch deep learning model DB-Res HViT based on the hybrid structure of convolutional neural residual network and vision transformer deep learning network.We design a novel partial patch pre-convolution module in the vision transformer branch to achieve image enhancement,and help this branch to establish global feature relations containing local feature information.In the residual hybrid vision transformer branch,we designed the residual mobile vision transformer module to complete the deep learning neural network training of input features and extract local features.Finally,the feature information extracted by the two branches is combined,and some low-frequency information is filtered out to obtain the pedestrian features we finally extracted,which improves the overall performance and the extracted features are more robust.
Keywords/Search Tags:occluded person re-identification, ViT, ResNet, patch sequence
PDF Full Text Request
Related items