Font Size: a A A

Single-Person Target Tracking Based On Multi-Scale Feature Fusion

Posted on:2022-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:J RenFull Text:PDF
GTID:2518306557967499Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Single-target tracking usually predicts the size and position of the target in subsequent frames when the size and position of the target in the initial frame of a certain video sequence are given.Although deep learning technology has developed rapidly in the field of single-target tracking in recent years,due to many challenging factors in the real scene,there is still no tracking model that can accurately track single pedestrian.This thesis mainly solves the single-person target tracking problem based on multi-scale feature fusion,especially to improve the performance of single pedestrian detection and tracking in video.This thesis aims to design an accurate single-person target tracking model based on deep learning.First,improve the accuracy of the pedestrian detection model by combining semantic features to lay the foundation for single-person target tracking.Secondly,design a single-person target tracking model based on multi-scale feature fusion and dilated convolution.Finally,a real-time single-person target tracking model based on self-attention mechanism and Io U loss function is designed.The main work of this thesis is as follows:(1)It is proposed that by combining semantic features in the existing detection model,solve the problem of some hard substances being falsely detected as pedestrians.Aiming at the problem that the target occupies a small pixel in the picture,ROI Pooling is used to combine high-level and lowlevel features to improve the algorithm.Experiments with the improved model on Caltech datasets show that the improved model can effectively solve the problem of pedestrians being misdetected.(2)It is proposed to fuse the features of the larger receptive field obtained by using dilated convolution with the features obtained by the standard convolution method to obtain richer features.Secondly,the features output by the low-level network and the high-level network are merged to obtain detailed information such as the texture of the picture and high-level semantic information.Finally,the existing single-target tracking model is used to predict the picture sequence.Because the tracking results of some pictures are accurate,it reduces a lot of time and energy to make a dataset.The improved model is tested on the dataset.The experimental results show that compared with the original model,the improved model has a higher tracking accuracy.(3)It is proposed to use the multi-layer feature fusion method to improve the model to obtain more feature information of the picture,and use the self-attention mechanism to capture the dependencies between multiple frames to solve the problem of only relying on the previous frame to affect the tracking effect.Finally,the Io U loss is added to the original loss function so that the predicted frame of the model can better coincide with the real frame to improve the performance of the tracking model.Experiments with the improved model on the dataset,the experimental results show that the improved model can obtain richer features to increase the robustness of the model.
Keywords/Search Tags:Object Detection, Object Tracking, Dilated Convolution, Multi-scale Feature Fusion, Self-attention Mechanism
PDF Full Text Request
Related items