Font Size: a A A

Feature Representation For Multi-object Tracking Based On Attention Mechanism And Feature Decoupling

Posted on:2022-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2518306512452304Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Multi-object tracking(MOT)is an important topic in computer vision field,with the task of determining the motion trajectories of all instances in a video sequence.As a fundamental research,MOT has been widely used in the fields of autonomous driving,intelligent monitoring,and human-computer interaction,etc.In recent years,multiobject tracking has achieved great advancement due to deep learning technique.However,the task of MOT is still challenging because of the variable number of tracked targets,the mutual occlusions among targets,the interference of complex background,the tracking drift by detectors,etc.Currently,tracking-by-detection is the mainstream framework of MOT,which consists of three parts: global target detection,affinity model(also called association model)design,and inference of association state.Global target detection is responsible for detecting all the targets of interest in the video sequence frame by frame,affinity model aims at extracting features for each detection response(or trajectory)and measuring similarities between them.Based on these similarities,the task of inference module is to solve a global optimal problem for association and generate the motion trajectories of all the targets of interest.Under the tracking-by-detection framework,this paper utilizes deep learning technique and has made an in-depth research on feature representation learning in affinity model.The main contents are as follows:(1)The affinity model based on spatial attention mechanism.Spatial attention mechanism is an efficient means to handle mutual occlusions and detector's drift.This paper studies and improves a siamese architecture based spatial attention network.Specifically,aiming at the shortcoming in the original network that ignores the spatial structure information existing in each channel,Intersection over Union(Io U)is proposed to substitute weighted pooling as feature fusing strategy.The outputs of the improved model are used to calculate the similarity scores of each detection response pair,and the Hungarian algorithm is performed state association,resulting in trajectories of multiple targets.The experimental results demonstrate that the proposed model can improve the accuracy in data association and achieve multitarget tracking with a better performance.(2)The affinity model based on spatial-temporal attention mechanism.In complex scenes,it is hard to guarantee the tracking performance by relying solely on spatial attention mechanism.In this situation,the dynamic information of the tracked targets in time domain could be exploited to improve robustness of the affinity model.This paper proposes a spatial-temporal attention network,by which the spatiotemporal relationships are modeled for detection responses.Compared with the spatial attention network in above chapter,more discriminative spatio-temporal features are learned to facilitate feature representation ability.We conduct experiments on the dataset in MOT Challenge and show the validity of the presented network model.(3)Feature representation learning based on decoupling for foreground and background.The core of spatial-temporal attention mechanism is to suppress interference and strengthen effective information.Along this line and from the perspective of feature decoupling,this paper makes an attempt to introduce the generative adversarial network and generative representation learning into multi-object tracking.To this end,the appropriate network architecture as well as the corresponding loss functions are designed elaborately,such that the foreground is decoupled from the background with the designed network model.While the foreground feature is discriminative to different identities,the background corresponds to the clutter in the scene except the foreground.The self-encoder-decoder framework and self-attention mechanism are employed in the model.Experimental results show that compared with several state-of-the-art approaches,the proposed method achieves comparable or superior tracking performance.
Keywords/Search Tags:multi-object tracking, affinity mode, attention mechanism, feature decoupling, generative adversarial networks
PDF Full Text Request
Related items