Font Size: a A A

Learning Robust Multimodal Representations For RGBT Object Tracking

Posted on:2022-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2518306542463814Subject:Computer technology
Abstract/Summary:PDF Full Text Request
RGBT target tracking is a new research topic that combines visible light(RGB)and thermal infrared(T)video data to track object.In recent years,the object tracking technology based on single mode has made important breakthroughs,but this algorithm still fails to perform well in some complex scenes or extreme conditions,such as insufficient illumination,bad weather,background clutter and target occlusion.Since thermal infrared imaging can overcome the above challenges and make up for the shortcomings of visible light imaging,and visible light images can supplement the color and detail information lost in thermal infrared imaging,it is of great significance to properly utilize the complementarity of visible light and thermal infrared images for the improvement of tracking performance.In recent years,researchers have proposed a RGBT target tracking method based on traditional visual models such as sparse representation,correlation filtering and dynamic graphs,but it is weaker than the RGBT target tracking method based on deep learning in accuracy and speed.Therefore,based on deep learning technology,this dissertation carries out research on robust learning of multi-modal features.The main work includes the following two aspects:First,a high-performance RGBT object tracking method based on deep adaptive fusion network is proposed.At present,RGBT target tracking methods based on deep learning usually only use the modal features of a certain layer for fusion,while ignoring the performance improvement brought by multi-level feature fusion.In addition,if the features of two modes are directly fused without considering the modal reliability,excessive noise may be introduced,which is not conducive to obtaining robust multi-modal fusion features and affects the performance of subsequent tracking.In order to solve the above problems,a high-performance RGBT tracking method based on deep adaptive fusion network(DAFNet)is proposed in this work.The method is composed of a fusion chain,which can be used for multi-layer feature fusion.On this basis,an adaptive fusion module is proposed,which can fully consider the reliability of different modal features before feature fusion,and use the generated modal weight to suppress noise and redundancy information to realize adaptive modal fusion.Due to the simple and effective operation of the adaptive fusion network,DAFNet can run at near real time speed.Experiments on public datasets GTOT and RGBT234 verify the better tracking accuracy and efficiency of DAFNet.Second,an RGBT object tracking method based on global context modeling and memory information guidance is proposed.How to obtain robust multimodal fusion features has always been the focus of RGBT object tracking research.Inspired by the self-attention mechanism,a feature fusion module based on global context information is proposed in this dissertation.The module can mine the global context information in the modal features,and combine the global context information of two modes for feature fusion,so as to achieve robust multi-modal feature representation.In addition,as the tracking progresses,there are often challenges such as occlusion,lighting changes,and so on.For these challenges,the tracker is more likely to drift.However,it is difficult to achieve a more robust feature representation depending only on the features of the current frame.Considering the rich timing information will be generated in the tracking process,a memory information guidance network is proposed in this work.This network can reasonably use the timing information to guide the features of the current frame,optimize the feature representation,reduce the probability of the tracker drift,and further improve the tracking accuracy.A series of experiments on GTOT and RGBT234 datasets verify the effectiveness of the proposed method.
Keywords/Search Tags:RGBT object tracking, Deep learning, Information fusion, Attention mechanism
PDF Full Text Request
Related items