| Object tracking,as one of the fundamental research problems in the field of computer vision,aims at realizing the tracking of the object of interest by learning the object labeled in the first frame of the video,and then outputting the scale and position information of the object in the subsequent video sequences.With the rapid development of remote sensing earth observation technology,remote sensing videos captured from satellites can efficiently collect rich surface information,which has very important practical application value in military monitoring,battlefield analysis,urban intelligent monitoring and other scenarios.In recent years,object tracking methods based on natural videos have made great progress,and a large number of tracking algorithms have emerged.However,since remote sensing videos often face problems such as small object size,occlusion object,similar background interference and object scale changes,it is difficult to cope with such a complex situation by directly using the existing object tracking algorithms based on natural videos,and good tracking performance cannot be realized.In this thesis,a series of researches are carried out to address the problem of the remote sensing videos in complex backgrounds.Based on the existing object tracking techniques,four methods using deep learning are proposed to improve the performance of object tracking in remote sensing videos.The more specific theoretical and application innovative contributions are described as follows:(1)Aiming at the occlusion problem in remote sensing object tracking,a object tracking method based on deep reinforcement learning is proposed.The method describes the object tracking task as a Markov decision process,where the tracking algorithm makes decisions about the movement direction and distance of the object in the video frame based on state and action parameters.This framework achieves tracking of occluded objects without additional network structure.The spatio-temporal context awareness information between consecutive frames in a remote sensing sequence,the object appearance model learned from the network,and the motion vector obtained from the action parameters in reinforcement learning are utilized to provide occlusion information to determine the direction and distance of the tracked object’s movement and to improve the robustness and precision of the object tracking algorithm.Experiments results on remote sensing videos show that even in the face of completely occluded remote sensing videos,the proposed algorithm can still accurately track the object when it reappears,which verifies the effectiveness of deep reinforcement learning-based object tracking method.(2)A Siamese multi-scale adaptive search network framework is proposed to realize object tracking for the complex background problem in remote sensing videos.Firstly,the multiscale cross-correlation can utilize multiple image features to achieve the complementary advantages of multiple features,and obtain a discriminative model and a more comprehensive feature representation.Afterwards,if the network still cannot accurately track the object,an adaptive search module is introduced,which combines Kalman filter and partitioned search strategy.Kalman filter can re-detect and localize the object,and the partition search strategy can assist Kalman filtering to accomplish more accurate candidate region selection,thus realizing the estimation of object motion and re-localizing to the tracked object.Experimental results on remote sensing datasets containing complex backgrounds such as background clutter,occlusion and scale changes show that,the proposed Siamese multi-scale adaptive search network can obtain more accurate object bounding boxes,thereby improving the performance of tracking.(3)A deep motion network-based tracking method is proposed to address the common problem of small objects in remote sensing videos.Many existing tracking methods belong to detection-based tracking algorithms,which treat video sequences as frame-by-frame images and lack information correlation between adjacent frames.Especially for small objects where it is more difficult to extract features,it is crucial to focus on the motion information of the tracked object between video frames.The deep motion network-based tracking method introduces the optical flow network into the Transformer-based tracking model,making use of the spatio-temporal awareness information of remote sensing videos to estimate the motion information of the tracked object more accurately.Then,the relative displacement theory is utilized to improve the accuracy of the model.The information of the difficult-to-track object is discarded,and then the motion of a relative object is observed,and the motion state of the tracked object is inferred from the positional relationship between the tracked object and the relative object.By combining the optical flow network and the relative displacement theory,the proposed method realizes good tracking results.(4)Aiming at the problem of insufficient remote sensing video samples,a remote sensing video object tracking method based on knowledge distillation and attention fusion is proposed.Obtaining high-resolution remote sensing videos requires high cost and expense,and it requires very time-consuming manual annotation of remote sensing videos.However,researches on deep learning need a large amount of annotated data.For this reason,a teacherstudent knowledge distillation network model is constructed for remote sensing video object tracking.In the process of training the network,the teacher model that learns a large number of non-remote sensing data videos guides the student model that learns a small amount of remote sensing videos.The knowledge learned in the teacher model is used to transfer it to the student model,so that the student model is optimized continuously under the guidance of the knowledge,which then enhances the discriminative precision of the student model.Meanwhile,an attention loss function is designed to improve the robustness and precision of the object tracking algorithm.The experimental results show that the proposed teacher-student knowledge distillation network model can make the predicted bounding box of the tracked object more accurate and improve the precision of remote sensing video object tracking. |