| Video object tracking has been widely used in video surveillance,autonomous driving,human-computer interaction,battlefield situational awareness,etc.However,the RGB-based object tracking methods cannot deal well with the situations where the target is similar to the background and the target apparence is too small.Hyperspectral images have rich spectral information and can finely portray the spectra of targets,so the above problems can be solved by hyperspectral-based object tracking methods.With the development of artificial intelligence,deep learning has been widely used for major computer vision tasks with great success.Deep learning can automatically learn the intrinsic laws and representation levels of data,and a deep enough network can accommodate richer semantic information.However,facing the high-dimensional and complex hyperspectral video data,there are still great challenges to use deep learning to effectively mine the semantic information of hyperspectral video data:1)Training samples of hyperspectral video data are difficult to obtain and the cost of manually labeling samples is high.There are only two publicly available hyperspectral video datasets,whose training samples are not abundant and difficult to obtain.Manual labeling of samples is costly and time-consuming.2)Due to the lack of training samples,the hyperspectral video object tracking method based on deep learning faces the problem of“data hungry”.“Data hungry”means that when the hyperspectral video samples are insufficient,the accuracy of deep learning-based hyperspectral video object tracking methods cannot reach that of traditional machine learning methods,which greatly limits the development of deep learning-based hyperspectral video objet tracking methods.Meanwhile,hyperspectral target tracking task as a video task to analyze dynamic targets,existing hyperspectral tracking methods run at less than 1 frame/second,which cannot reflect the dynamic changes of targets in real time.3)During the tracking process in complex scenes,the target apparence,target background,target spectrum and other information will change dynamically,which leads to the degradation of existing hyperspectral video object tracking models,thus causing tracking drift.In this paper,we explore how to train the hyperspectral video object tracking model based on deep network under the situation of“data hungry”,and the main research contents are as follows.(1)The difficulties and challenges of applying deep learning to hyperspectral object tracking are comprehensively analyzed,the current status and problems of hyperspectral video object tracking methods at home and abroad are summarized.The theories related to Siamese network tracking methods and correlation filter tracking methods are introduced in detail and the application potential of these two methods in the field of hyperspectral tracking are analyzed.(2)To address the lack of hyperspectral video datasets and the high cost of manual labeling,a high spectral-spatial-temporal(H~3)resolution benchmark dataset and an unsupervised hyperspectral video object tracking method based on cyclic consistency(H~3Net)are proposed.The H~3 dataset contains nine challenging attributes,which can verify the performance of the algorithm under different challenging attributes.H~3Net uses uses cyclic consistency,which can unsupervisedly train the video tracking network model without labeling the training samples,overcoming the challenge of high cost of manual labeling,while using spatial-spectral branching to extract spatial and spectral information separately to improve model discriminative.(3)To address the“data hungry”problem of deep hyperspectral video object tracking methods,a double Siamese network object tracking method(Siam HYPER)based on RGB-hyperspectral fusion is proposed.By analyzing the hyperspectral video data,which contains both spatial and spectral information,the Siamese network architecture is proposed.A preliminary classification result is obtained using a RGB-based Siamese network object tracking module trained from massive RGB data,and this classification result is refined using a hyperspectral target-aware Siamese network module on this basis.A spectral-spatial cross attention module is designed,which can give different weights to the two Siamese network modules to enhance the information interaction between RGB features and hyperspectral features.The RGB-based Siamese network object tracking module is used to guide the training of the hyperspectral target-aware Siamese network module,and finally the RGB classification information is fused with the hyperspectral classification information to train a hyperspectral object tracking model based on the depth Siamese network under the condition of“data hungry”.(4)To address the problem of model degradation caused by dynamic changes in target appearance,target background,and target spectrum(due to changes in target spectrum caused by sensor photography)during the motion of the target in complex scenes,we propose a deep Siamese network object tracking method(HA-Net)with time-space-spectrum online update.By solving the positive definite quadratic based on the Conjugate Gradient descent algorithm,the hyperspectral features can be fine-tuned online quickly to adapt to new targets and scenes and reduce the risk of model overfitting.The hyperspectral template update strategy is proposed to retain the initial hyperspectral template to mitigate the template degradation and error accumulation;the dynamic hyperspectral template is updated adaptively to increase the resistance of the model to challenges such as target deformation,background changes,and target spectral changes.(5)A prototype hyperspectral video-based object tracking system is constructed.By combining hyperspectral video-based object tracking methods proposed from multiple perspectives,a hyperspectral video-based object tracking prototype system for multi-source object situational awareness is constructed.In this paper,we build a deep-learning hyperspectral tracking model based on a step-by-step approach of“shallow feature-semantic feature-temporal-spatial-spectral semantic features”to provide a new solution to the development of hyperspectral video object tracking methods from the difficulties faced by deep-learning hyperspectral object tracking methods. |