| As an important problem in computer vision,multi-object tracking has a very wide range of usages in the fields of video surveillance,intelligent drive,and intelligent robot.Multi-object tracking in complex scenes is still a very challenging task,even though great progress has been made recently.In complex scenarios,the problems of interpretability,correlation characteristics and real-time performance of multi-object tracking algorithms in the use process are important factors that restrict the application of the algorithm.So this work designs a various of moedels to handle the challenges and faced problems.The major innovations of this research are listed as follows:1)To understand and interpret increasingly deep tracking models,a novel neuroanatomically aligned brain-like tracking model is proposed.This method attempts to resolve the contradiction between the tracking accuracy and the human brain itself processing visual tracking mechanism.First,a deep tracking neural network model that is aligned with the anatomy of the human brain is proposed,making it more in line with the anatomical pathway of the cerebral cortex smooth tracking pathway.Secondly,the cerebral cortical response data and behavioral data of human beings are analyzed to find out the cortical areas and corresponding activation responses related to the visual tracking task.Finally,to reasonably evaluate the brain-like performance of the proposed model,we design a novel metric to measure the similarity between deep neural networks and human cerebral cortex responses and human eye behavior.By deeply exploring the similarities between the tracking performance of the model and the behavior of the cerebral cortex and human eye.The correlation between the designed model and the human brain in processing visual tracking tasks is shown,and explained from the aspects of model structure,cortical activation response,and human eye behavior.2)In order to deal with the problems of target appearance,attitude change,frequent occlusion,etc.in multi-object tracking,and to solve the incompleteness of historical trajectory features used by traditional methods when data association between current detection results and historical trajectories,and the local feature extraction of convolution operations In this paper,a global attention model focusing on the span-temporal scope is proposed to achieve better association effect.First,a method of embedding non-local attention layers in traditional convolutional neural networks is designed to adaptively extract global features across spatial and temporal regions,using non-local attention mechanism to suppress the problems of partial detection inaccuracy and occlusion.Second,the method also proposes an attention association network to handle sequence correlation and occlusion during multi-object tracking.When correlating current detection results with historical trajectories,the proposed method not only generates similarities between object detections and trajectory species observations,but also generates all consistency with pedestrian sequences to alleviate unreliable sample pairs from historical trajectories associated impact.Finally,a data association training approach and data augmentation strategy are proposed to deal with the issue of insufficient data and overfitting during model training.Extensive experiments demonstrate that the proposed non-local attention mechanism and attention association network can effectively handle the problem in multi-object tracking process for better performance.3)In order to handle the issue of the lack of temporal features in the current detection results and the difference between the sum and the historical sequence features,a spatial-temporal mutual representation learning method is proposed,so that the current candidate detection can well correlate the historical trajectories when performing object association.First,this paper proposes an excellent spatial-temporal mutual representation learning architecture to solve the feature difference between the spatial features of the current detection results and the spatialtemporal features of the associated historical trajectory sequences.Second,to enhance the mutual learning and discriminative ability of the proposed method,three loss functions are designed: cross loss,modality loss and similarity loss,which help the detection learning network to obtain temporal features.Finally,a prediction-based multi-object tracking paradigm is designed to alleviate the drift problem of single-target tracking by using features learned from spatial-temporal mutual representation learning.Experiments on multiple multi-object tracking datasets show that the proposed method can achieve very good tracking results in complex tracking scenarios.4)In view of the problem of repeated extraction of target features when the two tasks of detection and tracking are processed separately,this paper proposes an end-to-end training and prediction method to jointly detect and correlate data,which effectively handles the differences between the two tasks of detection and tracking contradiction in order to better meet the realtime requirements of the application.First,an end-to-end model architecture is designed to jointly handle object detection and online multi-object tracking tasks.Second,to resolve the inconsistency between the output of the target detection submodule and the input of the target association submodule in the end-to-end model,a joint submodule and an appropriate training data generation method are proposed.Finally,a two-stage iterative training method is designed to train the proposed detection sub-module and association sub-module and conduct straight online multi-object tracking in a fully end-to-end mode.Extensive ablation experiments are conducted on the multi-target tracking benchmark dataset,showing that the proposed method achieves competitive tracking accuracy and operational efficiency relative to many other online multi-object tracking methods.Based on the algorithm proposed before,a multi-object tracking system was developed,and the application of the system was verified in the automatic driving scenario,indicating that the proposed method has good application value in engineering.To sum up,from a series of perspectives such as brain-like similarity,non-local attention mechanism,spatial-temporal mutual representation learning and end-to-end model,this paper deeply explores the difficulties and challenges of multi-object tracking,and introduces advanced concepts and methods.Based on the idea,a number of effective tracking methods are designed,and better tracking performance is obtained.Therefore,this paper has a good inspiration and reference significance for future research and development in this field. |