Font Size: a A A

Research On Video Multi-object Tracking Algorithms Based On Deep Learning

Posted on:2023-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M YangFull Text:PDF
GTID:1528306794960519Subject:Control Science and Engineering
Abstract/Summary:
Tracking multiple objects of interest in videos to maintain fixed identities,that is,multi-object tracking(MOT),has always been an issue of interest to many researchers.It is uniquely useful in many more complex computer vision systems used in security,industry,transportation and military.Offline MOT can utilize the entire video to provide complete spatiotemporal information to support tracking,while online MOT can only utilize known information such as current frames and historical frames to construct target trajectories.Online MOT can meet the real-time requirements in scenarios such as video surveillance and autonomous driving.Therefore,the online MOT algorithm capable of real-time tracking and the related problems encountered during its realization are the primary concerns of this paper.The traditional MOT algorithm exploits manual means to set the tracked object.The tracking-by-detection paradigm obtains the positions and categories of all objects in the current scene from the detection results of each frame of a given video,and then determines the objects that the algorithm focuses on.With the application of deep learning technology to the field of object detection and its brilliant performance,researchers have focused their attention on the detection-based tracking paradigm,and designed a series of video multi-object trackers with excellent tracking performance.However,the current detection algorithm cannot completely and accurately identify and locate the target of interest in some video scenes with severe target occlusion and a large number of targets,and there are a large number of missed detections and false detections in the obtained detection results.Therefore,how to overcome the adverse effects of noise in the detection results and accurately restore the full path of the target is the main challenge faced by the tracking-by-detection-based MOT.Through in-depth research and analysis of the application of tracking-by-detection paradigm in video online MOT technology,this paper explores solutions to problems such as motion position prediction,bounding box refinement,appearance feature modeling,tracking management,compatibility between tracking modules and their suitability for MOT tasks,etc.The main work of this paper is summarized as follows:(1)Online pedestrian multiple-object tracking with prediction refinement and track classification.The classification and processing of targets of different occlusion types when severely occluded is studied in depth,aiming to solve the association error caused by inaccurate detection and prediction,the identity switch caused by the coincident target,and the false negative caused by severe occlusion.Specifically,this method first uses a motion model combining Karaman filter and Enhanced Correlation Coefficient(ECC)method to improve the accuracy of position prediction.Secondly,a bounding box regression network is used to refine the target position after motion prediction with detection attributes,thereby improving the positioning accuracy.Thirdly,severely occluded objects are classified and treated discriminately.Fourthly,a simple greedy matching algorithm can accurately associate the tracked target with the detection response.Finally,the pedestrian re-identification technology is used to restore the identity of the lost target in the re-entry scene,thereby improving the performance of the algorithm for online multi-target tracking.(2)Online multi-object tracking using multi-function integration and tracking simulation training.In order to make the various modules in the multi-objective framework more compatible,the algorithm integrates the functions of bounding box refinement and object appearance feature extraction into a network model and exploits different subsequent branches to realize the corresponding functions.This method also applies Kalman & ECC as a motion model to improve prediction accuracy.In order to improve the adaptability of each network module to the tracking task itself,this study proposes a tracking simulation training method to train the network model.This study simulates the online multi-object tracking process during training,exploits the predicted position from the motion model to expand the training data,and combines a metric loss that can utilize the historical appearance features of targets to train the appearance feature extraction module,so that the model can optimize the network weights by an end-to-end approach.(3)Transformer-based two-source motion model for multi-object tracking.In complex video scenes with a large number of non-linear motions such as turning,acceleration,and deceleration,simple linear motion models often perform poorly.Therefore,the algorithm proposes to use the Transformer structure,which has demonstrated excellent sequence data processing capability in the fields of text translation and speech recognition,to construct a motion model,so as to realize the perception of the changing law of the target position and the position prediction in subsequent frames.The historical position difference of the target is exploited to extract the non-rigid body position change information caused by the motion of the target itself,and the affine vector between consecutive video frames is extracted by the Enhanced Correlation Coefficient(ECC)method to provide the rigid body position change information.These two types of information are respectively expanded by two fully connected layers,and then input into the network model to predict the current position of the target,thereby improving the prediction accuracy.At the same time,the motion model can also be easily deployed to other tracking frameworks to improve tracking performance.In this paper,the proposed video multi-object tracking algorithm is evaluated on several public experimental datasets recognized by researchers and widely used and quantitatively compared with other excellent tracking algorithms.By analyzing the experimental results,it can be found that the proposed methods are very effective,they can effectively alleviate the above-mentioned key problems,and improve the overall performance of the video multi-target tracking algorithm based on deep learning.
Keywords/Search Tags:multiple object tracking, deep learning, neural network, motion model, appearance model
Related items