Font Size: a A A

Research On Target Tracking Via Multiple-feature Based On Deep Reinforcement Learning

Posted on:2023-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z M WangFull Text:PDF
GTID:2568306782974299Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep Neural Network(DNN)extract non-linear and potentially structured representation of data by building multi-level,multi-neuron large-scale neural network learning system,which has driven the rapid development of computer vision and achieved breakthrough results in a wide range of applications such as image recognition and autonomous driving systems.Object tracking model predict the location and shape of the target object in the current frame by given the location and shape of the target object in the previous frame,and achieves the tracking and localization of the target object in the video image sequence.The combination with DNN has achieved a leap forward of object tracking,from unitary to universal and online training to completely offline training.However,in practical applications,the tracking in hard scenes is still a challenge that needs to be addressed in this field.Most of known DNN frameworks pay more attention to strengthen 2D-features of objects,which enhance tracking performances in general scenes.However,they are difficult to work in hard scenes due to the single 2D-feature-dependence failed to confront interferences from the surrounding environment.Therefore,it is difficult for ordinary DNN-based trackers to achieve competitive performance in hard scenes.According to the latent 3D-feature with geometric invariance provides the more robust description for the target,we propose a novel framework of off-line tracking that fuses the latent 3D information with 2D-features based on Deep Reinforcement learning(DRL)to deal with the object tracking in hard scenes.Moreover,a general DRL algorithm optimization method is proposed for the characteristics of low data sample efficiency and unstable performance of DRL algorithm to obtain a more robust offline training tracking model.The main contents are as follows:1.In hard scenes tracking task,conventional frameworks pay excess attention to strengthen 2D-features of objects,which leads to the undesirable performance by the sensitive to interference such as occlusion and illumination.To solve this,we propose a novel framework of off-line tracking that fuses the latent 3D information with 2D-features based on Deep Reinforcement learning(DRL)to deal with the object tracking in hard scenes.First and foremost,we develop a generator named deep reinforcement generator(DRG)that is inspired by the temporal tasks solving ability and strong discrete sampling ability of deep reinforcement learning(DRL),which is utilize to construct latent 3D-features between of the target and background.Furthermore,the generated features satisfy the realism and discriminability by establishing a joint loss function between the generated features and 2D features,and between the predicted Io U value and the ground-truth of the model determine with generated feature.And optimizing part of 2D features to be similar with the generated features and splicing the different features into a fully connected neural network to achieve feature fusion.The results on popular benchmark of tracking OTB-50 and OTB-100 demonstrates the effectiveness and rationality of our framework.2.Most of DRL algorithms applied in reward feedback sparse environments(only a very small portion of the sampled trajectories contain non-zero reward)making those grappling with problems as low data utilization,long training time,and unstable performance.Asynchronous Advantage Actor-Critic(A3C)constructs the parallel deep reinforcement learning framework by parallel computing of Workers and updating network parameters of Learner asynchronously for intelligent exploring in the large-scale and complex environments to accelerate training process greatly.However,this way will produce the high variance solutions when Agent confronts the complex environment.This problem induces that Learner does not obtain the global optimal policy.We proposed the network compression and knowledge extraction model based on supervised exploring,which is called Compact Asynchronous Advantage Actor-Critic(compact_A3C).First,the proposed model freezes Workers of the pre-trained A3 C.And then,we measure the performances of all Workers in the common states,and map the performances of Workers to probability values by Softmax.Then we update Learner according to such probability value,which is obtain the global optimal sub-model(Worker)and enhance resource utilization.Furthermore,the updated Learner is assigned as Teacher Network to supervise Student Network in the early exploration stage,moreover,we exploit the linear factor to reduce the guidance of Teacher Network for encouraging the free exploration of Student Network.We verified the generalization capability of the proposed model in popular environments Gym Classic Control and Atari2600.The proposed method is also used to improve the original DRG,which significantly reduces the training time and enhances the stability for DRG within loss of performance lower than 3%.
Keywords/Search Tags:object tracking, deep reinforcement learning, hard scene, asynchronous advantage actor-critic, proximal policy optimization
PDF Full Text Request
Related items