Font Size: a A A

Research Of Saliency Detection And Tracking Algorithm Based On Deep Visual Attention Mechanism

Posted on:2019-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:T SunFull Text:PDF
GTID:2348330545998798Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Computer vision stems from machine learning,which "teach" how computers can imitate the way human vision is viewed,allowing machines to replace human eyes to understand the entire world.In the computer field,computer vision has always been a research direction that people are keen on,because it can help people accomplish many visual tasks.Saliency detection and visual tracking are two classical problems in the field of computer vision.With the development of technology,more and more algorithms have been proposed and used.These algorithms solve the problem to a certain extent,but there are still many deficiencies.On the one hand,people apply multi-modal information to expect a more accurate saliency map,but they cannot determine which modality is more important for the results.On the other hand,due to the complexity of real-world scenes(such as object occlusion,target deformation and scale change),the performance of object tracking is greatly affected.Therefore,there are still many problems that need to be resolved.The attention mechanism proposed in recent years imitates the way of human visual observation,temporarily ignores the global information of things,selectively focuses on certain parts of the things,and obtains the global information of things through the understanding of several parts.The advantage of doing so is that you can quickly filter out useless information and concentrate on your interests.At the same time,with the re-emergence of neural networks,computer vision has also gradually shifted from a combination of traditional hand-craft features such as HOG and SIFT with shallow models to deep learning models represented by convolutional neural networks(CNNs).Deep learning possesses powerful feature expression capabilities.It can automatically learn the low-level features and advanced features of things from the bottom to the up.These feature hierarchies are arranged in layers and represent different semantic concepts in the real world.This dissertation makes full use of these characteristics,designs a reasonable neural network,and forms a new attention-based mechanism based on deep learning.The concept of attention has become more and more abundant and diversified,and new ideas have been added.The innovations of the proposed two works in this dissertation are as follows:(1)In order to produce a more accurate and high-precision target,we propose a novel attention-based mechanism by fusing multiple modalities,which can adaptively distinguish the good and bad of different modal results and make good use of them.The better modalities receive higher weights and the overall significant improvement can be achieved by weighted fusion of different modalities.In terms of implementation,the algorithm successfully combines deep learning with reinforcement learning.Specifically,the model is divided into several parts:First,we obtain the saliency maps with different coarse model by using the classical encoder-decoder network.Then,using generative adversarial nets training on the previous basis,we can make the generator learn the data distribution of the dataset and the resulting saliency map is thus refined.Finally,combined with reinforcement learning,different modal results are weighted by different attention weights.To the best of our knowledge,this is the first application of reinforcement learning to attention weighting problem.(2)The tracking algorithms based on the deep learning techniques have made remarkable achievements.However,the target of actual scene tracking is often disturbed by deformation,illumination,occlusion or scale change,which greatly limits the tracking performance.Not only that,many algorithms lack the tracking lost-restart mechanism,and the latter frame is thus difficult to keep up when the target is lost.To overcome these problems,we propose a novel target-driven attention mechanism,which searches the target from the global by training a deep CNN to generate the target attention map of each frame.Then,using generative adversarial nets further optimizes the attention map.Finally,the combination of attention map and tracking algorithm can effectively avoids the interference of the above factors.Experiments show that this attention map has a wide range of generalization capabilities and can assist tracking algorithms to achieve great improvement.
Keywords/Search Tags:Deep learning, convolutional neural networks, saliency detection, reinforcement learning, attention mechanism, visual tracking
PDF Full Text Request
Related items