Font Size: a A A

Research Of Single Object Visual Tracking Based On Natural Language Description

Posted on:2022-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2518306314973189Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Computer vision(CV)and nature language processing(NLP)are tasks of artificial intelligence which attract much attention,and have been applied in face recognition and other technologies.With the development of deep learning,especially Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)in the field of CV and NLP,and tasks combining CV and NLP,such as Video Captioning and Visual Question Answering,provide more possibilities for human-machine interaction.Success of these tasks makes tracking by natural language possible.The main contents of this paper are as follows:Single object tracking task needs to predict the location of target with deformation,displacement,occlusion and other changes in the video sequence according to the given target in the first frame.While the natural language annotation can describe either the states of targets in first frames or the overall movement of the targets among the hole video.Besides,available datasets with language annotations usually give descriptions of overall movements of targets,and it's impractical to annotate every single frame with a sentence,therefore natural language annotation cannot be global restriction for single object tracking task.To solve the aforementioned problems,this paper propose RNN-based feature updating module which utilize deep visual feature for updating encoded natural language feature,expecting the language feature changing as the targets moving among the video sequence,for assisting tracking algorithms to predict location of targetsDeep learning based tracking algorithms always ignore the matter of time sequence in training phase,which randomly select positive and negative sample group and constitute batches for training.This training method can also be applied to algorithms which takes language annotations as global restriction.But using RNN based updating module for assisting object tracking tasks should redesign training strategy:dividing video sequence into several segments as batch data,and take the first frame of each segment as the input of updating module,expecting to change language feature as the hidden state of RNN.In this paper,sufficient experiments are carried out on two single object tracking datasets,LaSOT and Lingual OTB,which contain language annotation,and experimental result prove our updating module can improve the accuracy of tracking algorithm by language specification.
Keywords/Search Tags:Deep Learning, Visual Tracking, Natural Language Processing, Updating
PDF Full Text Request
Related items