Research Of Single Object Visual Tracking Based On Natural Language Description

Posted on:2022-07-16

Degree:Master

Type:Thesis

Country:China

Candidate:S Wu

Full Text:PDF

GTID:2518306314973189

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Computer vision(CV)and nature language processing(NLP)are tasks of artificial intelligence which attract much attention,and have been applied in face recognition and other technologies.With the development of deep learning,especially Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)in the field of CV and NLP,and tasks combining CV and NLP,such as Video Captioning and Visual Question Answering,provide more possibilities for human-machine interaction.Success of these tasks makes tracking by natural language possible.The main contents of this paper are as follows:Single object tracking task needs to predict the location of target with deformation,displacement,occlusion and other changes in the video sequence according to the given target in the first frame.While the natural language annotation can describe either the states of targets in first frames or the overall movement of the targets among the hole video.Besides,available datasets with language annotations usually give descriptions of overall movements of targets,and it’s impractical to annotate every single frame with a sentence,therefore natural language annotation cannot be global restriction for single object tracking task.To solve the aforementioned problems,this paper propose RNN-based feature updating module which utilize deep visual feature for updating encoded natural language feature,expecting the language feature changing as the targets moving among the video sequence,for assisting tracking algorithms to predict location of targetsDeep learning based tracking algorithms always ignore the matter of time sequence in training phase,which randomly select positive and negative sample group and constitute batches for training.This training method can also be applied to algorithms which takes language annotations as global restriction.But using RNN based updating module for assisting object tracking tasks should redesign training strategy:dividing video sequence into several segments as batch data,and take the first frame of each segment as the input of updating module,expecting to change language feature as the hidden state of RNN.In this paper,sufficient experiments are carried out on two single object tracking datasets,LaSOT and Lingual OTB,which contain language annotation,and experimental result prove our updating module can improve the accuracy of tracking algorithm by language specification.

Keywords/Search Tags:

Deep Learning, Visual Tracking, Natural Language Processing, Updating

PDF Full Text Request

Related items

1	Algorithm Study On Object Tracking Via Language And Visual Model
2	Research On Machine Learning For Natural Language Processing And Transmission
3	Research And Application On Method Of Generating SQL Through Natural Language Based On Deep Learning
4	A Research Of Visual Tracking Algorithm Based On Deep Learning
5	Research On Natural Language Generation Techniques In The Large Language Model Era Of Deep Learning
6	Modeling And Learning Of Representations For Natural Language Sentence-level Structures
7	Deep Learning Natural Language Generation System For Scientific Literature Based On Microservices
8	Natural Language Processing Of Ancient Books Of Chinese Traditional Medicine Based On Deep Learning
9	Life Language Processing:Exploration Of Gene Sequence Splicing Method Based On Deep Learning And Natural Language Processing
10	Research On Visual Question Answering Method Based On Deep Learning