Font Size: a A A

Video Moving Object Detection And Retrieval Based On Deep Learning

Posted on:2019-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:T M YangFull Text:PDF
GTID:2428330566495999Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Detecting and retrieval the object in video is an important task in the field of computer vision.The main difficulty lies in how to use the deep learning algorithm to analyze the position of the object in video and how to search for accurate object in video based on query criteria.In recent years,the traditional deep learning algorithm has made great progress in the recognition of single static images and the generation of text description.However,the retrieval requirements for the target in video cannot be met.In this thesis,the goal is to detect and retrieval target objects in video.Firstly,a target location detection algorithm based on the boundary probability convolution neural network model is proposed to identify and locate the object in video.Then,a 3D convolutional neural network based on spatio-temporal characteristics fusion is used to perform the action detection of the characters in video.Finally,a natural language target search algorithm based on the Gated Recurrent Unit(GRU)is used to complete the object retrieval in video.In addition,this thesis carries out the target detection experiment on PASCAL VOC,carries out character action detection experiment on UCF-101 and HMDB51,and carries out natural language object retrieval experiment on ReferIt.The results show that the object detection and retrieval algorithm based on deep learning has improved the existing methods to a certain extent.The work of the thesis is mainly reflected in the following three aspects:(1)Using a convolution based on target candidates box border probability neural network model,calculating the target candidate bounding box of the four sides in a certain probability on the search area,getting a candidate box closer to the ground truth box,and integrating with the object recognition model through iteration;(2)Fusing the pre-trained spatial and temporal network in the depth convolution layer,using the combined spatio-temporal model to extract the spatio-temporal feature,and using the 3D convolution neural network to complete the detection of actions for characters in video;(3)Using a convolution neural network to extract the features of local object region and global image in parallel,and using a two-layer gated recurrent unit to combine these two features and the feature of natural language statement for natural language object retrieval.
Keywords/Search Tags:Deep Learning, Object Detection, Action Detection, Object Retrieval, Gated Recurrent Unit(GRU), Natural Language
PDF Full Text Request
Related items