Font Size: a A A

Video Visual Relation Detection And Reasoning Based On 3D Convolution Neural Network

Posted on:2022-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:W K ShaoFull Text:PDF
GTID:2518306572950969Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Video visual relation detection is a significant,yet challenging,research problem,which aims to mine the dynamic visual relationship between different objects in videos.And as a bridge between dynamic vision and language,visual relation instances in the form of relation triplet <subject,predicate,object>,such as“person-ride-horse” and “dog-toward-person”,provide a more abundant and comprehensive visual content understanding in a video.Much of the existing work has researched visual relation instances in the context of still images,understanding visual relationships in videos has received limited attention.Therefore,there are only a few datasets in this research field at present.In this paper,we contribute a new dataset named Vid PDR(Video Predicate Detection and Reasoning)for video visual relation detection,which contains 1000 videos with manually labeled visual relations.In addition,this paper investigates the existing video visual relation detection methods,most of which adopt the strategy of dividing the whole video into short-term video segments,fusing the visual relationship predicted from short-term video segments into complete video level visual relationship instances,and using hand-designed features in short-term video segment visual relation prediction.These feature descriptions are designed based on human cognition,including some boundary information and motion information of the objects in the video.By encoding the spatial position information and motion information between the objects,the feature vectors are obtained,so as to predict the short-term visual relation.This paper propose a new method based on the existing video visual relation detection methods,which consists of object tracklet proposal,short-term relation prediction,and greedy relational association.First,we use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection.Second,we extract spatio-temporal features by 3D Convolutional Neural Network to help the prediction of short-term relations between pairs of objects.Finally,we associate the short-term relation segments into complete relation instances greedily.During the experiment,we conducted considerable experiments on several public datasets with a different emphasis to demonstrate the effectiveness of the spatio-temporal features extracted by the 3D Convolutional Neural Network.Eventually,we verify that our method has a certain improvement compared with other methods in the task of video visual relation detection.
Keywords/Search Tags:computer vision, video content analysis, video annotation, video visual relation detection, 3D convolutional neural network
PDF Full Text Request
Related items