Video Visual Relation Detection And Reasoning Based On 3D Convolution Neural Network

Posted on:2022-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:W K Shao

Full Text:PDF

GTID:2518306572950969

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Video visual relation detection is a significant,yet challenging,research problem,which aims to mine the dynamic visual relationship between different objects in videos.And as a bridge between dynamic vision and language,visual relation instances in the form of relation triplet <subject,predicate,object>,such as�person-ride-horse� and �dog-toward-person�,provide a more abundant and comprehensive visual content understanding in a video.Much of the existing work has researched visual relation instances in the context of still images,understanding visual relationships in videos has received limited attention.Therefore,there are only a few datasets in this research field at present.In this paper,we contribute a new dataset named Vid PDR(Video Predicate Detection and Reasoning)for video visual relation detection,which contains 1000 videos with manually labeled visual relations.In addition,this paper investigates the existing video visual relation detection methods,most of which adopt the strategy of dividing the whole video into short-term video segments,fusing the visual relationship predicted from short-term video segments into complete video level visual relationship instances,and using hand-designed features in short-term video segment visual relation prediction.These feature descriptions are designed based on human cognition,including some boundary information and motion information of the objects in the video.By encoding the spatial position information and motion information between the objects,the feature vectors are obtained,so as to predict the short-term visual relation.This paper propose a new method based on the existing video visual relation detection methods,which consists of object tracklet proposal,short-term relation prediction,and greedy relational association.First,we use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection.Second,we extract spatio-temporal features by 3D Convolutional Neural Network to help the prediction of short-term relations between pairs of objects.Finally,we associate the short-term relation segments into complete relation instances greedily.During the experiment,we conducted considerable experiments on several public datasets with a different emphasis to demonstrate the effectiveness of the spatio-temporal features extracted by the 3D Convolutional Neural Network.Eventually,we verify that our method has a certain improvement compared with other methods in the task of video visual relation detection.

Keywords/Search Tags:

computer vision, video content analysis, video annotation, video visual relation detection, 3D convolutional neural network

PDF Full Text Request

Related items

1	A Research On Visual Content Analysis Towards Video Mining
2	Research On Video Annotation Technology Based On Multimodality
3	Research On Key Technologies Of Video Retrieval Based On Convolutional Neural Network
4	Fusion And Reasoning Of Video Visual Relation Detection Based On Graph Neural Network
5	Research On Video Classification And Detection With Deep Learning
6	Research On Video Sentiment Content Analysis Method Based On Protagonist And Convolutional Neural Network
7	Video Semantic Annotation Methods And Theoretical Research
8	Research And Realization On Content-Based Video Retrieval
9	Video Content Structure Technology To Achieve
10	Near Duplicate Video Detection Based On Short Video