Font Size: a A A

Video Recognition Based On Deep Visual Representation

Posted on:2020-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:X S QiaoFull Text:PDF
GTID:2428330626453276Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently,with the rapid development of the computer vision,video recognition be-comes a popular research direction.Video recognition is the fundamental of the video surveillance,autonomous vehicles,virtual reality,etc,which also raises researchers' at-tention.Specifically,the task of Action Recognition aims to predict the action of persons within a video,based on the automatic analysis of pattern recognition and machine learn-ing algorithms.Now,most of the papers concentrate on using detection,tracking or de-signing robust feature to encode the motion information,while they ignore the high-level semantic information among samples.To solve this problem,in this paper,we propose an unified framework for analysing Spatial Temporal representation across Grassmanni-an manifold and Euclidean space(ST-AGE).ST-AGE designs a new Spatial-Temporal Representation Volume,then projects this volume onto different spaces to metric the similarity and analyse high-level semantic information.The major contributions of our work are concluded as follows:(1)We design a new video representation named Spatial-Temporal Representation Volume(STRV).This representation contains spatial and time information simultaneous-ly.Based on the capability of the convolutional neural network,we choose the fully-connected layer for constructing the spatial part,meanwhile keep its sequence informa-tion.Besides,based on improved dense trajectory,we obtain the motion information in the region-of-interest for reinforcing the temporal part.(2)We propose analyzing the relationship among samples using manifold learning.We decompose the STRV into two parts.For spatial representation,we project it onto the Grassmannian manifold while project the temporal representation onto the Euclidean space.Then we fusion the two metrics into a kernel linearly.Finally,an efficient multi-kernel for SVM is conducted to classify the videos.(3)We evaluate the performance of ST-AGE under four datasets,namely KTH,HMDB-51,UCF-50,UCF-101.Meanwhile,we compare several results under different condition from multiple aspects.According to the experiments,the algorithm of ST-AGE gets a satisfying performance on the four datasets.ST-AGE concentrate on modeling the three-dimensional structure of the video,then analysing across multiple spaces.This algorithm achieves a very satisflying results on several datasets.
Keywords/Search Tags:Video recognition, Dense trajectory, Deep learning, Grassmannian manifold, Euclidean space
PDF Full Text Request
Related items