Font Size: a A A

Fined-grained Action Recognition Using Graph Model

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:W LuoFull Text:PDF
GTID:2518306503472574Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Action recognition in video is always one of the most popular and unresolved issues in the computer vision community.Action recognition can be directly applied not only to intelligent monitoring,intelligent drive,human-computer interaction,etc,but also is the the basis for multiple video research tasks.With the development of deep learning technology,many new algorithms based on deep neural networks have emerged in the field of action recognition.Mainstream approaches can be roughly divided into three families:3D convolution,two-stream,and recurrent neural networks.By relying on these methods,the temporal and spatial characteristics of the video can be effectively exploited and utilized.These models work well in UCF101,HMDB51,Kinetics and other general datasets,but they are not satisfactory on some tasks of fine-grained action recognitions.We believe that these methods focus on the modeling of appearance and motion characteristics,but more or less ignore the interactions between objects in the videos.Therefore,their performances are not good enough.This paper presents a novel framework for modeling and reasoning interactions in video using graph models.The objects in the video and some class-related scene regions are detected first.Their visual features and spatiotemporal position features are extracted as nodes of the graph model,and the interaction between objects is represented as the edge of the model.We tried to use the inner product,bilinear weighting,and multi-layer perceptron to obtain the weights of the edges of the graph model.Then a graph convolution network(GCN[1])is used on this graph,which explores the relation between objects and achieves the fusion of scene information and object information.What's more,we propose weight sharing and relation normalization to improve inner-product-based relation modules,which is widely used for the relative exploiting problem.Experiments have shown that our algorithms work better in fine-grained action recognition than traditional CNN meth-ods because it is good at representing the interaction between objects:the accuracy up to 43.6%at the validation set of EPIC Kitchen for the verb classification,and 47.0%(m AP)at the test set of VLOG,which outperform the-state-of-the-arts.
Keywords/Search Tags:Computer Vision, Action Recognition, Graph Model, Scene Modeling
PDF Full Text Request
Related items