Action Recognition Based On Interactions

Posted on:2021-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:J Xia

Full Text:PDF

GTID:2518306503480694

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the great success of deep networks in various image tasks,more and more researches have focused on video understanding tasks that are more complex.The task of action recognition is to locate the spatiotemporal position of all people in the video and identify the actions.It is one of the most important video understanding tasks.The topic of action recognition has great value both in academia and industry.Action recognition can be widely used in the fields of surveillance cameras,autonomous driving,platform video review and commercialization,human behavior researches,and so on.This article studies action recognition based on a variety of interactions in the video.The interaction relationship refers to the interaction between people and the environment in the video.We observe three types of interactions that are helpful for the recognition of actions,namely person-person interaction,person-object interaction,and temporal interaction.In order to model these interactions,we first use deep video networks in conjunction with object detection models to extract features of people and objects that appear in the video.For these extracted features,we propose a general interaction module based on the dot-product attention mechanism to model the above three interactions respectively.In order to fuse these three types of interactions,we propose a structure of serial reasoning,in which the interaction blocks are connected in serial.As the interaction network deepens,human action features are continuously strengthened and the fusion model is then able to model complex interactions.Long-term temporal interactions are important but complicated for action recognition.In the past,algorithms used to model long-term interactions consumed a lot of computing resources.To solve this problem,we proposed a feature pool and dynamic read-write algorithm.The feature pool stores the action features of the video over a long period of time.During training,the model uses a dynamic read-write algorithm to read and update the features in the pool.Using the feature pool and dynamic read-write algorithm,we can store the features of the area with a long temporal distance,avoiding the direct convolution operation on the entire video.Our model achieves more concise operations and better results.The model proposed in this article is tested on a large-scale action data set called Atomic Visual Actions(AVA).AVA is currently the largest and most distinguishable dataset.This paper sets up multiple sets of experiments to verify the computational advantages and accuracy advantages of each module proposed in the article.Compared with other state-of-the-art algorithms,our model uses the same or even less computational cost.With a single model,our algorithm outperforms other state-of-the-art methods for at least 5 m AP.Our interaction-based action recognition model achieves currently the new state-of-the-art performance.

Keywords/Search Tags:

Video understanding, action recognition, deep learning, interactions, dot-product attention

PDF Full Text Request

Related items

1	Studies On Action Recognition In Video Based On Deep Learning
2	Video Action Recognition Technology Research Based On Deep Learning
3	Deep Feature Fusion And Attention Models For Video Action Recognition
4	Action Recognition In Video Based On Deep Neural Network
5	Construction And Application Of Complex Behavior Video Datasets
6	Research On Video Action Recognition Based On Deep Learning
7	Video Based Human Action Recognition With Deep Learning
8	Research On Video Action Recognition Based On Deep Learning
9	Research And Implementation Of Action Recognition Based On Deep Learning
10	Deep Learning On Video Based Human Action Recognition