Action Recognition Based On Spatiotemporal Convolution Networks

Posted on:2021-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:K Yang

Full Text:PDF

GTID:2518306308970169

Subject:Computer Science and Technology

Abstract/Summary:

Recognizing actions in videos based on temporal information is a challenging problem in computer vision.Most of the previous methods,such as 3D CNNs(convolutional neural networks)and two-streams CNNs,only used features containing global temporal information as video representation,ignoring the importance of local temporal features.Action recognition based on the spatiotemporal convolutional networks can simultaneously analyze the spatial and temporal information,so as to effectively understand the intention of the action performer in videos.To solve such problems,we propose long and short sequence concerned networks(LSCN)based on temporal interaction perception module,which can combine different temporal information.LSCN makes use of the interactions of temporal features from different convolution layers to enhance the temporal structure and the representation of videos,as well as takes into account the needs of temporal information for long and short sequence actions.This paper proposes a new method of data augmentation,namely background enhancement.Background enhancement algorithm tries to use the background information of video,which is fused with the video frames to generate new data.New data generated by background enhancement algorithm is similar to the videos which are captured by cameras from long distance.This algorithm increases the data distribution which models in training stage can learn,as well as better optimizes the parameters and enhance generalization ability of models.Based on the characteristics of real-time video stream,this paper designs an online action recognition system and puts forward a variety of solutions according to the problems encountered in the practical application of the system.The results of experiments show that LSCN which is based on 3D ResNext101 and uses the background enhancement algorithm can be generalized well in two public datasets(UCF101 and HMDB51).Moreover,compared with the basic network,there are 0.5%and 3.1%accuracy improvements respectively.

Keywords/Search Tags:

action classification, action recognition, temporal features, features fusion, background enhancement

Related items

1	Research On Action Recognition Based On Deep Network Learning Of Spatio-temporal Features
2	Research On The Approach Of Human Action Recognition Based On Spatio-temporal Features
3	Research Of Action Recognition From Videos Using Deep Neural Networks
4	Research On Human Skeleton Action Recognition Method Based On Graph Convolutional Network
5	Research On The Approach Of Human Action Recognition Based On Mutil-features Fusion
6	Research On Action Recognition Based On Spatio-temporal Features
7	Research On Some Problems Of Human Action Recognition In Videos
8	Research On Multi-modal Human Action Recognition Based On Features Fusion And Attention Mechanisms
9	A Study Of Human Action Recognition Based On Spatio-temporal Features
10	Research On Temporal Action Detection Based On Neural Network